Tuesday, 6 December 2011


What is reliability?
Reliability is synonymous with consistency. It is the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications.
Methods to Determine Reliability of Instrument

·         Equivalency
The extent to which two items measure identical concepts at an identical level of difficulty. Equivalency reliability is determined by relating two sets of test scores to one another to highlight the degree of relationship or association.
·         Internal
Internal consistency is the extent to which tests or procedures assess the same characteristic, skill or quality. It is a measure of the precision between the observers or of the measuring instruments used in a study.
·         Interrater
Interrater reliability is the extent to which two or more individuals (coders or raters) agree. Interrater reliability addresses the consistency of the implementation of a rating system.
·         Test-retest
The same test is repeated on the same group of test takers on two different occasions. Results are compared and correlated with the initial test to give a measure of stability. This method examines performance over time.
·         Split half method
A test given and divided into halves and are scored separately, then the score of one half of test are compared to the score of the remaining half to test the reliability.

 Factors Affecting Reliability

·         Administration factor
Instructions with the test may contain errors that create another type of systemic error. These errors exist in either the instructions provided to the test-taker or those given to the psychologist who is conducting the test.
·         Question construction
If test questions are difficult, confusing or ambiguous, reliability is negatively affected. Some people read the question to mean one thing, whereas others read the same question to mean something else.
·         Scoring errors
Reliable tests have an accurate method of scoring and interpreting the results. All tests come with a set of instructions on scoring. Errors in these instructions, such as making unsupported conclusions, reduce the reliability of the test.
·         Test-Taker Factors
Factors related to the test-taker, such as poor sleep, feeling ill, anxious or "stressed-out" can integrate into the test itself.
·         Heterogenity of the items
The greater the heterogeneity, the differences in the kind of questions or difficulty of the question of the items, the greater the chance for high reliability.
·         Heterogenity of the group members
The greater the heterogeneity of the group members in the preferences, skills or behaviors being tested, the greater the chance for high reliability

 Relationship between validity and  reliability
·         A test cannot be considered valid unless the measurements resulting from it are reliable.
·         Result from a test can be reliable and not necessarily valid.

Sunday, 4 December 2011


Validity is the extent to which a test measures what it claims to measure. It is vital for a test to be valid in order for the results to be accurately applied and interpreted (Kendra cherry, 2011).
Validity also refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (AERA/APA/NCME, 1999).
Validity isn’t determined by a single statistic, but by a body of research that demonstrates the relationship between the test and the behavior it is intended to measure. There are three types of validity:
Category of Validity Definition Example/Non-example
Content validity The extent to which the content of the test matches the instructional objectives. Or in other words the test represent the entire range of possible items the test should cover. A semester or quarter exam that only includes content covered during the last six weeks is not a valid measure of the course's overall objectives -- it has very low content validity.
Criterion-related Validity The extent to which scores on the test are in agreement with other criterion is called concurrent validity. Or to the extent where it can predict an external criterion is called predictive validity. If the SPM trial math tests in Form 5 correlate highly with the SPM math tests, they would have high concurrent validity. The criterion examples are success in school or success in class.
Construct The extent to which an assessment corresponds to other variables, as predicted by some rationale or theory. Or it demonstrates an association between the test scores and the prediction of a theoretical trait. Intelligence tests are one example of measurement instruments that should have construct validity. MUET test is another example because it comprises of reading, writing, listening and speaking test which cover up all the scope needed in mastering a language.

Factors that influence the validity
Reliability A valid test is always reliable because in order to be valid, it need to be reliable in the first place.
Nature of the group consistency of the validity coefficient for subgroups which differ in any characteristic (eg: age, gender, educational level)
Sample heterogeneity A wider range of score results in higher validity coefficient (range restriction phenomenon)
Criterion-predictor relationship There must be a linear relationship between predictor and criterion.
Criterion contamination Get rid of bias by measuring contaminated influences. Then correct this influence statistically by use of partial correlation.
Moderator variables Variable like age, gender, personality characteristics may help to predict performance for particular variables only.