validity
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Validity
how well a test measures what it purports to
measure in a particular context.
Validation is the process of gathering and evaluating
evidence about validity. Both the test developer and
the test user may play a role in the validation of a
test for a specific purpose.
Local validation studies are absolutely necessary
when the test user plans to alter in some way the
format, instructions, language, or content of the test.
three categories
1. content validity
2. criterion-related validity
3. construct validity
Face Validity
relates more to what a test appears to measure to the
person being tested than to what the test actually
measures.
is a judgment concerning how relevant the test items
appear to be.
if a test definitely appears to measure what it
purports to measure “on the face of it,” then it could
be said to be high in face validity
test’s lack of face validity could contribute to a lack of
confidence in the perceived effectiveness of the
test—with a consequential decrease in the testtaker’s
cooperation or motivation to do his or her best.
Content Validity
describes a judgment of how adequately a test
samples behavior representative of the universe of
behavior that the test was designed to sample
For example, the universe of behavior referred to as
assertive is very wide-ranging. A content-valid,
paper-and-pencil test of assertiveness would be one
that is adequately representative of this wide range.
With respect to educational achievement tests, it is
customary to consider a test a content-valid measure
when the proportion of material covered by the test
approximates the proportion of material covered in
the course.
For an employment test to be content-valid, its
content must be a representative sample of the jobrelated
skills required for employment.
a test blueprint emerges for the “structure” of the
evaluation; that is, a plan regarding the types of
information to be covered by the items, the number
of items tapping each area of coverage, the
organization of the items in the test, and so forth
The quantification of content
developed by C. H. Lawshe, is essentially a method
for gauging agreement among raters or judges
regarding how essential a particular item is“Is the
skill or knowledge
measured by this item
■ essential
■ useful but not essential
■ not necessary
validity
to the performance of the job?”
According to Lawshe, if more than half the panelists
indicate that an item is essential, that item has at
least some content validity. Greater levels of content
validity exist as larger numbers of panelists agree
that a particular item is essential. Using these
assumptions, Lawshe developed a formula termed
the content validity ratio (CVR)
In validating a test, the content validity ratio is
calculated for each item. Lawshe recommended that
if the amount of agreement observed is more than
5% likely to occur by chance, then the item should be
eliminated. The minimal CVR values corresponding
to this 5% level
Culture and the relativity of
content validity
Tests are often thought of as either valid or not valid. A
history test, for example, either does or does not
accurately measure one’s sknowledge of historical fact.
However, it is also true that what constitutes historical
fact depends to some extent on who is writing the history;
For example:
Gavrilo Princip was
a. a poet
b. a hero
c. a terrorist
d. a nationalist
e. all of the above
Criterion-Related Validity
is a judgment of how adequately a test score can be
used to infer an individual’s most probable standing
on some measure of interest—the measure of interest
being the criterion.
Concurrent validity is an index of the degree to
which a test score is related to some criterion
measure obtained at the same time (concurrently)
Predictive validity is an index of the degree to which
a test score predicts some criterion measure.
What Is a Criterion?
There are no hard-and-fast rules for what constitutes
a criterion. It can be a test score, a specific behavior
or group of behaviors, an amount of time, a rating, a
psychiatric diagnosis, a training cost, an index of
absenteeism, an index of alcohol intoxication, and so
on