how well a test measures what it purports to

measure in a particular context.

Validation is the process of gathering and evaluating

evidence about validity. Both the test developer and

the test user may play a role in the validation of a

test for a specific purpose.

Local validation studies are absolutely necessary

when the test user plans to alter in some way the

format, instructions, language, or content of the test.

three categories

1. content validity

2. criterion-related validity

3. construct validity

Face Validity

relates more to what a test appears to measure to the

person being tested than to what the test actually


is a judgment concerning how relevant the test items

appear to be.

if a test definitely appears to measure what it

purports to measure “on the face of it,” then it could

be said to be high in face validity

test’s lack of face validity could contribute to a lack of

confidence in the perceived effectiveness of the

test—with a consequential decrease in the testtaker’s

cooperation or motivation to do his or her best.

Content Validity

describes a judgment of how adequately a test

samples behavior representative of the universe of

behavior that the test was designed to sample

For example, the universe of behavior referred to as

assertive is very wide-ranging. A content-valid,

paper-and-pencil test of assertiveness would be one

that is adequately representative of this wide range.

With respect to educational achievement tests, it is

customary to consider a test a content-valid measure

when the proportion of material covered by the test

approximates the proportion of material covered in

the course.

For an employment test to be content-valid, its

content must be a representative sample of the jobrelated

skills required for employment.

a test blueprint emerges for the “structure” of the

evaluation; that is, a plan regarding the types of

information to be covered by the items, the number

of items tapping each area of coverage, the

organization of the items in the test, and so forth

The quantification of content

developed by C. H. Lawshe, is essentially a method

for gauging agreement among raters or judges

regarding how essential a particular item is“Is the

skill or knowledge

measured by this item

■ essential

■ useful but not essential

■ not necessary


to the performance of the job?”

According to Lawshe, if more than half the panelists

indicate that an item is essential, that item has at

least some content validity. Greater levels of content

validity exist as larger numbers of panelists agree

that a particular item is essential. Using these

assumptions, Lawshe developed a formula termed

the content validity ratio (CVR)

In validating a test, the content validity ratio is

calculated for each item. Lawshe recommended that

if the amount of agreement observed is more than

5% likely to occur by chance, then the item should be

eliminated. The minimal CVR values corresponding

to this 5% level

Culture and the relativity of

content validity

Tests are often thought of as either valid or not valid. A

history test, for example, either does or does not

accurately measure one’s sknowledge of historical fact.

However, it is also true that what constitutes historical

fact depends to some extent on who is writing the history;

For example:

Gavrilo Princip was

a. a poet

b. a hero

c. a terrorist

d. a nationalist

e. all of the above

Criterion-Related Validity

is a judgment of how adequately a test score can be

used to infer an individual’s most probable standing

on some measure of interest—the measure of interest

being the criterion.

Concurrent validity is an index of the degree to

which a test score is related to some criterion

measure obtained at the same time (concurrently)

Predictive validity is an index of the degree to which

a test score predicts some criterion measure.

What Is a Criterion?

There are no hard-and-fast rules for what constitutes

a criterion. It can be a test score, a specific behavior

or group of behaviors, an amount of time, a rating, a

psychiatric diagnosis, a training cost, an index of

absenteeism, an index of alcohol intoxication, and so


Similar magazines