25.11.2014 Views

Developmental psychology.pdf

Developmental psychology.pdf

Developmental psychology.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

354 Individual Differences<br />

Test items also can be developed on the basis of theory. Tests of personality<br />

and intelligence are commonly constructed in this way, for the concepts being measured<br />

are broad and difficult to define. Inkblots and other ambiguous figures, for example,<br />

are used to assess the so-called depth factors in personality.<br />

Figure 13.7<br />

Assigning Partial Credit. In<br />

answering Figure 13.3, a subject<br />

might arrange the pictures as follows:<br />

F, C, A, E, B, D. Thus, picture D is<br />

incorrectly placed at the end of the<br />

sequence instead of the beginning.<br />

Some subjects explain that the man's<br />

dinner has been stolen and now he<br />

wants to leave the restaurant before<br />

losing anything else. They describe<br />

him as taking his hat and coat off the<br />

hook rather than placing them there<br />

before eating. This line of reasoning<br />

greatly lessens the joke and makes a<br />

less concise story, but it is not clearly<br />

contradicted by evidence in the<br />

drawing. Hence, this sequence might<br />

receive partial credit, or a change<br />

could be made in the test materials<br />

(Drawing by Alain, © 1943, 1971 The<br />

New Yorker Magazine, Inc.).<br />

Answers to Figure 13.6<br />

1. Alexander Fleming<br />

2. Approximately 3,000 miles<br />

3. _ 4. 5. 6.<br />

7. Weight В 8. Equal<br />

9. 10.<br />

Refining the Test<br />

The first draft of any test usually contains more items than are intended for the final<br />

version. The reason is that some test procedures and items will prove to be defective.<br />

They must be revised or discarded, and these decisions are made by some early trials.<br />

Pretesting the Procedures A small group of subjects is used for the first trial. It<br />

is called a pretest because the purpose is simply to try out the test, determining how<br />

it is understood and how it functions. The subjects are selected for convenience and<br />

willingness to cooperate, but representativeness must be considered, too. If the sample<br />

is not reasonably close to the population for which the test is intended, the value of the<br />

pretest is greatly diminished.<br />

This administration of the test follows specific procedures, and there are detailed<br />

guidelines for recording and scoring each response. One purpose of the pretest<br />

is to check the clarity of the instructions, test items, and scoring procedures (Figure<br />

13.7).<br />

Conducting an Item Analysis Another purpose of pretesting, which would require<br />

an appropriate sample of pilot trainees in this instance,, is to examine the value of the<br />

test items. This procedure is called item analysis, for the aim is to discover which items<br />

are satisfactory and which items are not serving the purposes of the test. Then the<br />

unsatisfactory test items are improved or discarded.<br />

Basically, item analysis involves a study of how often each item is answered<br />

correctly and by whom. For selection purposes, an item that everyone passes or everyone<br />

fails is useless. It does not indicate which candidates are more promising than<br />

others.<br />

Instead, an item has greatest potential when approximately half of the subjects<br />

answer correctly. But which half? When the subjects answering correctly are<br />

those with the greatest flying aptitude, and those answering incorrectly have the least<br />

aptitude, then the item has considerable value. The ideal test item elicits one type of<br />

response from candidates with much ability and a different response from those with<br />

little ability. In other words, it has high power for discriminating among potentially<br />

good and poor pilots.<br />

Suppose one of the items on mechanical comprehension is answered correctly<br />

by about half the subjects. Is it a valuable item? Not necessarily. For each item, those<br />

subjects answering correctly and those answering incorrectly must be compared on<br />

some external criterion, such as past successes in similar training programs. If the most<br />

successful candidates answer correctly and those with the least ability answer incorrectly,<br />

then we would feel that the item discriminated successfully (Figure 13.8).<br />

Sometimes no external criteria are available, for there has been no previous<br />

assessment of the subjects. Then the only recourse for evaluating a test item is to use<br />

the total test score as the criterion of flying ability. The premise here is that the test<br />

as a whole, even in an early version, is a better measure of flying ability than any one<br />

of its items alone. Thus the subjects are considered to be high or low in flying ability<br />

according to their total scores on the test. Any item answered correctly by those in the<br />

high group and incorrectly by those in the low group is considered to be an effective<br />

item, insofar as it can be assessed by this less desirable procedure.<br />

When all of the test items have been retained, revised, or discarded, as required<br />

by the item analysis, and the procedures for test administration changed accordingly,<br />

a final version of the test is ready. Actually, the test is never in final form,<br />

for it can always be improved. But eventually a point is reached at which the test must<br />

be evaluated and used.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!