12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

434 TESTS<br />

discriminability and efficiency of testing. Aiken<br />

(2003: 51) suggests that computer adaptive testing<br />

can reduce the number of test items present to<br />

around 50 per cent of those used in conventional<br />

tests. Testees can work at their own pace, they<br />

need not be discouraged but can be challenged,<br />

the test is scored instantly to provide feedback<br />

to the testee, a greater range of items can be<br />

included in the test and a greater degree of<br />

precision and reliability of measurement can be<br />

achieved; indeed, test security can be increased<br />

and the problem of understanding answer sheets is<br />

avoided.<br />

Clearly the use of computer adaptive testing has<br />

several putative attractions. On the other hand, it<br />

requires different skills from traditional tests, and<br />

these might compromise the reliability of the test,<br />

for example:<br />

<br />

<br />

<br />

The mental processes required to work with a<br />

computer screen and computer program differ<br />

from those required for a pen and paper test.<br />

Motivation and anxiety levels increase or<br />

decrease when testees work with computers.<br />

The physical environment might exert a<br />

significant difference, e.g. lighting, glare from<br />

the screen, noise from machines, loading and<br />

running the software.<br />

Reliability shifts from an index of the<br />

variability of the test to an index of the standard<br />

<br />

error of the testee’s performance. The usual<br />

formula for calculating standard error assumes<br />

that error variance is the same for all scores,<br />

whereas in item response theory it is assumed<br />

that error variance depends on each testee’s<br />

ability – the conventional statistic of error<br />

variance calculates a single average variance<br />

of summed scores, whereas in item response<br />

theory this is at best very crude, and at worst<br />

misleading as variation is a function of ability<br />

rather than test variation and cannot fairly be<br />

summed (see Thissen (1990) for an analysis of<br />

how to address this issue).<br />

Having so many test items increases the chance<br />

of inclusion of poor items.<br />

Computer adaptive testing requires a large<br />

item pool for each area of content domain<br />

to be developed (Flaugher 1990), with sufficient<br />

numbers, variety and spread of difficulty. All items<br />

must measure a single aptitude or dimension, and<br />

the items must be independent of each other,<br />

i.e. a person’s response to an item should not<br />

depend on that person’s response to another item.<br />

The items have to be pretested and validated,<br />

their difficulty and discriminability calculated,<br />

the effect of distractors reduced, the capability<br />

of the test to address unidimensionality and/or<br />

multidimensionality to be clarified, and the rules<br />

for selecting items to be enacted.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!