12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

424 TESTS<br />

need to pilot home-grown tests. Items with limited<br />

discriminability and limited difficulty must be<br />

weeded out and replaced, those items with the<br />

greatest discriminability and the most appropriate<br />

degrees of difficulty can be retained; this can be<br />

undertaken only once data from a pilot have been<br />

analysed.<br />

Item discriminability and item difficulty take on<br />

differential significance in norm-referenced and<br />

criterion-referenced tests. In a norm-referenced<br />

test we wish to compare students with each other,<br />

hence item discriminability is very important. In<br />

acriterion-referencedtest,ontheotherhand,it<br />

is not important per se to be able to compare<br />

or discriminate between students’ performance.<br />

For example, it may be the case that we wish<br />

to discover whether a group of students has<br />

learnt a particular body of knowledge, that<br />

is the objective, rather than, say, finding out<br />

how many have learned it better than others.<br />

Hence it may be that a criterion-referenced test<br />

has very low discriminability if all the students<br />

achieve very well or achieve very poorly, but the<br />

discriminability is less important than the fact<br />

than the students have or have not learnt the<br />

material. A norm-referenced test would regard<br />

such a poorly discriminating item as unsuitable<br />

for inclusion, whereas a criterion-referenced test<br />

would regard such an item as providing useful<br />

information (on success or failure).<br />

With regard to item difficulty, in a criterionreferenced<br />

test the level of difficulty is that<br />

which is appropriate to the task or objective.<br />

Hence if an objective is easily achieved then<br />

the test item should be easily achieved; if the<br />

objective is difficult then the test item should be<br />

correspondingly difficult. This means that, unlike<br />

anorm-referencedtestwhereanitemmightbe<br />

reworked in order to increase its discriminability<br />

index, this is less of an issue in criterionreferencing.<br />

Of course, this is not to deny the<br />

value of undertaking an item difficulty analysis,<br />

rather it is to question the centrality of such a<br />

concern. Gronlund and Linn (1990: 265) suggest<br />

that where instruction has been effective the item<br />

difficulty index of a criterion-referenced test will<br />

be high.<br />

In addressing the item discriminability, item<br />

difficulty and distractor effect of particular test<br />

items, it is advisable, of course, to pilot these tests<br />

and to be cautious about placing too great a store<br />

on indices of difficulty and discriminability that<br />

are computed from small samples.<br />

In constructing a test with item analysis, item<br />

discriminability, item difficulty and distractor<br />

effects in mind, it is important also to consider<br />

the actual requirements of the test (Nuttall<br />

1987; Cresswell and Houston 1991):<br />

Are all the items in the test equally difficult<br />

Which items are easy, moderately hard, hard<br />

or very hard<br />

What kinds of task is each item addressing:<br />

is it a practice item (repeating known<br />

knowledge), an application item (applying<br />

known knowledge, or a synthesis item<br />

(bringing together and integrating diverse areas<br />

of knowledge)<br />

If not, what makes some items more difficult<br />

than the rest<br />

Are the items sufficiently within the<br />

experience of the students<br />

How motivated will students be by the contents<br />

of each item (i.e. how relevant will they<br />

perceive the item to be, how interesting is it)<br />

The contents of the test will also need to take<br />

account of the notion of fitness for purpose, for<br />

example in the types of test items. Here the<br />

researcher will need to consider whether the kinds<br />

of data to demonstrate ability, understanding and<br />

achievement will be best demonstrated in, for<br />

example (Lewis 1974; Cohen et al. 2004: ch. 16):<br />

an open essay<br />

afactualandheavilydirectedessay<br />

short answer questions<br />

divergent thinking items<br />

completion items<br />

multiple-choice items (with one correct answer<br />

or more than one correct answer)<br />

matching pairs of items or statements<br />

inserting missing words<br />

incomplete sentences or incomplete, unlabelled<br />

diagrams

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!