12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

422 TESTS<br />

importance of the knowledge or skill being tested,<br />

the match of the item to the programme, and the<br />

number of items to be included.<br />

The basis of item analysis can be seen in<br />

item response theory (see Hambleton 1993). Item<br />

response theory (IRT) is based on the principle<br />

that it is possible to measure single, specific latent<br />

traits, abilities, attributes that, themselves, are not<br />

observable, i.e. to determine observable quantities<br />

of unobservable quantities. The theory assumes a<br />

relationship between a person’s possession or level<br />

of a particular attribute, trait or ability and his or<br />

her response to a test item. IRT is also based on<br />

the view that it is possible:<br />

to identify objective levels of difficulty of<br />

an item, e.g. the Rasch model (Wainer and<br />

Mislevy 1990)<br />

to devise items that will be able to discriminate<br />

effectively between individuals<br />

to describe an item independently of any<br />

particular sample of people who might<br />

be responding to it, i.e. is not group<br />

dependent (i.e. the item difficulty and item<br />

discriminability are independent of the sample)<br />

to describe a testee’s proficiency in terms of<br />

his or her achievement of an item of a known<br />

difficulty level<br />

to describe a person independently of any<br />

sample of items that has been administered<br />

to that person (i.e. a testee’s ability does not<br />

depend on the particular sample of test items)<br />

to specify and predict the properties of a test<br />

before it has been administered;<br />

for traits to be unidimensional (single traits are<br />

specifiable, e.g. verbal ability, mathematical<br />

proficiency) and to account for test outcomes<br />

and performance<br />

for a set of items to measure a common trait or<br />

ability<br />

for a testee’s response to any one test item not<br />

to affect his or her response to another test<br />

item<br />

that the probability of the correct response<br />

to an item does not depend on the number<br />

of testees who might be at the same level of<br />

ability<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

that it is possible to identify objective levels of<br />

difficulty of an item<br />

that a statistic can be calculated that indicates<br />

the precision of the measured ability for each<br />

testee, and that this statistic depends on the<br />

ability of the testee and the number and<br />

properties of the test items.<br />

In constructing a test the researcher will need<br />

to undertake an item analysis to clarify the item<br />

discriminability and item difficulty of each item of<br />

the test. Item discriminability refers to the potential<br />

of the item in question to be answered correctly<br />

by those students who have a lot of the particular<br />

quality that the item is designed to measure and<br />

to be answered incorrectly by those students who<br />

have less of the particular quality that the same<br />

item is designed to measure. In other words,<br />

how effective is the test item in showing up<br />

differences between a group of students Does the<br />

item enable us to discriminate between students’<br />

abilities in a given field An item with high<br />

discriminability will enable the researcher to<br />

see a potentially wide variety of scores on that<br />

item; an item with low discriminability will show<br />

scores on that item poorly differentiated. Clearly<br />

a high measure of discriminability is desirable,<br />

and items with low discriminability should be<br />

discarded.<br />

Suppose the researcher wishes to construct a test<br />

of mathematics for eventual use with 30 students in<br />

aparticularschool(orwithclassAinaparticular<br />

school). The researcher devises a test and pilots<br />

it in a different school or in class B respectively,<br />

administering the test to 30 students of the same<br />

age (i.e. the researcher matches the sample of the<br />

pilot school or class to the sample in the school<br />

which eventually will be used). The scores of the<br />

30 pilot children are then split into three groups of<br />

10 students each (high, medium and low scores).<br />

It would be reasonable to assume that there will be<br />

more correct answers to a particular item among<br />

the high scorers than among the low scorers. For<br />

each item compute the following:<br />

A − B<br />

1<br />

2 (N)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!