12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

VALIDITY AND RELIABILITY IN TESTS 159<br />

reliability apply. Here steps are taken to ensure that<br />

observers enter data into the appropriate categories<br />

consistently (i.e. intra- and inter-rater reliability)<br />

and accurately. Further, to ensure validity, a<br />

pilot must have been conducted to ensure<br />

that the observational categories themselves are<br />

appropriate, exhaustive, discrete, unambiguous<br />

and effectively operationalize the purposes of the<br />

research.<br />

Validity and reliability in tests<br />

The researcher will have to judge the place and<br />

significance of test data, not forgetting the problem<br />

of the Hawthorne effect operating negatively or<br />

positively on students who have to undertake<br />

the tests. There is a range of issues which might<br />

affect the reliability of the test – for example,<br />

the time of day, the time of the school year,<br />

the temperature in the test room, the perceived<br />

importance of the test, the degree of formality<br />

of the test situation, ‘examination nerves’, the<br />

amount of guessing of answers by the students<br />

(the calculation of standard error which the test<br />

demonstrates feature here), the way that the test is<br />

administered, the way that the test is marked, the<br />

degree of closure or openness of test items. Hence<br />

the researcher who is considering using testing<br />

as a way of acquiring research data must ensure<br />

that it is appropriate, valid and reliable (Linn<br />

1993; Borsboom et al.2004).<br />

Wolf (1994) suggests four main factors that<br />

might affect reliability: the range of the group that<br />

is being tested, the group’s level of proficiency,<br />

the length of the measure (the longer the test the<br />

greater the chance of errors), and the way in which<br />

reliability is calculated. Fitz-Gibbon (1997: 36)<br />

argues that, other things being equal, longer tests<br />

are more reliable than shorter tests. Additionally<br />

there are several ways in which reliability might be<br />

compromised in tests. Feldt and Brennan (1993)<br />

suggest four types of threat to reliability:<br />

<br />

individuals: their motivation, concentration,<br />

forgetfulness, health, carelessness, guessing,<br />

their related skills (e.g. reading ability, their<br />

usedness to solving the type of problem set, the<br />

effects of practice)<br />

situational factors: the psychological and<br />

physical conditions for the test – the context<br />

test marker factors: idiosyncrasy and subjectivity<br />

instrument variables: poor domain sampling,<br />

errors in sampling tasks, the realism of the<br />

tasks and relatedness to the experience of the<br />

testees, poor question items, the assumption or<br />

extent of unidimensionality in item response<br />

theory, length of the test, mechanical errors,<br />

scoring errors, computer errors.<br />

Sources of unreliability<br />

There are several threats to reliability in tests and<br />

examinations, particularly tests of performance<br />

and achievement, for example (Cunningham<br />

1998; Airasian 2001), with respect to examiners<br />

and markers:<br />

<br />

<br />

<br />

<br />

<br />

errors in marking: e.g. attributing, adding and<br />

transfer of marks<br />

inter-rater reliability: different markers giving<br />

different marks for the same or similar pieces<br />

of work<br />

inconsistency in the marker: e.g. being harsh in<br />

the early stages of the marking and lenient in<br />

the later stages of the marking of many scripts<br />

variations in the award of grades: for work that<br />

is close to grade boundaries, some markers may<br />

place the score in a higher or lower category<br />

than other markers<br />

the Halo effect: a student who is judged<br />

to do well or badly in one assessment is<br />

given undeserved favourable or unfavourable<br />

assessment respectively in other areas.<br />

With reference to the students and teachers<br />

themselves, there are several sources of unreliability:<br />

<br />

Motivation and interest in the task have a<br />

considerable effect on performance. Clearly,<br />

students need to be motivated if they are<br />

going to make a serious attempt at any test<br />

that they are required to undertake, where<br />

motivation is intrinsic (doing something for<br />

its own sake) or extrinsic (doing something<br />

for an external reason, e.g. obtaining a<br />

Chapter 6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!