12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

VALIDITY AND RELIABILITY IN TESTS 161<br />

<br />

<br />

Marking practices are not always reliable,<br />

markers may be being too generous, marking<br />

by effort and ability rather than performance.<br />

The context in which the task is presented<br />

affects performance: some students can perform<br />

the task in everyday life but not under test<br />

conditions.<br />

With regard to the test items themselves, there may<br />

be problems (e.g. test bias):<br />

<br />

<br />

<br />

<br />

The task itself may be multidimensional,<br />

for example, testing ‘reading’ may require<br />

several components and constructs. Students<br />

can execute a Mathematics operation in the<br />

Mathematics class but they cannot perform the<br />

same operation in, for example, a Physics class;<br />

students will disregard English grammar in a<br />

Science class but observe it in an English class.<br />

This raises the issue of the number of contexts<br />

in which the behaviour must be demonstrated<br />

before a criterion is deemed to have been<br />

achieved (Cohen et al. 2004).Thequestionof<br />

transferability of knowledge and skills is also<br />

raised in this connection. The context of the<br />

task affects the student’s performance.<br />

The validity of the items may be in question.<br />

The language of the assessment and the<br />

assessor exerts an influence on the testee, for<br />

example if the assessment is carried out in the<br />

testee’s second language or in a ‘middle-class’<br />

code (Haladyna 1997).<br />

The readability level of the task can exert an<br />

influence on the test, e.g. a difficulty in reading<br />

might distract from the purpose of a test which<br />

is of the use of a mathematical algorithm.<br />

The size and complexity of numbers or<br />

operations in a test (e.g. of Mathematics) might<br />

distract the testee who actually understands the<br />

operations and concepts.<br />

<br />

<br />

The number and type of operations and stages<br />

to a task: the students might know how to<br />

perform each element, but when they are<br />

presented in combination the size of the task<br />

can be overwhelming.<br />

The form and presentation of questions affects<br />

the results, giving variability in students’<br />

performances.<br />

<br />

<br />

<br />

<br />

<br />

Asingleerrorearlyoninacomplexsequence<br />

may confound the later stages of the sequence<br />

(within a question or across a set of questions),<br />

even though the student might have been able<br />

to perform the later stages of the sequence,<br />

thereby preventing the student from gaining<br />

credit for all she or he can, in fact, do.<br />

Questions might favour boys more than girls or<br />

vice versa.<br />

Essay questions favour boys if they concern<br />

impersonal topics and girls if they concern<br />

personal and interpersonal topics (Haladyna<br />

1997; Wedeen et al.2002).<br />

Boys perform better than girls on multiple<br />

choice questions and girls perform better than<br />

boys on essay-type questions (perhaps because<br />

boys are more willing than girls to guess in<br />

multiple-choice items), and girls perform better<br />

in written work than boys.<br />

Questions and assessment may be culturebound:<br />

what is comprehensible in one culture<br />

may be incomprehensible in another.<br />

The test may be so long, in order to<br />

ensure coverage, that boredom and loss of<br />

concentration may impair reliability.<br />

Hence specific contextual factors can exert a<br />

significant influence on learning and this has to be<br />

recognised in conducting assessments, to render<br />

an assessment as unthreatening and natural as<br />

possible.<br />

Harlen (1994: 140-2) suggests that inconsistency<br />

and unreliability in teacher-based and<br />

school-based assessment may derive from differences<br />

in:<br />

<br />

<br />

<br />

<br />

<br />

interpreting the assessment purposes, tasks and<br />

contents, by teachers or assessors<br />

the actual task set, or the contexts and<br />

circumstances surrounding the tasks (e.g. time<br />

and place)<br />

how much help is given to the test-takers<br />

during the test<br />

the degree of specificity in the marking criteria<br />

the application of the marking criteria and the<br />

grading or marking system that accompanies it<br />

Chapter 6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!