12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

162 VALIDITY AND RELIABILITY<br />

<br />

how much additional information about the<br />

student or situation is being referred to in the<br />

assessment.<br />

Harlen (1994) advocates the use of a range of<br />

moderation strategies, both before and after the<br />

tests, including:<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

statistical reference/scaling tests<br />

inspection of samples (by post or by visit)<br />

group moderation of grades<br />

post-hoc adjustment of marks<br />

accreditation of institutions<br />

visits of verifiers<br />

agreement panels<br />

defining marking criteria<br />

exemplification<br />

group moderation meetings.<br />

While moderation procedures are essentially posthoc<br />

adjustments to scores, agreement trials and<br />

practice-marking can be undertaken before the<br />

administration of a test, which is particularly<br />

important if there are large numbers of scripts<br />

or several markers.<br />

The issue here is that the results as well as the<br />

instruments should be reliable. Reliability is also<br />

addressed by:<br />

<br />

<br />

<br />

<br />

<br />

calculating coefficients of reliability, split-half<br />

techniques, the Kuder-Richardson formula,<br />

parallel/equivalent forms of a test, test/retest<br />

methods, the alpha coefficient<br />

calculating and controlling the standard error<br />

of measurement<br />

increasing the sample size (to maximize the<br />

range and spread of scores in a normreferenced<br />

test), though criterion-referenced<br />

tests recognize that scores may bunch around<br />

the high level (in mastery learning for<br />

example), i.e. that the range of scores might<br />

be limited, thereby lowering the correlation<br />

coefficients that can be calculated<br />

increasing the number of observations made<br />

and items included in the test (in order to<br />

increase the range of scores)<br />

ensuring effective domain sampling of items<br />

in tests based on item response theory (a<br />

<br />

particular issue in Computer Adaptive Testing<br />

see chapter 19: Thissen 1990)<br />

ensuring effective levels of item discriminability<br />

and item difficulty.<br />

Reliability has to be not only achieved but also<br />

seen to be achieved, particularly in ‘high stakes’<br />

testing (where a lot hangs on the results of the test,<br />

e.g. entrance to higher education or employment).<br />

Hence the procedures for ensuring reliability must<br />

be transparent. The difficulty here is that the<br />

more one moves towards reliability as defined<br />

above, the more the test will become objective,<br />

the more students will be measured as though they<br />

are inanimate objects, and the more the test will<br />

become decontextualized.<br />

An alternative form of reliability, which is<br />

premissed on a more constructivist psychology,<br />

emphasizes the significance of context, the<br />

importance of subjectivity and the need to engage<br />

and involve the testee more fully than a simple test.<br />

This rehearses the tension between positivism and<br />

more interpretive approaches outlined in Chapter<br />

1ofthisbo<strong>ok</strong>.Objectivetests,asdescribedin<br />

this chapter, lean strongly towards the positivist<br />

paradigm, while more phenomenological and<br />

interpretive paradigms of social science research<br />

will emphasize the importance of settings, of<br />

individual perceptions, of attitudes, in short, of<br />

‘authentic’ testing (e.g. by using non-contrived,<br />

non-artificial forms of test data, for example<br />

portfolios, documents, course work, tasks that<br />

are stronger in realism and more ‘hands on’).<br />

Though this latter adopts a view which is closer<br />

to assessment rather than narrowly ‘testing’,<br />

nevertheless the two overlap, both can yield marks,<br />

grades and awards, both can be formative as well<br />

as summative, both can be criterion-referenced.<br />

With regard to validity, it is important to note<br />

here that an effective test will adequately ensure<br />

the following:<br />

<br />

Content validity (e.g. adequate and representative<br />

coverage of programme and test objectives<br />

in the test items, a key feature of domain<br />

sampling): this is achieved by ensuring that<br />

the content of the test fairly samples the class<br />

or fields of the situations or subject matter

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!