12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CONSTRUCTING A TEST 431<br />

(e.g. to assign grades) is to lose the very purpose<br />

of the criterion-referencing (Gipps 1994: 85). For<br />

example, if a student is awarded a grade E for<br />

spelling in English, and a grade A for imaginative<br />

writing, this could be aggregated into a C grade as<br />

an overall grade of the student’s English language<br />

competence, but what does this C grade mean It<br />

is meaningless, it has no frame of reference or clear<br />

criteria, it loses the useful specificity of the A and<br />

Egrades,itisacompromisethatactuallytellsus<br />

nothing. Further, aggregating such grades assumes<br />

equal levels of difficulty of all items.<br />

Of course, raw scores are still open to<br />

interpretation – which is a matter of judgement<br />

rather than exactitude or precision (Wiliam<br />

1996). For example, if a test is designed to<br />

assess ‘mastery’ of a subject, then the researcher is<br />

faced with the issue of deciding what constitutes<br />

‘mastery’ – is it an absolute (i.e. very high score) or<br />

are there gradations, and if the latter, then where<br />

do these gradations fall For published tests the<br />

scoring is standardized and already made clear, as<br />

are the conversions of scores into, for example,<br />

percentiles and grades.<br />

Underpinning the discussion of scoring is the<br />

need to make it unequivocally clear exactly what<br />

the marking criteria are – what will and will<br />

not score points. This requires a clarification of<br />

whether there is a ‘checklist’ of features that must<br />

be present in a student’s answer.<br />

Clearly criterion-referenced tests will have<br />

to declare their lowest boundary – a cut-off<br />

point – below which the student has been deemed<br />

to fail to meet the criteria. A compromise can be<br />

seen in those criterion-referenced tests that award<br />

different grades for different levels of performance<br />

of the same task, necessitating the clarification<br />

of different cut-off points in the examination. A<br />

common example of this can be seen in the GCSE<br />

examinations for secondary school pupils in the<br />

United Kingdom, where students can achieve a<br />

grade between A and F for a criterion-related<br />

examination.<br />

The determination of cut-off points has been addressed<br />

by Nedelsky (1954), Angoff (1971), Ebel<br />

(1979) and Linn (1993). Angoff (1971) suggests<br />

amethodfordichotomouslyscoreditems.Here<br />

judges are asked to identify the proportion of<br />

minimally acceptable persons who would answer<br />

each item correctly. The sum of these proportions<br />

would then be taken to represent the minimally<br />

acceptable score. An elaborated version of this<br />

principle comes from Ebel (1979). Here a difficulty<br />

by relevance matrix is constructed for all<br />

the items. Difficulty might be assigned three levels<br />

(e.g. easy, medium and hard) and relevance might<br />

be assigned three levels (e.g. highly relevant, moderately<br />

relevant, barely relevant). When each and<br />

every test item has been assigned to the cells of<br />

the matrix, the judges estimate the proportion of<br />

items in each cell that minimally acceptable persons<br />

would answer correctly, with the standard<br />

for each judge being the weighted average of the<br />

proportions in each cell (which are determined by<br />

the number of items in each cell). In this method<br />

judges have to consider two factors – relevance<br />

and difficulty (unlike Angoff (1971), where only<br />

difficulty featured). What characterizes these approaches<br />

is the trust that they place in experts in<br />

making judgements about levels (e.g. of difficulty,<br />

or relevance, or proportions of successful achievement),<br />

that is they are based on fallible human<br />

subjectivity.<br />

Ebel (1979) argues that one principle in<br />

assignation of grades is that they should represent<br />

equal intervals on the score scales. Reference is<br />

made to median scores and standard deviations,<br />

median scores because it is meaningless to<br />

assume an absolute zero on scoring, and standard<br />

deviations as the unit of convenient size for<br />

inclusion of scores for each grade (see also Cohen<br />

and Holliday 1996). One procedure is thus:<br />

<br />

<br />

Calculate the median and standard deviation<br />

of the scores.<br />

Determine the lower score limits of the mark<br />

intervals using the median and the standard<br />

deviation as the unit of size for each grade.<br />

However, the issue of cut-off scores is complicated<br />

by the fact that they may vary according to<br />

the different purposes and uses of scores (e.g.<br />

for diagnosis, for certification, for selection, for<br />

programme evaluation, as these purposes will affect<br />

the number of cut-off points and grades, and the<br />

Chapter 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!