12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

430 TESTS<br />

the test and its component items. With regard to<br />

the former, in part this is a matter of reliability,<br />

for the time of day or week etc. might influence<br />

how alert, motivated or capable a student might<br />

be. With regard to the latter, the researcher will<br />

need to decide what time restrictions are being<br />

imposed and why; for example, is the pressure<br />

of a time constraint desirable – to show what<br />

a student can do under time pressure – or an<br />

unnecessary impediment, putting a time boundary<br />

around something that need not be bounded (was<br />

Van Gogh put under a time pressure to produce<br />

paintings of sunflowers) (see also Kohn 2000).<br />

Although it is vital that students know what<br />

the overall time allowance is for the test, clearly<br />

it might be helpful to indicate notional time<br />

allowances for different elements of the test; if<br />

these are aligned to the relative weightings of<br />

the test (see the discussions of weighting and<br />

scoring) they enable students to decide where<br />

to place emphasis in the test – they may want<br />

to concentrate their time on the high scoring<br />

elements of the test. Further, if the items of the<br />

test have exact time allowances, this enables a<br />

degree of standardization to be built into the test,<br />

and this may be useful if the results are going to be<br />

used to compare individuals or groups.<br />

Plan the scoring of the test<br />

The awarding of scores for different items of the<br />

test is a clear indication of the relative significance<br />

of each item – the weightings of each item are<br />

addressed in their scoring. It is important to<br />

ensure that easier parts of the test attract fewer<br />

marks than more difficult parts of it, otherwise a<br />

student’s results might be artificially inflated by<br />

answering many easy questions and fewer more<br />

difficult questions (Gronlund and Linn 1990).<br />

Additionally, there are several attractions to<br />

making the scoring of tests as detailed and specific<br />

as possible (Cresswell and Houston 1991; Gipps<br />

1994; Aiken 2003), awarding specific points for<br />

each item and sub-item, for example:<br />

<br />

It enables partial completion of the task to be<br />

recognized – students gain marks in proportion<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

to how much of the task they have completed<br />

successfully (an important feature of domainreferencing).<br />

It enables a student to compensate for doing<br />

badly in some parts of a test by doing well in<br />

other parts of the test.<br />

It enables weightings to be made explicit to<br />

the students.<br />

It enables the rewards for successful completion<br />

of parts of a test to reflect considerations such<br />

as the length of the item, the time required to<br />

complete it, its level of difficulty, its level of<br />

importance.<br />

It facilitates moderation because it is clear and<br />

specific.<br />

It enables comparisons to be made across groups<br />

by item.<br />

It enables reliability indices to be calculated<br />

(see discussions of reliability).<br />

Scores can be aggregated and converted into<br />

grades straightforwardly.<br />

Ebel (1979) argues that the more marks<br />

that are available to indicate different levels of<br />

achievement (e.g. for the awarding of grades),<br />

the greater the reliability of the grades will<br />

be, although clearly this could make the test<br />

longer. Scoring will also need to be prepared<br />

to handle issues of poor spelling, grammar and<br />

punctuation – is it to be penalized, and how will<br />

consistency be assured here Further, how will<br />

issues of omission be treated, e.g. if a student omits<br />

the units of measurement (miles per hour, dollars<br />

or pounds, meters or centimetres)<br />

Related to the scoring of the test is the issue<br />

of reporting the results. If the scoring of a test is<br />

specific then this enables variety in reporting to<br />

be addressed, for example, results may be reported<br />

item by item, section by section, or whole test<br />

by whole test. This degree of flexibility might be<br />

useful for the researcher, as it will enable particular<br />

strengths and weaknesses in groups of students to<br />

be exposed.<br />

The desirability of some of the above points<br />

is open to question. For example, it could be<br />

argued that the strength of criterion-referencing is<br />

precisely its specificity, and that to aggregate data

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!