12.01.2015 Views

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

RESEARCH METHOD COHEN ok

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CONSTRUCTING A TEST 425<br />

<br />

<br />

<br />

true/false statements<br />

open-ended questions where students are given<br />

guidance on how much to write (e.g. 300 words,<br />

a sentence, a paragraph)<br />

closed questions.<br />

These items can test recall, knowledge, comprehension,<br />

application, analysis, synthesis and evaluation,<br />

i.e. different orders of thinking. These take<br />

their rationale from Bloom (1956) on hierarchies<br />

of thinking – from low order (comprehension, application),<br />

through middle order thinking (analysis,<br />

synthesis) to higher order thinking (evaluation,<br />

judgement, criticism). Clearly the selection of the<br />

form of the test item will be based on the principle<br />

of gaining the maximum amount of information in<br />

the most economical way. This is evidenced in the<br />

use of machine-scorable multiple choice completion<br />

tests, where optical mark readers and scanners<br />

can enter and process large-scale data rapidly.<br />

In considering the contents of a test the test<br />

writer must also consider the scale for some kinds<br />

of test. The notion of a scale (a graded system of<br />

classification) can be created in two main ways<br />

(Howitt and Cramer 2005: 203):<br />

<br />

<br />

Alistofitemswhosemeasurementsgofrom<br />

the lowest to highest (e.g. an IQ test, a measure<br />

of sexism, a measure of aggressiveness), such<br />

that it is possible to judge where a student has<br />

reached on the scale by seeing the maximum<br />

level reached on the items;<br />

The method of ‘summated scores’ (Howitt and<br />

Cramer 2005: 203) in which a pool of items<br />

is created, and the student’s score is the total<br />

score gained by summing the marks for all the<br />

items.<br />

Further, many psychological tests used in<br />

educational research will be unidimensional, that<br />

is, the items all measure a single element or<br />

dimension. Howitt and Cramer (2005: 204) liken<br />

this to weighing 30 people using 10 bathroom<br />

scales, in which one would expect a high<br />

intercorrelation to be found between the bathroom<br />

scales. Other tests may be multidimensional, i.e.<br />

where two or more factors or dimensions are being<br />

measured in the same test. Howitt and Cramer<br />

(2005: 204) liken this to weighing 30 people<br />

using 10 bathroom scales and then measuring<br />

their heights using 5 different tape measures. Here<br />

one would expect a high intercorrelation to be<br />

found between the bathroom scale measures, a<br />

high intercorrelation to be found between the<br />

measurements from the tape measures, and a low<br />

intercorrelation to be found between the bathroom<br />

scale measures and the measurements from the tape<br />

measures, because they are measuring different<br />

things or dimensions.<br />

Test constructors, then, need to be clear<br />

whether they are using a unidimensional or a<br />

multidimensional scale. Many texts, while advocating<br />

the purity of using a unidimensional test<br />

that measures a single construct or concept, also<br />

recognize the efficacy, practicality and efficiency in<br />

using multidimensional tests. For example, though<br />

one might regard intelligence casually as a unidimensional<br />

factor, in fact a stronger measure of<br />

intelligence would be obtained by regarding it as<br />

amultidimensionalconstruct,therebyrequiring<br />

multidimensional scaling. Of course, some items<br />

on a test are automatically unidimensional, for<br />

example age, hours spent on homework.<br />

Further, the selection of the items needs to be<br />

considered in order to have the highest reliability.<br />

Let us say that we have ten items that measure<br />

students’ negative examination stress. Each item<br />

is intended to measure stress, for example:<br />

Item 1: Loss of sleep at examination time.<br />

Item 2: Anxiety at examination time.<br />

Item 3: Irritability at examination time.<br />

Item 4: Depression at examination time.<br />

Item 5: Tearfulness at examination time.<br />

Item 6: Unwillingness to do household chores at<br />

examination time.<br />

Item 7: Mood swings at examination time.<br />

Item 8: Increased consumption of coffee at<br />

examination time.<br />

Item 9: Positive attitude and cheerfulness at<br />

examination time.<br />

Item 10: Eager anticipation of the examination.<br />

You run a reliability test (see Chapter 24 on SPSS<br />

reliability) of internal consistency and find strong<br />

intercorrelations between items 1–5 (e.g. around<br />

Chapter 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!