29.06.2013 Views

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

and with the summed score of the total of items in<br />

the same scale.<br />

Internal consistency can be measured in a number<br />

of different ways. One approach – split-half reliability<br />

– is randomly to divide the items in a scale<br />

into two groups and to assess the degree of agreement<br />

between the two halves. The two halves<br />

should correlate highly. An extension of this<br />

principle is Coefficient alpha, usually referred<br />

to as Cronbach’s alpha, which essentially estimates<br />

the average level of agreement of all the possible<br />

ways of performing split-half tests (Cronbach,<br />

1951). The higher the alpha, the higher the<br />

internal consistency. However, it is also possible<br />

to increase Cronbach’s alpha by increasing the<br />

number of items, even if the average level of correlation<br />

does not change (Streiner and Norman,<br />

1995). Also if the items of a scale correlate perfectly<br />

with each other, it is likely that there is some redundancy<br />

among items, and also a possibility that the<br />

items together are addressing a rather narrow<br />

aspect of an attribute. For these reasons, it is<br />

suggested that Cronbach’s alpha should be above<br />

0.70 but not higher than 0.90 (Nunnally and<br />

Bernstein, 1994; Streiner and Norman, 1995).<br />

Another approach to establish internal consistency<br />

of items is simply to examine the correlation of<br />

individual items to the scale as a whole, omitting<br />

that item. Steiner and Norman (1995) cite a normal<br />

rule of thumb that items should correlate<br />

at least 0.20 with the scale.<br />

A balance needs to be struck between satisfactory<br />

internal consistency and a measure that is too<br />

homogeneous because it measures a very restricted<br />

aspect of a phenomenon. Kessler and Mroczek<br />

(1995) provide an important argument with<br />

illustrative evidence against excessive emphasis<br />

upon internal reliability. Essentially, they advocate<br />

a shift toward selecting items in a scale in a way<br />

that maximises the additional information content<br />

of each item, a principle of minimal redundancy.<br />

As Kessler and Mroczek argue, investigators usually<br />

use factor-analytic techniques to identify items<br />

that particularly correlate and therefore yield<br />

high internal reliability. They argue for the use<br />

of regression and related techniques to replace<br />

factor analytic methods of developing scales. Their<br />

argument begins with a hypothetical long list of<br />

questionnaire items that together may be considered<br />

the complete and true measure of some<br />

phenomenon, say pain, or mobility. The objective<br />

of scale development is to produce a small sub-set<br />

that reliably measures the full set. Factor analysis<br />

will identify the items from the longer set that<br />

<strong>Health</strong> Technology Assessment 1998; Vol. 2: No. 14<br />

most correlate with each other to produce an<br />

internally reliable scale. If, however, one conceives<br />

of the full set of items as the dependent variable<br />

and uses regression analysis with individual items<br />

as the independent variables, it is possible to<br />

identify a small sub-set of items that explains most<br />

of the variance in the full set. As they express it,<br />

‘most of the variance in the long form can then<br />

usually be reproduced with a small subset of the<br />

scale items’ (Kessler and Mroczek, 1995: AS112).<br />

The argument is then illustrated with data comprising<br />

32 items used to screen for psychological<br />

distress in a population survey. They select items<br />

to form a short version of the scale from the full<br />

set by two methods: factor analytic methods to<br />

maximise internal reliability and regression to<br />

minimise redundancy. Results from regression<br />

produce consistently higher correlations of the<br />

sub-scale with the total variance in the full set<br />

of 32 items. On the other hand, factor analytic<br />

techniques produce consistently higher<br />

internal reliability.<br />

It has been argued that excessive attention to<br />

internal reliability can result in the omission of<br />

important items, particularly those that reflect<br />

the complexity and diversity of a phenomenon<br />

(Donovan et al., 1993). Certainly, obtaining the<br />

highest possible reliability coefficient should not<br />

be the sole objective in developing or selecting an<br />

instrument because the reductio ad absurdum of<br />

this principle would be an instrument with high<br />

reliability produced by virtually identical items.<br />

Reproducibility<br />

Reproducibility more directly evaluates whether<br />

an instrument yields the same results on repeated<br />

applications, when respondents have not changed<br />

on the domain being measured. This is assessed<br />

by test–retest reliability. The degree of agreement<br />

is examined between scores at a first assessment<br />

and when reassessed. There is no exact agreement<br />

about the length of time that should elapse; it<br />

needs to be a sufficient length of time that<br />

respondents are unlikely to recall their previous<br />

answers, but not so long that actual changes in<br />

the underlying dimension of health have<br />

occurred. Streiner and Norman (1995) suggest<br />

that the usual range of time elapsed between<br />

assessments tends to be between 2 and 14 days.<br />

One way of checking whether the sample has<br />

experienced underlying changes in health that<br />

would reduce the apparent reproducibility of<br />

an instrument is also to administer a transition<br />

question at the second assessment (‘Is your<br />

health better, the same or worse than at the<br />

last assessment?’).<br />

23

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!