Evaluating Patient-Based Outcome Measures - NIHR Health ...
Evaluating Patient-Based Outcome Measures - NIHR Health ...
Evaluating Patient-Based Outcome Measures - NIHR Health ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
and with the summed score of the total of items in<br />
the same scale.<br />
Internal consistency can be measured in a number<br />
of different ways. One approach – split-half reliability<br />
– is randomly to divide the items in a scale<br />
into two groups and to assess the degree of agreement<br />
between the two halves. The two halves<br />
should correlate highly. An extension of this<br />
principle is Coefficient alpha, usually referred<br />
to as Cronbach’s alpha, which essentially estimates<br />
the average level of agreement of all the possible<br />
ways of performing split-half tests (Cronbach,<br />
1951). The higher the alpha, the higher the<br />
internal consistency. However, it is also possible<br />
to increase Cronbach’s alpha by increasing the<br />
number of items, even if the average level of correlation<br />
does not change (Streiner and Norman,<br />
1995). Also if the items of a scale correlate perfectly<br />
with each other, it is likely that there is some redundancy<br />
among items, and also a possibility that the<br />
items together are addressing a rather narrow<br />
aspect of an attribute. For these reasons, it is<br />
suggested that Cronbach’s alpha should be above<br />
0.70 but not higher than 0.90 (Nunnally and<br />
Bernstein, 1994; Streiner and Norman, 1995).<br />
Another approach to establish internal consistency<br />
of items is simply to examine the correlation of<br />
individual items to the scale as a whole, omitting<br />
that item. Steiner and Norman (1995) cite a normal<br />
rule of thumb that items should correlate<br />
at least 0.20 with the scale.<br />
A balance needs to be struck between satisfactory<br />
internal consistency and a measure that is too<br />
homogeneous because it measures a very restricted<br />
aspect of a phenomenon. Kessler and Mroczek<br />
(1995) provide an important argument with<br />
illustrative evidence against excessive emphasis<br />
upon internal reliability. Essentially, they advocate<br />
a shift toward selecting items in a scale in a way<br />
that maximises the additional information content<br />
of each item, a principle of minimal redundancy.<br />
As Kessler and Mroczek argue, investigators usually<br />
use factor-analytic techniques to identify items<br />
that particularly correlate and therefore yield<br />
high internal reliability. They argue for the use<br />
of regression and related techniques to replace<br />
factor analytic methods of developing scales. Their<br />
argument begins with a hypothetical long list of<br />
questionnaire items that together may be considered<br />
the complete and true measure of some<br />
phenomenon, say pain, or mobility. The objective<br />
of scale development is to produce a small sub-set<br />
that reliably measures the full set. Factor analysis<br />
will identify the items from the longer set that<br />
<strong>Health</strong> Technology Assessment 1998; Vol. 2: No. 14<br />
most correlate with each other to produce an<br />
internally reliable scale. If, however, one conceives<br />
of the full set of items as the dependent variable<br />
and uses regression analysis with individual items<br />
as the independent variables, it is possible to<br />
identify a small sub-set of items that explains most<br />
of the variance in the full set. As they express it,<br />
‘most of the variance in the long form can then<br />
usually be reproduced with a small subset of the<br />
scale items’ (Kessler and Mroczek, 1995: AS112).<br />
The argument is then illustrated with data comprising<br />
32 items used to screen for psychological<br />
distress in a population survey. They select items<br />
to form a short version of the scale from the full<br />
set by two methods: factor analytic methods to<br />
maximise internal reliability and regression to<br />
minimise redundancy. Results from regression<br />
produce consistently higher correlations of the<br />
sub-scale with the total variance in the full set<br />
of 32 items. On the other hand, factor analytic<br />
techniques produce consistently higher<br />
internal reliability.<br />
It has been argued that excessive attention to<br />
internal reliability can result in the omission of<br />
important items, particularly those that reflect<br />
the complexity and diversity of a phenomenon<br />
(Donovan et al., 1993). Certainly, obtaining the<br />
highest possible reliability coefficient should not<br />
be the sole objective in developing or selecting an<br />
instrument because the reductio ad absurdum of<br />
this principle would be an instrument with high<br />
reliability produced by virtually identical items.<br />
Reproducibility<br />
Reproducibility more directly evaluates whether<br />
an instrument yields the same results on repeated<br />
applications, when respondents have not changed<br />
on the domain being measured. This is assessed<br />
by test–retest reliability. The degree of agreement<br />
is examined between scores at a first assessment<br />
and when reassessed. There is no exact agreement<br />
about the length of time that should elapse; it<br />
needs to be a sufficient length of time that<br />
respondents are unlikely to recall their previous<br />
answers, but not so long that actual changes in<br />
the underlying dimension of health have<br />
occurred. Streiner and Norman (1995) suggest<br />
that the usual range of time elapsed between<br />
assessments tends to be between 2 and 14 days.<br />
One way of checking whether the sample has<br />
experienced underlying changes in health that<br />
would reduce the apparent reproducibility of<br />
an instrument is also to administer a transition<br />
question at the second assessment (‘Is your<br />
health better, the same or worse than at the<br />
last assessment?’).<br />
23