Evaluating Patient-Based Outcome Measures - NIHR Health ...
Evaluating Patient-Based Outcome Measures - NIHR Health ...
Evaluating Patient-Based Outcome Measures - NIHR Health ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
‘yes’ or ‘no’. Binary response categories have the<br />
advantage of simplicity but there is evidence that<br />
they do not allow respondents to report degrees<br />
of difficulty or severity that they experience and<br />
consider important to distinguish (Donovan et al.,<br />
1993). Many instruments therefore allow for<br />
gradations of response, most commonly in<br />
the form of a Likert set of response categories:<br />
– strongly agree<br />
– agree<br />
– uncertain<br />
– disagree<br />
– strongly disagree<br />
or some equivalent set of ordinally related items:<br />
– very satisfied<br />
– satisfied<br />
– neither satisfied nor dissatisfied<br />
– dissatisfied<br />
– very dissatisfied<br />
Alternatively, response categories may require that<br />
respondents choose between different options of<br />
how frequently a problem occurs.<br />
There is some evidence that there is increased<br />
precision from using seven rather than five<br />
response categories. A sample of older indviduals<br />
with heart problems were assigned to questionnaires<br />
assessing satisfaction with various domains<br />
of life with either five or seven item response<br />
categories (Avis and Smith, 1994). The latter<br />
showed higher correlations with a criterion<br />
measure of QoL completed by respondents.<br />
However there is little evidence in the literature<br />
of increased precision beyond seven categories.<br />
The main alternative to Likert format response<br />
categories is the visual analogue scale, which<br />
would appear to offer considerably more precision.<br />
Respondents can mark any point on a continuous<br />
line to represent their experience and in principal<br />
this offers an extensive range of response categories.<br />
However, the evidence is not strong that<br />
the apparent precision is meaningful (Nord,<br />
1991). Guyatt and colleagues (1987a) compared<br />
the responsiveness of a health-related QoL<br />
measure for respiratory function, using alternate<br />
forms of a Likert and visual analogue scale. They<br />
found no significant advantage for the visual<br />
analogue scale. Similar results were found in a<br />
randomised trial setting, showing no advantage<br />
in responsiveness for visual analogue scales<br />
(Jaeschke et al., 1990). An additional concern<br />
cited earlier is the somewhat lower acceptability<br />
<strong>Health</strong> Technology Assessment 1998; Vol. 2: No. 14<br />
of visual analogue scales as a task. Overall, firm<br />
empirical evidence of superiority of visual analogue<br />
scales over Likert scales is difficult to find<br />
(Remington et al., 1979).<br />
Precision of numerical values<br />
To be of use in clinical trials, what patients<br />
report in health status measures is generally<br />
transformed into numerical values or codes<br />
that, on the one hand, most accurately reflect<br />
differences between individuals and changes<br />
within individuals over time and, on the other<br />
hand make possible statistical analysis of the size<br />
and importance of results. Clearly philosophical<br />
and epistemological issues can be raised about<br />
this process of assigning numerical values to<br />
subjective experience (Nordenfelt, 1994). These<br />
issues must be acknowledged but are beyond the<br />
scope of this review to address. Instead, we need<br />
to examine how the field has drawn upon psychometric,<br />
social scientific and statistical principles to<br />
produce pragmatically plausible numerical values<br />
as accurately as possible to capture subjective<br />
experiences that may in some way be related<br />
to health care interventions.<br />
Two basically different methods of numerical<br />
scoring can be found amongst health status<br />
measures. On the one hand, the majority of instruments<br />
use somewhat arbitrary but common-sense<br />
based methods of simple ordinal values. For<br />
example, many instruments use Likert format<br />
response categories where degrees of agreement<br />
with a statement are given progressively<br />
lower values:<br />
strongly agree = 1; agree = 2; neither agree nor<br />
disagree = 3; disagree = 4, strongly disagree = 5.<br />
The direction of such values is entirely arbitrary,<br />
and can be reversed so that greater agreement is<br />
given higher numerical value.<br />
It is worth noting that some instruments such<br />
as SF-36 recode numerical values so that items are<br />
expressed as percentages or proportions of the<br />
total scale score. To take a hypothetical example,<br />
an instrument may have six alternative responses<br />
for an assessment of pain, ranging in severity from,<br />
let us say, ‘no pain at all’ through to ‘severe pain<br />
all of the time’. Instead of scoring responses ‘1’,<br />
‘2’, ‘3’ and so on, the scores may be transformed<br />
into percentages of a total: ‘17%’, ‘33%’, ‘50%’.<br />
Although this approach produces a range of values<br />
between 0 and 100, the simple and limited basis<br />
from which values are derived should be kept in<br />
mind. In particular, while it might appear that<br />
33