29.06.2013 Views

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

‘yes’ or ‘no’. Binary response categories have the<br />

advantage of simplicity but there is evidence that<br />

they do not allow respondents to report degrees<br />

of difficulty or severity that they experience and<br />

consider important to distinguish (Donovan et al.,<br />

1993). Many instruments therefore allow for<br />

gradations of response, most commonly in<br />

the form of a Likert set of response categories:<br />

– strongly agree<br />

– agree<br />

– uncertain<br />

– disagree<br />

– strongly disagree<br />

or some equivalent set of ordinally related items:<br />

– very satisfied<br />

– satisfied<br />

– neither satisfied nor dissatisfied<br />

– dissatisfied<br />

– very dissatisfied<br />

Alternatively, response categories may require that<br />

respondents choose between different options of<br />

how frequently a problem occurs.<br />

There is some evidence that there is increased<br />

precision from using seven rather than five<br />

response categories. A sample of older indviduals<br />

with heart problems were assigned to questionnaires<br />

assessing satisfaction with various domains<br />

of life with either five or seven item response<br />

categories (Avis and Smith, 1994). The latter<br />

showed higher correlations with a criterion<br />

measure of QoL completed by respondents.<br />

However there is little evidence in the literature<br />

of increased precision beyond seven categories.<br />

The main alternative to Likert format response<br />

categories is the visual analogue scale, which<br />

would appear to offer considerably more precision.<br />

Respondents can mark any point on a continuous<br />

line to represent their experience and in principal<br />

this offers an extensive range of response categories.<br />

However, the evidence is not strong that<br />

the apparent precision is meaningful (Nord,<br />

1991). Guyatt and colleagues (1987a) compared<br />

the responsiveness of a health-related QoL<br />

measure for respiratory function, using alternate<br />

forms of a Likert and visual analogue scale. They<br />

found no significant advantage for the visual<br />

analogue scale. Similar results were found in a<br />

randomised trial setting, showing no advantage<br />

in responsiveness for visual analogue scales<br />

(Jaeschke et al., 1990). An additional concern<br />

cited earlier is the somewhat lower acceptability<br />

<strong>Health</strong> Technology Assessment 1998; Vol. 2: No. 14<br />

of visual analogue scales as a task. Overall, firm<br />

empirical evidence of superiority of visual analogue<br />

scales over Likert scales is difficult to find<br />

(Remington et al., 1979).<br />

Precision of numerical values<br />

To be of use in clinical trials, what patients<br />

report in health status measures is generally<br />

transformed into numerical values or codes<br />

that, on the one hand, most accurately reflect<br />

differences between individuals and changes<br />

within individuals over time and, on the other<br />

hand make possible statistical analysis of the size<br />

and importance of results. Clearly philosophical<br />

and epistemological issues can be raised about<br />

this process of assigning numerical values to<br />

subjective experience (Nordenfelt, 1994). These<br />

issues must be acknowledged but are beyond the<br />

scope of this review to address. Instead, we need<br />

to examine how the field has drawn upon psychometric,<br />

social scientific and statistical principles to<br />

produce pragmatically plausible numerical values<br />

as accurately as possible to capture subjective<br />

experiences that may in some way be related<br />

to health care interventions.<br />

Two basically different methods of numerical<br />

scoring can be found amongst health status<br />

measures. On the one hand, the majority of instruments<br />

use somewhat arbitrary but common-sense<br />

based methods of simple ordinal values. For<br />

example, many instruments use Likert format<br />

response categories where degrees of agreement<br />

with a statement are given progressively<br />

lower values:<br />

strongly agree = 1; agree = 2; neither agree nor<br />

disagree = 3; disagree = 4, strongly disagree = 5.<br />

The direction of such values is entirely arbitrary,<br />

and can be reversed so that greater agreement is<br />

given higher numerical value.<br />

It is worth noting that some instruments such<br />

as SF-36 recode numerical values so that items are<br />

expressed as percentages or proportions of the<br />

total scale score. To take a hypothetical example,<br />

an instrument may have six alternative responses<br />

for an assessment of pain, ranging in severity from,<br />

let us say, ‘no pain at all’ through to ‘severe pain<br />

all of the time’. Instead of scoring responses ‘1’,<br />

‘2’, ‘3’ and so on, the scores may be transformed<br />

into percentages of a total: ‘17%’, ‘33%’, ‘50%’.<br />

Although this approach produces a range of values<br />

between 0 and 100, the simple and limited basis<br />

from which values are derived should be kept in<br />

mind. In particular, while it might appear that<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!