29.06.2013 Views

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

22<br />

Criteria for selecting a patient-based outcome measure<br />

completed both (King et al., 1996). The reason for<br />

the low level of agreement is that the items of one<br />

scale focus upon companionship of family and<br />

friends, whilst the other instrument’s social scale<br />

focuses upon impact of disease on social activities.<br />

The same degree of disparate content was found<br />

in social dimensions of instruments used to assess<br />

well-being in patients with rheumatoid arthritis<br />

(Fitzpatrick et al., 1991). Instruments focusing on<br />

physical function may also differ in less obvious<br />

ways in their content when assessing dimensions<br />

such as physical function about which more agreement<br />

might be expected. For example, the physical<br />

function of patients with rheumatoid arthritis is<br />

assessed in one health status instrument by items<br />

that ask respondents how much help they need<br />

to perform particular tasks, another instrument<br />

addresses similar tasks but questionnaire items<br />

elicit the degree of difficulty experienced by<br />

respondents with tasks (Ziebland et al., 1993).<br />

One commonly recommended solution to<br />

ensure that a trial will have an appropriate set of<br />

outcome measures is that one disease-specific and<br />

one generic instrument be used to assess outcomes<br />

(Cox et al., 1992; Bombardier et al., 1995). In this<br />

way, it is reasonably likely that both important<br />

proximal and distal effects of a treatment will be<br />

captured; detecting the most immediate effects<br />

upon disease as well as possible consequences<br />

that are harder to anticipate.<br />

Summary<br />

In more general terms, appropriateness of an<br />

instrument for a trial will involve considering<br />

the other criteria we have identified and discuss<br />

below; evidence of reliability, feasibility, and so<br />

on. In the more specific terms with which we have<br />

summarised the rather disparate literature on<br />

appropriateness, the term requires that investigators<br />

consider as directly as possible how well the<br />

content of an instrument matches the intended<br />

purpose of their specific trial.<br />

Reliability<br />

Does the instrument produce<br />

results that are reproducible and<br />

internally consistent?<br />

Reliability is concerned with the reproducibility<br />

and internal consistency of a measuring instrument.<br />

It assesses the extent to which the instrument<br />

is free from random error and may be considered<br />

as the amount of a score that is signal rather than<br />

noise. It is a very important property of any<br />

patient-based outcome measure in a clinical<br />

trial because it is essential to establish that any<br />

changes observed in a trial are due to the intervention<br />

and not to problems in the measuring<br />

instrument. As the random error of such a measure<br />

increases, so the size of the sample required to<br />

obtain a precise estimate of effects in a trial will<br />

increase. An unreliable measure may therefore<br />

underestimate the size of benefit obtained from<br />

an intervention. The reliability of a particular<br />

measure is not a fixed property, but is dependent<br />

upon the context and population studied<br />

(Streiner and Norman, 1995).<br />

The degree of reliability required of an instrument<br />

used to assess individuals is higher than that<br />

required to assess groups (Williams and Naylor,<br />

1992; Nunnally and Bernstein, 1994). As is<br />

described below, reliability coefficients of 0.70<br />

may be acceptable for measures in a study of<br />

a group of patients in a clinical trial. However,<br />

Nunnally and Bernstein (1994) recommend that<br />

a reliability level of at least 0.90 is required for<br />

a measure if it is going to be used for decisions<br />

about an individual on the basis of his or her<br />

score. This higher requirement is because the<br />

confidence interval around an individual’s true<br />

score are wide at reliabilities below this recommended<br />

level (Hayes et al., 1993). For a similar<br />

reason Jaeschke and colleagues (1991) express<br />

extreme caution about the interpretation of<br />

QoL scores in N of one trials. Our concern is<br />

with group applications such as in trials where the<br />

confidence interval around an estimate of the<br />

reliability of a measure is increased as sample<br />

size increases.<br />

In practice, the evaluation of reliability is in terms<br />

of two different aspects of a measure: internal consistency<br />

and reproducibility (sometimes referred to<br />

as ‘equivalence’ and ‘stability’ respectively (Bohrnstedt,<br />

1983). The two measures derive from classical<br />

measurement theory which regards any observation<br />

as the sum of two components, a true score and an<br />

error term (Bravo and Potvin, 1991).<br />

Internal consistency<br />

Normally, more than one questionnaire item is<br />

used to measure a dimension or construct. This<br />

is because of a basic principle of measurement<br />

that several related observations will produce a<br />

more reliable estimate than one. For this to be<br />

true, the items all need to be homogeneous, that<br />

is all measuring aspects of a single attribute or<br />

construct rather than different constructs<br />

(Streiner and Norman, 1995). The practical<br />

consequence of this expectation is that individual<br />

items should highly correlate with each other

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!