29.06.2013 Views

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

Evaluating Patient-Based Outcome Measures - NIHR Health ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

26<br />

Criteria for selecting a patient-based outcome measure<br />

the product of these variables’ reliability coefficients.<br />

Therefore, given typical levels of reliability<br />

of patient-based variables, a correlation coefficient<br />

of 0.60 may be strong evidence in support of<br />

construct validity (McDowell and Newell, 1996).<br />

Because there is considerable vagueness and<br />

variability in the levels of correlation coefficients<br />

that authors cite as evidence of construct validity<br />

of new instruments, McDowell and Jenkinson<br />

(1996) recommend that expected correlations<br />

should be specified at the outset of studies to test<br />

instruments’ validity in order that it be possible<br />

for validity to be disproved.<br />

The most sophisticated form of testing construct<br />

validity is so-called ‘convergent and discriminant<br />

validity’ (Campbell and Fiske, 1959). This approach<br />

requires postulating that an instrument that we<br />

wish to test should have stronger relationships<br />

with some variables and weaker relationships with<br />

others. A new measure of mobility should correlate<br />

more strongly with existing measures of physical<br />

disability than with existing measures of emotional<br />

well-being. Essentially correlations are expected to<br />

be strongest with most related constructs and weakest<br />

with most distally related constructs. Typically,<br />

construct validity is examined by inspecting correlations<br />

of a new measure against a range of other<br />

evidence such as, disease staging, performance<br />

status, clinical or laboratory evidence of disease<br />

severity, illness behaviour, use of health services and<br />

related constructs of well-being (Spitzer et al., 1981;<br />

Fletcher, 1988; Sullivan et al., 1990; Aaronson et al.,<br />

1993). As an example of convergent and discriminant<br />

validation, Sullivan and colleagues (1990)<br />

examined validity of the scales of the SIP for use in<br />

rheumatoid arthritis with the expectation that the<br />

SIP physical function score should correlate most<br />

with various measures of disease severity and the<br />

SIP psychosocial scale should correlate most with<br />

other measures of mood and psychological wellbeing.<br />

Conversely, measures of physical function<br />

were expected to correlate less with measures of<br />

physical mood. Similarly, Morrow and colleagues<br />

(1992) examined the convergent-discriminant<br />

validity of the Functional Living Index-Cancer<br />

(FLIC) by examining relationships to other<br />

variables. As predicted the subscales of FLIC to<br />

measure gastrointestinal symptoms was significantly<br />

related to other ratings by patients of nausea and<br />

vomiting, but correlations were close to zero<br />

between psychological and social subscales of<br />

FLIC and patients’ separate reports of nausea<br />

and vomiting.<br />

The most demanding form of convergent and<br />

discriminant validity is the multitrait-multimethod<br />

matrix (Campbell and Fiske, 1959) in which two<br />

unrelated constructs are measured by two or more<br />

methods. In essence, it is expected that different<br />

measures of a single underlying trait should<br />

correlate most and different measures of different<br />

constructs least. Whilst potentially powerful in<br />

psychometric test development, it is difficult to<br />

apply to patient-based outcome measures because<br />

of problems of obtaining alternative methods of<br />

measuring constructs.<br />

Most instruments to assess outcomes from the<br />

patient’s point of view are multi-dimensional.<br />

They, for example, assess physical, psychological<br />

and social aspects of an illness within one questionnaire.<br />

This internal structure of an instrument can<br />

also be considered a set of assumed relationships<br />

between underlying constructs. At the very least,<br />

an instrument with sub-scales has implied that the<br />

instrument measures different underlying constructs<br />

by providing different sub-scales, rather than<br />

requiring that all items should simply be added to<br />

produce one score of one underlying construct.<br />

Normally instruments with multiple scales also<br />

assume particular underlying relationships for<br />

the constructs measured by the instrument; for<br />

example, scales of different aspects of emotional<br />

response to illness will correlate more with each<br />

other than with scales assessing physical function.<br />

This internal structure of instruments has also to<br />

be established by construct validation. The most<br />

common of methods for this purpose is statistical,<br />

particularly the use of factor analysis. Thus, factor<br />

analysis is often considered an aspect of<br />

construct validity.<br />

To simplify, factor analysis is the analysis of patterns<br />

of, in this field, items that go together to assess<br />

single underlying constructs. Typically, statistical<br />

analysis of answers of a sample of respondents to<br />

a pool of questionnaire items is used to reveal two<br />

or more sub-scales assessing distinct dimensions.<br />

In essence, the data can be checked to see whether<br />

individual questionnaire items correlate more with<br />

the scale of which they are a part and less with<br />

other scales to which they do not belong. Examples<br />

of instruments in which factor analysis has played<br />

a key role in establishing the internal structure<br />

of sub-scales include the Profile of Mood States<br />

(McNair et al., 1992) and the St George’s<br />

Respiratory Questionnaire (Jones et al., 1992).<br />

However, there are problems with factor analytic<br />

methods used in this context. Fayers and Hand<br />

(1997) provide important arguments and evidence<br />

against excessive reliance upon factor analysis<br />

alone to determine or evaluate the construct

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!