01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

498 Modell<strong>in</strong>g categorical data<br />

When there are only two response categories, (14.9) and (14.10) are entirely<br />

equivalent, and both the cumulative logits model and the adjacent categories<br />

model reduce to ord<strong>in</strong>ary logistic regression. In the more general case,<br />

with more than two categories, computer programs are available for estimation<br />

of the coefficients. For example, SAS CATMOD uses weighted least<br />

squares.<br />

The mean response model<br />

Suppose that scores x are assigned to the categories, as <strong>in</strong> §15.2, and denote by<br />

M…x† the mean score for <strong>in</strong>dividuals with explanatory variables x. The model<br />

specifies the same l<strong>in</strong>ear relation as <strong>in</strong> multiple regression<br />

M…x† ˆa ‡ b 0 x: …14:11†<br />

The approach is thus a generalization of that underly<strong>in</strong>g the comparison of<br />

mean scores by (15.8) <strong>in</strong> the simple two-group case. In the general case the<br />

regression coefficients cannot be estimated accurately by standard multiple<br />

regression methods, because there may be large departures from normality<br />

and disparities <strong>in</strong> variance. Nor can exact variances such as (15.5) be easily<br />

exploited.<br />

Choice of model<br />

The choice between the models described briefly above, or any others, is largely<br />

empirical: which is the most convenient to use, and which best describes the<br />

data? There is no universally best choice. The two logistic models attempt<br />

to describe the relative frequencies of observations <strong>in</strong> the various categories,<br />

and their adequacy for any particular data set may be checked by compar<strong>in</strong>g<br />

observed and expected frequencies. The mean response model is less<br />

search<strong>in</strong>g, s<strong>in</strong>ce it aims to describe only the mean values. It may, therefore, be a<br />

little more flexible <strong>in</strong> fitt<strong>in</strong>g data, and particularly appropriate where there is a<br />

natural underly<strong>in</strong>g cont<strong>in</strong>uous response variate or scor<strong>in</strong>g system, but<br />

less appropriate when the f<strong>in</strong>e structure of the categorical response is under<br />

study.<br />

Further descriptions of these models are given <strong>in</strong> Agresti (1990, Chapter 9),<br />

and an application to repeated measures data is described <strong>in</strong> Agresti (1989). An<br />

example relat<strong>in</strong>g alcohol consumption <strong>in</strong> eight ordered categories to biochemical<br />

and haematological variables was discussed by Ashby et al. (1986). These<br />

authors also discussed a test of goodness of fit, which is essentially an extension<br />

of the Hosmer±Lemeshow test, and a method of allocat<strong>in</strong>g an <strong>in</strong>dividual to one of<br />

the groups.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!