01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

494 Modell<strong>in</strong>g categorical data<br />

<strong>in</strong> a smaller number of dist<strong>in</strong>ct covariate patterns; this is particularly likely to be<br />

the case if the covariates are categorical variables with just a few levels. The value<br />

of the deviance is unaltered by group<strong>in</strong>g <strong>in</strong>to covariate patterns, but the degrees<br />

of freedom are equal to the number of covariate patterns less the number of<br />

parameters fitted.<br />

For <strong>in</strong>dividual data that do not reduce to a smaller number of covariate<br />

patterns, tests based on group<strong>in</strong>g the data may be constructed. For a logistic<br />

regression, group<strong>in</strong>g could be by the estimated probabilities and a x 2 test<br />

produced by compar<strong>in</strong>g observed and expected frequencies (Lemeshow & Hosmer,<br />

1982; Hosmer & Lemeshow, 1989, §5.2.2). In this test the <strong>in</strong>dividuals are<br />

ranked <strong>in</strong> terms of the size of the estimated probability, P, obta<strong>in</strong>ed from the<br />

fitted logistic regression model. The <strong>in</strong>dividuals are then divided <strong>in</strong>to g groups;<br />

often g ˆ 10. One way of do<strong>in</strong>g this is to have the groups of equal sizeÐthat is,<br />

the first 10% of subjects are <strong>in</strong> the first group, etc. Another way is to def<strong>in</strong>e the<br />

groups <strong>in</strong> terms of the estimated probabilities so that the first group conta<strong>in</strong>s<br />

those with estimated probabilities less than 0 1, the second 0 1to02, etc. A g 2<br />

table is then formed, <strong>in</strong> which the columns represent the two categories of the<br />

dichotomous outcome variable, conta<strong>in</strong><strong>in</strong>g the observed and expected numbers<br />

<strong>in</strong> each cell. The expected numbers for each group are the sum of the estimated<br />

probabilities, P, and the sum of 1 P, for all the <strong>in</strong>dividuals <strong>in</strong> that group. A x 2<br />

goodness-of-fit statistic is then calculated (11.73). Based on simulations, Hosmer<br />

and Lemeshow (1980) showed that this test statistic is distributed approximately<br />

as a x 2 with g 2 degrees of freedom. This test can be modified when some<br />

<strong>in</strong>dividuals have the same covariate pattern (Hosmer & Lemeshow, 1989, §5.2.2),<br />

provided that the total number of covariate patterns is not too different from the<br />

total number of <strong>in</strong>dividuals.<br />

Diagnostic methods based on residuals similar to those used <strong>in</strong> classical<br />

regression (§11.9) can be applied. If the data are already grouped, as <strong>in</strong> Example<br />

14.1, then standardized residuals can be produced and assessed, where each<br />

residual is standardized by its estimated standard error. In logistic regression<br />

the standardized residual is<br />

r n^m<br />

p<br />

‰n^m…1 ^m† Š ,<br />

where there are r events out of n. For <strong>in</strong>dividual data the residual may be def<strong>in</strong>ed<br />

us<strong>in</strong>g the above expression, with r either 0 or 1, but the <strong>in</strong>dividual residuals are of<br />

little use s<strong>in</strong>ce they are not distributed normally and cannot be assessed <strong>in</strong>dividually.<br />

For example, if ^m ˆ 0 01, the only possible values of the standardized<br />

residual are 9 9 and 0 1; the occurrence of the larger residual does not<br />

necessarily <strong>in</strong>dicate an outly<strong>in</strong>g po<strong>in</strong>t, and if accompanied by 99 of the smaller<br />

residuals the fit would be perfect. It is, therefore, necessary to group the residuals,<br />

def<strong>in</strong><strong>in</strong>g groups as <strong>in</strong>dividuals with similar values of the xi.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!