13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

146 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS<br />

freedom, called the residual df for the model, subtract the number of parameters in<br />

the model from the number of parameters in the saturated model. The number of<br />

parameters in the saturated model equals the number of settings of the predic<strong>to</strong>rs,<br />

which is the number of binomial observations for the data in the grouped form of the<br />

contingency table. Large X 2 (M) or G 2 (M) values provide evidence of lack of fit.<br />

The P -value is the right-tail probability.<br />

We illustrate by checking the model Section 4.3.2 used for the data on AIDS<br />

symp<strong>to</strong>ms (y = 1, yes), AZT use, and race, shown again in Table 5.4. Let x = 1 for<br />

those who <strong>to</strong>ok AZT immediately and x = 0 otherwise, and let z = 1 for whites and<br />

z = 0 for blacks. The ML fit is<br />

logit( ˆπ) =−1.074 − 0.720x + 0.056z<br />

The model assumes homogeneous association (Section 2.7.6), with odds ratio between<br />

each predic<strong>to</strong>r and the response the same at each category of the other variable. Is<br />

this assumption plausible?<br />

Table 5.4. Development of AIDS Symp<strong>to</strong>ms by AZT Use and Race<br />

Symp<strong>to</strong>ms<br />

Race AZT Use Yes No Total<br />

White Yes 14 93 107<br />

No 32 81 113<br />

Black Yes 11 52 63<br />

No 12 43 55<br />

For this model fit, white veterans with immediate AZT use had estimated probability<br />

0.150 of developing AIDS symp<strong>to</strong>ms during the study. Since 107 white veterans<br />

<strong>to</strong>okAZT, the fitted number developing symp<strong>to</strong>ms is 107(0.150) = 16.0, and the fitted<br />

number not developing symp<strong>to</strong>ms is 107(0.850) = 91.0. Similarly, one can obtain<br />

fitted values for all eight cells in Table 5.4. Substituting these and the cell counts<br />

in<strong>to</strong> the goodness-of-fit statistics, we obtain G 2 (M) = 1.38 and X 2 (M) = 1.39. The<br />

model applies <strong>to</strong> four binomial observations, one at each of the four combinations of<br />

AZT use and race. The model has three parameters, so the residual df = 4 − 3 = 1.<br />

The small G 2 and X 2 values suggest that the model fits decently (P = 0.24).<br />

5.2.3 Checking Fit: Grouped <strong>Data</strong>, Ungrouped <strong>Data</strong>, and<br />

Continuous Predic<strong>to</strong>rs<br />

The beginning of Section 4.2 noted that, with categorical predic<strong>to</strong>rs, the data file<br />

can have the form of ungrouped data or grouped data. The ungrouped data are the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!