13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.2 GENERALIZED LINEAR MODELS FOR BINARY DATA 69<br />

If we ignored the binary nature of Y and used ordinary regression, the estimates<br />

of the parameters would be the least squares estimates. They are the ML estimates<br />

under the assumption of a normal response. These estimates exist, because for a<br />

normal response an estimated mean of Y can be any real number and is not restricted<br />

<strong>to</strong> the 0–1 range. Of course, an assumption of normality for a binary response is not<br />

sensible; when ML fitting with the binomial assumption fails, the least squares method<br />

is also likely <strong>to</strong> give estimated probabilities outside the 0–1 range for some x values.<br />

3.2.2 Example: Snoring and Heart Disease<br />

Table 3.1 is based on an epidemiological survey of 2484 subjects <strong>to</strong> investigate snoring<br />

as a possible risk fac<strong>to</strong>r for heart disease. The subjects were classified according <strong>to</strong><br />

their snoring level, as reported by their spouses. The linear probability model states<br />

that the probability of heart disease π(x) is a linear function of the snoring level x.<br />

We treat the rows of the table as independent binomial samples with probabilities<br />

π(x). We use scores (0, 2, 4, 5) for x = snoring level, treating the last two snoring<br />

categories as closer than the other adjacent pairs.<br />

Software for GLMs reports the ML model fit, ˆπ = 0.0172 + 0.0198x. For<br />

example, for nonsnorers (x = 0), the estimated probability of heart disease is<br />

ˆπ = 0.0172 + 0.0198(0) = 0.0172. The estimated values of E(Y) for a GLM are<br />

called fitted values. Table 3.1 shows the sample proportions and the fitted values for<br />

the linear probability model. Figure 3.1 graphs the sample proportions and fitted values.<br />

The table and graph suggest that the model fits these data well. (Section 5.2.2<br />

discusses goodness-of-fit analyses for binary-response GLMs.)<br />

The model interpretation is simple. The estimated probability of heart disease<br />

is about 0.02 (namely, 0.0172) for nonsnorers; it increases 2(0.0198) = 0.04 for<br />

occasional snorers, another 0.04 for those who snore nearly every night, and another<br />

0.02 for those who always snore. This rather surprising effect is significant, as the<br />

standard error of ˆβ = 0.0198 equals 0.0028.<br />

Suppose we had chosen snoring-level scores with different relative spacings than<br />

the scores {0, 2, 4, 5}. Examples are {0, 2, 4, 4.5} or {0, 1, 2, 3}. Then the fitted values<br />

would change somewhat. They would not change if the relative spacings between<br />

Table 3.1. Relationship Between Snoring and Heart Disease<br />

Heart Disease<br />

Proportion Linear Logit Probit<br />

Snoring Yes No Yes Fit Fit Fit<br />

Never 24 1355 0.017 0.017 0.021 0.020<br />

Occasional 35 603 0.055 0.057 0.044 0.046<br />

Nearly every night 21 192 0.099 0.096 0.093 0.095<br />

Every night 30 224 0.118 0.116 0.132 0.131<br />

Note: Model fits refer <strong>to</strong> proportion of “yes” responses.<br />

Source: P. G. Nor<strong>to</strong>n and E. V. Dunn, Br. Med. J., 291: 630–632, 1985, published by BMJ<br />

Publishing Group.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!