13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

120 LOGISTIC REGRESSION<br />

1.2, based on df = 1. The evidence of interaction is not strong (P = 0.28). Although<br />

the sample slopes for the width effect are quite different for the two colors, the sample<br />

had only 24 crabs of dark color. So, effects involving it have relatively large standard<br />

errors.<br />

Fitting the interaction model is equivalent <strong>to</strong> fitting the logistic regression model<br />

with width as the predic<strong>to</strong>r separately for the crabs of each color. The reduced model<br />

has the advantage of simpler interpretations.<br />

4.5 SUMMARIZING EFFECTS IN LOGISTIC REGRESSION<br />

We have interpreted effects in logistic regression using multiplicative effects on the<br />

odds, which correspond <strong>to</strong> odds ratios. However, many find it difficult <strong>to</strong> understand<br />

odds ratios.<br />

4.5.1 Probability-Based Interpretations<br />

For a relatively small change in a quantitative predic<strong>to</strong>r, Section 4.1.1 used a straight<br />

line <strong>to</strong> approximate the change in the probability. This simpler interpretation applies<br />

also with multiple predic<strong>to</strong>rs.<br />

Consider a setting of predic<strong>to</strong>rs at which P(Y ˆ = 1) =ˆπ. Then, controlling for the<br />

other predic<strong>to</strong>rs, a 1-unit increase in xj corresponds approximately <strong>to</strong> a ˆβj ˆπ(1 −ˆπ)<br />

change in ˆπ. For example, for the horseshoe crab data with predic<strong>to</strong>rs x = width<br />

and an indica<strong>to</strong>r c that is 0 for dark crabs and 1 otherwise, logit( ˆπ) =−12.98 +<br />

1.300c + 0.478x. When ˆπ = 0.50, the approximate effect on ˆπ of a 1 cm increase in<br />

x is (0.478)(0.50)(0.50) = 0.12. This is considerable, since a 1 cm change in width<br />

is less than half its standard deviation (which is 2.1 cm).<br />

This straight-line approximation deteriorates as the change in the predic<strong>to</strong>r values<br />

increases. More precise interpretations use the probability formula directly. One way<br />

<strong>to</strong> describe the effect of a predic<strong>to</strong>r xj sets the other predic<strong>to</strong>rs at their sample means<br />

and finds ˆπ at the smallest and largest xj values. The effect is summarized by reporting<br />

those ˆπ values or their difference. However, such summaries are sensitive <strong>to</strong> outliers<br />

on xj . To obtain a more robust summary, it is more sensible <strong>to</strong> use the quartiles of the<br />

xj values.<br />

For the prediction equation logit( ˆπ) =−12.98 + 1.300c + 0.478x, the sample<br />

means are 26.3 cm for x = width and 0.873 for c = color. The lower and upper quartiles<br />

of x are LQ = 24.9 cm and UQ = 27.7 cm. At x = 24.9 and c =¯c, ˆπ = 0.51.<br />

At x = 27.7 and c =¯c, ˆπ = 0.80. The change in ˆπ from 0.51 <strong>to</strong> 0.80 over the middle<br />

50% of the range of width values reflects a strong width effect. Since c takes only<br />

values 0 and 1, one could instead report this effect separately for each value of c rather<br />

than just at its mean.<br />

To summarize the effect of an indica<strong>to</strong>r explana<strong>to</strong>ry variable, it makes sense <strong>to</strong><br />

report the estimated probabilities at its two values rather than at quartiles, which<br />

could be identical. For example, consider the color effect in the prediction equation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!