13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

118 LOGISTIC REGRESSION<br />

4.4.2 Model Comparison <strong>to</strong> Check Whether a Term is Needed<br />

Are certain terms needed in a model? To test this, we can compare the maximized<br />

log-likelihood values for that model and the simpler model without those terms.<br />

To test whether color contributes <strong>to</strong> model (4.11), we test H0: β1 = β2 = β3 = 0.<br />

This hypothesis states that, controlling for width, the probability of a satellite is<br />

independent of color. The likelihood-ratio test compares the maximized log-likelihood<br />

L1 for the full model (4.11) <strong>to</strong> the maximized log-likelihood L0 for the simpler<br />

model in which those parameters equal 0. Table 4.6 shows that the test statistic is<br />

−2(L0 − L1) = 7.0. Under H0, this test statistic has an approximate chi-squared<br />

distribution with df = 3, the difference between the numbers of parameters in the<br />

two models. The P -value of 0.07 provides slight evidence of a color effect. Since the<br />

analysis in the previous subsection noted that estimated probabilities are quite different<br />

for dark-colored crabs, it seems safest <strong>to</strong> leave the color predic<strong>to</strong>r in the model.<br />

4.4.3 Quantitative Treatment of Ordinal Predic<strong>to</strong>r<br />

Color has a natural ordering of categories, from lightest <strong>to</strong> darkest. Model (4.11)<br />

ignores this ordering, treating color as nominal scale. A simpler model treats color in<br />

a quantitative manner. It supposes a linear effect, on the logit scale, for a set of scores<br />

assigned <strong>to</strong> its categories.<br />

To illustrate, we use scores c ={1, 2, 3, 4} for the color categories and fit the model<br />

The prediction equation is<br />

logit[P(Y = 1)] =α + β1c + β2x (4.12)<br />

logit[ ˆ<br />

P(Y = 1)] =−10.071 − 0.509c + 0.458x<br />

The color and width estimates have SE values of 0.224 and 0.104, showing strong<br />

evidence of an effect for each. At a given width, for every one-category increase in<br />

color darkness, the estimated odds of a satellite multiply by exp(−0.509) = 0.60. For<br />

example, the estimated odds of a satellite for dark colored crabs are 60% of those for<br />

medium-dark crabs.<br />

A likelihood-ratio test compares the fit of this model <strong>to</strong> the more complex<br />

model (4.11) that has a separate parameter for each color. The test statistic equals<br />

−2(L0 − L1) = 1.7, based on df = 2. This statistic tests that the simpler model<br />

(4.12) holds, given that model (4.11) is adequate. It tests that the color parameters<br />

in equation (4.11), when plotted against the color scores, follow a linear<br />

trend. The simplification seems permissible (P = 0.44).<br />

The estimates of the color parameters in the model (4.11) that treats color as<br />

nominal scale are (1.33, 1.40, 1.11, 0). The 0 value for the dark category reflects the<br />

lack of an indica<strong>to</strong>r variable for that category. Though these values do not depart<br />

significantly from a linear trend, the first three are similar compared <strong>to</strong> the last one.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!