13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

166 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS<br />

a. When the data for the 27 subjects are 14 binomial observations (for the 14<br />

distinct levels of LI), the deviance for this model is 15.7 with df = 12. Is it<br />

appropriate <strong>to</strong> use this <strong>to</strong> check the fit of the model? Why or why not?<br />

b. The model that also has a quadratic term for LI has deviance = 11.8.<br />

Conduct a test comparing the two models.<br />

c. The model in (b) has fit, logit( ˆπ) =−13.096 + 0.9625(LI ) − 0.0160(LI) 2 ,<br />

with SE = 0.0095 for ˆβ2 =−0.0160. If you know basic calculus, explain<br />

why ˆπ is increasing for LI between 0 and 30. Since LI varies between 8<br />

and 38 in this sample, the estimated effect of LI is positive over most of its<br />

observed values.<br />

d. For the model with only the linear term, the Hosmer–Lemeshow test<br />

statistic = 6.6 with df = 6. Interpret.<br />

5.10 For the horseshoe crab data, fit the logistic regression model with x = weight<br />

as the sole predic<strong>to</strong>r of the presence of satellites.<br />

a. For a classification table using the sample proportion of 0.642 as the cu<strong>to</strong>ff,<br />

report the sensitivity and specificity. Interpret.<br />

b. Form a ROC curve, and report and interpret the area under it.<br />

c. Investigate the model goodness-of-fit using the Hosmer–Lemeshow statistic<br />

or some other model-checking approach. Interpret.<br />

d. Inferentially compare the model <strong>to</strong> the model with x and x 2 as predic<strong>to</strong>rs.<br />

Interpret.<br />

e. Compare the models in (d) using the AIC. Interpret.<br />

5.11 Here is an alternative <strong>to</strong> the Hosmer–Lemeshow goodness-of-fit test when at<br />

least one predic<strong>to</strong>r is continuous: Partition values for the explana<strong>to</strong>ry variables<br />

in<strong>to</strong> a set of regions. Add these regions as a predic<strong>to</strong>r in the model by setting<br />

up dummy variables for the regions. The test statistic compares the fit of<br />

this model <strong>to</strong> the simpler one, testing that the extra parameters are not needed.<br />

Doing this for model (4.11) by partitioning according <strong>to</strong> the eight width regions<br />

in Table 4.11, the likelihood-ratio statistic for testing that the extra parameters<br />

are unneeded equals 7.5, based on df = 7. Interpret.<br />

5.12 Refer <strong>to</strong> Table 7.27 in Chapter 7 with opinion about premarital sex as the<br />

response variable. Use a process (such as backward elimination) or criterion<br />

(such as AIC) <strong>to</strong> select a model. Interpret the parameter estimates for that<br />

model.<br />

5.13 Logistic regression is often applied <strong>to</strong> large financial databases. For example,<br />

credit scoring is a method of modeling the influence of predic<strong>to</strong>rs on the<br />

probability that a consumer is credit worthy. The data archive found under the<br />

index at www.stat.uni-muenchen.de for a textbook by L. Fahrmeir and G. Tutz<br />

(Multivariate Statistical Modelling Based on Generalized Linear Models,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!