13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

102 LOGISTIC REGRESSION<br />

of observations at each point. It appears that y = 1 occurs relatively more often at<br />

higher x values. Since y takes only values 0 and 1, however, it is difficult <strong>to</strong> determine<br />

whether a logistic regression model is reasonable by plotting y against x.<br />

Better information results from grouping the width values in<strong>to</strong> categories and<br />

calculating a sample proportion of crabs having satellites for each category. This<br />

reveals whether the true proportions follow approximately the trend required by this<br />

model. Consider the grouping shown in Table 4.1. In each of the eight width categories,<br />

we computed the sample proportion of crabs having satellites and the mean width<br />

for the crabs in that category. Figure 4.2 contains eight dots representing the sample<br />

proportions of female crabs having satellites plotted against the mean widths for the<br />

eight categories.<br />

Section 3.3.2 that introduced the horseshoe crab data mentioned that software can<br />

smooth the data without grouping observations. Figure 4.2 also shows a curve based<br />

on smoothing the data using generalized additive models, which allow the effect of x<br />

<strong>to</strong> be much more complex than linear. The eight plotted sample proportions and this<br />

smoothing curve both show a roughly increasing trend, so we proceed with fitting<br />

models that imply such trends.<br />

4.1.3 Horseshoe Crabs: Interpreting the Logistic Regression Fit<br />

For the ungrouped data in Table 3.2, let π(x) denote the probability that a female<br />

horseshoe crab of width x has a satellite. The simplest model <strong>to</strong> interpret is the<br />

linear probability model, π(x) = α + βx. During the ML fitting process, some predicted<br />

values for this GLM fall outside the legitimate 0–1 range for a binomial<br />

parameter, so ML fitting fails. Ordinary least squares fitting (such as GLM software<br />

reports when you assume a normal response and use the identity link function) yields<br />

ˆπ(x) =−1.766 + 0.092x. The estimated probability of a satellite increases by 0.092<br />

Table 4.1. Relation Between Width of Female Crab and Existence of Satellites, and<br />

Predicted Values for Logistic Regression Model<br />

Number Predicted<br />

Number of Having Sample Estimated Number of Crabs<br />

Width Cases Satellites Proportion Probability with Satellites<br />

29.25 14 14 1.00 0.93 13.1<br />

Note: The estimated probability is the predicted number (in the final column) divided by the number of<br />

cases.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!