13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4 STATISTICAL INFERENCE AND MODEL CHECKING 87<br />

residual is a standardized difference<br />

Pearson residual = ei = yi −ˆμi<br />

�<br />

�Var(yi)<br />

For example, for Poisson GLMs the Pearson residual for count i equals<br />

ei = yi −ˆμi<br />

�<br />

ˆμi<br />

(3.9)<br />

(3.10)<br />

It divides by the estimated Poisson standard deviation. The reason for calling ei a<br />

Pearson residual is that � e2 �<br />

i = i (yi −ˆμi) 2 / ˆμi. When the GLM is the model<br />

corresponding <strong>to</strong> independence for cells in a two-way contingency table, this is the<br />

Pearson chi-squared statistic X2 for testing independence [equation (2.8)]. Therefore,<br />

X2 decomposes in<strong>to</strong> terms describing the lack of fit for separate observations.<br />

Components of the deviance, called deviance residuals, are alternative measures<br />

of lack of fit.<br />

Pearson residuals fluctuate around zero, following approximately a normal distribution<br />

when μi is large. When the model holds, these residuals are less variable than<br />

standard normal, however, because the numera<strong>to</strong>r must use the fitted value ˆμi rather<br />

than the true mean μi. Since the sample data determine the fitted value, (yi −ˆμi)<br />

tends <strong>to</strong> be smaller than yi − μi.<br />

The standardized residual takes (yi −ˆμi) and divides it by its estimated standard<br />

error, that is<br />

Standardized residual = yi −ˆμi<br />

SE<br />

It 3 does have an approximate standard normal distribution when μi is large. With<br />

standardized residuals it is easier <strong>to</strong> tell when a deviation (yi −ˆμi) is “large.” Standardized<br />

residuals larger than about 2 or 3 in absolute value are worthy of attention,<br />

although some values of this size occur by chance alone when the number of observations<br />

is large. Section 2.4.5 introduced standardized residuals that follow up tests<br />

of independence in two-way contingency tables. We will use standardized residuals<br />

with logistic regression in Chapter 5.<br />

Other diagnostic <strong>to</strong>ols from regression modeling are also helpful in assessing fits<br />

of GLMs. For instance, <strong>to</strong> assess the influence of an observation on the overall fit,<br />

one can refit the model with that observation deleted (Section 5.2.6).<br />

3 SE =[�Var(yi)(1 − hi)] 1/2 , where hi is the leverage of observation i (Section 5.2.6). The greater the<br />

value of hi, the more potential that observation has for influencing the model fit.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!