13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

38 CONTINGENCY TABLES<br />

Table 2.5. Cross Classification of Party Identification by Gender<br />

Party Identification<br />

Gender Democrat Independent Republican Total<br />

Females 762 327 468 1557<br />

(703.7) (319.6) (533.7)<br />

Males 484 239 477 1200<br />

(542.3) (246.4) (411.3)<br />

Total 1246 566 945 2757<br />

Note: Estimated expected frequencies for hypothesis of independence in parentheses. <strong>Data</strong><br />

from 2000 General Social Survey.<br />

would be rather unusual if the variables were truly independent. Both test statistics<br />

suggest that political party identification and gender are associated.<br />

2.4.5 Residuals for Cells in a Contingency Table<br />

A test statistic and its P -value describe the evidence against the null hypothesis.<br />

A cell-by-cell comparison of observed and estimated expected frequencies helps us<br />

better understand the nature of the evidence. Larger differences between nij and ˆμij<br />

tend <strong>to</strong> occur for cells that have larger expected frequencies, so the raw difference<br />

nij −ˆμij is insufficient. For the test of independence, a useful cell residual is<br />

nij −ˆμij<br />

� ˆμij (1 − pi+)(1 − p+j )<br />

(2.9)<br />

The denomina<strong>to</strong>r is the estimated standard error of nij −ˆμij , under H0. The ratio<br />

(2.9) is called a standardized residual, because it divides nij −ˆμij by its SE.<br />

When H0 is true, each standardized residual has a large-sample standard normal<br />

distribution. A standardized residual having absolute value that exceeds about 2 when<br />

there are few cells or about 3 when there are many cells indicates lack of fit of H0 in<br />

that cell. (Under H0, we expect about 5% of the standardized residuals <strong>to</strong> be farther<br />

from 0 than ±2 by chance alone.)<br />

Table 2.6 shows the standardized residuals for testing independence in Table 2.5.<br />

For the first cell, for instance, n11 = 762 and ˆμ11 = 703.7. The first row and<br />

first column marginal proportions equal p1+ = 1557/2757 = 0.565 and p+1 =<br />

1246/2757 = 0.452. Substituting in<strong>to</strong> (2.9), the standardized residual for this cell<br />

equals<br />

(762 − 703.7)/ � 703.7(1 − 0.565)(1 − 0.452) = 4.50<br />

This cell shows a greater discrepancy between n11 and ˆμ11 than we would expect if<br />

the variables were truly independent.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!