13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.4 CHI-SQUARED TESTS OF INDEPENDENCE 37<br />

The marginal probabilities then determine the joint probabilities. To test H0, we<br />

identify μij = nπij = nπi+π+j as the expected frequency. Here, μij is the expected<br />

value of nij assuming independence. Usually, {πi+} and {π+j } are unknown, as is<br />

this expected value.<br />

To estimate the expected frequencies, substitute sample proportions for the<br />

unknown marginal probabilities, giving<br />

ˆμij = npi+p+j = n<br />

� �� �<br />

ni+ n+j<br />

=<br />

n n<br />

ni+n+j<br />

n<br />

This is the row <strong>to</strong>tal for the cell multiplied by the column <strong>to</strong>tal for the cell, divided<br />

by the overall sample size. The {ˆμij } are called estimated expected frequencies. They<br />

have the same row and column <strong>to</strong>tals as the observed counts, but they display the<br />

pattern of independence.<br />

For testing independence in I × J contingency tables, the Pearson and likelihoodratio<br />

statistics equal<br />

X 2 = � (nij −ˆμij ) 2<br />

ˆμij<br />

, G 2 = 2 � � �<br />

nij<br />

nij log<br />

ˆμij<br />

(2.8)<br />

Their large-sample chi-squared distributions have df = (I − 1)(J − 1).<br />

The df value means the following: under H0, {πi+} and {π+j } determine the<br />

cell probabilities. There are I − 1 nonredundant row probabilities. Because they sum<br />

<strong>to</strong> 1, the first I − 1 determine the last one through πI+ = 1 − (π1+ +···+πI−1,+).<br />

Similarly, there are J − 1 nonredundant column probabilities. So, under H0, there are<br />

(I − 1) + (J − 1) parameters. The alternative hypothesis Ha merely states that there<br />

is not independence. It does not specify a pattern for the IJ cell probabilities. The<br />

probabilities are then solely constrained <strong>to</strong> sum <strong>to</strong> 1, so there are IJ − 1 nonredundant<br />

parameters. The value for df is the difference between the number of parameters<br />

under Ha and H0, or<br />

df = (IJ − 1) −[(I − 1) + (J − 1)] =IJ − I − J + 1 = (I − 1)(J − 1)<br />

2.4.4 Example: Gender Gap in Political Affiliation<br />

Table 2.5, from the 2000 General Social Survey, cross classifies gender and political<br />

party identification. Subjects indicated whether they identified more strongly with the<br />

Democratic or Republican party or as Independents. Table 2.5 also contains estimated<br />

expected frequencies for H0: independence. For instance, the first cell has ˆμ11 =<br />

n1+n+1/n = (1557 × 1246)/2757 = 703.7.<br />

The chi-squared test statistics are X 2 = 30.1 and G 2 = 30.0, with df = (I − 1)<br />

(J − 1) = (2 − 1)(3 − 1) = 2. This chi-squared distribution has a mean of df = 2<br />

and a standard deviation of √ (2df) = √ 4 = 2. So, a value of 30 is far out in the<br />

right-hand tail. Each statistic has a P -value < 0.0001. This evidence of association

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!