13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

26 CONTINGENCY TABLES<br />

compare two groups on a binary response, Y . The data can be displayed in a 2 × 2<br />

contingency table, in which the rows are the two groups and the columns are the<br />

response levels of Y . This section presents measures for comparing groups on binary<br />

responses.<br />

2.2.1 Difference of Proportions<br />

As in the discussion of the binomial distribution in Section 1.2, we use the generic<br />

terms success and failure for the outcome categories. For subjects in row 1, let π1<br />

denote the probability of a success, so 1 − π1 is the probability of a failure. For<br />

subjects in row 2, let π2 denote the probability of success. These are conditional<br />

probabilities.<br />

The difference of proportions π1 − π2 compares the success probabilities in the<br />

two rows. This difference falls between −1 and +1. It equals zero when π1 = π2,<br />

that is, when the response is independent of the group classification. Let p1 and p2<br />

denote the sample proportions of successes. The sample difference p1 − p2 estimates<br />

π1 − π2.<br />

For simplicity, we denote the sample sizes for the two groups (that is, the row<br />

<strong>to</strong>tals n1+ and n2+) byn1 and n2. When the counts in the two rows are independent<br />

binomial samples, the estimated standard error of p1 − p2 is<br />

SE =<br />

�<br />

p1(1 − p1)<br />

n1<br />

+ p2(1 − p2)<br />

n2<br />

(2.1)<br />

The standard error decreases, and hence the estimate of π1 − π2 improves, as the<br />

sample sizes increase.<br />

A large-sample 100(1 − α)% (Wald) confidence interval for π1 − π2 is<br />

(p1 − p2) ± zα/2(SE)<br />

For small samples the actual coverage probability is closer <strong>to</strong> the nominal confidence<br />

level if you add 1.0 <strong>to</strong> every cell of the 2 × 2 table before applying this formula. 1 For<br />

a significance test of H0: π1 = π2, az test statistic divides (p1 − p2) by a pooled<br />

SE that applies under H0. Because z 2 is the Pearson chi-squared statistic presented<br />

in Section 2.4.3, we will not discuss this test here.<br />

2.2.2 Example: Aspirin and Heart Attacks<br />

Table 2.3 is from a report on the relationship between aspirin use and myocardial<br />

infarction (heart attacks) by the Physicians’ Health Study Research Group at Harvard<br />

1 A. Agresti and B. Caffo, Am. Statist., 54: 280–288, 2000.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!