13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

156 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS<br />

“no intercept” option. When SAS (PROC GENMOD) does this, ˆβ Z 1 and ˆβ Z 3 are both<br />

about −28 with standard errors of about 200,000.<br />

The counts in the 2 × 2 marginal table relating treatment <strong>to</strong> response, shown in<br />

the bot<strong>to</strong>m panel of Table 5.7, are all positive. The empty cells in Table 5.7 affect<br />

the center estimates, but not the treatment estimates, for this model. The estimated<br />

log odds ratio equals 1.55 for the treatment effect (SE = 0.70). The deviance (G2 )<br />

goodness-of-fit statistic equals 0.50 (df = 4, P = 0.97).<br />

The treatment log odds ratio estimate of 1.55 also results from deleting centers<br />

1 and 3 from the analysis. In fact, when a center has outcomes of only one type, it<br />

provides no information about the odds ratio between treatment and response. Such<br />

tables also make no contribution <strong>to</strong> the Cochran–Mantel–Haenszel test (Section 4.3.4)<br />

or <strong>to</strong> a small-sample, exact test of conditional independence between treatment and<br />

response (Section 5.4.2).<br />

An alternative strategy in multi-center analyses combines centers of a similar type.<br />

Then, if each resulting partial table has responses with both outcomes, the inferences<br />

use all data. For estimating odds ratios, however, this usually has little impact. For<br />

Table 5.7, perhaps centers 1 and 3 are similar <strong>to</strong> center 2, since the success rate is<br />

very low for that center. Combining these three centers and re-fitting the model <strong>to</strong><br />

this table and the tables for the other two centers yields an estimated treatment effect<br />

of 1.56 (SE = 0.70). Centers with no successes or with no failures can be useful for<br />

estimating some parameters, such as the difference of proportions, but they do not<br />

help us estimate odds ratios for logistic regression models or give us information<br />

about whether a treatment effect exists in the population.<br />

5.3.4 Effect of Small Samples on X 2 and G 2 Tests<br />

When a model for a binary response has only categorical predic<strong>to</strong>rs, the true sampling<br />

distributions of goodness-of-fit statistics are approximately chi-squared, for large<br />

sample size n. The adequacy of the chi-squared approximation depends both on n<br />

and on the number of cells. It tends <strong>to</strong> improve as the average number of observations<br />

per cell increases.<br />

The quality of the approximation has been studied carefully for the Pearson X 2<br />

test of independence for two-way tables (Section 2.4.3). Most guidelines refer <strong>to</strong> the<br />

fitted values. When df > 1, a minimum fitted value of about 1 is permissible as long<br />

as no more than about 20% of the cells have fitted values below 5. However, the<br />

chi-squared approximation can be poor for sparse tables containing both very small<br />

and very large fitted values. Unfortunately, a single rule cannot cover all cases. When<br />

in doubt, it is safer <strong>to</strong> use a small-sample, exact test (Section 2.6.1).<br />

The X 2 statistic tends <strong>to</strong> be valid with smaller samples and sparser tables than<br />

G 2 . The distribution of G 2 is usually poorly approximated by chi-squared when<br />

n/(number of cells) is less than 5. Depending on the sparseness, P -values based<br />

on referring G 2 <strong>to</strong> a chi-squared distribution can be <strong>to</strong>o large or <strong>to</strong>o small. When<br />

most fitted values are smaller than 0.50, treating G 2 as chi-squared gives a highly<br />

conservative test; that is, when H0 is true, reported P -values tend <strong>to</strong> be much larger

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!