13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.2 INFERENCE FOR LOGLINEAR MODELS 215<br />

For model (AC, AM, CM), the 95% confidence intervals are (8.0, 49.2) for the AM<br />

conditional odds ratio and (12.5, 23.8) for the CM conditional odds ratio. The intervals<br />

are wide, but these associations also are strong. In summary, this model reveals strong<br />

conditional associations for each pair of drugs. There is a strong tendency for users of<br />

one drug <strong>to</strong> be users of a second drug, and this is true both for users and for nonusers of<br />

the third drug. Table 7.5 shows that estimated marginal associations are even stronger.<br />

Controlling for outcome on one drug moderates the association somewhat between<br />

the other two drugs.<br />

The analyses in this section pertain <strong>to</strong> association structure. A different analysis<br />

pertains <strong>to</strong> comparing marginal distributions, for instance <strong>to</strong> determine if one drug<br />

has more usage than the others. Section 8.1 presents this type of analysis.<br />

7.2.5 Loglinear Models for Higher Dimensions<br />

Loglinear models are more complex for three-way tables than for two-way tables,<br />

because of the variety of potential association patterns. Basic concepts for models<br />

with three-way tables extend readily, however, <strong>to</strong> multiway tables.<br />

We illustrate this for four-way tables, with variables W, X, Y , and Z. Interpretations<br />

are simplest when the model has no three-fac<strong>to</strong>r terms. Such models are special<br />

cases of (WX, WY, WZ, XY, XZ, YZ), which has homogenous associations. Each pair<br />

of variables is conditionally associated, with the same odds ratios at each combination<br />

of levels of the other two variables. An absence of a two-fac<strong>to</strong>r term implies<br />

conditional independence for those variables. Model (WX, WY, WZ, XZ, YZ) does<br />

not contain an XY term, so it treats X and Y as conditionally independent at each<br />

combination of levels of W and Z.<br />

A variety of models have three-fac<strong>to</strong>r terms. A model could contain WXY, WXZ,<br />

WYZ,orXYZ terms. The XYZ term permits the association between any pair of those<br />

three variables <strong>to</strong> vary across levels of the third variable, at each fixed level of W .<br />

The saturated model contains all these terms plus a four-fac<strong>to</strong>r term.<br />

7.2.6 Example: Au<strong>to</strong>mobile Accidents and Seat Belts<br />

Table 7.9 shows results of accidents in the state of Maine for 68,694 passengers in<br />

au<strong>to</strong>s and light trucks. The table classifies passengers by gender (G), location of<br />

accident (L), seat-belt use (S), and injury (I). The table reports the sample proportion<br />

of passengers who were injured. For each GL combination, the proportion of injuries<br />

was about halved for passengers wearing seat belts.<br />

Table 7.10 displays tests of fit for several loglinear models. To investigate the<br />

complexity of model needed, we consider model (G, I, L, S) containing only singlefac<strong>to</strong>r<br />

terms, model (GI, GL, GS, IL, IS, LS) containing also all the two-fac<strong>to</strong>r terms,<br />

and model (GIL, GIS, GLS, ILS) containing also all the three-fac<strong>to</strong>r terms. Model<br />

(G, I, L, S) implies mutual independence of the four variables. It fits very poorly<br />

(G 2 = 2792.8,df = 11). Model (GI, GL, GS, IL, IS, LS) fits much better (G 2 =<br />

23.4,df = 5) but still has lack of fit (P

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!