13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

222 LOGLINEAR MODELS FOR CONTINGENCY TABLES<br />

Table 7.12. Equivalent Loglinear and Logistic Models for<br />

a Three-Way Table With Binary Response Variable Y<br />

Loglinear Symbol Logistic Model Logistic Symbol<br />

(Y, XZ) α (—)<br />

(XY, XZ) α + β X i (X)<br />

(YZ, XZ) α + β Z k (Z)<br />

(XY, YZ, XZ) α + β X i + βZ k (X + Z)<br />

(XYZ) α + β X i + βZ k<br />

+ βXZ<br />

ik<br />

(X*Z)<br />

categories, relevant loglinear models correspond <strong>to</strong> baseline-category logit models<br />

(Section 6.1). In such cases it is more sensible <strong>to</strong> fit logistic models directly, rather<br />

than loglinear models. Indeed, one can see by comparing equations (7.8) and (7.9)<br />

how much simpler the logistic structure is. The loglinear approach is better suited<br />

for cases with more than one response variable, as in studying association patterns<br />

for the drug use example in Section 7.1.6. In summary, loglinear models are most<br />

natural when at least two variables are response variables and we want <strong>to</strong> study their<br />

association structure. Otherwise, logistic models are more relevant.<br />

Selecting a loglinear model becomes more difficult as the number of variables<br />

increases, because of the increase in possible associations and interactions. One<br />

explora<strong>to</strong>ry approach first fits the model having only single-fac<strong>to</strong>r terms, the model<br />

having only two-fac<strong>to</strong>r and single-fac<strong>to</strong>r terms, the model having only three-fac<strong>to</strong>r<br />

and lower terms, and so forth, as Section 7.2.6 showed. Fitting such models often<br />

reveals a restricted range of good-fitting models.<br />

When certain marginal <strong>to</strong>tals are fixed by the sampling design or by the response–<br />

explana<strong>to</strong>ry distinction, the model should contain the term for that margin. This<br />

is because the ML fit forces the corresponding fitted <strong>to</strong>tals <strong>to</strong> be identical <strong>to</strong> those<br />

marginal <strong>to</strong>tals. To illustrate, suppose one treats the counts {ng+ℓ+} in Table 7.9 as<br />

fixed at each combination of levels of G = gender and L = location. Then a loglinear<br />

model should contain the GL two-fac<strong>to</strong>r term, because this ensures that {ˆμg+ℓ+ =<br />

ng+ℓ+}. That is, the model should be at least as complex as model (GL,S,I).If<br />

20,629 women had accidents in urban locations, then the fitted counts have 20,629<br />

women in urban locations.<br />

Related <strong>to</strong> this point, the modeling process should concentrate on terms linking<br />

response variables and terms linking explana<strong>to</strong>ry variables <strong>to</strong> response variables.<br />

Allowing a general interaction term among the explana<strong>to</strong>ry variables has the effect<br />

of fixing <strong>to</strong>tals at combinations of their levels. If G and L are both explana<strong>to</strong>ry<br />

variables, models assuming conditional independence between G and L are not of<br />

interest.<br />

For Table 7.9, I is a response variable, and S might be treated either as a response<br />

or explana<strong>to</strong>ry variable. If it is explana<strong>to</strong>ry, we treat the {ng+ℓs} <strong>to</strong>tals as fixed and fit<br />

logistic models for the I response. If S is also a response, we consider the {ng+ℓ+}<br />

<strong>to</strong>tals as fixed and consider loglinear models that are at least as complex as (GL,S,I).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!