13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.1 INTERPRETING THE LOGISTIC REGRESSION MODEL 105<br />

width; that is, there is a 64% increase. To illustrate, the mean width value of x = 26.3<br />

has ˆπ(x) = 0.674, and odds = 0.674/0.326 = 2.07. At x = 27.3 = 26.3 + 1.0, you<br />

can check that ˆπ(x) = 0.773 and odds = 0.773/0.227 = 3.40. However, this is a<br />

64% increase; that is, 3.40 = 2.07(1.64).<br />

4.1.5 Logistic Regression with Retrospective Studies<br />

Another property of logistic regression relates <strong>to</strong> situations in which the explana<strong>to</strong>ry<br />

variable X rather than the response variable Y is random. This occurs with retrospective<br />

sampling designs. Sometimes such designs are used because one of the response<br />

categories occurs rarely, and a prospective study might have <strong>to</strong>o few cases <strong>to</strong> enable<br />

one <strong>to</strong> estimate effects of predic<strong>to</strong>rs well. For a given sample size, effect estimates<br />

have smaller SEs when the number of outcomes of the two types are similar than<br />

when they are very different.<br />

Most commonly, retrospective designs are used with biomedical case-control studies<br />

(Section 2.3.5). For samples of subjects having Y = 1 (cases) and having Y = 0<br />

(controls), the value of X is observed. Evidence exists of an association between X<br />

and Y if the distribution of X values differs between cases and controls. For case–<br />

control studies, it was noted in Section 2.3.5 that it is possible <strong>to</strong> estimate odds ratios<br />

but not other summary measures. Logistic regression parameters refer <strong>to</strong> odds and<br />

odds ratios. One can fit logistic regression models with data from case–control studies<br />

and estimate effects of explana<strong>to</strong>ry variables. The intercept term α in the model is<br />

not meaningful, because it relates <strong>to</strong> the relative numbers of outcomes of y = 1 and<br />

y = 0. We do not estimate this, because the sample frequencies for y = 1 and y = 0<br />

are fixed by the nature of the case–control study.<br />

With case–control studies, it is not possible <strong>to</strong> estimate effects in binary models<br />

with link functions other than the logit. Unlike the odds ratio, the effect for the<br />

conditional distribution of X given Y does not then equal that for Y given X. This<br />

provides an important advantage of the logit link over links such as the probit. It<br />

is a major reason why logistic regression surpasses other models in popularity for<br />

biomedical studies.<br />

Many case–control studies employ matching. Each case is matched with one or<br />

more control subjects. The controls are like the case on key characteristics such<br />

as age. The model and subsequent analysis should take the matching in<strong>to</strong> account.<br />

Section 8.2.4 discusses logistic regression for matched case–control studies.<br />

4.1.6 Normally Distributed X Implies Logistic Regression for Y<br />

Regardless of the sampling mechanism, the logistic regression model may or may not<br />

describe a relationship well. In one special case, it does necessarily hold. Suppose<br />

the distribution of X for subjects having Y = 1 is normal N(μ1,σ), and suppose the<br />

distribution of X for subjects having Y = 0 is normal N(μ0,σ); that is, with different<br />

mean but the same standard deviation. Then, a Bayes theorem calculation converting<br />

from the distribution of X given Y = y <strong>to</strong> the distribution of Y given X = x shows

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!