13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 GENERALIZED LINEAR MODELS FOR BINARY DATA 73<br />

the form<br />

π(x) = F(x) (3.4)<br />

where F is a cdf for some continuous probability distribution.<br />

When F is the cdf of a normal distribution, model type (3.4) is equivalent <strong>to</strong> the<br />

probit model (3.3). The probit link function transforms π(x) so that the regression<br />

curve for π(x) [or for 1 − π(x), when β0<br />

corresponds <strong>to</strong> a different normal distribution.<br />

For the snoring and heart disease data, probit[ˆπ(x)]=−2.061 + 0.188x. This<br />

probit fit corresponds <strong>to</strong> a normal cdf having mean −ˆα/ ˆβ = 2.061/0.188 = 11.0<br />

and standard deviation 1/| ˆβ| =1/0.188 = 5.3. The estimated probability of heart<br />

disease equals 1/2 at snoring level x = 11.0. That is, x = 11.0 has a fitted probit of<br />

−2.061 + 0.188(11) = 0, which is the z-score corresponding <strong>to</strong> a left-tail probability<br />

of 1/2. Since snoring level is restricted <strong>to</strong> the range 0–5 for these data, well below<br />

11, the fitted probabilities over this range are quite small.<br />

The logistic regression curve also has form (3.4). When β>0 in model (3.2), the<br />

curve for π(x) has the shape of the cdf F(x)of a two-parameter logistic distribution.<br />

The logistic cdf corresponds <strong>to</strong> a probability distribution with a symmetric, bell shape.<br />

It looks similar <strong>to</strong> a normal distribution but with slightly thicker tails.<br />

When both models fit well, parameter estimates in probit models have smaller<br />

magnitude than those in logistic regression models. This is because their link functions<br />

transform probabilities <strong>to</strong> scores from standard versions of the normal and<br />

logistic distribution, but those two distributions have different spread. The standard<br />

normal distribution has a mean of 0 and standard deviation of 1. The standard logistic<br />

distribution has a mean of 0 and standard deviation of 1.8. When both models fit well,<br />

parameter estimates in logistic regression models are approximately 1.8 times those<br />

in probit models.<br />

The probit model and the cdf model form (3.4) were introduced in the mid1930s in<br />

<strong>to</strong>xicology studies. A typical experiment exposes animals (typically insects or mice) <strong>to</strong><br />

various dosages of some potentially <strong>to</strong>xic substance. For each subject, the response is<br />

whether it dies. It is natural <strong>to</strong> assume a <strong>to</strong>lerance distribution for subjects’responses.<br />

For example, each insect may have a certain <strong>to</strong>lerance <strong>to</strong> an insecticide, such that it<br />

dies if the dosage level exceeds its <strong>to</strong>lerance and survives if the dosage level is less<br />

than its <strong>to</strong>lerance. Tolerances would vary among insects. If a cdf F describes the<br />

distribution of <strong>to</strong>lerances, then the model for the probability π(x) of death at dosage<br />

level x has form (3.4). If the <strong>to</strong>lerances vary among insects according <strong>to</strong> a normal<br />

distribution, then π(x) has the shape of a normal cdf . With sample data, model fitting<br />

determines which normal cdf best applies.<br />

Logistic regression was not developed until the mid 1940s and not used much until<br />

the 1970s, but it is now more popular than the probit model. We will see in the next<br />

chapter that the logistic model parameters relate <strong>to</strong> odds ratios. Thus, one can fit the<br />

model <strong>to</strong> data from case–control studies, because one can estimate odds ratios for<br />

such data (Section 2.3.5).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!