15.01.2013 Views

an introduction to generalized linear models - GDM@FUDAN ...

an introduction to generalized linear models - GDM@FUDAN ...

an introduction to generalized linear models - GDM@FUDAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The next three chapters provide the theoretical background. Chapter 3 is<br />

about the exponential family of distributions, which includes the Normal,<br />

Poisson <strong>an</strong>d binomial distributions. It also covers <strong>generalized</strong> <strong>linear</strong> <strong>models</strong><br />

(as defined by Nelder <strong>an</strong>d Wedderburn, 1972). Linear regression <strong>an</strong>d m<strong>an</strong>y<br />

other <strong>models</strong> are special cases of<strong>generalized</strong> <strong>linear</strong> <strong>models</strong>. In Chapter 4<br />

methods ofestimation <strong>an</strong>d model fitting are described.<br />

Chapter 5 outlines methods of statistical inference for <strong>generalized</strong> <strong>linear</strong><br />

<strong>models</strong>. Most ofthese are based on how well a model describes the set ofdata.<br />

For example, hypothesis testing is carried out by first specifying alternative<br />

<strong>models</strong> (one corresponding <strong>to</strong> the null hypothesis <strong>an</strong>d the other <strong>to</strong> a more<br />

general hypothesis). Then test statistics are calculated which measure the<br />

‘goodness offit’ ofeach model <strong>an</strong>d these are compared. Typically the model<br />

corresponding <strong>to</strong> the null hypothesis is simpler, so ifit fits the data about<br />

as well as a more complex model it is usually preferred on the grounds of<br />

parsimony (i.e., we retain the null hypothesis).<br />

Chapter 6 is about multiple <strong>linear</strong> regression <strong>an</strong>d <strong>an</strong>alysis of vari<strong>an</strong>ce<br />

(ANOVA). Regression is the st<strong>an</strong>dard method for relating a continuous<br />

response variable <strong>to</strong> several continuous expl<strong>an</strong>a<strong>to</strong>ry (or predic<strong>to</strong>r) variables.<br />

ANOVA is used for a continuous response variable <strong>an</strong>d categorical or qualitative<br />

expl<strong>an</strong>a<strong>to</strong>ry variables (fac<strong>to</strong>rs). Analysis of covari<strong>an</strong>ce (ANCOVA) is<br />

used when at least one ofthe expl<strong>an</strong>a<strong>to</strong>ry variables is continuous. Nowadays<br />

it is common <strong>to</strong> use the same computational <strong>to</strong>ols for all such situations. The<br />

terms multiple regression or general <strong>linear</strong> model are used <strong>to</strong> cover the<br />

r<strong>an</strong>ge ofmethods for <strong>an</strong>alyzing one continuous response variable <strong>an</strong>d multiple<br />

expl<strong>an</strong>a<strong>to</strong>ry variables.<br />

Chapter 7 is about methods for <strong>an</strong>alyzing binary response data. The most<br />

common one is logistic regression which is used <strong>to</strong> model relationships<br />

between the response variable <strong>an</strong>d several expl<strong>an</strong>a<strong>to</strong>ry variables which may<br />

be categorical or continuous. Methods for relating the response <strong>to</strong> a single<br />

continuous variable, the dose, are also considered; these include probit <strong>an</strong>alysis<br />

which was originally developed for <strong>an</strong>alyzing dose-response data from<br />

bioassays. Logistic regression has been <strong>generalized</strong> in recent years <strong>to</strong> include<br />

responses with more th<strong>an</strong> two nominal categories (nominal, multinomial,<br />

poly<strong>to</strong>mous or polycho<strong>to</strong>mous logistic regression) or ordinal categories<br />

(ordinal logistic regression). These new methods are discussed in Chapter<br />

8.<br />

Chapter 9 concerns count data. The counts may be frequencies displayed<br />

in a contingency table or numbers ofevents, such as traffic accidents, which<br />

need <strong>to</strong> be <strong>an</strong>alyzed in relation <strong>to</strong> some ‘exposure’ variable such as the number<br />

ofmo<strong>to</strong>r vehicles registered or the dist<strong>an</strong>ces travelled by the drivers. Modelling<br />

methods are based on assuming that the distribution ofcounts c<strong>an</strong> be<br />

described by the Poisson distribution, at least approximately. These methods<br />

include Poisson regression <strong>an</strong>d log-<strong>linear</strong> <strong>models</strong>.<br />

Survival <strong>an</strong>alysis is the usual term for methods of <strong>an</strong>alyzing failure time<br />

data. The parametric methods described in Chapter 10 fit in<strong>to</strong> the framework<br />

© 2002 by Chapm<strong>an</strong> & Hall/CRC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!