15.01.2013 Views

an introduction to generalized linear models - GDM@FUDAN ...

an introduction to generalized linear models - GDM@FUDAN ...

an introduction to generalized linear models - GDM@FUDAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3<br />

Exponential Family <strong>an</strong>d Generalized<br />

Linear Models<br />

3.1 Introduction<br />

Linear <strong>models</strong> ofthe form<br />

E(Yi) =µi = x T i β; Yi ∼ N(µi,σ 2 ) (3.1)<br />

where the r<strong>an</strong>dom variables Yi are independent are the basis ofmost<br />

<strong>an</strong>alyses ofcontinuous data. The tr<strong>an</strong>sposed vec<strong>to</strong>r xT i represents the ith row<br />

ofthe design matrix X. The example about the relationship between birthweight<br />

<strong>an</strong>d gestational age is ofthis form, see Section 2.2.2. So is the exercise<br />

on pl<strong>an</strong>t growth where Yi is the dry weight ofpl<strong>an</strong>ts <strong>an</strong>d X has elements <strong>to</strong><br />

identify the treatment <strong>an</strong>d control groups (Exercise 2.1). Generalizations of<br />

these examples <strong>to</strong> the relationship between a continuous response <strong>an</strong>d several<br />

expl<strong>an</strong>a<strong>to</strong>ry variables (multiple regression) <strong>an</strong>d comparisons ofmore th<strong>an</strong> two<br />

me<strong>an</strong>s (<strong>an</strong>alysis ofvari<strong>an</strong>ce) are also ofthis form.<br />

Adv<strong>an</strong>ces in statistical theory <strong>an</strong>d computer software allow us <strong>to</strong> use methods<br />

<strong>an</strong>alogous <strong>to</strong> those developed for <strong>linear</strong> <strong>models</strong> in the following more<br />

general situations:<br />

1. Response variables have distributions other th<strong>an</strong> the Normal distribution<br />

– they may even be categorical rather th<strong>an</strong> continuous.<br />

2. Relationship between the response <strong>an</strong>d expl<strong>an</strong>a<strong>to</strong>ry variables need not be<br />

ofthe simple <strong>linear</strong> form in (3.1).<br />

One ofthese adv<strong>an</strong>ces has been the recognition that m<strong>an</strong>y ofthe ‘nice’<br />

properties ofthe Normal distribution are shared by a wider class ofdistributions<br />

called the exponential family of distributions. These distributions<br />

<strong>an</strong>d their properties are discussed in the next section.<br />

A second adv<strong>an</strong>ce is the extension ofthe numerical methods <strong>to</strong> estimate the<br />

parameters β from the <strong>linear</strong> model described in (3.1) <strong>to</strong> the situation where<br />

there is some non-<strong>linear</strong> function relating E(Yi) =µi <strong>to</strong> the <strong>linear</strong> component<br />

xT i β, that is<br />

g(µi) =x T i β<br />

(see Section 2.4). The function g is called the link function. In the initial formulation<br />

of<strong>generalized</strong> <strong>linear</strong> <strong>models</strong> by Nelder <strong>an</strong>d Wedderburn (1972) <strong>an</strong>d<br />

in most ofthe examples considered in this book, g is a simple mathematical<br />

function. These <strong>models</strong> have now been further <strong>generalized</strong> <strong>to</strong> situations where<br />

functions may be estimated numerically; such <strong>models</strong> are called <strong>generalized</strong><br />

additive <strong>models</strong> (see Hastie <strong>an</strong>d Tibshir<strong>an</strong>i, 1990). In theory, the estimation<br />

is straightforward. In practice, it may require a considerable amount of com-<br />

© 2002 by Chapm<strong>an</strong> & Hall/CRC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!