an introduction to generalized linear models - GDM@FUDAN ...
an introduction to generalized linear models - GDM@FUDAN ...
an introduction to generalized linear models - GDM@FUDAN ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3<br />
Exponential Family <strong>an</strong>d Generalized<br />
Linear Models<br />
3.1 Introduction<br />
Linear <strong>models</strong> ofthe form<br />
E(Yi) =µi = x T i β; Yi ∼ N(µi,σ 2 ) (3.1)<br />
where the r<strong>an</strong>dom variables Yi are independent are the basis ofmost<br />
<strong>an</strong>alyses ofcontinuous data. The tr<strong>an</strong>sposed vec<strong>to</strong>r xT i represents the ith row<br />
ofthe design matrix X. The example about the relationship between birthweight<br />
<strong>an</strong>d gestational age is ofthis form, see Section 2.2.2. So is the exercise<br />
on pl<strong>an</strong>t growth where Yi is the dry weight ofpl<strong>an</strong>ts <strong>an</strong>d X has elements <strong>to</strong><br />
identify the treatment <strong>an</strong>d control groups (Exercise 2.1). Generalizations of<br />
these examples <strong>to</strong> the relationship between a continuous response <strong>an</strong>d several<br />
expl<strong>an</strong>a<strong>to</strong>ry variables (multiple regression) <strong>an</strong>d comparisons ofmore th<strong>an</strong> two<br />
me<strong>an</strong>s (<strong>an</strong>alysis ofvari<strong>an</strong>ce) are also ofthis form.<br />
Adv<strong>an</strong>ces in statistical theory <strong>an</strong>d computer software allow us <strong>to</strong> use methods<br />
<strong>an</strong>alogous <strong>to</strong> those developed for <strong>linear</strong> <strong>models</strong> in the following more<br />
general situations:<br />
1. Response variables have distributions other th<strong>an</strong> the Normal distribution<br />
– they may even be categorical rather th<strong>an</strong> continuous.<br />
2. Relationship between the response <strong>an</strong>d expl<strong>an</strong>a<strong>to</strong>ry variables need not be<br />
ofthe simple <strong>linear</strong> form in (3.1).<br />
One ofthese adv<strong>an</strong>ces has been the recognition that m<strong>an</strong>y ofthe ‘nice’<br />
properties ofthe Normal distribution are shared by a wider class ofdistributions<br />
called the exponential family of distributions. These distributions<br />
<strong>an</strong>d their properties are discussed in the next section.<br />
A second adv<strong>an</strong>ce is the extension ofthe numerical methods <strong>to</strong> estimate the<br />
parameters β from the <strong>linear</strong> model described in (3.1) <strong>to</strong> the situation where<br />
there is some non-<strong>linear</strong> function relating E(Yi) =µi <strong>to</strong> the <strong>linear</strong> component<br />
xT i β, that is<br />
g(µi) =x T i β<br />
(see Section 2.4). The function g is called the link function. In the initial formulation<br />
of<strong>generalized</strong> <strong>linear</strong> <strong>models</strong> by Nelder <strong>an</strong>d Wedderburn (1972) <strong>an</strong>d<br />
in most ofthe examples considered in this book, g is a simple mathematical<br />
function. These <strong>models</strong> have now been further <strong>generalized</strong> <strong>to</strong> situations where<br />
functions may be estimated numerically; such <strong>models</strong> are called <strong>generalized</strong><br />
additive <strong>models</strong> (see Hastie <strong>an</strong>d Tibshir<strong>an</strong>i, 1990). In theory, the estimation<br />
is straightforward. In practice, it may require a considerable amount of com-<br />
© 2002 by Chapm<strong>an</strong> & Hall/CRC