15.01.2013 Views

an introduction to generalized linear models - GDM@FUDAN ...

an introduction to generalized linear models - GDM@FUDAN ...

an introduction to generalized linear models - GDM@FUDAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3.4 Residuals <strong>an</strong>d model checking<br />

Firstly, consider residuals for a model involving the Normal distribution. Suppose<br />

that the response variable Yi is modelled by<br />

E(Yi) =µi; Yi ∼ N(µi,σ 2 ).<br />

The fitted values are the estimates �µ i. Residuals c<strong>an</strong> be defined as yi − �µ i <strong>an</strong>d<br />

the approximate st<strong>an</strong>dardized residuals as<br />

ri =(yi − �µ i)/�σ,<br />

where �σ is <strong>an</strong> estimate ofthe unknown parameter σ. These st<strong>an</strong>dardized residuals<br />

are slightly correlated because they all depend on the estimates �µ i <strong>an</strong>d<br />

�σ that were calculated from the observations. Also they are not exactly Normally<br />

distributed because σ has been estimated by �σ. Nevertheless, they are<br />

approximately Normally distributed <strong>an</strong>d the adequacy ofthe approximation<br />

c<strong>an</strong> be checked using appropriate graphical methods (see below).<br />

The parameters µi are functions of the expl<strong>an</strong>a<strong>to</strong>ry variables. If the model<br />

is a good description ofthe relationship between the response <strong>an</strong>d the expl<strong>an</strong>a<strong>to</strong>ry<br />

variables, this should be well ‘captured’ or ‘explained’ by the �µ i’s.<br />

Therefore there should be little remaining information in the residuals yi − �µ i.<br />

This <strong>to</strong>o c<strong>an</strong> be checked graphically (see below). Additionally, the sum of<br />

squared residuals � (yi − �µ i) 2 provides <strong>an</strong> overall statistic for assessing the<br />

adequacy ofthe model; in fact, it is the component ofthe log-likelihood function<br />

or least squares expression which is optimized in the estimation process.<br />

Secondly, consider residuals from a Poisson model. Recall the model for<br />

chronic medical conditions<br />

E(Yi) =θi; Yi ∼ P oisson(θi).<br />

In this case approximate st<strong>an</strong>dardized residuals are ofthe form<br />

ri = yi − �θi � .<br />

�θi These c<strong>an</strong> be regarded as signed square roots ofcontributions <strong>to</strong> the Pearson<br />

goodness-of-fit statistic<br />

� (oi − ei) 2<br />

,<br />

i<br />

where oi is the observed value yi <strong>an</strong>d ei is the fitted value � θi ‘expected’ from<br />

the model.<br />

For other distributions a variety ofdefinitions ofst<strong>an</strong>dardized residuals<br />

are used. Some ofthese are tr<strong>an</strong>sformations ofthe terms (yi − �µ i) designed<br />

<strong>to</strong> improve their Normality or independence (for example, see Chapter 9 of<br />

Neter et al., 1996). Others are based on signed square roots ofcontributions<br />

<strong>to</strong> statistics, such as the log-likelihood function or the sum of squares, which<br />

are used as overall measures ofthe adequacy ofthe model (for example, see<br />

© 2002 by Chapm<strong>an</strong> & Hall/CRC<br />

ei

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!