an introduction to generalized linear models - GDM@FUDAN ...
an introduction to generalized linear models - GDM@FUDAN ...
an introduction to generalized linear models - GDM@FUDAN ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2.3.4 Residuals <strong>an</strong>d model checking<br />
Firstly, consider residuals for a model involving the Normal distribution. Suppose<br />
that the response variable Yi is modelled by<br />
E(Yi) =µi; Yi ∼ N(µi,σ 2 ).<br />
The fitted values are the estimates �µ i. Residuals c<strong>an</strong> be defined as yi − �µ i <strong>an</strong>d<br />
the approximate st<strong>an</strong>dardized residuals as<br />
ri =(yi − �µ i)/�σ,<br />
where �σ is <strong>an</strong> estimate ofthe unknown parameter σ. These st<strong>an</strong>dardized residuals<br />
are slightly correlated because they all depend on the estimates �µ i <strong>an</strong>d<br />
�σ that were calculated from the observations. Also they are not exactly Normally<br />
distributed because σ has been estimated by �σ. Nevertheless, they are<br />
approximately Normally distributed <strong>an</strong>d the adequacy ofthe approximation<br />
c<strong>an</strong> be checked using appropriate graphical methods (see below).<br />
The parameters µi are functions of the expl<strong>an</strong>a<strong>to</strong>ry variables. If the model<br />
is a good description ofthe relationship between the response <strong>an</strong>d the expl<strong>an</strong>a<strong>to</strong>ry<br />
variables, this should be well ‘captured’ or ‘explained’ by the �µ i’s.<br />
Therefore there should be little remaining information in the residuals yi − �µ i.<br />
This <strong>to</strong>o c<strong>an</strong> be checked graphically (see below). Additionally, the sum of<br />
squared residuals � (yi − �µ i) 2 provides <strong>an</strong> overall statistic for assessing the<br />
adequacy ofthe model; in fact, it is the component ofthe log-likelihood function<br />
or least squares expression which is optimized in the estimation process.<br />
Secondly, consider residuals from a Poisson model. Recall the model for<br />
chronic medical conditions<br />
E(Yi) =θi; Yi ∼ P oisson(θi).<br />
In this case approximate st<strong>an</strong>dardized residuals are ofthe form<br />
ri = yi − �θi � .<br />
�θi These c<strong>an</strong> be regarded as signed square roots ofcontributions <strong>to</strong> the Pearson<br />
goodness-of-fit statistic<br />
� (oi − ei) 2<br />
,<br />
i<br />
where oi is the observed value yi <strong>an</strong>d ei is the fitted value � θi ‘expected’ from<br />
the model.<br />
For other distributions a variety ofdefinitions ofst<strong>an</strong>dardized residuals<br />
are used. Some ofthese are tr<strong>an</strong>sformations ofthe terms (yi − �µ i) designed<br />
<strong>to</strong> improve their Normality or independence (for example, see Chapter 9 of<br />
Neter et al., 1996). Others are based on signed square roots ofcontributions<br />
<strong>to</strong> statistics, such as the log-likelihood function or the sum of squares, which<br />
are used as overall measures ofthe adequacy ofthe model (for example, see<br />
© 2002 by Chapm<strong>an</strong> & Hall/CRC<br />
ei