an introduction to generalized linear models - GDM@FUDAN ...

Recommendations

Info

2.3.4 Residuals and model checking Firstly, consider residuals for a model involving the Normal distribution. Suppose that the response variable Yi is modelled by E(Yi) =µi; Yi ∼ N(µi,σ 2 ). The fitted values are the estimates �µ i. Residuals can be defined as yi − �µ i and the approximate standardized residuals as ri =(yi − �µ i)/�σ, where �σ is an estimate ofthe unknown parameter σ. These standardized residuals are slightly correlated because they all depend on the estimates �µ i and �σ that were calculated from the observations. Also they are not exactly Normally distributed because σ has been estimated by �σ. Nevertheless, they are approximately Normally distributed and the adequacy ofthe approximation can be checked using appropriate graphical methods (see below). The parameters µi are functions of the explanatory variables. If the model is a good description ofthe relationship between the response and the explanatory variables, this should be well ‘captured’ or ‘explained’ by the �µ i’s. Therefore there should be little remaining information in the residuals yi − �µ i. This too can be checked graphically (see below). Additionally, the sum of squared residuals � (yi − �µ i) 2 provides an overall statistic for assessing the adequacy ofthe model; in fact, it is the component ofthe log-likelihood function or least squares expression which is optimized in the estimation process. Secondly, consider residuals from a Poisson model. Recall the model for chronic medical conditions E(Yi) =θi; Yi ∼ P oisson(θi). In this case approximate standardized residuals are ofthe form ri = yi − �θi � . �θi These can be regarded as signed square roots ofcontributions to the Pearson goodness-of-fit statistic � (oi − ei) 2 , i where oi is the observed value yi and ei is the fitted value � θi ‘expected’ from the model. For other distributions a variety ofdefinitions ofstandardized residuals are used. Some ofthese are transformations ofthe terms (yi − �µ i) designed to improve their Normality or independence (for example, see Chapter 9 of Neter et al., 1996). Others are based on signed square roots ofcontributions to statistics, such as the log-likelihood function or the sum of squares, which are used as overall measures ofthe adequacy ofthe model (for example, see © 2002 by Chapman & Hall/CRC ei
Cox and Snell, 1968; Prigibon, 1981; and Pierce and Shafer, 1986). Many of these residuals are discussed in more detail in McCullagh and Nelder (1989) or Krzanowski (1998). Residuals are important tools for checking the assumptions made in formulating a model. This is because they should usually be independent and have a distribution which is approximately Normal with a mean ofzero and constant variance. They should also be unrelated to the explanatory variables. Therefore, the standardized residuals can be compared to the Normal distribution to assess the adequacy ofthe distributional assumptions and to identify any unusual values. This can be done by inspecting their frequency distribution and looking for values beyond the likely range; for example, no more than 5% should be less than −1.96 or greater than +1.96 and no more than 1% should be beyond ±2.58. A more sensitive method for assessing Normality, however, is to use a Normal probability plot. This involves plotting the residuals against their expected values, defined according to their rank order, ifthey were Normally distributed. These values are called the Normal order statistics and they depend on the number ofobservations. Normal probability plots are available in all good statistical software (and analogous probability plots for other distributions are also commonly available). In the plot the points should lie on or near a straight line representing Normality and systematic deviations or outlying observations indicate a departure from this distribution. The standardized residuals should also be plotted against each ofthe explanatory variables that are included in the model. Ifthe model adequately describes the effect ofthe variable, there should be no apparent pattern in the plot. Ifit is inadequate, the points may display curvature or some other systematic pattern which would suggest that additional or alternative terms may need to be included in the model. The residuals should also be plotted against other potential explanatory variables that are not in the model. Ifthere is any systematic pattern, this suggests that additional variables should be included. Several different residual plots for detecting non-linearity in generalized linear models have been compared by Cai and Tsai (1999). In addition, the standardized residuals should be plotted against the fitted values �yi, especially to detect changes in variance. For example, an increase in the spread ofthe residuals towards the end ofthe range offitted values would indicate a departure from the assumption of constant variance (sometimes termed homoscedasticity). Finally, a sequence plot ofthe residuals should be made using the order in which the values yi were measured. This might be in time order, spatial order or any other sequential effect that might cause lack ofindependence among the observations. Ifthe residuals are independent the points should fluctuate randomly without any systematic pattern, such as alternating up and down or steadily increasing or decreasing. Ifthere is evidence ofassociations among the residuals, this can be checked by calculating serial correlation coefficients among them. Ifthe residuals are correlated, special modelling methods are needed – these are outlined in Chapter 11. © 2002 by Chapman & Hall/CRC
Page 1 and 2: CHAPMAN & HALL/CRC Texts in Statist
Page 3 and 4: AN INTRODUCTION TO GENERALIZED LINE
Page 5 and 6: Preface Contents 1 Introduction 1.1
Page 7 and 8: 10 Survival Analysis 10.1 Introduct
Page 9 and 10: 1 Introduction 1.1 Background This
Page 11 and 12: Table 1.1 Major methods of statisti
Page 13 and 14: ofgeneralized linear models althoug
Page 15 and 16: 3. Let Y1, ..., Yn denote Normally
Page 17 and 18: divided by its degrees offreedom, F
Page 19 and 20: l(θ; y) = log L(θ; y), since the
Page 21 and 22: (i.e., the matrix ofsecond derivati
Page 23 and 24: Table 1.3 Successive approximations
Page 25 and 26: 2 Model Fitting 2.1 Introduction Th
Page 27 and 28: IfH1is true, then the log-likelihoo
Page 29 and 30: estimated in order to calculate to
Page 31 and 32: where xjk is the gestational age of
Page 33 and 34: Table 2.4 Summary of data on birthw
Page 35 and 36: Residuals Residuals Percent 2 1 0 -
Page 37 and 38: sampling distributions ofthe corres
Page 39: is categorical how many categories
Page 43 and 44: 2.4Notation and coding for explanat
Page 45 and 46: and the rows of X are as follows Gr
Page 47 and 48: (a) Conduct an exploratory analysis
Page 49 and 50: (d) List the assumptions made for (
Page 51 and 52: putation involving numerical optimi
Page 53 and 54: worthwhile trying to identify a tra
Page 55 and 56: We also need expressions for the ex
Page 57 and 58: 1. Response variables Y1,... ,YN wh
Page 59 and 60: Table 3.2 Numbers of deaths from co
Page 61 and 62: 3.4 Use results (3.9) and (3.12) to
Page 63 and 64: 4 Estimation 4.1 Introduction This
Page 65 and 66: x (m - 1) x (m) Figure 4.3 Newton-R
Page 67 and 68: Table 4.2 Details of Newton-Raphson
Page 69 and 70: y differentiating (4.13) and substi
Page 71 and 72: Table 4.3 Data for Poisson regressi
Page 73 and 74: Table 4.4 Successive approximations
Page 75 and 76: 5 Inference 5.1 Introduction The tw
Page 77 and 78: consistent with the general result
Page 79 and 80: approximated by its expected value
Page 81 and 82: Hence E � (b − β)(b − β) T
Page 83 and 84: For Yi’s with other distributions
Page 85 and 86: 8, D has a chi-squared distribution
Page 87 and 88: Consider the null hypothesis ⎡
Page 89 and 90: (a) Find the Wald statistic (�π
Page 91 and 92:
6.2.2 Least squares estimation IfE(
Page 93 and 94:
Table 6.2 Multiple hypothesis tests
Page 95 and 96:
the minimum value ofthe sum ofsquar
Page 97 and 98:
and ⎡ X T ⎢ X = ⎢ ⎣ 20 923
Page 99 and 100:
or ‘worst possible’ value of S.
Page 101 and 102:
Table 6.6 Dried weights yi of plant
Page 103 and 104:
The first row (or column) ofthe (J
Page 105 and 106:
so For the plant weight data and Y
Page 107 and 108:
4. The model formed by omitting eff
Page 109 and 110:
Finally for the model with only a m
Page 111 and 112:
6.5 Analysis of covariance Analysis
Page 113 and 114:
For the reduced model (6.14) �
Page 115 and 116:
6.7 Exercises 6.1 Table 6.15 shows
Page 117 and 118:
Table 6.17 Cholesterol (CHOL), age
Page 119 and 120:
6.8 Table 6.20 shows the data from
Page 121 and 122:
Table 7.1 Frequencies for N binomia
Page 123 and 124:
x x Figure 7.2 Normal distribution:
Page 125 and 126:
and log(1 − πi) =− log [1 + ex
Page 127 and 128:
Table 7.4 Comparison of observed nu
Page 129 and 130:
Proportion germinated 0.7 0.6 0.5 4
Page 131 and 132:
which is asymptotically equivalent
Page 133 and 134:
From equation (7.5), �m residuals
Page 135 and 136:
Proportion with symptoms of senilit
Page 137 and 138:
Table 7.10 Hosmer-Lemeshow test for
Page 139 and 140:
(a) Are the proportions of graduate
Page 141 and 142:
category 2, and so on, then let ⎡
Page 143 and 144:
(iii) Likelihood ratio chi-squared
Page 145 and 146:
Women: preference for air condition
Page 147 and 148:
Table 8.3 Results from fitting the
Page 149 and 150:
π 1 π 2 π 3 π 4 C1 C2 C3 Figure
Page 151 and 152:
The adjacent category logit model i
Page 153 and 154:
Table 8.4 Results of proportional o
Page 155 and 156:
(c) Use a Wald statistic to test th
Page 157 and 158:
variables. The study design may mea
Page 159 and 160:
series expansion given in Section 7
Page 161 and 162:
for smokers and zero for non-smoker
Page 163 and 164:
column totals. It appears that Hutc
Page 165 and 166:
similar to the ulcer patients with
Page 167 and 168:
� k θ.k =1. This hypothesis can
Page 169 and 170:
9.6 Inference for log-linear models
Page 171 and 172:
Table 9.10 Log-linear models for th
Page 173 and 174:
9.9 Exercises 9.1 Let Y1, ..., YN b
Page 175 and 176:
Table 9.14 Contingency table with 2
Page 177 and 178:
1 2 D 3 A D 4 L 5 D D TO TL TC time
Page 179 and 180:
ofthe distribution. The median surv
Page 181 and 182:
10.2.3 Weibull distribution Another
Page 183 and 184:
Table 10.1 Remission times of leuke
Page 185 and 186:
log H(y) 1 0 -1 -2 0 1 2 3 log (y)
Page 187 and 188:
As there are r uncensored observati
Page 189 and 190:
small number ofcategorical explanat
Page 191 and 192:
Cox Snell residuals 3 2 1 0 Devianc
Page 193 and 194:
is sometimes used for modelling sur
Page 195 and 196:
11 Clustered and Longitudinal Data
Page 197 and 198:
data from the stroke example in Sec
Page 199 and 200:
score 100 80 60 40 20 0 2 4 6 8 wee
Page 201 and 202:
Table11.3 Results of naive analyses
Page 203 and 204:
Table 11.6 Analysis of variance of
Page 205 and 206:
1. All the off-diagonal elements ar
Page 207 and 208:
These are also called the quasi-sco
Page 209 and 210:
andom effect.This is an example ofa
Page 211 and 212:
Table 11.7 Comparison of analyses o
Page 213 and 214:
Table 11.8 Measurements of left ven
Page 215 and 216:
Table 11.9 Numbers of ears clear of
Page 217 and 218:
References Aitkin, M., Anderson, D.
Page 219 and 220:
Diggle, P. J., Liang, K.-Y. and Zeg
Page 221:
Roberts, G., Martyn, A. L., Dobson,
show all

an introduction to generalized linear models - GDM@FUDAN ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?