an introduction to generalized linear models - GDM@FUDAN ...

Recommendations

Info

2.3.5 Inference and interpretation It is sometimes useful to think of scientific data as measurements composed of a message, or signal, that is distorted by noise. For instance, in the example about birthweight the ‘signal’ is the usual growth rate ofbabies and the ‘noise’ comes from all the genetic and environmental factors that lead to individual variation. A goal ofstatistical modelling is to extract as much information as possible about the signal. In practice, this has to be balanced against other criteria such as simplicity. The Oxford Dictionary describes the law of parsimony (otherwise known as Occam’s Razor) as the principle that no more causes should be assumed than will account for the effect. Accordingly a simpler or more parsimonious model that describes the data adequately is preferable to a more complicated one which leaves little of the variability ‘unexplained’. To determine a parsimonious model consistent with the data, we test hypotheses about the parameters. Hypothesis testing is performed in the context of model fitting by defining a series ofnested models corresponding to different hypotheses. Then the question about whether the data support a particular hypothesis can be formulated in terms ofthe adequacy offit ofthe corresponding model relative to other more complicated models. This logic is illustrated in the examples earlier in this chapter. Chapter 5 provides a more detailed explanation of the concepts and methods used, including the sampling distributions for the statistics used to describe ‘goodness offit’. While hypothesis testing is useful for identifying a good model, it is much less useful for interpreting it. Wherever possible, the parameters in a model should have some natural interpretation; for example, the rate of growth of babies, the relative risk ofacquiring a disease or the mean difference in profit from two marketing strategies. The estimated magnitude of the parameter and the reliability ofthe estimate as indicated by its standard error or a confidence interval are far more informative than significance levels or p-values. They make it possible to answer questions such as: is the effect estimated with sufficient precision to be useful, or is the effect large enough to be of practical, social or biological significance? 2.3.6 Further reading An excellent discussion ofthe principles ofstatistical modelling is in the introductory part ofCox and Snell (1981). The importance ofadopting a systematic approach is stressed by Kleinbaum et al. (1998). The various steps ofmodel choice, criticism and validation are outlined by Krzanowski (1998). The use of residuals is described in Neter et al. (1996), Draper and Smith (1998), Belsley et al. (1980) and Cook and Weisberg (1999). © 2002 by Chapman & Hall/CRC
2.4Notation and coding for explanatory variables For the models in this book the equation linking each response variable Y and a set ofexplanatory variables x1,x2,...xm has the form g[E(Y )] = β0 + β1x1 + ...+ βmxm. For responses Y1, ..., YN , this can be written in matrix notation as g[E(y)] = Xβ (2.13) where ⎡ ⎤ Y1 ⎢ . ⎥ y = ⎢ . ⎥ is a vector ofresponses, ⎣ . ⎦ YN ⎡ ⎢ g[E(y)] = ⎢ ⎣ g[E(Y1)] . . . g[E(YN)] denotes a vector offunctions ofthe terms E(Yi) (with the same g for every element), ⎡ ⎤ β1 ⎢ . ⎥ β = ⎢ . ⎥ is a vector ofparameters, ⎣ . ⎦ βp and X is a matrix whose elements are constants representing levels ofcategorical explanatory variables or measured values ofcontinuous explanatory variables. For a continuous explanatory variable x (such as gestational age in the example on birthweight) the model contains a term βx where the parameter β represents the change in the response corresponding to a change ofone unit in x. For categorical explanatory variables there are parameters for the different levels ofa factor. The corresponding elements ofX are chosen to exclude or include the appropriate parameters for each observation; they are called dummy variables. Ifthey are only zeros and ones, the term indictor variable is used. Ifthere are p parameters in the model and N observations, then y is a N × 1 random vector, β is a p × 1 vector ofparameters and X is an N × p matrix ofknown constants. X is often called the design matrix and Xβ is the linear component ofthe model. Various ways ofdefining the elements of X are illustrated in the following examples. © 2002 by Chapman & Hall/CRC ⎤ ⎥ ⎦
Page 1 and 2: CHAPMAN & HALL/CRC Texts in Statist
Page 3 and 4: AN INTRODUCTION TO GENERALIZED LINE
Page 5 and 6: Preface Contents 1 Introduction 1.1
Page 7 and 8: 10 Survival Analysis 10.1 Introduct
Page 9 and 10: 1 Introduction 1.1 Background This
Page 11 and 12: Table 1.1 Major methods of statisti
Page 13 and 14: ofgeneralized linear models althoug
Page 15 and 16: 3. Let Y1, ..., Yn denote Normally
Page 17 and 18: divided by its degrees offreedom, F
Page 19 and 20: l(θ; y) = log L(θ; y), since the
Page 21 and 22: (i.e., the matrix ofsecond derivati
Page 23 and 24: Table 1.3 Successive approximations
Page 25 and 26: 2 Model Fitting 2.1 Introduction Th
Page 27 and 28: IfH1is true, then the log-likelihoo
Page 29 and 30: estimated in order to calculate to
Page 31 and 32: where xjk is the gestational age of
Page 33 and 34: Table 2.4 Summary of data on birthw
Page 35 and 36: Residuals Residuals Percent 2 1 0 -
Page 37 and 38: sampling distributions ofthe corres
Page 39 and 40: is categorical how many categories
Page 41: Cox and Snell, 1968; Prigibon, 1981
Page 45 and 46: and the rows of X are as follows Gr
Page 47 and 48: (a) Conduct an exploratory analysis
Page 49 and 50: (d) List the assumptions made for (
Page 51 and 52: putation involving numerical optimi
Page 53 and 54: worthwhile trying to identify a tra
Page 55 and 56: We also need expressions for the ex
Page 57 and 58: 1. Response variables Y1,... ,YN wh
Page 59 and 60: Table 3.2 Numbers of deaths from co
Page 61 and 62: 3.4 Use results (3.9) and (3.12) to
Page 63 and 64: 4 Estimation 4.1 Introduction This
Page 65 and 66: x (m - 1) x (m) Figure 4.3 Newton-R
Page 67 and 68: Table 4.2 Details of Newton-Raphson
Page 69 and 70: y differentiating (4.13) and substi
Page 71 and 72: Table 4.3 Data for Poisson regressi
Page 73 and 74: Table 4.4 Successive approximations
Page 75 and 76: 5 Inference 5.1 Introduction The tw
Page 77 and 78: consistent with the general result
Page 79 and 80: approximated by its expected value
Page 81 and 82: Hence E � (b − β)(b − β) T
Page 83 and 84: For Yi’s with other distributions
Page 85 and 86: 8, D has a chi-squared distribution
Page 87 and 88: Consider the null hypothesis ⎡
Page 89 and 90: (a) Find the Wald statistic (�π
Page 91 and 92: 6.2.2 Least squares estimation IfE(
Page 93 and 94:
Table 6.2 Multiple hypothesis tests
Page 95 and 96:
the minimum value ofthe sum ofsquar
Page 97 and 98:
and ⎡ X T ⎢ X = ⎢ ⎣ 20 923
Page 99 and 100:
or ‘worst possible’ value of S.
Page 101 and 102:
Table 6.6 Dried weights yi of plant
Page 103 and 104:
The first row (or column) ofthe (J
Page 105 and 106:
so For the plant weight data and Y
Page 107 and 108:
4. The model formed by omitting eff
Page 109 and 110:
Finally for the model with only a m
Page 111 and 112:
6.5 Analysis of covariance Analysis
Page 113 and 114:
For the reduced model (6.14) �
Page 115 and 116:
6.7 Exercises 6.1 Table 6.15 shows
Page 117 and 118:
Table 6.17 Cholesterol (CHOL), age
Page 119 and 120:
6.8 Table 6.20 shows the data from
Page 121 and 122:
Table 7.1 Frequencies for N binomia
Page 123 and 124:
x x Figure 7.2 Normal distribution:
Page 125 and 126:
and log(1 − πi) =− log [1 + ex
Page 127 and 128:
Table 7.4 Comparison of observed nu
Page 129 and 130:
Proportion germinated 0.7 0.6 0.5 4
Page 131 and 132:
which is asymptotically equivalent
Page 133 and 134:
From equation (7.5), �m residuals
Page 135 and 136:
Proportion with symptoms of senilit
Page 137 and 138:
Table 7.10 Hosmer-Lemeshow test for
Page 139 and 140:
(a) Are the proportions of graduate
Page 141 and 142:
category 2, and so on, then let ⎡
Page 143 and 144:
(iii) Likelihood ratio chi-squared
Page 145 and 146:
Women: preference for air condition
Page 147 and 148:
Table 8.3 Results from fitting the
Page 149 and 150:
π 1 π 2 π 3 π 4 C1 C2 C3 Figure
Page 151 and 152:
The adjacent category logit model i
Page 153 and 154:
Table 8.4 Results of proportional o
Page 155 and 156:
(c) Use a Wald statistic to test th
Page 157 and 158:
variables. The study design may mea
Page 159 and 160:
series expansion given in Section 7
Page 161 and 162:
for smokers and zero for non-smoker
Page 163 and 164:
column totals. It appears that Hutc
Page 165 and 166:
similar to the ulcer patients with
Page 167 and 168:
� k θ.k =1. This hypothesis can
Page 169 and 170:
9.6 Inference for log-linear models
Page 171 and 172:
Table 9.10 Log-linear models for th
Page 173 and 174:
9.9 Exercises 9.1 Let Y1, ..., YN b
Page 175 and 176:
Table 9.14 Contingency table with 2
Page 177 and 178:
1 2 D 3 A D 4 L 5 D D TO TL TC time
Page 179 and 180:
ofthe distribution. The median surv
Page 181 and 182:
10.2.3 Weibull distribution Another
Page 183 and 184:
Table 10.1 Remission times of leuke
Page 185 and 186:
log H(y) 1 0 -1 -2 0 1 2 3 log (y)
Page 187 and 188:
As there are r uncensored observati
Page 189 and 190:
small number ofcategorical explanat
Page 191 and 192:
Cox Snell residuals 3 2 1 0 Devianc
Page 193 and 194:
is sometimes used for modelling sur
Page 195 and 196:
11 Clustered and Longitudinal Data
Page 197 and 198:
data from the stroke example in Sec
Page 199 and 200:
score 100 80 60 40 20 0 2 4 6 8 wee
Page 201 and 202:
Table11.3 Results of naive analyses
Page 203 and 204:
Table 11.6 Analysis of variance of
Page 205 and 206:
1. All the off-diagonal elements ar
Page 207 and 208:
These are also called the quasi-sco
Page 209 and 210:
andom effect.This is an example ofa
Page 211 and 212:
Table 11.7 Comparison of analyses o
Page 213 and 214:
Table 11.8 Measurements of left ven
Page 215 and 216:
Table 11.9 Numbers of ears clear of
Page 217 and 218:
References Aitkin, M., Anderson, D.
Page 219 and 220:
Diggle, P. J., Liang, K.-Y. and Zeg
Page 221:
Roberts, G., Martyn, A. L., Dobson,
show all

an introduction to generalized linear models - GDM@FUDAN ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?