Introduction to Categorical Data Analysis

More documents

Recommendations

Info

5.2 MODEL CHECKING 147 raw 0 and 1 observations. The grouped data are the totals of successes and failures at each combination of the predictor values. Although the ML estimates of parameters are the same for either form of data, the X 2 and G 2 statistics are not. These goodness-of-fit tests only make sense for the grouped data. The large-sample theory for X 2 and G 2 applies for contingency tables when the fitted counts mostly exceed about 5. When calculated for logistic regression models fitted with continuous or nearly continuous predictors, the X 2 and G 2 statistics do not have approximate chi-squared distributions. How can we check the adequacy of a model for such data? One way creates categories for each predictor (e.g., four categories according to where a value falls relative to the quartiles) and then applies X 2 or G 2 to observed and fitted counts for the grouped data. As the number of explanatory variables increases, however, simultaneous grouping of values for each variable produces a contingency table with a very large number of cells. Most cells then have fitted values that are too small for the chi-squared approximation to be good. An alternative way of grouping the data forms observed and fitted values based on a partitioning of the estimated probabilities. With 10 groups of equal size, the first pair of observed counts and corresponding fitted counts refers to the n/10 observations having the highest estimated probabilities, the next pair refers to the n/10 observations having the second decile of estimated probabilities, and so forth. Each group has an observed count of subjects with each outcome and a fitted value for each outcome. The fitted value for an outcome is the sum of the estimated probabilities for that outcome for all observations in that group. The Hosmer–Lemeshow test uses a Pearson test statistic to compare the observed and fitted counts for this partition. The test statistic does not have exactly a limiting chi-squared distribution. However, Hosmer and Lemeshow (2000, pp. 147–156) noted that, when the number of distinct patterns of covariate values (for the original data) is close to the sample size, the null distribution is approximated by chi-squared with df = number of groups −2. For the fit to the horseshoe crab data of the logistic regression model with width (which is continuous) as the sole predictor, SAS (PROC LOGISTIC) reports that the Hosmer–Lemeshow statistic with 10 groups equals 6.6, with df = 10 − 2 = 8. It indicates an adequate fit (P -value = 0.58). For the model having width and color as predictors, the Hosmer–Lemeshow statistic equals 4.4 (df = 8), again indicating an adequate fit. 5.2.4 Residuals for Logit Models From a scientific perspective, the approach of comparing a working model to more complex models is more useful than a global goodness-of-fit test. A large goodnessof-fit statistic merely indicates some lack of fit, but provides no insight about its nature. Comparing a model to a more complex model, on the other hand, indicates whether lack of fit exists of a particular type. For either approach, when the fit is poor, diagnostic measures describe the influence of individual observations on the model fit and highlight reasons for the inadequacy.
148 BUILDING AND APPLYING LOGISTIC REGRESSION MODELS With categorical predictors, we can use residuals to compare observed and fitted counts. This should be done with the grouped form of the data. Let yi denote the number of “successes” for ni trials at setting i of the explanatory variables. Let ˆπi denote the estimated probability of success for the model fit. Then, the estimated binomial mean ni ˆπi is the fitted number of successes. For a GLM with binomial random component, the Pearson residual (3.9) comparing yi to its fit is Pearson residual = ei = yi − ni ˆπi � [ni ˆπi(1 −ˆπi)] Each Pearson residual divides the difference between an observed count and its fitted value by the estimated binomial standard deviation of the observed count. When ni is large, ei has an approximate normal distribution. When the model holds, {ei} has an approximate expected value of zero but a smaller variance than a standard normal variate. The standardized residual divides (yi − ni ˆπi) by its SE, standardized residual = yi − ni ˆπi SE = yi − ni ˆπi � [ni ˆπi(1 −ˆπi)(1 − hi)] The term hi in this formula is the observation’s leverage, its element from the diagonal of the so-called hat matrix. (Roughly speaking, the hat matrix is a matrix that, when applied to the sample logits, yields the predicted logit values for the model.) The greater an observation’s leverage, the greater its potential influence on the model fit. The standardized residual equals ei/ √ (1 − hi), so it is larger in absolute value than the Pearson residual ei. Itis approximately standard normal when the model holds. We prefer it. An absolute value larger than roughly 2 or 3 provides evidence of lack of fit. This serves the same purpose as the standardized residual (2.9) defined in Section 2.4.5 for detecting patterns of dependence in two-way contingency tables. It is a special case of the standardized residual presented in Section 3.4.5 for describing lack of fit in GLMs. When fitted values are very small, we have noted that X 2 and G 2 do not have approximate null chi-squared distributions. Similarly, residuals have limited meaning in that case. For ungrouped binary data and often when explanatory variables are continuous, each ni = 1. Then, yi can equal only 0 or 1, and a residual can assume only two values and is usually uninformative. Plots of residuals also then have limited use, consisting merely of two parallel lines of dots. The deviance itself is then completely uninformative about model fit. When data can be grouped into sets of observations having common predictor values, it is better to compute residuals for the grouped data than for individual subjects.
Page 1 and 2:
www.dbeBooks.com - An Ebook Library
Page 4:
An Introduction to Categorical Data
Page 7 and 8:
Copyright © 2007 by John Wiley & S
Page 9 and 10:
vi CONTENTS 2.1.3 Sensitivity and S
Page 11 and 12:
viii CONTENTS 4.1.5 Logistic Regres
Page 13 and 14:
x CONTENTS 6.3.1 Adjacent-Categorie
Page 15 and 16:
xii CONTENTS 9. Modeling Correlated
Page 18 and 19:
Preface to the Second Edition In re
Page 20:
PREFACE TO THE SECOND EDITION xvii
Page 23 and 24:
2 INTRODUCTION sciences (e.g., cate
Page 25 and 26:
4 INTRODUCTION models for continuou
Page 27 and 28:
6 INTRODUCTION The multinomial is a
Page 29 and 30:
8 INTRODUCTION precise, in terms of
Page 31 and 32:
10 INTRODUCTION that are judged pla
Page 33 and 34:
12 INTRODUCTION In this text, we us
Page 35 and 36:
14 INTRODUCTION 1.4.4 Small-Sample
Page 37 and 38:
16 INTRODUCTION For small samples,
Page 39 and 40:
18 INTRODUCTION the outcome y equal
Page 41 and 42:
20 INTRODUCTION a. Calculate the lo
Page 43 and 44:
22 CONTINGENCY TABLES Suppose there
Page 45 and 46:
24 CONTINGENCY TABLES Figure 2.1. T
Page 47 and 48:
26 CONTINGENCY TABLES compare two g
Page 49 and 50:
28 CONTINGENCY TABLES It can be any
Page 51 and 52:
30 CONTINGENCY TABLES The sample od
Page 53 and 54:
32 CONTINGENCY TABLES For Table 2.3
Page 55 and 56:
34 CONTINGENCY TABLES the binomial
Page 57 and 58:
36 CONTINGENCY TABLES The df value
Page 59 and 60:
38 CONTINGENCY TABLES Table 2.5. Cr
Page 61 and 62:
40 CONTINGENCY TABLES that combines
Page 63 and 64:
42 CONTINGENCY TABLES Large values
Page 65 and 66:
44 CONTINGENCY TABLES when the data
Page 67 and 68:
46 CONTINGENCY TABLES n11 equals P(
Page 69 and 70:
48 CONTINGENCY TABLES the observed
Page 71 and 72:
50 CONTINGENCY TABLES Table 2.10. D
Page 73 and 74:
52 CONTINGENCY TABLES Figure 2.4. P
Page 75 and 76:
54 CONTINGENCY TABLES marginal tabl
Page 77 and 78:
56 CONTINGENCY TABLES 2.4 A newspap
Page 79 and 80:
58 CONTINGENCY TABLES a. Construct
Page 81 and 82:
60 CONTINGENCY TABLES Table 2.14. D
Page 83 and 84:
62 CONTINGENCY TABLES a. Show that
Page 85 and 86:
64 CONTINGENCY TABLES 2.36 Give a
Page 87 and 88:
66 GENERALIZED LINEAR MODELS 3.1 CO
Page 89 and 90:
68 GENERALIZED LINEAR MODELS The ne
Page 91 and 92:
70 GENERALIZED LINEAR MODELS Figure
Page 93 and 94:
72 GENERALIZED LINEAR MODELS fitted
Page 95 and 96:
74 GENERALIZED LINEAR MODELS 3.3 GE
Page 97 and 98:
76 Table 3.2. Number of Crab Satell
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
82 GENERALIZED LINEAR MODELS with S
Page 105 and 106:
84 GENERALIZED LINEAR MODELS Simila
Page 107 and 108:
86 GENERALIZED LINEAR MODELS provid
Page 109 and 110:
88 GENERALIZED LINEAR MODELS 3.5 FI
Page 111 and 112:
90 GENERALIZED LINEAR MODELS Table
Page 113 and 114:
92 GENERALIZED LINEAR MODELS 37 obs
Page 115 and 116:
94 GENERALIZED LINEAR MODELS Table
Page 117 and 118: 96 GENERALIZED LINEAR MODELS 3.18 T
Page 119 and 120: 98 GENERALIZED LINEAR MODELS 3.22 T
Page 121 and 122: 100 LOGISTIC REGRESSION The logisti
Page 123 and 124: 102 LOGISTIC REGRESSION of observat
Page 125 and 126: 104 LOGISTIC REGRESSION for crabs i
Page 127 and 128: 106 LOGISTIC REGRESSION that P(Y =
Page 129 and 130: 108 LOGISTIC REGRESSION −2(L0 −
Page 131 and 132: 110 LOGISTIC REGRESSION The estimat
Page 133 and 134: 112 LOGISTIC REGRESSION Table 4.4.
Page 135 and 136: 114 LOGISTIC REGRESSION 4.3.4 The C
Page 137 and 138: 116 LOGISTIC REGRESSION 4.4.1 Examp
Page 139 and 140: 118 LOGISTIC REGRESSION 4.4.2 Model
Page 141 and 142: 120 LOGISTIC REGRESSION 1.2, based
Page 143 and 144: 122 LOGISTIC REGRESSION activity of
Page 149 and 150: 128 LOGISTIC REGRESSION 0 = no) and
Page 153 and 154: 132 LOGISTIC REGRESSION b. In Table
Page 157 and 158: 136 LOGISTIC REGRESSION c. The lack
Page 159 and 160: 138 BUILDING AND APPLYING LOGISTIC
Page 167: 146 BUILDING AND APPLYING LOGISTIC
Page 195 and 196: 174 MULTICATEGORY LOGIT MODELS are
Page 197 and 198: 176 MULTICATEGORY LOGIT MODELS By e
Page 199 and 200: 178 MULTICATEGORY LOGIT MODELS 6.1.
Page 201 and 202: 180 MULTICATEGORY LOGIT MODELS 6.2
Page 203 and 204: 182 MULTICATEGORY LOGIT MODELS Then
Page 205 and 206: 184 MULTICATEGORY LOGIT MODELS Like
Page 207 and 208: 186 MULTICATEGORY LOGIT MODELS Tabl
Page 209 and 210: 188 MULTICATEGORY LOGIT MODELS only
Page 211 and 212: 190 MULTICATEGORY LOGIT MODELS 6.3.
Page 213 and 214: 192 MULTICATEGORY LOGIT MODELS Tabl
Page 215 and 216: 194 MULTICATEGORY LOGIT MODELS With
Page 217 and 218: 196 MULTICATEGORY LOGIT MODELS high
Page 219 and 220:
198 MULTICATEGORY LOGIT MODELS 6.6
Page 221 and 222:
200 MULTICATEGORY LOGIT MODELS 6.9
Page 223 and 224:
202 MULTICATEGORY LOGIT MODELS Tabl
Page 225 and 226:
CHAPTER 7 Loglinear Models for Cont
Page 227 and 228:
206 LOGLINEAR MODELS FOR CONTINGENC
Page 229 and 230:
Page 231 and 232:
Page 233 and 234:
Page 235 and 236:
Page 237 and 238:
216 Table 7.9. Injury (I) by Gender
Page 239 and 240:
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
CHAPTER 8 Models for Matched Pairs
Page 267 and 268:
246 MODELS FOR MATCHED PAIRS are nu
Page 269 and 270:
248 MODELS FOR MATCHED PAIRS An alt
Page 271 and 272:
250 MODELS FOR MATCHED PAIRS This c
Page 273 and 274:
252 MODELS FOR MATCHED PAIRS only a
Page 275 and 276:
254 MODELS FOR MATCHED PAIRS Table
Page 277 and 278:
Page 279 and 280:
258 MODELS FOR MATCHED PAIRS from t
Page 281 and 282:
260 MODELS FOR MATCHED PAIRS likeli
Page 283 and 284:
262 MODELS FOR MATCHED PAIRS the ad
Page 285 and 286:
264 MODELS FOR MATCHED PAIRS 8.5.5
Page 287 and 288:
266 MODELS FOR MATCHED PAIRS ˆβ4
Page 289 and 290:
268 MODELS FOR MATCHED PAIRS 8.7 Re
Page 291 and 292:
270 MODELS FOR MATCHED PAIRS b. The
Page 293 and 294:
Page 295 and 296:
Page 297 and 298:
CHAPTER 9 Modeling Correlated, Clus
Page 299 and 300:
278 MODELING CORRELATED, CLUSTERED
Page 301 and 302:
Page 303 and 304:
Page 305 and 306:
Page 307 and 308:
Page 309 and 310:
Page 311 and 312:
Page 313 and 314:
Page 315 and 316:
Page 317 and 318:
Page 319 and 320:
298 RANDOM EFFECTS: GENERALIZED LIN
Page 321 and 322:
Page 323 and 324:
Page 325 and 326:
Page 327 and 328:
Page 329 and 330:
Page 331 and 332:
Page 333 and 334:
Page 335 and 336:
Page 337 and 338:
Page 339 and 340:
Page 341 and 342:
Page 343 and 344:
Page 345 and 346:
Page 347 and 348:
326 A HISTORICAL TOUR OF CATEGORICA
Page 349 and 350:
Page 351 and 352:
Page 353 and 354:
Appendix A: Software for Categorica
Page 355 and 356:
334 APPENDIX A: SOFTWARE FOR CATEGO
Page 357 and 358:
Page 359 and 360:
Page 361 and 362:
Page 363 and 364:
Page 365 and 366:
Bibliography Agresti, A. (2002). Ca
Page 367 and 368:
Index of Examples abortion, opinion
Page 369 and 370:
348 INDEX OF EXAMPLES lung cancer a
Page 371 and 372:
Subject Index Adjacent categories l
Page 373 and 374:
352 SUBJECT INDEX Exact inference (
Page 375 and 376:
354 SUBJECT INDEX Loglinear models
Page 377 and 378:
356 SUBJECT INDEX Residuals binomia
Page 379 and 380:
358 BRIEF SOLUTIONS TO SOME ODD-NUM
Page 381 and 382:
Page 383 and 384:
Page 385 and 386:
Page 387 and 388:
Page 389 and 390:
Page 391 and 392:
Page 393:
show all

Introduction to Categorical Data Analysis

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?