From Algorithms to Z-Scores - matloff - University of California, Davis

More documents

Recommendations

Info

308 CHAPTER 15. RELATIONS AMONG VARIABLES: LINEAR REGRESSION Here I use the term model selection to mean which predictor variables (including powers and interactions) we will use. If we have data on many predictors, we almost certainly will not be able to use them all, for the following reason: 15.17.1 The Overfitting Problem in Regression Recall in Section 15.10) we mentioned that we could add polynomial terms to a regression model. But you can see that if we carry this notion to its extreme, we get absurd results. If we fit a polynomial of degree 99 to our 100 points, we can make our fitted curve exactly pass through every point! This clearly would give us a meaningless, useless curve. We are simply fitting the noise. Recall that we analyzed this problem in Section 14.1.4 in our chapter on modeling. There we noted an absolutely fundamental principle in statistics: In choosing between a simpler model and a more complex one, the latter is more accurate only if either • we have enough data to support it, or • the complex model is sufficiently different from the simpler one This is extremely important in regression analysis, because we often have so many variables we can use, thus often can make highly complex models. In the regression context, the phrase “we have enough data to support the model” means (in the parametric model case) we have enough data so that the confidence intervals for the βi will be reasonably narrow. For fixed n, the more complex the model, the wider the resulting confidence intervals will tend to be. If we use too many predictor variables, 11 , our data is “diluted,” by being “shared” by so many βi. As a result, V ar( βi) will be large, with big implications: Whether our goal is Prediction or Understanding, our estimates will be so poor that neither goal is achieved. On the other hand, if some predictor variable is really important (i.e. its βi is far from 0), then it may pay to include it, even though the confidence intervals might get somewhat wider. The questions raised in turn by the above considerations, i.e. How much data is enough data?, and How different from 0 is “quite different”?, are addressed below in Section 15.17.3. A detailed mathematical example of overfitting in regression is presented in my paper A Careful Look at the Use of Statistical Methodology in Data Mining (book chapter), by N. Matloff, in 11 In the ALOHA example above, b, b 2 , b 3 and b 4 are separate predictors, even though they are of course correlated.
15.17. MODEL SELECTION 309 Foundations of Data Mining and Granular Computing, edited by T.Y. Lin, Wesley Chu and L. Matzlack, Springer-Verlag Lecture Notes in Computer Science, 2005. 15.17.2 Multicollinearity In typical applications, the X (i) are correlated with each other, to various degrees. If the correlation is high—a condition termed multicollinearity—problems may occur. Consider (15.30). Suppose one predictor variable were to be fully correlated with another. That would mean that the first is exactly equal to a linear function of the other, which would mean that in Q one column is an exact linear combination of the first column and another column. Then Q ′ Q −1 would not exist. Well, if one predictor is strongly (but not fully) correlated with another, (Q ′ Q) −1 will exist, but it will be numerically unstable. Moreover, even without numeric roundoff errors, (Q ′ Q) −1 would be very large, and thus (15.42) would be large, giving us large standard errors—not good! Thus we have yet another reason to limit our set of predictor variables. 15.17.3 Methods for Predictor Variable Selection So, we typically must discard some, maybe many, of our predictor variables. In the weight/height/age example, we may need to discard the age variable. In the ALOHA example, we might need to discard b 4 and even b 3 . How do we make these decisions? Note carefully that this is an unsolved problem. If anyone claims they have a foolproof way to do this, then they do not understand the problem in the first place. Entire books have been written on this subject (e.g. Subset Selection in Regression, by Alan Miller, pub. by Chapman and Hall, 2002), discussing myriad different methods. but again, none of them is foolproof. Hypothesis testing: The most commonly used methods for variable selection use hypothesis testing in one form or another. Typically this takes the form H0 : βi = 0 (15.54) In the context of (15.16), for instance, a decision as to whether to include age as one of our predictor variables would mean testing H0 : β2 = 0 (15.55)
Page 1 and 2:
From Algorithms to Z-Scores: Probab
Page 3 and 4:
Contents 1 Time Waste Versus Empowe
Page 5 and 6:
CONTENTS iii 3.7 A Combinatorial Ex
Page 7 and 8:
CONTENTS v 5.5.1.3 Example: Modelin
Page 9 and 10:
CONTENTS vii 7.3.1 Properties of Me
Page 11 and 12:
CONTENTS ix 10.1 Sampling Distribut
Page 13 and 14:
CONTENTS xi 11.9.4 What to Do Inste
Page 15 and 16:
CONTENTS xiii 15.2 Example Applicat
Page 17 and 18:
CONTENTS xv 17.2.3 Logistic Regress
Page 19 and 20:
CONTENTS xvii 19.2 Simulation of Ra
Page 21 and 22:
CONTENTS xix 21.4 Loss Models . . .
Page 23 and 24:
Preface Why is this book different
Page 25 and 26:
Chapter 1 Time Waste Versus Empower
Page 27 and 28:
Chapter 2 Basic Probability Models
Page 29 and 30:
2.2. THE CRUCIAL NOTION OF A REPEAT
Page 31 and 32:
2.3. OUR DEFINITIONS 7 2009, cannot
Page 33 and 34:
2.4. “MAILING TUBES” 9 but in m
Page 35 and 36:
2.5. BASIC PROBABILITY COMPUTATIONS
Page 37 and 38:
2.6. BAYES’ RULE 13 Note by the w
Page 39 and 40:
2.8. SOLUTION STRATEGIES 15 2.8 Sol
Page 41 and 42:
2.10. EXAMPLE: A SIMPLE BOARD GAME
Page 43 and 44:
2.11. EXAMPLE: BUS RIDERSHIP 19 Aga
Page 45 and 46:
2.12. SIMULATION 21 1 # roll d dice
Page 47 and 48:
2.12. SIMULATION 23 So, in evaluati
Page 49 and 50:
2.12. SIMULATION 25 3 count
Page 51 and 52:
2.13. COMBINATORICS-BASED PROBABILI
Page 53 and 54:
Page 55 and 56:
Page 57 and 58:
Page 59 and 60:
Chapter 3 Discrete Random Variables
Page 61 and 62:
3.4. EXPECTED VALUE 37 3.4.1.1 What
Page 63 and 64:
3.4. EXPECTED VALUE 39 So It turns
Page 65 and 66:
3.4. EXPECTED VALUE 41 • For rand
Page 67 and 68:
3.4. EXPECTED VALUE 43 of two rando
Page 69 and 70:
3.5. VARIANCE 45 ance of U is defin
Page 71 and 72:
3.5. VARIANCE 47 for any constant d
Page 73 and 74:
3.7. A COMBINATORIAL EXAMPLE 49 You
Page 75 and 76:
3.8. A USEFUL FACT 51 Note carefull
Page 77 and 78:
3.10. EXPECTED VALUE, ETC. IN THE A
Page 79 and 80:
3.11. DISTRIBUTIONS 55 3.11.1 Examp
Page 81 and 82:
3.12. PARAMETERIC FAMILIES OF PMFS
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
3.13. RECOGNIZING SOME PARAMETRIC D
Page 95 and 96:
3.13. RECOGNIZING SOME PARAMETRIC D
Page 97 and 98:
3.15. A CAUTIONARY TALE 73 T has a
Page 99 and 100:
3.16. WHY NOT JUST DO ALL ANALYSIS
Page 101 and 102:
3.18. RECONCILIATION OF MATH AND IN
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Chapter 4 Introduction to Discrete
Page 109 and 110:
4.3. EXAMPLE: 3-HEADS-IN-A-ROW GAME
Page 111 and 112:
4.4. EXAMPLE: ALOHA 87 The quantity
Page 113 and 114:
4.6. AN INVENTORY MODEL 89 4.6 An I
Page 115 and 116:
Chapter 5 Continuous Probability Mo
Page 117 and 118:
5.3. BUT EQUATION (??) PRESENTS A P
Page 119 and 120:
5.3. BUT EQUATION (??) PRESENTS A P
Page 121 and 122:
5.4. DENSITY FUNCTIONS 97 2(0.1)fX(
Page 123 and 124:
5.4. DENSITY FUNCTIONS 99 5.4.2 Pro
Page 125 and 126:
5.5. FAMOUS PARAMETRIC FAMILIES OF
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
5.8. “HYBRID” CONTINUOUS/DISCRE
Page 149 and 150:
5.8. “HYBRID” CONTINUOUS/DISCRE
Page 151 and 152:
Chapter 6 Stop and Review: Probabil
Page 153 and 154:
• famous parametric families of d
Page 155 and 156:
Chapter 7 Covariance and Random Vec
Page 157 and 158:
7.1. MEASURING CO-VARIATION OF RAND
Page 159 and 160:
7.2. SETS OF INDEPENDENT RANDOM VAR
Page 161 and 162:
7.2. SETS OF INDEPENDENT RANDOM VAR
Page 163 and 164:
7.3. MATRIX FORMULATIONS 139 this l
Page 165 and 166:
7.3. MATRIX FORMULATIONS 141 consis
Page 167 and 168:
7.3. MATRIX FORMULATIONS 143 import
Page 169 and 170:
7.3. MATRIX FORMULATIONS 145 since
Page 171 and 172:
7.3. MATRIX FORMULATIONS 147 minimi
Page 173 and 174:
Chapter 8 Multivariate PMFs and Den
Page 175 and 176:
8.1. MULTIVARIATE PROBABILITY MASS
Page 177 and 178:
8.2. MULTIVARIATE DENSITIES 153 So,
Page 179 and 180:
8.2. MULTIVARIATE DENSITIES 155 we
Page 181 and 182:
8.3. MORE ON SETS OF INDEPENDENT RA
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
8.4. EXAMPLE: FINDING THE DISTRIBUT
Page 191 and 192:
8.5. PARAMETRIC FAMILIES OF MULTIVA
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
Page 205 and 206:
Page 207 and 208:
Chapter 9 Introduction to Continuou
Page 209 and 210:
9.1. MEMORYLESS PROPERTY OF EXPONEN
Page 211 and 212:
9.3. HOLDING-TIME DISTRIBUTION 187
Page 213 and 214:
Page 215 and 216:
Page 217 and 218:
Chapter 10 Introduction to Confiden
Page 219 and 220:
10.1. SAMPLING DISTRIBUTIONS 195 Wh
Page 221 and 222:
10.1. SAMPLING DISTRIBUTIONS 197 Ap
Page 223 and 224:
10.3. CONFIDENCE INTERVALS FOR MEAN
Page 225 and 226:
10.4. MEANING OF CONFIDENCE INTERVA
Page 227 and 228:
10.5. GENERAL FORMATION OF CONFIDEN
Page 229 and 230:
10.7. CONFIDENCE INTERVALS FOR PROP
Page 231 and 232:
10.7. CONFIDENCE INTERVALS FOR PROP
Page 233 and 234:
10.8. CONFIDENCE INTERVALS FOR DIFF
Page 235 and 236:
Page 237 and 238:
Page 239 and 240:
10.10. R COMPUTATION 215 algebra, w
Page 241 and 242:
10.13. OTHER CONFIDENCE LEVELS 217
Page 243 and 244:
10.14. ONE MORE TIME: WHY DO WE USE
Page 245 and 246:
Chapter 11 Introduction to Signific
Page 247 and 248:
11.2. GENERAL TESTING BASED ON NORM
Page 249 and 250:
11.5. ONE-SIDED HA 225 By checking
Page 251 and 252:
11.6. EXACT TESTS 227 It is natural
Page 253 and 254:
11.8. THE POWER OF A TEST 229 11.8
Page 255 and 256:
11.9. WHAT’S WRONG WITH SIGNIFICA
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Chapter 12 General Statistical Esti
Page 263 and 264:
12.1. GENERAL METHODS OF PARAMETRIC
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
12.2. BIAS AND VARIANCE 247 people,
Page 273 and 274:
12.2. BIAS AND VARIANCE 249 Moreove
Page 275 and 276:
12.3. MORE ON THE ISSUE OF INDEPEND
Page 277 and 278:
12.4. NONPARAMETRIC DISTRIBUTION ES
Page 279 and 280:
12.4. NONPARAMETRIC DISTRIBUTION ES
Page 281 and 282: 12.4. NONPARAMETRIC DISTRIBUTION ES
Page 283 and 284: 12.5. BAYESIAN METHODS 259 his plan
Page 285 and 286: 12.5. BAYESIAN METHODS 261 it now b
Page 287 and 288: 12.5. BAYESIAN METHODS 263 number
Page 289 and 290: 12.5. BAYESIAN METHODS 265 (b) ˆp,
Page 291 and 292: Chapter 13 Simultaneous Inference M
Page 293 and 294: 13.2. SCHEFFE’S METHOD 269 You ca
Page 295 and 296: 13.4. OTHER METHODS FOR SIMULTANEOU
Page 297 and 298: Chapter 14 Introduction to Model Bu
Page 299 and 300: 14.1. “DESPERATE FOR DATA” 275
Page 301 and 302: 14.1. “DESPERATE FOR DATA” 277
Page 303 and 304: 14.2. ASSESSING “GOODNESS OF FIT
Page 305 and 306: 14.4. ROBUSTNESS 281 bin width, or
Page 307 and 308: 14.5. REAL POPULATIONS AND CONCEPTU
Page 309 and 310: Chapter 15 Relations Among Variable
Page 311 and 312: 15.3. ADJUSTING FOR COVARIATES 287
Page 313 and 314: 15.6. ESTIMATING THAT RELATIONSHIP
Page 315 and 316: 15.6. ESTIMATING THAT RELATIONSHIP
Page 317 and 318: 15.7. EXAMPLE: BASEBALL DATA 293
Page 319 and 320: 15.9. EXAMPLE: BASEBALL DATA (CONT
Page 321 and 322: 15.11. PREDICTION 297 matter, it ma
Page 323 and 324: 15.12. PARAMETRIC ESTIMATION OF LIN
Page 325 and 326: 15.12. PARAMETRIC ESTIMATION OF LIN
Page 327 and 328: 15.14. DUMMY VARIABLES 303 15.14 Du
Page 329 and 330: 15.16. WHAT DOES IT ALL MEAN?—EFF
Page 331: 15.17. MODEL SELECTION 307 But look
Page 335 and 336: 15.18. WHAT ABOUT THE ASSUMPTIONS?
Page 337 and 338: 15.19. CASE STUDIES 313 have an und
Page 339 and 340: 15.19. CASE STUDIES 315 Exercises N
Page 341 and 342: 15.19. CASE STUDIES 317 8. Consider
Page 343 and 344: Chapter 16 Advanced Statistical Est
Page 345 and 346: 16.2. THE DELTA METHOD: CONFIDENCE
Page 353 and 354: 16.3. THE BOOTSTRAP METHOD FOR FORM
Page 355 and 356: 16.3. THE BOOTSTRAP METHOD FOR FORM
Page 357 and 358: Chapter 17 Relations Among Variable
Page 359 and 360: 17.2. THE CLASSIFICATION PROBLEM 33
Page 365 and 366: 17.3. NONPARAMETRIC ESTIMATION OF R
Page 375 and 376: 17.4. SYMMETRIC RELATIONS AMONG SEV
Page 381 and 382: 17.5. SIMPSON’S (NON-)PARADOX 357
Page 383 and 384:
17.5. SIMPSON’S (NON-)PARADOX 359
Page 385 and 386:
Chapter 18 Describing “Failure”
Page 387 and 388:
18.2. A CAUTIONARY TALE: THE BUS PA
Page 389 and 390:
18.2. A CAUTIONARY TALE: THE BUS PA
Page 391 and 392:
18.3. RESIDUAL-LIFE DISTRIBUTION 36
Page 393 and 394:
Page 395 and 396:
Page 397 and 398:
Page 399 and 400:
Chapter 19 Advanced Multivariate Me
Page 401 and 402:
19.1. CONDITIONAL DISTRIBUTIONS 377
Page 403 and 404:
Page 405 and 406:
Page 407 and 408:
19.3. MIXTURE MODELS 383 Now that w
Page 409 and 410:
19.4. TRANSFORM METHODS 385 Thus an
Page 411 and 412:
19.4. TRANSFORM METHODS 387 19.4.2
Page 413 and 414:
19.4. TRANSFORM METHODS 389 transfo
Page 415 and 416:
19.5. VECTOR SPACE INTERPRETATIONS
Page 417 and 418:
19.7. CONDITIONAL EXPECTATION AS A
Page 419 and 420:
19.8. PROOF OF THE LAW OF TOTAL EXP
Page 421 and 422:
19.8. PROOF OF THE LAW OF TOTAL EXP
Page 423 and 424:
Chapter 20 Markov Chains One of the
Page 425 and 426:
20.1. DISCRETE-TIME MARKOV CHAINS 4
Page 427 and 428:
Page 429 and 430:
Page 431 and 432:
Page 433 and 434:
Page 435 and 436:
Page 437 and 438:
Page 439 and 440:
20.2. SIMULATION OF MARKOV CHAINS 4
Page 441 and 442:
20.4. CONTINUOUS-TIME MARKOV CHAINS
Page 443 and 444:
20.5. HITTING TIMES ETC. 419 least
Page 445 and 446:
20.5. HITTING TIMES ETC. 421 First
Page 447 and 448:
20.5. HITTING TIMES ETC. 423 valued
Page 449 and 450:
20.5. HITTING TIMES ETC. 425 summon
Page 451 and 452:
Chapter 21 Introduction to Queuing
Page 453 and 454:
21.2. M/M/1 429 busy for approximat
Page 455 and 456:
21.2. M/M/1 431 • Due to the memo
Page 457 and 458:
21.3. MULTI-SERVER MODELS 433 Recal
Page 459 and 460:
21.4. LOSS MODELS 435 1 = i,j,k π
Page 461 and 462:
21.5. NONEXPONENTIAL SERVICE TIMES
Page 463 and 464:
21.6. REVERSED MARKOV CHAINS 439 So
Page 465 and 466:
21.6. REVERSED MARKOV CHAINS 441 21
Page 467 and 468:
21.6. REVERSED MARKOV CHAINS 443 Re
Page 469 and 470:
21.7. NETWORKS OF QUEUES 445 • Gi
Page 471 and 472:
21.7. NETWORKS OF QUEUES 447 Let Li
Page 473 and 474:
Appendix A Review of Matrix Algebra
Page 475 and 476:
A.2. MATRIX TRANSPOSE 451 • Matri
Page 477 and 478:
A.6. EIGENVALUES AND EIGENVECTORS 4
Page 479 and 480:
Appendix B R Quick Start Here we pr
Page 481 and 482:
B.3. FIRST SAMPLE PROGRAMMING SESSI
Page 483 and 484:
B.4. SECOND SAMPLE PROGRAMMING SESS
Page 485 and 486:
B.4. SECOND SAMPLE PROGRAMMING SESS
Page 487 and 488:
B.6. COMPLEX NUMBERS 463 B.6 Comple
show all

From Algorithms to Z-Scores - matloff - University of California, Davis

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?