statisticalrethinkin..

More documents

Recommendations

Info

384 ENDNOTES45. Fisher (1925), in Chapter III within section 12 on the normal distribution. ere a couple of other places inthe book in which the same resort to convenience or convention is used. Fisher seems to indicate that the 5%mark was already widely practiced by 1925 and already without clear justification. [67]46. Fisher (1956). [67]47. See Henrion and Fischoff (1986) for examples from the estimation of physical constants, such as the speedof light. [68]48. Robert (2007) provides concise proofs of optimal estimators under several standard loss functions, like thisone. It also covers the history of the topic, as well as many related issues in deriving good decisions from statisticalprocedures. [70]49. Rice (2010) presents a interesting construction of classical Fisherian testing through the adoption of lossfunctions. [71]50. See Hauer (2004) for three tales from transportation safety in which testing resulted in premature incorrectdecisions and a demonstrable and continuing loss of human life. [71]51. It is poorly appreciated that coin tosses are very hard to bias, as long as you catch them in the air. Once theyland and bounce and spin, however, it is very easy to bias them. [77]52. Jaynes (1985), page 351. [78]Chapter 453. Leo Breiman, at the start of Chapter 9 of his classic book on probability theory (Breiman, 1968), says “thereis really no completely satisfying answer” to the question “why normal?” Many mathematical results remainmysterious, even aer we prove them. So if you don’t quite get why the normal distribution is the limiting distribution,you are in good company. [86]54. For the reader hungry for mathematical details, see Frank (2009) for a nicely illustrated explanation of this,using Fourier transforms. [86]55. Technically, the distribution of sums converges to normal only when the original distribution has finite variance.What this means practically is that the magnitude of any newly sampled value cannot be so big as tooverwhelm all of the previous values. ere are natural phenomena with effectively infinite variance, but wewon’t be working with any. Or rather, when we do, I won’t comment on it. [86]56. Howell 2010 and Howell 2000. See also Lee and DeVore 1976. Much more raw data is available for downloadfrom https://tspace.library.utoronto.ca/handle/1807/10395. [91]57. Jaynes (2003), page 21–22. See that book’s index for other mentions in various statistical arguments. [93]58. e strategy is the same grid approximation strategy as before (page 48). But now there are two dimensions,and so there is a geometric (literally) increase in bother. e algorithm is mercifully short, however, if not transparent.ink of the code as being six distinct commands. e first two lines of code just establish the range ofµ and σ values, respectively, to calculate over, as well as how many points to calculate in between. e third lineof code expands those chosen µ and σ values into a matrix of all of the combinations of µ and σ. is matrixis stored in a data frame, post. In the monstrous fourth line of code, shown in expanded form to make it easierto read, the log-likelihood at each combination of µ and σ is computed. is line looks so awful, because wehave to be careful here to do everything on the log scale. Otherwise rounding error will quickly make all of theposterior probabilities zero. So what sapply does is pass the unique combination of µ and σ on each row ofpost to a function that compute the log-likelihood of each observed height, and adds all of these log-likelihoodstogether (sum). In the fih line, we multiply the prior by the likelihood to get the product that is proportionalto the posterior density. e priors are also on the log scale, and so we add them to the log-likelihood, which isequivalent to multiplying the raw densities by the likelihood. Finally, the obstacle for getting back on the probabilityscale is that rounding error is always a threat when moving from log-probability to probability. If you usethe obvious approach, like exp( post$prod ), you’ll get a vector full of zeros, which isn’t very helpful. isis a result of R’s rounding very small probabilities to zero. Remember, in large samples, all unique samples areunlikely. is is why you have to work with log-probability. e code in the box dodges this problem by scaling
ENDNOTES 385all of the log-products by the maximum log-product. As a result, the values in post$prob are not all zero, butthey also aren’t exactly probabilities. Instead they are relative posterior probabilities. But that’s good enough forwhat we wish to do with these values. [96]59. e most accessible of Galton’s writings on the topic has been reprinted as Galton (1989). [104]60. e implied definition of α in a parabolic model is α = E y i − β 1 E x i − β 2 E x 2 i . Now even when the averagex i is zero, E x i = 0, the average square will likely not be zero. So α becomes hard to directly interpret again. [123]Chapter 561. “How to Measure a Storm’s Fury One Breakfast at a Time.” e Wall Street Journal: September 1, 2011. [129]62. Simpson (1951). Simpson’s paradox is very famous in statistics, probably because recognizing it increasesthe apparent usefulness of statistical modeling. It’s a lot less known outside of statistics. [129]63. Debates about causal inference go back a long time. David Hume is key citation. One curious obstaclein modern statistics is that classic causal reasoning requires that if A causes B, then B will always appearwhen A appears. But with probabilistic relationships, like those described in most contemporary scientificmodels, it is unsurprising to talk about probabilistic causes, in which B only sometimes follows A. Seehttp://plato.stanford.edu/entries/causation-probabilistic/. [130]64. See Pearl (2014) for an accessible introduction, with discussion. See also Rubin (2005) for a related approach.[130]65. Data from Table 2 of Hinde and Milligan (2011). [145]66. Provided the posterior distributions are Gaussian, you could, however, get the variance of their sum by addingtheir variances and twice their covariance. e variance of the sum (or difference) of two normal distributionsa and b is given by σ 2 a + σ 2 b + 2ρσ aσ b , where ρ is the correlation between the two. [162]67. See Gelman and Stern (2006) for further explanation, and see Nieuwenhuis et al. (2011) for some evidenceof how commonly this mistake occurs. [165]68. See Stigler (1981) for historical context. ere are a number of legitimate ways to derive the method of leastsquares estimation. Gauss’ approach was Bayesian, but a probability interpretation isn’t always necessary. [166]69. ese data are modified from an example in Grafen and Hails (2002). [170]Chapter 670. De Revolutionibus, Book 1, Chapter 10. [173]71. See e.g. Akaike (1978), as well as discussion in Burnham and Anderson (2002). [175]72. Data from Table 1 of McHenry and Coffing (2000). [176]73. See Grünwald (2007) for a book-length treatment of these ideas. [180]74. ere are many discussions of bias and variance in the literature, some much more mathematical thanothers. For a broad treatment, I recommend Chapter 7 of Hastie, Tibshirani and Friedman’s 2009 book, whichexplores BIC, AIC, cross-validation and other measures, all in the context of the bias-variance tradeoff. [182]75. I first encountered this kind of example in Jaynes (1976), page 246. Jaynes himself credits G. David Forney’s1972 information theory course notes. Forney is an important figure in information theory, having won severalawards for his contributions. [182]76. Shannon (1948). For a more accessible introduction, see the venerable textbook Elements of Informationeory, by Cover and omas. Slightly more advanced, but having lots of added value, is Jaynes’ (2003, Chapter11) presentation. A foundational book in applying information theory to statistical inference is Kullback (1959),but it’s not easy reading. [184]
Page 1:
Statistical RethinkingA BAYESIAN CO
Page 4 and 5:
4 CONTENTS5.5. Ordinary least squar
Page 7 and 8:
PrefaceMasons, when they start upon
Page 9 and 10:
HOW TO USE THIS BOOK 9least a minor
Page 11 and 12:
HOW TO USE THIS BOOK 11than initial
Page 13 and 14:
1 e Golem of PragueIn the 16th cent
Page 16:
16 1. THE GOLEM OF PRAGUEconverses
Page 19 and 20:
1.2. WRECKING PRAGUE 19model P 1B ,
Page 21 and 22:
1.2. WRECKING PRAGUE 21e dominant r
Page 23 and 24:
1.3. THREE TOOLS FOR GOLEM ENGINEER
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
2 Small Worlds and Large WorldsWhen
Page 31 and 32:
2.1. PROBABILITY IS JUST COUNTING 3
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
2.2. COLOMBO’S FIRST BAYESIAN MOD
Page 39 and 40:
W L W W W L W L W2.2. COLOMBO’S F
Page 41 and 42:
2.3. COMPONENTS OF THE MODEL 41not
Page 43 and 44:
2.3. COMPONENTS OF THE MODEL 43You
Page 45 and 46:
2.3. COMPONENTS OF THE MODEL 45e ma
Page 47 and 48:
2.4. MAKING THE MODEL GO 47other pr
Page 49 and 50:
2.4. MAKING THE MODEL GO 495 points
Page 51 and 52:
2.4. MAKING THE MODEL GO 51just the
Page 53 and 54:
2.4. MAKING THE MODEL GO 53between
Page 55 and 56:
2.6. PRACTICE 55Medium.2.6.5. m1. R
Page 57 and 58:
2.6. PRACTICE 57implied by the equa
Page 59 and 60:
3 Sampling the ImaginaryLots of boo
Page 61 and 62:
3. SAMPLING THE IMAGINARY 61a probl
Page 63 and 64:
3.2. SAMPLING TO SUMMARIZE 63plot(
Page 65 and 66:
3.2. SAMPLING TO SUMMARIZE 65Densit
Page 67 and 68:
3.2. SAMPLING TO SUMMARIZE 67In con
Page 69 and 70:
3.2. SAMPLING TO SUMMARIZE 69Densit
Page 71 and 72:
3.3. SAMPLING TO SIMULATE PREDICTIO
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79 and 80:
3.5. PRACTICE 793.5. PracticeEasy.
Page 81:
3.5. PRACTICE 813.5.17. Predict sec
Page 84 and 85:
84 4. LINEAR MODELSFIGURE 4.1. e Pt
Page 86 and 87:
86 4. LINEAR MODELSexperiment with
Page 88 and 89:
88 4. LINEAR MODELS4.1.4.2. Epistem
Page 90 and 91:
90 4. LINEAR MODELSe approach above
Page 92 and 93:
92 4. LINEAR MODELS'data.frame': 54
Page 94 and 95:
94 4. LINEAR MODELSe point isn’t
Page 96 and 97:
96 4. LINEAR MODELShave the samples
Page 98 and 99:
98 4. LINEAR MODELSere’s no troub
Page 100 and 101:
100 4. LINEAR MODELS)Note the comma
Page 102 and 103:
102 4. LINEAR MODELSpreviously obse
Page 104 and 105:
104 4. LINEAR MODELS4.4. Adding a p
Page 106 and 107:
106 4. LINEAR MODELS(1) What is the
Page 108 and 109:
108 4. LINEAR MODELSdata(Howell1)d
Page 110 and 111:
110 4. LINEAR MODELSRethinking: Wha
Page 112 and 113:
112 4. LINEAR MODELSheight140 150 1
Page 114 and 115:
Page 116 and 117:
116 4. LINEAR MODELSYou end up with
Page 118 and 119:
118 4. LINEAR MODELS(1) Use link to
Page 120 and 121:
Page 122 and 123:
122 4. LINEAR MODELSRethinking: Lin
Page 124 and 125:
Page 126 and 127:
126 4. LINEAR MODELS4.7.1. e1. In t
Page 129 and 130:
5 Multivariate Linear ModelsOne of
Page 131 and 132:
5.1. SPURIOUS ASSOCIATION 131Divorc
Page 133 and 134:
5.1. SPURIOUS ASSOCIATION 133But me
Page 135 and 136:
5.1. SPURIOUS ASSOCIATION 135abmbas
Page 137 and 138:
5.1. SPURIOUS ASSOCIATION 137this i
Page 139 and 140:
5.1. SPURIOUS ASSOCIATION 139slower
Page 141 and 142:
5.1. SPURIOUS ASSOCIATION 141Median
Page 143 and 144:
5.1. SPURIOUS ASSOCIATION 143(a)(b)
Page 145 and 146:
5.2. MASKED RELATIONSHIP 145N
Page 147 and 148:
5.2. MASKED RELATIONSHIP 147a ~ dno
Page 149 and 150:
5.2. MASKED RELATIONSHIP 149dcc$log
Page 151 and 152:
5.3. WHEN ADDING VARIABLES HURTS 15
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
5.4. CATEGORICAL VARIABLES 161$ wei
Page 163 and 164:
5.4. CATEGORICAL VARIABLES 1635.4.2
Page 165:
5.4. CATEGORICAL VARIABLES 165# sam
Page 168 and 169:
168 5. MULTIVARIATE LINEAR MODELS5.
Page 170 and 171:
170 5. MULTIVARIATE LINEAR MODELS5.
Page 173 and 174:
6 Model Selection, Comparison, and
Page 175 and 176:
6.1. THE PROBLEM WITH PARAMETERS 17
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
6.2. INFORMATION THEORY AND MODEL P
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
6.3. AKAIKE INFORMATION CRITERION 1
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
Page 197 and 198:
6.4. DEVIANCE INFORMATION CRITERION
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
6.5. USING AIC 203helps guard again
Page 205 and 206:
6.5. USING AIC 205compare( m6.11 ,
Page 207 and 208:
6.5. USING AIC 207e attitude this b
Page 209 and 210:
6.5. USING AIC 209kcal.per.g0.5 0.7
Page 211 and 212:
6.7. PRACTICE 211Consider by analog
Page 213:
6.7. PRACTICE 2136.7.3. e deviance
Page 216 and 217:
216 7. INTERACTIONSFIGURE 7.1. TOP:
Page 218 and 219:
218 7. INTERACTIONSlog(rgdppc_2000)
Page 220 and 221:
220 7. INTERACTIONSird, we may want
Page 222 and 223:
222 7. INTERACTIONSlog GDP year 200
Page 224 and 225:
Page 226 and 227:
226 7. INTERACTIONSInteraction mode
Page 228 and 229:
228 7. INTERACTIONSthis model and t
Page 230 and 231:
Page 232 and 233:
232 7. INTERACTIONSe main effect li
Page 234 and 235:
234 7. INTERACTIONSbs -38.91 34.94s
Page 236 and 237:
236 7. INTERACTIONSe primary reason
Page 238 and 239:
238 7. INTERACTIONSNow for the plot
Page 240 and 241:
240 7. INTERACTIONS7.4. Higher-orde
Page 243 and 244:
8 Markov Chain Monte Carlo Estimati
Page 245 and 246:
8.1. GOOD KING MARKOV AND HIS ISLAN
Page 247 and 248:
8.2. MARKOV CHAIN MONTE CARLO 247fo
Page 249 and 250:
8.2. MARKOV CHAIN MONTE CARLO 249No
Page 251 and 252:
8.3. EASY HMC: MAP2STAN 251We’re
Page 253 and 254:
8.3. EASY HMC: MAP2STAN 253$ cont_a
Page 255 and 256:
8.3. EASY HMC: MAP2STAN 255mcmcpair
Page 257 and 258:
8.3. EASY HMC: MAP2STAN 257FIGURE 8
Page 259 and 260:
8.4. CARE AND FEEDING OF YOUR MARKO
Page 261 and 262:
Page 263 and 264:
Page 265:
8.6. PRACTICE 265And since the chai
Page 268 and 269:
268 9. BIG ENTROPY AND THE GENERALI
Page 270 and 271:
270 9. BIG ENTROPY AND THE GENERALI
Page 273 and 274:
10 Distance and DurationA curious t
Page 275:
10.2. GAMMA 275can be quite complic
Page 278 and 279:
278 11. COUNTING AND CLASSIFICATION
Page 280 and 281:
Page 282 and 283:
Page 284 and 285:
Page 286 and 287:
Page 288 and 289:
Page 290 and 291:
Page 292 and 293:
Page 294 and 295:
Page 296 and 297:
Page 298 and 299:
Page 300 and 301:
Page 302 and 303:
Page 304 and 305:
304 12. MONSTERS AND MIXTUREShere?
Page 306 and 307:
306 12. MONSTERS AND MIXTURESdensit
Page 308 and 309:
308 12. MONSTERS AND MIXTURESfootbr
Page 310 and 311:
310 12. MONSTERS AND MIXTURESWhy di
Page 312 and 313:
312 12. MONSTERS AND MIXTURESm11.3
Page 314 and 315:
314 12. MONSTERS AND MIXTURESResult
Page 316 and 317:
316 12. MONSTERS AND MIXTURESanswer
Page 318 and 319:
318 12. MONSTERS AND MIXTURESDensit
Page 320 and 321:
320 12. MONSTERS AND MIXTURESBut we
Page 322 and 323:
322 12. MONSTERS AND MIXTURESdisper
Page 324 and 325:
324 12. MONSTERS AND MIXTURESe resu
Page 326 and 327:
326 12. MONSTERS AND MIXTURESDensit
Page 328 and 329:
328 12. MONSTERS AND MIXTURESR code
Page 330 and 331:
330 12. MONSTERS AND MIXTURESPoisso
Page 332 and 333:
332 12. MONSTERS AND MIXTURESthe ga
Page 334 and 335: 334 12. MONSTERS AND MIXTURESdata=d
Page 336 and 337: 336 12. MONSTERS AND MIXTURESma2 ~
Page 338 and 339: 338 13. MULTILEVEL MODELSrepeat obs
Page 340 and 341: 340 13. MULTILEVEL MODELSSo how do
Page 342 and 343: 342 13. MULTILEVEL MODELSprobabilit
Page 344 and 345: 344 13. MULTILEVEL MODELSa conseque
Page 346 and 347: 346 13. MULTILEVEL MODELSCompute th
Page 348 and 349: 348 13. MULTILEVEL MODELSabsolute e
Page 350 and 351: 350 13. MULTILEVEL MODELS)sigma_act
Page 353 and 354: 14 Multilevel Models II: Slopes14.1
Page 355 and 356: 14.1. EVERYTHING CAN VARY AND PROBA
Page 357: 14.1. EVERYTHING CAN VARY AND PROBA
Page 360 and 361: 360 15. MISSING DATA AND OTHER OPPO
Page 379: 16 Writing Statistics379
Page 382 and 383: 382 ENDNOTES12. For an autopsy of t
Page 386 and 387: 386 ENDNOTES77. See two famous edit
Page 388 and 389: 388 ENDNOTES113. Hurlbert (1984) is
Page 390 and 391: 390 BibliographyFrank, S. A. (2011)
Page 392 and 393: 392 BibliographyProulx, S. R. and A
Page 395 and 396: IndexAkaike (1973), 380, 383Akaike
Page 397: INDEX 397non-identifiability, 153Oc
show all

statisticalrethinkin..

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?