11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ENDNOTES 385all of the log-products by the maximum log-product. As a result, the values in post$prob are not all zero, butthey also aren’t exactly probabilities. Instead they are relative posterior probabilities. But that’s good enough forwhat we wish to do with these values. [96]59. e most accessible of Galton’s writings on the topic has been reprinted as Galton (1989). [104]60. e implied definition of α in a parabolic model is α = E y i − β 1 E x i − β 2 E x 2 i . Now even when the averagex i is zero, E x i = 0, the average square will likely not be zero. So α becomes hard to directly interpret again. [123]Chapter 561. “How to Measure a Storm’s Fury One Breakfast at a Time.” e Wall Street Journal: September 1, 2011. [129]62. Simpson (1951). Simpson’s paradox is very famous in statistics, probably because recognizing it increasesthe apparent usefulness of statistical modeling. It’s a lot less known outside of statistics. [129]63. Debates about causal inference go back a long time. David Hume is key citation. One curious obstaclein modern statistics is that classic causal reasoning requires that if A causes B, then B will always appearwhen A appears. But with probabilistic relationships, like those described in most contemporary scientificmodels, it is unsurprising to talk about probabilistic causes, in which B only sometimes follows A. Seehttp://plato.stanford.edu/entries/causation-probabilistic/. [130]64. See Pearl (2014) for an accessible introduction, with discussion. See also Rubin (2005) for a related approach.[130]65. Data from Table 2 of Hinde and Milligan (2011). [145]66. Provided the posterior distributions are Gaussian, you could, however, get the variance of their sum by addingtheir variances and twice their covariance. e variance of the sum (or difference) of two normal distributionsa and b is given by σ 2 a + σ 2 b + 2ρσ aσ b , where ρ is the correlation between the two. [162]67. See Gelman and Stern (2006) for further explanation, and see Nieuwenhuis et al. (2011) for some evidenceof how commonly this mistake occurs. [165]68. See Stigler (1981) for historical context. ere are a number of legitimate ways to derive the method of leastsquares estimation. Gauss’ approach was Bayesian, but a probability interpretation isn’t always necessary. [166]69. ese data are modified from an example in Grafen and Hails (2002). [170]Chapter 670. De Revolutionibus, Book 1, Chapter 10. [173]71. See e.g. Akaike (1978), as well as discussion in Burnham and Anderson (2002). [175]72. Data from Table 1 of McHenry and Coffing (2000). [176]73. See Grünwald (2007) for a book-length treatment of these ideas. [180]74. ere are many discussions of bias and variance in the literature, some much more mathematical thanothers. For a broad treatment, I recommend Chapter 7 of Hastie, Tibshirani and Friedman’s 2009 book, whichexplores BIC, AIC, cross-validation and other measures, all in the context of the bias-variance tradeoff. [182]75. I first encountered this kind of example in Jaynes (1976), page 246. Jaynes himself credits G. David Forney’s1972 information theory course notes. Forney is an important figure in information theory, having won severalawards for his contributions. [182]76. Shannon (1948). For a more accessible introduction, see the venerable textbook Elements of Informationeory, by Cover and omas. Slightly more advanced, but having lots of added value, is Jaynes’ (2003, Chapter11) presentation. A foundational book in applying information theory to statistical inference is Kullback (1959),but it’s not easy reading. [184]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!