11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

384 ENDNOTES45. Fisher (1925), in Chapter III within section 12 on the normal distribution. ere a couple of other places inthe book in which the same resort to convenience or convention is used. Fisher seems to indicate that the 5%mark was already widely practiced by 1925 and already without clear justification. [67]46. Fisher (1956). [67]47. See Henrion and Fischoff (1986) for examples from the estimation of physical constants, such as the speedof light. [68]48. Robert (2007) provides concise proofs of optimal estimators under several standard loss functions, like thisone. It also covers the history of the topic, as well as many related issues in deriving good decisions from statisticalprocedures. [70]49. Rice (2010) presents a interesting construction of classical Fisherian testing through the adoption of lossfunctions. [71]50. See Hauer (2004) for three tales from transportation safety in which testing resulted in premature incorrectdecisions and a demonstrable and continuing loss of human life. [71]51. It is poorly appreciated that coin tosses are very hard to bias, as long as you catch them in the air. Once theyland and bounce and spin, however, it is very easy to bias them. [77]52. Jaynes (1985), page 351. [78]Chapter 453. Leo Breiman, at the start of Chapter 9 of his classic book on probability theory (Breiman, 1968), says “thereis really no completely satisfying answer” to the question “why normal?” Many mathematical results remainmysterious, even aer we prove them. So if you don’t quite get why the normal distribution is the limiting distribution,you are in good company. [86]54. For the reader hungry for mathematical details, see Frank (2009) for a nicely illustrated explanation of this,using Fourier transforms. [86]55. Technically, the distribution of sums converges to normal only when the original distribution has finite variance.What this means practically is that the magnitude of any newly sampled value cannot be so big as tooverwhelm all of the previous values. ere are natural phenomena with effectively infinite variance, but wewon’t be working with any. Or rather, when we do, I won’t comment on it. [86]56. Howell 2010 and Howell 2000. See also Lee and DeVore 1976. Much more raw data is available for downloadfrom https://tspace.library.utoronto.ca/handle/1807/10395. [91]57. Jaynes (2003), page 21–22. See that book’s index for other mentions in various statistical arguments. [93]58. e strategy is the same grid approximation strategy as before (page 48). But now there are two dimensions,and so there is a geometric (literally) increase in bother. e algorithm is mercifully short, however, if not transparent.ink of the code as being six distinct commands. e first two lines of code just establish the range ofµ and σ values, respectively, to calculate over, as well as how many points to calculate in between. e third lineof code expands those chosen µ and σ values into a matrix of all of the combinations of µ and σ. is matrixis stored in a data frame, post. In the monstrous fourth line of code, shown in expanded form to make it easierto read, the log-likelihood at each combination of µ and σ is computed. is line looks so awful, because wehave to be careful here to do everything on the log scale. Otherwise rounding error will quickly make all of theposterior probabilities zero. So what sapply does is pass the unique combination of µ and σ on each row ofpost to a function that compute the log-likelihood of each observed height, and adds all of these log-likelihoodstogether (sum). In the fih line, we multiply the prior by the likelihood to get the product that is proportionalto the posterior density. e priors are also on the log scale, and so we add them to the log-likelihood, which isequivalent to multiplying the raw densities by the likelihood. Finally, the obstacle for getting back on the probabilityscale is that rounding error is always a threat when moving from log-probability to probability. If you usethe obvious approach, like exp( post$prod ), you’ll get a vector full of zeros, which isn’t very helpful. isis a result of R’s rounding very small probabilities to zero. Remember, in large samples, all unique samples areunlikely. is is why you have to work with log-probability. e code in the box dodges this problem by scaling

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!