11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

94 4. LINEAR MODELSe point isn’t to say epistemology trumps reality, but rather that in ignorance of such correlationsthat the distribution most consistent with our state of information is i.i.d. Such an assumptioncan be rejected, by checking the model. Furthermore, there is a mathematical result known as deFinetti’s theorem that tells us that values which are EXCHANGEABLE can be approximated by mixturesof i.i.d. distributions. Colloquially, exchangeable values can be reordered. e practical impact of thisis that “i.i.d.” as an assumption cannot be read too literally, as different processes models again correspondto the same statistical model (as argued in Chapter 1). Even furthermore, there are many typesof correlation that do little or nothing to the overall shape of a distribution, but only affect the precisesequence in which values appear. For example, pairs of sisters have highly correlated heights. But theoverall distribution of female height remains almost perfectly normal. In such cases, i.i.d. remainsperfectly useful, despite ignoring the correlations. Consider for example that Markov chain MonteCarlo (Chapter 8) can use highly-correlated sequential samples to estimate most any iid distributionwe like.To complete the model, we’re going to need some priors. e parameters to be estimatedare both µ and σ, so we need a prior Pr(µ, σ), the joint prior probability for all parameters.In most cases, priors are specified independently for each parameter, which amounts to assumingPr(µ, σ) = Pr(µ) Pr(σ). en we can write:h i ∼ Normal(µ, σ)[likelihood]µ ∼ Normal(156, 10) [µ prior]σ ∼ Uniform(0, 50)[σ prior]e labels on the right are not part of the model, but instead just notes to help you keep trackof the purpose of each line. e prior for µ is a broad Gaussian prior, centered on 156cm,with 95% of probability between 156 ± 20.It’s a very good idea to plot your priors, so you have a sense of the assumption they buildinto the model. In this case:R code4.11curve( dnorm( x , 156 , 10 ) , from=100 , to=200 )Execute that code yourself, to see that the golem is assuming that the average height (not eachindividual height) is almost certainly between 120cm and 190cm. So this prior carries a littleinformation, but not a lot. e σ prior is a truly flat prior, a uniform one, that functions justto constrain σ to have positive probability between zero and 50cm. View it with:R code4.12curve( dunif( x , 0 , 50 ) , from=-10 , to=60 )A standard deviation like σ must be positive, so bounding it at zero makes sense. How shouldwe pick the upper bound? In this case, a standard deviation of 50cm would imply that 95%of individual heights lie within 100cm of the average height. at’s a very large range.All this talk is nice, but it’ll help to really see what these priors imply about the distributionof individual heights. You didn’t specify a prior probability distribution of heightsdirectly, but once you’ve chosen priors for µ and σ, these imply a prior distribution of individualheights. You can quickly simulate heights by sampling from the prior, like you sampledfrom the posterior back in Chapter 3. Remember, every posterior is also potentially aprior for a subsequent analysis, so you can process priors just like posteriors.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!