11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3. SAMPLING TO SIMULATE PREDICTION 75spread of the posterior distribution for p, not from sampling variation. But it will interactwith the sampling variation, when we try to assess what the model tells us about outcomes.We’d like to propagate the parameter uncertainty—carry it forward—as we evaluate theimplied predictions. All that is required is averaging over the posterior density for p, whilecomputing the predictions. For each possible value of the parameter p, there is an implieddistribution of outcomes. So if you were to compute the sampling distribution of outcomes ateach value of p, then you could average all of these prediction distributions together, using theposterior probabilities of each value of p, to get a POSTERIOR PREDICTIVE DISTRIBUTION.FIGURE 3.6 illustrates this averaging. On the le, the posterior distribution is shown,with three unique parameter values highlighted by the vertical lines. e implied distributionof observations specific to each of these parameter values is shown in the middle columnof plots. Observations are never certain for any value of p, but they do shi around in responseto it. Finally, on the right-hand side, the sampling distributions for all values of p arecombined, using the posterior probabilities to compute the weighted average frequency ofeach possible observation, zero to nine water samples. e actually observed value, six watersamples, is shown by the thick line.e resulting distribution is for predictions, but it incorporates all of the uncertaintyembodied in the posterior distribution for the parameter p. As a result, it is honest. Whilethe model does a good job of predicting the data—the most likely observation is indeedthe observed data—predictions are still quite spread out. If instead you were to use only asingle parameter value to compute implied predictions, say the most probable value at thepeak of posterior distribution, you’d produce an overconfident distribution of predictions,narrower than the right-hand plot in FIGURE 3.6 and more like the distribution shown forp = 0.64 in the middle row of the middle column. e usual effect of this overconfidencewill be to lead you to believe that the model is more consistent with the data than it really is—the predictions will cluster around the observations more tightly. is illusion arises fromtossing away uncertainty about the parameters.So how do you actually do the calculations? To simulate predicted observations for asingle value of p, say p = 0.6, you can use rbinom to generate random binomial samples:w

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!