11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.3. COMPONENTS OF THE MODEL 45e mathematical procedure that defines the logic of the posterior density is BAYES’THEOREM. is is the theorem that gives Bayesian data analysis its name. But the theoremitself is a trivial implication of probability theory. Here’s a quick derivation of it, in thecontext of the globe tossing example. To grasp this derivation, remember that we are usingprobabilities to represent plausibility. But all the rules of ordinary probability theory applyand ensure that the logic of plausible reasoning is correct. For this example, I’m going toomit n, the sample size. Nothing important is being hidden here. It’ll just be easier to followthe derivation without having to carry the inert n around in the mathematics.e joint probability of the data w and any particular value of p is:Pr(w, p) = Pr(w|p) Pr(p).is just says that the probability of w and p is the product of likelihood Pr(w|p) and theprior probability Pr(p). is is like saying that the probability of rain and cold on the sameday is equal to the probability of rain, when it’s cold, times the probability that it’s cold. ismuch is just definition. But it’s just as true that:Pr(w, p) = Pr(p|w) Pr(w).All I’ve done is reverse which probability is conditional, on the righthand side. It is still atrue definition. It’s like saying that the probability of rain and cold on the same day is equalto the probability that it’s cold, when it’s raining, times the probability of rain. Compare thisstatement to the one in the previous paragraph.Now since both righthand sides are equal to the same thing, we can set them equal toone another and solve for the posterior probability, Pr(p|w):Pr(p|w) =Pr(w|p) Pr(p).Pr(w)And this is Bayes’ theorem. It says that the probability of any particular value of p, consideringthe data, is equal to the product of the likelihood and prior, divided by this thing Pr(w),which I’ll call the average likelihood. In word form:Posterior =Likelihood × PriorAverage Likelihood .e average likelihood, Pr(w), can be confusing. It is commonly called the “evidence” orthe “probability of the data,” neither of which is a transparent name. e probability Pr(w)is merely the average likelihood of the data. Averaged over what? Averaged over the prior.It’s job is to standardize the posterior, to ensure it sums (integrates) to one. In mathematicalform:Pr(w) = E ( Pr(w|p) ) ∫= Pr(w|p) Pr(p)dp.e operator E means to take an expectation, which you can think of as an average. Suchaverages are commonly called marginals in mathematical statistics, and so you may also seethis same probability called a marginal likelihood. And the integral above just defines theproper way to compute the average over a continuous distribution of values, like the infinitepossible values of p.e key lesson for now is that the posterior is proportional to the product of the prior andthe likelihood. FIGURE 2.6 illustrates the multiplicative interaction of a prior and a likelihood.On each row, a prior on the le is multiplied by the likelihood in the middle to produce a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!