11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.2. MARKOV CHAIN MONTE CARLO 249Now, the interesting thing about this kind of slice through the full posterior is that, evenwhen we cannot compute an analytical expression for the full density Pr(µ, σ|D), we oencan compute an analytical expression for Pr(σ|µ, D) (or for Pr(µ|σ, D)). By treating oneof the parameters as a constant, like the data D, it makes the math easier. In the case of aGaussian outcome, it is possible to write down analytical expressions for Pr(σ|µ, D) and forPr(µ|σ, D).Why might we want to do this? In order to generate intelligent proposals, so that theMarkov chain will be more efficient. ink of it this way. If the Markov chain is at µ = 150,then the target for σ is the righthand plot in FIGURE 8.3. So if we have a formula to give us theshape of the slice in FIGURE 8.3, then we can use that function to produce random proposalsteps that will automatically be smaller, when the chain is close to the center of the target, andautomatically be larger, when the chain is far from the center of the target. So if we sampleout proposal values for both σ and µ from Pr(σ|µ, D) and Pr(µ|σ, D), respectively, then theMarkov chain will converge much faster and we’ll need fewer samples overall to get a goodpicture of the full target, Pr(µ, σ|D), the joint posterior.e catch is that, in order to get an analytical expression for these slices, we’ll have tomake an explicit choice of the prior distribution for each parameter. Flat priors will no longerdo. Indeed, we must choose a prior distribution that is conjugate with the likelihood function.So called conjugate pairs are matching distributions for priors and likelihoods that producea posterior with the same family as the prior. For example, the likelihood function in thiscase is Gaussian. For σ, the standard deviation of the Gaussian, it turns out that we want touse an inverse-gamma distribution (you’ll meet gamma distributions in a later chapter). eanalogous conjugate prior for the mean µ is a normal distribution.All of this means we can rewrite our Markov chain to generate proposals from the analyticalexpressions for the slices through the joint posterior. Not only does this vastly acceleratehow quickly the Markov chain gets into the high-probability region of parameter values, butit also means that Gibbs sampling will always accept any proposal. No rejected proposalsmeans more efficient search of the target distribution. e King never stands still.Alright, so let’s compare the efficiency of the Gibbs algorithm above to the analogousMetropolis-Hastings algorithm that uses fixed proposal distributions. You have the Metropolis-Hastings code, in the previous section. e first thing to compare is how long each approachtakes to finish 100-thousand samples. On my 2.66 GHz desktop, the Gibbs chain took 5seconds, while the plain Metropolis-Hastings took about 17 seconds. at’s about 19956samples per second for the Gibbs chain, and 5974 for the Metropolis-Hastings chain. Now,these models are so simple that the improvement hardly matters. But at this relative rate ofimprovement, a Metropolis chain that takes an hour to finish can be made to finish in about18 minutes, once converted to Gibbs sampling.Not only will the Gibbs chain execute faster, but it’ll also explore the parameter spacebetter. We can compare how quickly the two approaches get into the high probability regionof the target distribution. In FIGURE 8.4, I display the first 500 samples form both chains, withGibbs sampling on the le and fixed-step Metropolis-Hastings on the right. Both chainsstarted at the same parameter values, but Gibbs sampling approached the target region inexactly two giant steps. Metropolis-Hastings, in contrast, took hundreds of steps to arrivein the same region. In the end, both algorithms produced the same approximate estimate ofthe joint posterior. But Gibbs did so much more efficiently.Gibbs sampling is the basis of the powerful and widely used soware BUGS (Bayesianinference Using Gibbs Sampling) and JAGS (Just Another Gibbs Sampler). BUGS and JAGS

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!