10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.366 29 — Monte Carlo Methodswhether to accept the new state, we compute the quantitya = P ∗ (x ′ ) Q(x (t) ; x ′ )P ∗ (x (t) ) Q(x ′ ; x (t) ) . (29.31)If a ≥ 1 then the new state is accepted.Otherwise, the new state is accepted with probability a.If the step is accepted, we set x (t+1) = x ′ .If the step is rejected, then we set x (t+1) = x (t) .Note the difference from rejection sampling: in rejection sampling, rejectedpoints are discarded <strong>and</strong> have no influence on the list of samples {x (r) } thatwe collected. Here, a rejection causes the current state to be written againonto the list.Notation. I have used the superscript r = 1, . . . , R to label points thatare independent samples from a distribution, <strong>and</strong> the superscript t = 1, . . . , Tto label the sequence of states in a Markov chain. It is important to note thata Metropolis–Hastings simulation of T iterations does not produce T independentsamples from the target distribution P . The samples are dependent.To compute the acceptance probability (29.31) we need to be able to computethe probability ratios P (x ′ )/P (x (t) ) <strong>and</strong> Q(x (t) ; x ′ )/Q(x ′ ; x (t) ). If theproposal density is a simple symmetrical density such as a Gaussian centred onthe current point, then the latter factor is unity, <strong>and</strong> the Metropolis–Hastingsmethod simply involves comparing the value of the target density at the twopoints. This special case is sometimes called the Metropolis method. However,with apologies to Hastings, I will call the general Metropolis–Hastingsalgorithm for asymmetric Q ‘the Metropolis method’ since I believe importantideas deserve short names.Convergence of the Metropolis method to the target densityIt can be shown that for any positive Q (that is, any Q such that Q(x ′ ; x) > 0for all x, x ′ ), as t → ∞, the probability distribution of x (t) tends to P (x) =P ∗ (x)/Z. [This statement should not be seen as implying that Q has to assignpositive probability to every point x ′ – we will discuss examples later whereQ(x ′ ; x) = 0 for some x, x ′ ; notice also that we have said nothing about howrapidly the convergence to P (x) takes place.]The Metropolis method is an example of a Markov chain Monte Carlomethod (abbreviated MCMC). In contrast to rejection sampling, where theaccepted points {x (r) } are independent samples from the desired distribution,Markov chain Monte Carlo methods involve a Markov process in which a sequenceof states {x (t) } is generated, each sample x (t) having a probabilitydistribution that depends on the previous value, x (t−1) . Since successive samplesare dependent, the Markov chain may have to be run for a considerabletime in order to generate samples that are effectively independent samplesfrom P .Just as it was difficult to estimate the variance of an importance samplingestimator, so it is difficult to assess whether a Markov chain Monte Carlomethod has ‘converged’, <strong>and</strong> to quantify how long one has to wait to obtainsamples that are effectively independent samples from P .Demonstration of the Metropolis methodThe Metropolis method is widely used for high-dimensional problems. Manyimplementations of the Metropolis method employ a proposal distribution

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!