10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.30.6: Multi-state methods 395converge to the target density as a function of m. Find the optimal m<strong>and</strong> deduce how long the Metropolis method must run for.Compare the result with the results for an evolving population undernatural selection found in Chapter 19.The insight that the fastest progress that a st<strong>and</strong>ard Metropolis methodcan make, in information terms, is about one bit per iteration, gives a strongmotivation for speeding up the algorithm. This chapter has already reviewedseveral methods for reducing r<strong>and</strong>om-walk behaviour. Do these methods alsospeed up the rate at which information is acquired?Exercise 30.6. [4 ] Does Gibbs sampling, which is a smart Metropolis methodwhose proposal distributions do depend on P (x), allow information aboutP (x) to leak out at a rate faster than one bit per iteration? Find toyexamples in which this question can be precisely investigated.Exercise 30.7. [4 ] Hamiltonian Monte Carlo is another smart Metropolis methodin which the proposal distributions depend on P (x). Can HamiltonianMonte Carlo extract information about P (x) at a rate faster than onebit per iteration?Exercise 30.8. [5 ] In importance sampling, the weight w r = P ∗ (x (r) )/Q ∗ (x (r) ),a floating-point number, is computed <strong>and</strong> retained until the end of thecomputation. In contrast, in the dumb Metropolis method, the ratioa = P ∗ (x ′ )/P ∗ (x) is reduced to a single bit (‘is a bigger than or smallerthan the r<strong>and</strong>om number u?’). Thus in principle importance samplingpreserves more information about P ∗ than does dumb Metropolis. Canyou find a toy example in which this extra information does indeed leadto faster convergence of importance sampling than Metropolis? Canyou design a Markov chain Monte Carlo algorithm that moves aroundadaptively, like a Metropolis method, <strong>and</strong> that retains more useful informationabout the value of P ∗ , like importance sampling?In Chapter 19 we noticed that an evolving population of N individuals canmake faster evolutionary progress if the individuals engage in sexual reproduction.This observation motivates looking at Monte Carlo algorithms in whichmultiple parameter vectors x are evolved <strong>and</strong> interact.30.6 Multi-state methodsIn a multi-state method, multiple parameter vectors x are maintained; theyevolve individually under moves such as Metropolis <strong>and</strong> Gibbs; there are alsointeractions among the vectors. The intention is either that eventually all thevectors x should be samples from P (x) (as illustrated by Skilling’s leapfrogmethod), or that information associated with the final vectors x should allowus to approximate expectations under P (x), as in importance sampling.Genetic methodsGenetic algorithms are not often described by their proponents as Monte Carloalgorithms, but I think this is the correct categorization, <strong>and</strong> an ideal geneticalgorithm would be one that can be proved to be a valid Monte Carlo algorithmthat converges to a specified density.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!