01.08.2013 Views

Information Theory, Inference, and Learning ... - MAELabs UCSD

Information Theory, Inference, and Learning ... - MAELabs UCSD

Information Theory, Inference, and Learning ... - MAELabs UCSD

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981<br />

You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.<br />

396 30 — Efficient Monte Carlo Methods<br />

I’ll use R to denote the number of vectors in the population. We aim to<br />

have P ∗ ({x (r) } R 1 ) = P ∗ (x (r) ). A genetic algorithm involves moves of two or<br />

three types.<br />

First, individual moves in which one state vector is perturbed, x (r) → x (r)′ ,<br />

which could be performed using any of the Monte Carlo methods we have<br />

mentioned so far.<br />

Second, we allow crossover moves of the form x, y → x ′ , y ′ ; in a typical<br />

crossover move, the progeny x ′ receives half his state vector from one parent,<br />

x, <strong>and</strong> half from the other, y; the secret of success in a genetic algorithm is<br />

that the parameter x must be encoded in such a way that the crossover of<br />

two independent states x <strong>and</strong> y, both of which have good fitness P ∗ , should<br />

have a reasonably good chance of producing progeny who are equally fit. This<br />

constraint is a hard one to satisfy in many problems, which is why genetic<br />

algorithms are mainly talked about <strong>and</strong> hyped up, <strong>and</strong> rarely used by serious<br />

experts. Having introduced a crossover move x, y → x ′ , y ′ , we need to choose<br />

an acceptance rule. One easy way to obtain a valid algorithm is to accept or<br />

reject the crossover proposal using the Metropolis rule with P ∗ ({x (r) } R 1<br />

the target density – this involves comparing the fitnesses before <strong>and</strong> after the<br />

crossover using the ratio<br />

) as<br />

P ∗ (x ′ )P ∗ (y ′ )<br />

P ∗ (x)P ∗ . (30.15)<br />

(y)<br />

If the crossover operator is reversible then we have an easy proof that this<br />

procedure satisfies detailed balance <strong>and</strong> so is a valid component in a chain<br />

converging to P ∗ ({x (r) } R 1 ).<br />

⊲ Exercise 30.9. [3 ] Discuss whether the above two operators, individual variation<br />

<strong>and</strong> crossover with the Metropolis acceptance rule, will give a more<br />

efficient Monte Carlo method than a st<strong>and</strong>ard method with only one<br />

state vector <strong>and</strong> no crossover.<br />

The reason why the sexual community could acquire information faster than<br />

the asexual community in Chapter 19 was because the crossover operation<br />

produced diversity with st<strong>and</strong>ard deviation √ G, then the Blind Watchmaker<br />

was able to convey lots of information about the fitness function by killing<br />

off the less fit offspring. The above two operators do not offer a speed-up of<br />

√ G compared with st<strong>and</strong>ard Monte Carlo methods because there is no killing.<br />

What’s required, in order to obtain a speed-up, is two things: multiplication<br />

<strong>and</strong> death; <strong>and</strong> at least one of these must operate selectively. Either we must<br />

kill off the less-fit state vectors, or we must allow the more-fit state vectors to<br />

give rise to more offspring. While it’s easy to sketch these ideas, it is hard to<br />

define a valid method for doing it.<br />

Exercise 30.10. [5 ] Design a birth rule <strong>and</strong> a death rule such that the chain<br />

converges to P ∗ ({x (r) } R 1 ).<br />

I believe this is still an open research problem.<br />

Particle filters<br />

Particle filters, which are particularly popular in inference problems involving<br />

temporal tracking, are multistate methods that mix the ideas of importance<br />

sampling <strong>and</strong> Markov chain Monte Carlo. See Isard <strong>and</strong> Blake (1996), Isard<br />

<strong>and</strong> Blake (1998), Berzuini et al. (1997), Berzuini <strong>and</strong> Gilks (2001), Doucet<br />

et al. (2001).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!