27.10.2013 Views

Helmholtz Machines and Wake-Sleep Learning - Gatsby ...

Helmholtz Machines and Wake-Sleep Learning - Gatsby ...

Helmholtz Machines and Wake-Sleep Learning - Gatsby ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

It is straightforward to calculate all the terms on the right h<strong>and</strong> side except<br />

for the denominator È ℄, which involves a sum over all the possible<br />

states of Ü <strong>and</strong> Ý (a set which grows exponentially large as the number of<br />

elements in Ü <strong>and</strong> Ý grows). Thus, an approximation to ÈÜ Ý ℄ is usually<br />

required. The stochastic version of the <strong>Helmholtz</strong> machine (the only<br />

version we discuss here) uses for approximate recognition a bottom-up,<br />

belief network (see figure 1) over exactly the same units giving a probability<br />

distribution ÉÜ Ý Ê℄ ÉÝ Ê℄ÉÜÝ Ê℄ using a separate set of<br />

parameters, the bottom-up biases <strong>and</strong> weights Ê Ö Ü Ö Ý Ê Ý Ê ÝÜ . A<br />

critical approximation is that the recognition model is assumed to be factorial<br />

in the bottom-up direction, eg Ý is independent of Ý given . Over<br />

the course of learning, it is intended that ÉÜ Ý Ê℄ should come to be as<br />

close to ÈÜ Ý ℄ as possible. Just as it is simple to generate a sample,<br />

ie a fantasy, top-down from the generative model; it is easy to generate<br />

a sample, ie to recognize the input in terms of its generators, bottom-up<br />

from the recognition model.<br />

4 <strong>Wake</strong>-<strong>Sleep</strong> <strong>Learning</strong><br />

As for many unsupervised learning methods, the underlying goal<br />

of wake-sleep learning is to perform maximum likelihood density<br />

estimation by maximising the log probability of the observed data<br />

under the generative model, that is <br />

È Ø ÐÓ È Ø ℄. One key idea, due to Neal <strong>and</strong> Hinton (1998); Zemel<br />

(1994) is that<br />

<br />

ÐÓ È ℄ ÈÜ Ý ℄ÐÓÈÜ Ý ℄ À ÈÜ Ý ℄℄ (7)<br />

ÜÝ<br />

<br />

ÉÜ Ý Ê℄ÐÓÈÜ Ý ℄ À ÉÜ Ý Ê℄℄ (8)<br />

ÜÝ<br />

ÐÓÈ ℄ KL ÉÜ Ý Ê℄ ÈÜ Ý ℄℄ (9)<br />

Ê ℄ (10)<br />

where, À℄ È ℄ÐÓ℄ is the entropy of probability distribution<br />

, KL ℄ È ℄ÐÓ℄℄ is the Kullback-Liebler (KL)<br />

divergence between two distributions <strong>and</strong> , <strong>and</strong>, in inequality 8,<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!