10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.43.2: Boltzmann machine with hidden units 525⊲ Exercise 43.2. [2 ] Derive the gradient of the log likelihood with respect to v ijk .It is possible that the spines found on biological neurons are responsible fordetecting correlations between small numbers of incoming signals. However,to capture statistics of high enough order to describe the ensemble of imagesof chairs well would require an unimaginable number of terms. To capturemerely the fourth-order statistics in a 128 × 128 pixel image, we need morethan 10 7 parameters.So measuring moments of images is not a good way to describe their underlyingstructure. Perhaps what we need instead or in addition are hiddenvariables, also known to statisticians as latent variables. This is the importantinnovation introduced by Hinton <strong>and</strong> Sejnowski (1986). The idea is that thehigh-order correlations among the visible variables are described by includingextra hidden variables <strong>and</strong> sticking to a model that has only second-orderinteractions between its variables; the hidden variables induce higher-ordercorrelations between the visible variables.43.2 Boltzmann machine with hidden unitsWe now add hidden neurons to our stochastic model. These are neurons thatdo not correspond to observed variables; they are free to play any role in theprobabilistic model defined by equation (43.4). They might actually take oninterpretable roles, effectively performing ‘feature extraction’.<strong>Learning</strong> in Boltzmann machines with hidden unitsThe activity rule of a Boltzmann machine with hidden units is identical to thatof the original Boltzmann machine. The learning rule can again be derivedby maximum likelihood, but now we need to take into account the fact thatthe states of the hidden units are unknown. We will denote the states of thevisible units by x, the states of the hidden units by h, <strong>and</strong> the generic stateof a neuron (either visible or hidden) by y i , with y ≡ (x, h). The state of thenetwork when the visible neurons are clamped in state x (n) is y (n) ≡ (x (n) , h).The likelihood of W given a single data example x (n) isP (x (n) | W) = ∑ P (x (n) , h | W) = ∑ [ ]1 1Z(W) exp 2 [y(n) ] T Wy (n) ,hh(43.14)whereZ(W) = ∑ [ ] 1exp2 yT Wy . (43.15)x,hEquation (43.14) may also be writtenP (x (n) | W) = Z x (n)(W)Z(W)(43.16)whereZ x (n)(W) = ∑ [ ]1exp2 [y(n) ] T Wy (n) . (43.17)hDifferentiating the likelihood as before, we find that the derivative with respectto any weight w ij is again the difference between a ‘waking’ term <strong>and</strong> a‘sleeping’ term,∂∂w ijln P ({x (n) } N 1 | W) = ∑ n{〈y i y j 〉 P (h | x (n) ,W) − 〈y iy j 〉 P (x,h | W)}.(43.18)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!