10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.524 43 — Boltzmann MachinesWhile the network is ‘asleep’, it ‘dreams’ about the world using the generativemodel (43.4), <strong>and</strong> measures the correlations between x i <strong>and</strong> x j in the modelworld; these correlations determine a proportional decrease in the weights. Ifthe second-order correlations in the dream world match the correlations in thereal world, then the two terms balance <strong>and</strong> the weights do not change.Criticism of Hopfield networks <strong>and</strong> simple Boltzmann machinesUp to this point we have discussed Hopfield networks <strong>and</strong> Boltzmann machinesin which all of the neurons correspond to visible variables x i . The resultis a probabilistic model that, when optimized, can capture the second-orderstatistics of the environment. [The second-order statistics of an ensembleP (x) are the expected values 〈x i x j 〉 of all the pairwise products x i x j .] Thereal world, however, often has higher-order correlations that must be includedif our description of it is to be effective. Often the second-order correlationsin themselves may carry little or no useful information.Consider, for example, the ensemble of binary images of chairs. We canimagine images of chairs with various designs – four-legged chairs, comfychairs, chairs with five legs <strong>and</strong> wheels, wooden chairs, cushioned chairs, chairswith rockers instead of legs. A child can easily learn to distinguish these imagesfrom images of carrots <strong>and</strong> parrots. But I expect the second-order statistics ofthe raw data are useless for describing the ensemble. Second-order statisticsonly capture whether two pixels are likely to be in the same state as eachother. Higher-order concepts are needed to make a good generative model ofimages of chairs.A simpler ensemble of images in which high-order statistics are importantis the ‘shifter ensemble’, which comes in two flavours. Figure 43.1a shows afew samples from the ‘plain shifter ensemble’. In each image, the bottom eightpixels are a copy of the top eight pixels, either shifted one pixel to the left,or unshifted, or shifted one pixel to the right. (The top eight pixels are setat r<strong>and</strong>om.) This ensemble is a simple model of the visual signals from thetwo eyes arriving at early levels of the brain. The signals from the two eyesare similar to each other but may differ by small translations because of thevarying depth of the visual world. This ensemble is simple to describe, but itssecond-order statistics convey no useful information. The correlation betweenone pixel <strong>and</strong> any of the three pixels above it is 1/3. The correlation betweenany other two pixels is zero.Figure 43.1b shows a few samples from the ‘labelled shifter ensemble’.Here, the problem has been made easier by including an extra three neuronsthat label the visual image as being an instance of either the ‘shift left’,‘no shift’, or ‘shift right’ sub-ensemble. But with this extra information, theensemble is still not learnable using second-order statistics alone. The secondordercorrelation between any label neuron <strong>and</strong> any image neuron is zero. Weneed models that can capture higher-order statistics of an environment.So, how can we develop such models? One idea might be to create modelsthat directly capture higher-order correlations, such as:⎛⎞P ′ (x | W, V, . . .) = 1 Z ′ exp ⎝ 1 ∑w ij x i x j + 1 ∑v ijk x i x j x k + · · · ⎠ .26ij(43.13)Such higher-order Boltzmann machines are equally easy to simulate usingstochastic updates, <strong>and</strong> the learning rule for the higher-order parameters v ijkis equivalent to the learning rule for w ij .ij(a)(b)Figure 43.1. The ‘shifter’ensembles. (a) Four samples fromthe plain shifter ensemble. (b)Four corresponding samples fromthe labelled shifter ensemble.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!