08.02.2013 Views

Bernal S D_2010.pdf - University of Plymouth

Bernal S D_2010.pdf - University of Plymouth

Bernal S D_2010.pdf - University of Plymouth

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4. EXISTING MODELS<br />

maintaining the 2-layer structure characteristic <strong>of</strong> Restricted Boltzmann Machines.<br />

The above learning method can be seen as a variational approximation wherein the constraint<br />

is thai the weights in the higher levels ensure the complementary priors condition, therefore<br />

yielding a factorial posterior distribution. However, as weights in higher-levels are learned, the<br />

priors for lower layers cease to be complementary, so ihe weights used during inference are<br />

incorrect. Nonetheless, it can be shown that each time the weights <strong>of</strong> a layer are adapted, the<br />

variational lower bound on ihe log probability <strong>of</strong> the training data is improved, consequently<br />

improving the overall generative model. The weights <strong>of</strong> ihe model are then linely tuned in a<br />

final stage by performing an up and down pass <strong>of</strong> a variant <strong>of</strong> the wake-sleep algorithm (Hinton<br />

et al. 1995). Although Ihe learning is unsupervised in the directed layers, the lop two associative<br />

layers can be used to learn labeled data.<br />

Inference is achieved by a single up pass along the bottom directed layers, yielding the binary<br />

states <strong>of</strong> Ihe units in the lower associative memory layer. Further Gibbs sampling or free-<br />

energy optimization activates the correct label unit at the top layer. The pertbmiance <strong>of</strong> the<br />

model on the MNIST digit recognition task was superior to that <strong>of</strong> previous models, including<br />

Support Vector Machines and back-propagation. This demonstrates that generative models can<br />

learn many more parameters than discriminative models without overfilting. The model is still<br />

limited in that top-down feedback during inference is restricted to the top associative layers.<br />

Additionally, it does noi deal systematically with perceptual invariances. Instead, invariance<br />

arises as a consequence <strong>of</strong> the wide range <strong>of</strong> sample images that can be generated by the model<br />

for each given category,<br />

3.4.2.3 Models based on variatiimul approximation methods<br />

The free-energy model proposed by rri>ilon (Frislon 200;i. 2('H),'i, Friston et al. 2(M)6, Frislon and<br />

Stephan 2007. Friston and Kiebel 2()()9, Frislon 2010) has already been described in some detail<br />

in Section 3,l.,l. It is based on a variational approximation and therefore converts the complex<br />

inference problem into an optimization task which tries to minimize the free-energy between<br />

the true posterior distribution and and the recognition distribution. By assuming a Gaussian ap­<br />

proximation (l-aplace assumption) to the recognition distribution, optimization becomes equiv-<br />

129

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!