08.02.2013 Views

Bernal S D_2010.pdf - University of Plymouth

Bernal S D_2010.pdf - University of Plymouth

Bernal S D_2010.pdf - University of Plymouth

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.4. EXISTING MODELS<br />

by the higher levels to infer additional characteristics <strong>of</strong> the image, such as the precise identity<br />

<strong>of</strong> the face.<br />

Lewicki and Sejnowski (1997) demonstrate the efficiency <strong>of</strong> Gibbs sampling in learning higher<br />

level parameters in Bayesian networks. A simple 3-ievcl network <strong>of</strong> sl(x;hasiic binary variables<br />

with a 5x5 pixel input image is used lo discover higher level motion patterns from the inpiil<br />

image correlations (the Shifter problem). Importantly, feedback from the third layer, containing<br />

the global direction <strong>of</strong> motion, is used lo disambiguate the local shift direction in layer two. The<br />

combination <strong>of</strong> information from multiple parents/causes was approximated using the Noisy-<br />

C)R gate, previously described in Section 3.3.4.<br />

I linton et al. (2006) proposed a new type <strong>of</strong> network called a deep belief nei which is composed<br />

<strong>of</strong> a Bayesian network (directed acyclic graph) with two undirected associative memory layers<br />

al the lop. The motivation for this model is lo ease the inlraciable unsupervised leaminj; pro­<br />

cess in hierarchical Bayesian networks, where, in order lo learn the weights <strong>of</strong> the bonom layer<br />

il is necessary to calculate the posterior probability which depends not only on the likelihood<br />

(bottom-up data) but also on the prior (top-down data). In other words, as a result <strong>of</strong> the ex­<br />

plaining away effect, the weights <strong>of</strong> all the higher layers are required. Turther. it is necessary to<br />

sum over all possible configurations <strong>of</strong> the higher variables in order to obtain the bottom layer<br />

prior.<br />

The authors introduce the concept o( complementary priurs, which are prior distributions that,<br />

when multiplied by the corresponding likelihood function, yield a posterior distribution which<br />

can be factorized. This implies eliminating the expiaining-away effect, thus making each hid­<br />

den layer independent <strong>of</strong> its parents' weights. This yields a network which is equivalent to<br />

a Restricted Boll/mann Machine, i.e. a network with an independent hidden layer <strong>of</strong> binary<br />

variables with undirected symmetric connections to a layer <strong>of</strong> observed nodes. Under these<br />

conditions a fast learning algorithm is derived which obtains the approximate parameters <strong>of</strong> the<br />

network layer by layer. First, a visible layer (input image) is used to train the bottom hidden<br />

layer <strong>of</strong> the network. After learning the weights <strong>of</strong> the hidden layer, the activations <strong>of</strong> that<br />

layer, given the input image, are used as the input data for the hidden layer above, thus always<br />

128

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!