10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.34.2: The generative model for independent component analysis 439Repeat for each datapoint x:1. Put x through a linear mapping:a = Wx.Algorithm 34.2. Independentcomponent analysis – onlinesteepest ascents version.See also algorithm 34.4, which isto be preferred.2. Put a through a nonlinear map:z i = φ i (a i ),where a popular choice for φ is φ = − tanh(a i ).3. Adjust the weights in accordance with∆W ∝ [W T ] −1 + zx T .<strong>and</strong> z i = φ i (a i ), which indicates in which direction a i needs to change to makethe probability of the data greater. We may then obtain the gradient withrespect to G ji using equations (34.10) <strong>and</strong> (34.11):∂∂G jiln P (x (n) | G, H) = −W ij − a i z i ′W i ′ j. (34.14)Or alternatively, the derivative with respect to W ij :∂∂W ijln P (x (n) | G, H) = G ji + x j z i . (34.15)If we choose to change W so as to ascend this gradient, we obtain the learningrule∆W ∝ [W T ] −1 + zx T . (34.16)The algorithm so far is summarized in algorithm 34.2.Choices of φThe choice of the function φ defines the assumed prior distribution of thelatent variable s.Let’s first consider the linear choice φ i (a i ) = −κa i , which implicitly (viaequation 34.13) assumes a Gaussian distribution on the latent variables. TheGaussian distribution on the latent variables is invariant under rotation of thelatent variables, so there can be no evidence favouring any particular alignmentof the latent variable space. The linear algorithm is thus uninteresting in thatit will never recover the matrix G or the original sources. Our only hope isthus that the sources are non-Gaussian. Thankfully, most real sources havenon-Gaussian distributions; often they have heavier tails than Gaussians.We thus move on to the popular tanh nonlinearity. Ifthen implicitly we are assumingφ i (a i ) = − tanh(a i ) (34.17)p i (s i ) ∝ 1/ cosh(s i ) ∝1e s i + e −s i. (34.18)This is a heavier-tailed distribution for the latent variables than the Gaussi<strong>and</strong>istribution.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!