10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.438 34 — Independent Component Analysis <strong>and</strong> Latent Variable ModellingH, is:P ({x (n) , s (n) } N n=1 | G, H) =N∏n=1[]P (x (n) | s (n) , G, H)P (s (n) | H)(34.2)=⎡⎛N∏⎣⎝ ∏ jn=1(δ x (n)j− ∑ ) ⎞ ( ∏i G jis (n) ⎠iip i (s (n)i)) ⎤ ⎦ . (34.3)We assume that the vector x is generated without noise. This assumption isnot usually made in latent variable modelling, since noise-free data are rare;but it makes the inference problem far simpler to solve.The likelihood functionFor learning about G from the data D, the relevant quantity is the likelihoodfunctionP (D | G, H) = ∏ P (x (n) | G, H) (34.4)nwhich is a product of factors each of which is obtained by marginalizing overthe latent variables. When we marginalize over delta functions, rememberthat ∫ ds δ(x − vs)f(s) = 1 vf(x/v). We adopt summation convention at thispoint, such that, for example, G ji s (n)i≡ ∑ i G jis (n)i. A single factor in thelikelihood is given by∫P (x (n) | G, H) = d I s (n) P (x (n) | s (n) , G, H)P (s (n) | H) (34.5)=∫d I s ∏ () (n) δ x (n)j− G ji s (n) ∏ip i (s (n)i)j(34.6)=1|det G|∏⇒ ln P (x (n) | G, H) = − ln |det G| + ∑ iip i (G −1ij x j) (34.7)iln p i (G −1ij x j). (34.8)To obtain a maximum likelihood algorithm we find the gradient of the loglikelihood. If we introduce W ≡ G −1 , the log likelihood contributed by asingle example may be written:ln P (x (n) | G, H) = ln |det W| + ∑ iln p i (W ij x j ). (34.9)We’ll assume from now on that det W is positive, so that we can omit theabsolute value sign. We will need the following identities:Let us define a i ≡ W ij x j ,∂∂G jiln det G = G −1ij= W ij (34.10)∂∂G jiG −1lm = −G−1 lj G−1∂∂W ijf = −G jm( ∂∂G lmfim = −W ljW im (34.11))G li . (34.12)φ i (a i ) ≡ d ln p i (a i )/da i , (34.13)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!