10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.33.10: Solutions 435tive function for this approximation isG(Q X , Q Y ) = ∑ x,yP (x, y) log 2P (x, y)Q X (x)Q Y (y)that the minimal value of G is achieved when Q X <strong>and</strong> Q Ythe marginal distributions over x <strong>and</strong> y.Now consider the alternative objective functionare equal toF (Q X , Q Y ) = ∑ x,yQ X (x)Q Y (y)Q X (x)Q Y (y) log 2 ;P (x, y)the probability distribution P (x, y) shown in the margin is to be approximatedby a separable distribution Q(x, y) = Q X (x)Q Y (y). Statethe value of F (Q X , Q Y ) if Q X <strong>and</strong> Q Y are set to the marginal distributionsover x <strong>and</strong> y.Show that F (Q X , Q Y ) has three distinct minima, identify those minima,<strong>and</strong> evaluate F at each of them.P (x, y)x1 2 3 41 1/ 8 1/ 8 0 0y 2 1/ 8 1/ 8 0 03 0 0 1/ 4 04 0 0 0 1/ 433.10 SolutionsSolution to exercise 33.5 (p.434). We need to know the relative entropy betweentwo one-dimensional Gaussian distributions:∫dx Normal(x; 0, σ Q ) ln Normal(x; 0, σ Q)Normal(x; 0, σ P )∫[( )]= dx Normal(x; 0, σ Q ) ln σ P− 1 1σ Q 2 x2 σQ2 − 1σP2 (33.55)()= 1 2ln σ2 PσQ2 − 1 + σ2 QσP2. (33.56)So, if we approximate P , whose variances are σ1 2 <strong>and</strong> σ2 2 , by Q, whose variancesare both σQ 2 , we find()F (σ 2 Q ) = 1 2ln σ2 1σQ2 − 1 + σ2 Qσ12 + ln σ2 2σQ2 − 1 + σ2 Qσ22; (33.57)differentiating,[ ( )]dd ln(σQ 2 )F = 1 σ2Q−2 +2 σ12 + σ2 Qσ22 , (33.58)which is zero when1σ 2 Q= 1 ( 12 σ12 + 1 )σ22 . (33.59)Thus we set the approximating distribution’s inverse variance to the meaninverse variance of the target distribution P .In the case σ 1 = 10 <strong>and</strong> σ 2 = 1, we obtain σ Q ≃ √ 2, which is just a factorof √ 2 larger than σ 2 , pretty much independent of the value of the largerst<strong>and</strong>ard deviation σ 1 . Variational free energy minimization typically leads toapproximating distributions whose length scales match the shortest length scaleof the target distribution. The approximating distribution might be viewed astoo compact.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!