10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.430 33 — Variational Methods(a) σ0.40.30.20.9 10.80.70.60.50 0.5 1 1.5 2µ. . .(b) σ0.40.30.2(d) σ0.40.30.20.9 10.80.70.60.50 0.5 1 1.5 2µ0.9 10.80.70.60.50 0.5 1 1.5 2(f) σ0.40.20.80.70.60.5µ0.9 1 0 0.5 1 1.5 2(c) σ0.40.30.20.9 10.80.70.60.50 0.5 1 1.5 2(e) σ0.40.30.2µ0.9 10.80.70.60.50 0.5 1 1.5 2µFigure 33.4. Optimization of anapproximating distribution. Theposterior distributionP (µ, σ | {x n }), which is the sameas that in figure 24.1, is shown bysolid contours. (a) Initialcondition. The approximatingdistribution Q(µ, σ) (dottedcontours) is an arbitrary separabledistribution. (b) Q µ has beenupdated, using equation (33.41).(c) Q σ has been updated, usingequation (33.44). (d) Q µ updatedagain. (e) Q σ updated again. (f)Converged approximation (after15 iterations). The arrows pointto the peaks of the twodistributions, which are atσ N = 0.45 (for P ) <strong>and</strong> σ N−1 = 0.5(for Q).0.3µOptimization of Q µ (µ)As a functional of Q µ (µ), ˜F is:∫ [∫]˜F = − dµ Q µ (µ) dσ Q σ (σ) ln P (D | µ, σ) + ln[P (µ)/Q µ (µ)] + κ (33.38)∫ [∫= dµ Q µ (µ) dσ Q σ (σ)Nβ 1 ]2 (µ − ¯x)2 + ln Q µ (µ) + κ ′ , (33.39)where β ≡ 1/σ 2 <strong>and</strong> κ denote constants that do not depend on Q µ (µ). Thedependence on Q σ thus collapses down to a simple dependence on the mean∫¯β ≡ dσ Q σ (σ)1/σ 2 . (33.40)Now we can recognize the function −N ¯β 1 2 (µ − ¯x)2 as the logarithm of aGaussian identical to the posterior distribution for a particular value of β = ¯β.Since a relative entropy ∫ Q ln(Q/P ) is minimized by setting Q = P , we canimmediately write down the distribution Q optµ (µ) that minimizes ˜F for fixedQ σ :where σµ|D 2 = 1/(N ¯β).Optimization of Q σ (σ)Q optµ (µ) = P (µ | D, ¯β, H) = Normal(µ; ¯x, σµ|D 2 ). (33.41)We represent Q σ (σ) using the density over β, Q σ (β) ≡ Q σ (σ) |dσ/dβ|. As afunctional of Q σ (β), ˜F is (neglecting additive constants):∫ [∫]˜F = − dβ Q σ (β) dµ Q µ (µ) ln P (D | µ, σ) + ln[P (β)/Q σ (β)] (33.42)∫ [= dβ Q σ (β) (Nσµ|D 2 + S)β/2 − ( ]N2 − 1) ln β + ln Q σ (β) , (33.43)The prior P (σ) ∝ 1/σ transformsto P (β) ∝ 1/β.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!