10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.33Variational MethodsVariational methods are an important technique for the approximation of complicatedprobability distributions, having applications in statistical physics,data modelling <strong>and</strong> neural networks.33.1 Variational free energy minimizationOne method for approximating a complex distribution in a physical system ismean field theory. Mean field theory is a special case of a general variationalfree energy approach of Feynman <strong>and</strong> Bogoliubov which we will now study.The key piece of mathematics needed to underst<strong>and</strong> this method is Gibbs’inequality, which we repeat here.The relative entropy between two probability distributions Q(x) <strong>and</strong> P (x)that are defined over the same alphabet A X isGibbs’ inequality first appeared inequation (1.24); see alsoexercise 2.26 (p.37).D KL (Q||P ) = ∑ xQ(x) log Q(x)P (x) . (33.1)The relative entropy satisfies D KL (Q||P ) ≥ 0 (Gibbs’ inequality) withequality only if Q = P . In general D KL (Q||P ) ≠ D KL (P ||Q).In this chapter we will replace the log by ln, <strong>and</strong> measure the divergencein nats.Probability distributions in statistical physicsIn statistical physics one often encounters probability distributions of the formP (x | β, J) =1exp[−βE(x; J)] , (33.2)Z(β, J)where for example the state vector is x ∈ {−1, +1} N , <strong>and</strong> E(x; J) is someenergy function such asE(x; J) = − 1 ∑J mn x m x n − ∑ h n x n . (33.3)2m,n nThe partition function (normalizing constant) isZ(β, J) ≡ ∑ xexp[−βE(x; J)] . (33.4)The probability distribution of equation (33.2) is complex. Not unbearablycomplex – we can, after all, evaluate E(x; J) for any particular x in a time422

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!