10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.41.5: Implementing inference with Gaussian approximations 501along a dynamical trajectory in w, p space, where p are the extra ‘momentum’variables of the Langevin <strong>and</strong> Hamiltonian Monte Carlo methods. The numberof steps ‘Tau’ was set at r<strong>and</strong>om to a number between 100 <strong>and</strong> 200 for eachtrajectory. The step size ɛ was kept fixed so as to retain comparability withthe simulations that have gone before; it is recommended that one r<strong>and</strong>omizethe step size in practical applications, however.Figure 41.9 compares the sampling properties of the Langevin <strong>and</strong> HamiltonianMonte Carlo methods. The autocorrelation of the state of the HamiltonianMonte Carlo simulation falls much more rapidly with simulation timethan that of the Langevin method. For this toy problem, Hamiltonian MonteCarlo is at least ten times more efficient in its use of computer time.41.5 Implementing inference with Gaussian approximationsPhysicists love to take nonlinearities <strong>and</strong> locally linearize them, <strong>and</strong> they loveto approximate probability distributions by Gaussians. Such approximationsoffer an alternative strategy for dealing with the integral∫P (t (N+1) = 1 | x (N+1) , D, α) =d K w y(x (N+1) ; w) 1Z Mexp(−M(w)), (41.21)which we just evaluated using Monte Carlo methods.We start by making a Gaussian approximation to the posterior probability.We go to the minimum of M(w) (using a gradient-based optimizer) <strong>and</strong> Taylorexp<strong>and</strong>M there:M(w) ≃ M(w MP ) + 1 2 (w − w MP) T A(w − w MP ) + · · · , (41.22)where A is the matrix of second derivatives, also known as the Hessian, definedby∂ 2A ij ≡ M(w)∂w i ∂w j∣ . (41.23)w=wMPWe thus define our Gaussian approximation:[Q(w; w MP , A) = [det(A/2π)] 1/2 exp − 1 ]2 (w − w MP) T A(w − w MP ) . (41.24)We can think of the matrix A as defining error bars on w. To be precise, Qis a normal distribution whose variance–covariance matrix is A −1 .Exercise 41.1. [2 ] Show that the second derivative of M(w) with respect to wis given by∂ 2∂w i ∂w jM(w) =N∑n=1f ′ (a (n) )x (n)ix (n)j+ αδ ij , (41.25)where f ′ (a) is the first derivative of f(a) ≡ 1/(1 + e −a ), which isf ′ (a) = d f(a) = f(a)(1 − f(a)), (41.26)da<strong>and</strong>a (n) = ∑ jw j x (n)j. (41.27)Having computed the Hessian, our task is then to perform the integral (41.21)using our Gaussian approximation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!