10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.24Exact MarginalizationHow can we avoid the exponentially large cost of complete enumeration ofall hypotheses? Before we stoop to approximate methods, we explore twoapproaches to exact marginalization: first, marginalization over continuousvariables (sometimes known as nuisance parameters) by doing integrals; <strong>and</strong>second, summation over discrete variables by message-passing.Exact marginalization over continuous parameters is a macho activity enjoyedby those who are fluent in definite integration. This chapter uses gammadistributions; as was explained in the previous chapter, gamma distributionsare a lot like Gaussian distributions, except that whereas the Gaussian goesfrom −∞ to ∞, gamma distributions go from 0 to ∞.24.1 Inferring the mean <strong>and</strong> variance of a Gaussian distributionWe discuss again the one-dimensional Gaussian distribution, parameterizedby a mean µ <strong>and</strong> a st<strong>and</strong>ard deviation σ:P (x | µ, σ) = √ 1)(x − µ)2exp(−2πσ 2σ 2 ≡ Normal(x; µ, σ 2 ). (24.1)When inferring these parameters, we must specify their prior distribution.The prior gives us the opportunity to include specific knowledge that we haveabout µ <strong>and</strong> σ (from independent experiments, or on theoretical grounds, forexample). If we have no such knowledge, then we can construct an appropriateprior that embodies our supposed ignorance. In section 21.2, we assumed auniform prior over the range of parameters plotted. If we wish to be able toperform exact marginalizations, it may be useful to consider conjugate priors;these are priors whose functional form combines naturally with the likelihoodsuch that the inferences have a convenient form.Conjugate priors for µ <strong>and</strong> σThe conjugate prior for a mean µ is a Gaussian: we introduce two ‘hyperparameters’,µ 0 <strong>and</strong> σ µ , which parameterize the prior on µ, <strong>and</strong> writeP (µ | µ 0 , σ µ ) = Normal(µ; µ 0 , σµ).2 In the limit µ 0 = 0, σ µ → ∞, we obtainthe noninformative prior for a location parameter, the flat prior. This isnoninformative because it is invariant under the natural reparameterizationµ ′ = µ + c. The prior P (µ) = const. is also an improper prior, that is, it is notnormalizable.The conjugate prior for a st<strong>and</strong>ard deviation σ is a gamma distribution,which has two parameters b β <strong>and</strong> c β . It is most convenient to define the prior319

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!