10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.296 21 — Exact <strong>Inference</strong> by Complete EnumerationFigure 21.2. Enumeration of anentire (discretized) hypothesisspace for one Gaussian withparameters µ (horizontal axis)<strong>and</strong> σ (vertical).function for the decay length as a function of λ by evaluating the likelihoodat a finely-spaced series of points.A two-parameter modelLet’s look at the Gaussian distribution as an example of a model with a twodimensionalhypothesis space. The one-dimensional Gaussian distribution isparameterized by a mean µ <strong>and</strong> a st<strong>and</strong>ard deviation σ:P (x | µ, σ) = √ 1)(x − µ)2exp(− 2πσ 2σ 2 ≡ Normal(x; µ, σ 2 ). (21.8)Figure 21.2 shows an enumeration of one hundred hypotheses about the mean<strong>and</strong> st<strong>and</strong>ard deviation of a one-dimensional Gaussian distribution. Thesehypotheses are evenly spaced in a ten by ten square grid covering ten valuesof µ <strong>and</strong> ten values of σ. Each hypothesis is represented by a picture showingthe probability density that it puts on x. We now examine the inference of µ<strong>and</strong> σ given data points x n , n = 1, . . . , N, assumed to be drawn independentlyfrom this density.Imagine that we acquire data, for example the five points shown in figure21.3. We can now evaluate the posterior probability of each of the onehundred subhypotheses by evaluating the likelihood of each, that is, the valueof P ({x n } 5 n=1 | µ, σ). The likelihood values are shown diagrammatically infigure 21.4 using the line thickness to encode the value of the likelihood. Subhypotheseswith likelihood smaller than e −8 times the maximum likelihoodhave been deleted.Using a finer grid, we can represent the same information by plotting thelikelihood as a surface plot or contour plot as a function of µ <strong>and</strong> σ (figure 21.5).-0.5 0 0.5 1 1.5 2 2.5Figure 21.3. Five datapoints{x n } 5 n=1 . The horizontalcoordinate is the value of thedatum, x n ; the vertical coordinatehas no meaning.A five-parameter mixture modelEyeballing the data (figure 21.3), you might agree that it seems more plausiblethat they come not from a single Gaussian but from a mixture of twoGaussians, defined by two means, two st<strong>and</strong>ard deviations, <strong>and</strong> two mixing

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!