10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.3.1: A first inference problem 490.250.2P(x|lambda=2)P(x|lambda=5)P(x|lambda=10)Figure 3.1. The probabilitydensity P (x | λ) as a function of x.0.150.10.0500.20.150.10.0502 4 6 8 10 12 14 16 18 20P(x=3|lambda)P(x=5|lambda)P(x=12|lambda)1 10 100xλFigure 3.2. The probabilitydensity P (x | λ) as a function of λ,for three different values of x.When plotted this way round, thefunction is known as the likelihoodof λ. The marks indicate thethree values of λ, λ = 2, 5, 10, thatwere used in the preceding figure.Steve wrote down the probability of one data point, given λ:P (x | λ) ={ 1λ e−x/λ /Z(λ) 1 < x < 200 otherwise(3.1)whereZ(λ) =∫ 201dx 1 λ e−x/λ =(e −1/λ − e −20/λ) . (3.2)This seemed obvious enough. Then he wrote Bayes’ theorem:P (λ | {x 1 , . . . , x N }) =∝P ({x} | λ)P (λ)(3.3)P ({x})1((λZ(λ)) N exp − ∑ )N1 x n/λ P (λ). (3.4)Suddenly, the straightforward distribution P ({x 1 , . . . , x N } | λ), defining theprobability of the data given the hypothesis λ, was being turned on its headso as to define the probability of a hypothesis given the data. A simple figureshowed the probability of a single data point P (x | λ) as a familiar function of x,for different values of λ (figure 3.1). Each curve was an innocent exponential,normalized to have area 1. Plotting the same function as a function of λ for afixed value of x, something remarkable happens: a peak emerges (figure 3.2).To help underst<strong>and</strong> these two points of view of the one function, figure 3.3shows a surface plot of P (x | λ) as a function of x <strong>and</strong> λ.For a dataset consisting of several points, e.g., the six points {x} N n=1 ={1.5, 2, 3, 4, 5, 12}, the likelihood function P ({x} | λ) is the product of the Nfunctions of λ, P (x n | λ) (figure 3.4).3211x1.522.5110λ100Figure 3.3. The probabilitydensity P (x | λ) as a function of x<strong>and</strong> λ. Figures 3.1 <strong>and</strong> 3.2 arevertical sections through thissurface.1.4e-061.2e-061e-068e-076e-074e-072e-0701 10 100Figure 3.4. The likelihood functionin the case of a six-point dataset,P ({x} = {1.5, 2, 3, 4, 5, 12} | λ), asa function of λ.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!