01.08.2013 Views

Information Theory, Inference, and Learning ... - MAELabs UCSD

Information Theory, Inference, and Learning ... - MAELabs UCSD

Information Theory, Inference, and Learning ... - MAELabs UCSD

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981<br />

You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.<br />

22.1: Maximum likelihood for one Gaussian 301<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

0.02<br />

0.01<br />

0.6<br />

sigma<br />

0.4<br />

(a1)<br />

(b)<br />

1 0<br />

Posterior<br />

0.8<br />

4.5<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0.2<br />

0<br />

0.5<br />

1<br />

mean<br />

1.5<br />

sigma=0.2<br />

sigma=0.4<br />

sigma=0.6<br />

0<br />

0 0.2 0.4 0.6 0.8 1<br />

mean<br />

1.2 1.4 1.6 1.8 2<br />

2<br />

(c)<br />

0 0.5 1 1.5 2<br />

(a2)<br />

mean<br />

0.09<br />

0.08<br />

0.07<br />

0.06<br />

0.05<br />

0.04<br />

0.03<br />

0.02<br />

0.01<br />

1<br />

mu=1<br />

mu=1.25<br />

mu=1.5<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

sigma<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0.2 0.4 0.6 0.8 1 1.2 1.4 1.61.8 2<br />

If we Taylor-exp<strong>and</strong> the log likelihood about the maximum, we can define<br />

approximate error bars on the maximum likelihood parameter: we use<br />

a quadratic approximation to estimate how far from the maximum-likelihood<br />

parameter setting we can go before the likelihood falls by some st<strong>and</strong>ard factor,<br />

for example e 1/2 , or e 4/2 . In the special case of a likelihood that is a<br />

Gaussian function of the parameters, the quadratic approximation is exact.<br />

Example 22.2. Find the second derivative of the log likelihood with respect to<br />

µ, <strong>and</strong> find the error bars on µ, given the data <strong>and</strong> σ.<br />

Solution.<br />

∂2 N<br />

ln P = − . ✷ (22.7)<br />

∂µ 2 σ2 Comparing this curvature with the curvature of the log of a Gaussian distribution<br />

over µ of st<strong>and</strong>ard deviation σµ, exp(−µ 2 /(2σ 2 µ)), which is −1/σ 2 µ, we<br />

can deduce that the error bars on µ (derived from the likelihood function) are<br />

σµ = σ<br />

√ N . (22.8)<br />

The error bars have this property: at the two points µ = ¯x±σµ, the likelihood<br />

is smaller than its maximum value by a factor of e 1/2 .<br />

Example 22.3. Find the maximum likelihood st<strong>and</strong>ard deviation σ of a Gaussian,<br />

whose mean is known to be µ, in the light of data {xn} N n=1 . Find<br />

the second derivative of the log likelihood with respect to ln σ, <strong>and</strong> error<br />

bars on ln σ.<br />

Solution. The likelihood’s dependence on σ is<br />

ln P ({xn} N n=1 | µ, σ) = −N ln( √ 2πσ) − Stot<br />

(2σ2 , (22.9)<br />

)<br />

where Stot = <br />

n (xn − µ) 2 . To find the maximum of the likelihood, we can<br />

differentiate with respect to ln σ. [It’s often most hygienic to differentiate with<br />

Figure 22.1. The likelihood<br />

function for the parameters of a<br />

Gaussian distribution.<br />

(a1, a2) Surface plot <strong>and</strong> contour<br />

plot of the log likelihood as a<br />

function of µ <strong>and</strong> σ. The data set<br />

of N = 5 points had mean ¯x = 1.0<br />

<strong>and</strong> S = (x − ¯x) 2 = 1.0.<br />

(b) The posterior probability of µ<br />

for various values of σ.<br />

(c) The posterior probability of σ<br />

for various fixed values of µ<br />

(shown as a density over ln σ).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!