10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.302 22 — Maximum Likelihood <strong>and</strong> Clusteringrespect to ln u rather than u, when u is a scale variable; we use du n /d(ln u) =nu n .]∂ ln P ({x n } N n=1 | µ, σ)= −N + S tot∂ ln σσ 2 (22.10)This derivative is zero whenσ 2 = S totN , (22.11)i.e.,√∑Nn=1 (x n − µ) 2The second derivative isσ =N. (22.12)∂ 2 ln P ({x n } N n=1 | µ, σ)∂(ln σ) 2 = −2 S totσ 2 , (22.13)<strong>and</strong> at the maximum-likelihood value of σ 2 , this equals −2N. So error barson ln σ areσ ln σ = √ 1 . ✷ (22.14)2N⊲ Exercise 22.4. [1 ] Show that the values { of µ <strong>and</strong> ln σ that jointly maximize thelikelihood are: {µ, σ} ML = ¯x, σ N = √ }S/N , where√∑Nn=1 (x n − ¯x) 2σ N ≡N. (22.15)22.2 Maximum likelihood for a mixture of GaussiansWe now derive an algorithm for fitting a mixture of Gaussians to onedimensionaldata. In fact, this algorithm is so important to underst<strong>and</strong> that,you, gentle reader, get to derive the algorithm. Please work through the followingexercise.[2, p.310]Exercise 22.5. A r<strong>and</strong>om variable x is assumed to have a probabilitydistribution that is a mixture of two Gaussians,P (x | µ 1 , µ 2 , σ) =[ 2∑k=1p k1√2πσ 2 exp (− (x − µ k) 22σ 2 ) ] , (22.16)where the two Gaussians are given the labels k = 1 <strong>and</strong> k = 2; the priorprobability of the class label k is {p 1 = 1/2, p 2 = 1/2}; {µ k } are the meansof the two Gaussians; <strong>and</strong> both have st<strong>and</strong>ard deviation σ. For brevity, wedenote these parameters by θ ≡ {{µ k }, σ}.A data set consists of N points {x n } N n=1 which are assumed to be independentsamples from this distribution. Let k n denote the unknown class label ofthe nth point.Assuming that {µ k } <strong>and</strong> σ are known, show that the posterior probabilityof the class label k n of the nth point can be written asP (k n = 1 | x n , θ) =P (k n = 2 | x n , θ) =11 + exp[−(w 1 x n + w 0 )](22.17)11 + exp[+(w 1 x n + w 0 )] ,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!