10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.22.3: Enhancements to soft K-means 303<strong>and</strong> give expressions for w 1 <strong>and</strong> w 0 .Assume now that the means {µ k } are not known, <strong>and</strong> that we wish toinfer them from the data {x n } N n=1 . (The st<strong>and</strong>ard deviation σ is known.) Inthe remainder of this question we will derive an iterative algorithm for findingvalues for {µ k } that maximize the likelihood,P ({x n } N n=1 | {µ k}, σ) = ∏ nP (x n | {µ k }, σ). (22.18)Let L denote the natural log of the likelihood. Show that the derivative of thelog likelihood with respect to µ k is given by∂L = ∑ ∂µ knp k|n(x n − µ k )σ 2 , (22.19)where p k|n ≡ P (k n = k | x n , θ) appeared above at equation (22.17).Show, neglecting terms in∂∂µ kP (k n = k | x n , θ), that the second derivativeis approximately given by∂ 2∂µ 2 kL = − ∑ np k|n1σ 2 . (22.20)Hence show that from an initial state µ 1 , µ 2 , an approximate Newton–Raphsonstep updates these parameters to µ ′ 1 , µ′ 2 , where∑µ ′ k = n p k|nx n∑ . (22.21)n p k|n[The [ / Newton–Raphson method for maximizing L(µ) updates µ to µ ′ = µ −∂L ∂ 2 L∂µ].]∂µ 20 1 2 3 4 5 6Assuming that σ = 1, sketch a contour plot of the likelihood function as afunction of µ 1 <strong>and</strong> µ 2 for the data set shown above. The data set consists of32 points. Describe the peaks in your sketch <strong>and</strong> indicate their widths.Notice that the algorithm you have derived for maximizing the likelihoodis identical to the soft K-means algorithm of section 20.4. Now that it is clearthat clustering can be viewed as mixture-density-modelling, we are able toderive enhancements to the K-means algorithm, which rectify the problemswe noted earlier.22.3 Enhancements to soft K-meansAlgorithm 22.2 shows a version of the soft-K-means algorithm correspondingto a modelling assumption that each cluster is a spherical Gaussian having itsown width (each cluster has its own β (k) = 1/ σk 2 ). The algorithm updates thelengthscales σ k for itself. The algorithm also includes cluster weight parametersπ 1 , π 2 , . . . , π K which also update themselves, allowing accurate modellingof data from clusters of unequal weights. This algorithm is demonstrated infigure 22.3 for two data sets that we’ve seen before. The second example shows

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!