10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.290 20 — An Example <strong>Inference</strong> Task: ClusteringLarge σ . . .. . .. . . small σFigure 20.8. Soft K-meansalgorithm, version 1, applied to adata set of 40 points. K = 4.Implicit lengthscale parameterσ = 1/β 1/2 varied from a large toa small value. Each picture showsthe state of all four means, withthe implicit lengthscale shown bythe radius of the four circles, afterrunning the algorithm for severaltens of iterations. At the largestlengthscale, all four meansconverge exactly to the datamean. Then the four meansseparate into two groups of two.At shorter lengthscales, each ofthese pairs itself bifurcates intosubgroups.<strong>and</strong> the clusters of unequal weight <strong>and</strong> width? Adding one stiffness parameterβ is not going to make all these problems go away.We’ll come back to these questions in a later chapter, as we develop themixture-density-modelling view of clustering.Further readingFor a vector-quantization approach to clustering see (Luttrell, 1989; Luttrell,1990).20.4 Exercises⊲ Exercise 20.3. [3, p.291] Explore the properties of the soft K-means algorithm,version 1, assuming that the datapoints {x} come from a single separabletwo-dimensional Gaussian distribution with mean zero <strong>and</strong> variances(var(x 1 ), var(x 2 )) = (σ1 2, σ2 2 ), with σ2 1 > σ2 2 . Set K = 2, assume N islarge, <strong>and</strong> investigate the fixed points of the algorithm as β is varied.[Hint: assume that m (1) = (m, 0) <strong>and</strong> m (2) = (−m, 0).]⊲ Exercise 20.4. [3 ] Consider the soft K-means algorithm applied to a largeamount of one-dimensional data that comes from a mixture of two equalweightGaussians with true means µ = ±1 <strong>and</strong> st<strong>and</strong>ard deviation σ P ,for example σ P = 1. Show that the hard K-means algorithm with K = 2leads to a solution in which the two means are further apart than thetwo true means. Discuss what happens for other values of β, <strong>and</strong> findthe value of β such that the soft algorithm puts the two means in thecorrect places.-1 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!