10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.20.5: Solutions 29120.5 SolutionsSolution to exercise 20.1 (p.287). We can associate an ‘energy’ with the stateof the K-means algorithm by connecting a spring between each point x (n) <strong>and</strong>the mean that is responsible for it. The energy of one spring is proportional toits squared length, namely βd(x (n) , m (k) ) where β is the stiffness of the spring.The total energy of all the springs is a Lyapunov function for the algorithm,because (a) the assignment step can only decrease the energy – a point onlychanges its allegiance if the length of its spring would be reduced; (b) theupdate step can only decrease the energy – moving m (k) to the mean is theway to minimize the energy of its springs; <strong>and</strong> (c) the energy is bounded below– which is the second condition for a Lyapunov function. Since the algorithmhas a Lyapunov function, it converges.Solution to exercise 20.3 (p.290). If the means are initialized to m (1) = (m, 0)<strong>and</strong> m (1) = (−m, 0), the assignment step for a point at location x 1 , x 2 givesm 1m 2m 2m 1r 1 (x) ==exp(−β(x 1 − m) 2 /2)exp(−β(x 1 − m) 2 /2) + exp(−β(x 1 + m) 2 /2)(20.10)11 + exp(−2βmx 1 ) , (20.11)<strong>and</strong> the updated m is∫m ′ dx1 P (x 1 ) x 1 r 1 (x)= ∫ (20.12)dx1 P (x 1 ) r 1 (x)∫1= 2 dx 1 P (x 1 ) x 11 + exp(−2βmx 1 ) . (20.13)Now, m = 0 is a fixed point, but the question is, is it stable or unstable? Fortiny m (that is, βσ 1 m ≪ 1), we can Taylor-exp<strong>and</strong>so11 + exp(−2βmx 1 ) ≃ 1 2 (1 + βmx 1) + · · · (20.14)m ′≃∫dx 1 P (x 1 ) x 1 (1 + βmx 1 ) (20.15)= σ 2 1βm. (20.16)For small m, m either grows or decays exponentially under this mapping,depending on whether σ1 2 β is greater than or less than 1. The fixed pointm = 0 is stable if≤ 1/β (20.17)σ 2 1<strong>and</strong> unstable otherwise. [Incidentally, this derivation shows that this result isgeneral, holding for any true probability distribution P (x 1 ) having varianceσ1 2 , not just the Gaussian.]If σ1 2 > 1/β then there is a bifurcation <strong>and</strong> there are two stable fixed pointssurrounding the unstable fixed point at m = 0. To illustrate this bifurcation,figure 20.10 shows the outcome of running the soft K-means algorithm withβ = 1 on one-dimensional data with st<strong>and</strong>ard deviation σ 1 for various values ofσ 1 . Figure 20.11 shows this pitchfork bifurcation from the other point of view,where the data’s st<strong>and</strong>ard deviation σ 1 is fixed <strong>and</strong> the algorithm’s lengthscaleσ = 1/β 1/2 is varied on the horizontal axis.Figure 20.9. Schematic diagram ofthe bifurcation as the largest datavariance σ 1 increases from below1/β 1/2 to above 1/β 1/2 . The datavariance is indicated by theellipse.43210-1-2-3-2-1 0 1 2Data densityMean locations-2-1 0 1 2-40 0.5 1 1.5 2 2.5 3 3.5 4Figure 20.10. The stable meanlocations as a function of σ 1 , forconstant β, found numerically(thick lines), <strong>and</strong> theapproximation (20.22) (thin lines).0.80.60.40.20-0.2-0.4-0.6-2 -1 0 1 2-2 -1 0 1 2Data densityMean locns.-0.80 0.5 1 1.5 2Figure 20.11. The stable meanlocations as a function of 1/β 1/2 ,for constant σ 1 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!