08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Minimizing the sum <strong>of</strong> squared distances to cluster centers finds the maximum likelihood<br />

µ 1 , µ 2 , . . . , µ k . This motivates using the sum <strong>of</strong> distance squared to the cluster centers.<br />

8.2.2 Structural properties <strong>of</strong> the k-means objective<br />

Suppose we have already determined the clustering or the partitioning into C 1 , C 2 , . . . , C k .<br />

What are the best centers for the clusters? The following lemma shows that the answer<br />

is the centroids, the coordinate means, <strong>of</strong> the clusters.<br />

Lemma 8.1 Let {a 1 , a 2 , . . . , a n } be a set <strong>of</strong> points. The sum <strong>of</strong> the squared distances <strong>of</strong><br />

the a i to any point x equals the sum <strong>of</strong> the squared distances to the centroid <strong>of</strong> the a i plus<br />

n times the squared distance from x to the centroid. That is,<br />

∑<br />

|a i − x| 2 = ∑ |a i − c| 2 + n |c − x| 2<br />

i<br />

i<br />

where c = 1 n<br />

n∑<br />

a i is the centroid <strong>of</strong> the set <strong>of</strong> points.<br />

i=1<br />

Pro<strong>of</strong>:<br />

∑<br />

|a i − x| 2 = ∑<br />

i<br />

i<br />

= ∑ i<br />

|a i − c + c − x| 2<br />

|a i − c| 2 + 2(c − x) · ∑<br />

(a i − c) + n |c − x| 2<br />

i<br />

Since c is the centroid, ∑ i<br />

(a i − c) = 0. Thus, ∑ i<br />

|a i − x| 2 = ∑ i<br />

|a i − c| 2 + n |c − x| 2<br />

A corollary <strong>of</strong> Lemma 8.1 is that the centroid minimizes the sum <strong>of</strong> squared distances<br />

since the first term, ∑ |a i − c| 2 , is a constant independent <strong>of</strong> x and setting x = c sets the<br />

i<br />

second term, n ‖c − x‖ 2 , to zero.<br />

Corollary 8.2 Let {a 1 , a 2 , . . . , a n } be a set <strong>of</strong> points. The sum <strong>of</strong> squared<br />

∑<br />

distances <strong>of</strong><br />

the a i to a point x is minimized when x is the centroid, namely x = 1 a<br />

n i .<br />

8.2.3 Lloyd’s k-means clustering algorithm<br />

Corollary 8.2 suggests the following natural strategy for k-means clustering, known as<br />

Lloyd’s algorithm. Lloyd’s algorithm does not necessarily find a globally optimal solution<br />

but will find a locally-optimal one. An important but unspecified step in the algorithm is<br />

its initialization: how the starting k centers are chosen. We discuss this after discussing<br />

the main algorithm.<br />

i<br />

268

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!