K-means clustering algorithm

K-means clustering algorithm 

Once the number of clusters is set, the 

algorithm to find the clusters is 

straightforward: 

Let the set of 

Let the set of 

samples 

be 

x 

( i) 

: i 

cluster centers be 

m 

0, , 

N 

j 

: 

j 

1 . 

0, , 

K 

1 . 

For each x 

( i) 

, i 

For each Cluster 

0, , 

N 

j, j 

1, assign 

0, , 

K 

x 

( i) 

1,update 

to Cluster 

m 

j 

k 

if 

k 

1 

Cluster 

j 

argmin 

x 

( i ) 

j 

x 

x 

( i) 

Cluster j 

( i) 

. 

m 

j 

. 

Stop if 

no cluster center is 

updated.

How to set k in k-means clustering 

For K=1, 2, 3, …, run the k-means clustering algorithm. 

After the k-means algorithm has converged, we have cluster assignments for each 

sample as well as the locations of the cluster centers. 

Compute 

as d 

K 

Let d 

K 

Let the 

1 

N 

be 

the mean squared distance 

K 

j 

1 

( i ) 

0 x Cluster j 

the distortion 

x 

of 

( i) 

transformed distortion 

m 

j 

2 

the clustering 

be d 

. 

K 

of 

p / 2 

a sample 

result. 

from its 

, where p is the dimension 

corresponding 

of 

cluster center 

d K decreases as K increases 

the data samples. 

The jump value of transformed distortion 

(Assume d 

0 

0 when computing 

J 

1 

.) 

is 

J 

K 

d 

p / 2 

K 

d 

p / 2 

K 1 

. 

The peak of the jump values corresponds to the K that provides the best description of 

the original samples.


• Another example 

– Higher dimension data set 

In class, I often talk about a training set of 4 billion 

vectors, each having 4000 features… 

And yet, all the examples we have seen are 2-d or, as 

in the case of the “bonus” examples in the Lecture 

20 notes, 3-d 

Let us look at the results of processing a high 

dimensional data set!


• 17-dimensional data set; i.e., p=17 

• 12,000 vectors 

• The data set has 21 groups 

– Group 0 has prior probability 0.20 

– The remaining 20 groups have equal probability (0.04) 

• Each group has the same Gaussian density that differs only in 

the group mean 

• The group covariance matrix is the identity matrix; i.e., 

features are pairwise uncorrelated

Results 

• Run k-means for K=1, 2, …, 25 

• After each run, 

– Compute the mean squared distance to the 

corresponding cluster center as the total 

distortion 

– Compute transformed distortion 

– Compute the jump of the transformed distortion 

– Compute the inverse of the distortion (for 

comparison) 

– Compute the jump of the inverted distortion

Distortion 

34 

32 

30 

28 

26 

24 

22 

20 

18 

16 

0 5 10 15 20 25 

Number of Clusters

Transformed 

Distortion 

4.5E-11 

4E-11 

3.5E-11 

3E-11 

2.5E-11 

2E-11 

1.5E-11 

1E-11 

5E-12 

0 

0 5 10 15 20 25 

Number of Clusters 

Inverted 

Distortion 

0.065 

0.06 

0.055 

0.05 

0.045 

0.04 

0.035 

0.03 

0 5 10 15 20 25 

Number of Clusters

0.065 

0.06 

Inverted 

Distortion 

0.055 

0.05 

0.045 

0.04 

0.035 

0.03 

Using the inverted distortion, 

the best choice is K=1 

0.035 

0.03 

0 5 10 15 20 25 


Jump of 

Inverted 

Distortion 

0.025 

0.02 

0.015 

0.01 

0.005 

0 

-0.005 

0 5 10 15 20 25 

Number of Clusters

4.5E-11 

4E-11 

3.5E-11 

Transformed 

Distortion 

3E-11 

2.5E-11 

2E-11 

1.5E-11 

1E-11 

5E-12 

Using the inverted distortion, 

the best choice is K=25! 

0 

1E-11 

0 5 10 15 20 25 


8E-12 

Jump of 

Transformed 

Distortion 

6E-12 

4E-12 

2E-12 

0 

-2E-12 

0 5 10 15 20 25 

-4E-12 

Number of Clusters

Results 

• I expect to see the jump value of the transformed distortion 

to peak at K=21 

• I did not get what I expected to see 

• Although not shown here, two other examples show similar 

results

Discussion 

• We note that the results are subject to sampling errors (also 

see the “bonus” 4-group 2d example in the Lecture 20 notes) 

• The jump value of the transformed distortion does get us to 

the neighborhood of correct K (the high values are at K=17, 

20, 22, 25) 

• Because the peak occurs at K=25, we really should have ran a 

few more runs at larger values of K 

• By comparison, the jump value of the inverted distortion 

selected a one-cluster result as best description

K-means clustering algorithm

Create successful ePaper yourself

Delete template?

Save as template?