08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

B<br />

A<br />

Figure 8.1: Example where the natural clustering is not center-based.<br />

to estimate densities <strong>of</strong> regions when data lies in high dimensions. So, as a preprocessing<br />

step, one may want to first perform some type <strong>of</strong> projection into a low dimensional space,<br />

such as SVD, before running a clustering algorithm.<br />

We begin with a discussion <strong>of</strong> algorithms for center-based clustering, then examine<br />

algorithms for high-density clusters, and then examine some algorithms that allow combining<br />

the two. A resource for information on center-based clustering is the book chapter<br />

[?].<br />

8.2 k-means Clustering<br />

We assume in this section that data points lie in R d and focus on the k-means criterion.<br />

8.2.1 A maximum-likelihood motivation for k-means<br />

We now consider a maximum-likelihood motivation for using the k-means criterion. Suppose<br />

that the data was generated according to an equal weight mixture <strong>of</strong> k spherical<br />

well-separated Gaussian densities centered at µ 1 , µ 2 , . . . , µ k , each with variance one in<br />

every direction. Then the density <strong>of</strong> the mixture is<br />

Prob(x) =<br />

1 1<br />

(2π) d/2 k<br />

k∑<br />

e −|x−µ i| 2 .<br />

Denote by µ(x) the center nearest to x. Since the exponential function falls <strong>of</strong>f fast, we<br />

can approximate ∑ k<br />

i=1 e−|x−µ i| 2 by e −|x−µ(x)|2 . Thus<br />

Prob(x) ≈<br />

i=1<br />

1<br />

(2π) d/2 k e−|x−µ(x)|2 .<br />

The likelihood <strong>of</strong> drawing the sample <strong>of</strong> points x 1 , x 2 , . . . , x n from the mixture, if the<br />

centers were µ 1 , µ 2 , . . . , µ k , is approximately<br />

1 1<br />

k n (2π) nd/2<br />

n<br />

∏<br />

i=1<br />

e −|x(i) −µ(x (i) )| 2 = ce − ∑ n<br />

i=1 |x(i) −µ(x (i) )| 2 .<br />

267

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!