08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.6.4 Spectral Clustering Algorithm<br />

Spectral Clustering - The Algorithm<br />

1. Select a value for ɛ.<br />

2. Find the top k right singular vectors <strong>of</strong> data matrix A and let V be the n × k matrix <strong>of</strong><br />

the top k right singular vectors.<br />

3. Select a random row v i <strong>of</strong> V and form a cluster with all rows v j such that |v i −v j | ≤ 6kσ<br />

ɛ<br />

4. Repeat the above step k times.<br />

Theorem 8.6 If in a k-clustering C, every pair <strong>of</strong> centers is separated by at least 15kσ(C)/ε<br />

and every cluster has at least εn points in it, then with probability at least 1−ε, Spectral<br />

Clustering finds a clustering C ′ which differs from C in at most ε 2 n points.<br />

Before we prove the theorem, we show that the condition that every pair <strong>of</strong> cluster<br />

centers is separated by 15kσ(C)/ε holds for the Stochastic Block Model and Gaussian<br />

Mixture models discussed above for appropriate parameter settings. For this, we will<br />

need the following theorem from Random Matrix Theory which we do not prove here.<br />

Theorem 8.7 Suppose B is a n×d matrix with mutually independent, zero mean, random<br />

entries with variance ν in O(ln n/n) that are well-behaved. If |b ij | ≤ 1 for all i and j or if<br />

b ij are Gaussian random variables, they are well-behaved. The theorem works in greater<br />

generality. Then, with high probability,<br />

||B|| 2 ≤ c √ n + d √ ν.<br />

Now for the stochastic block model with two communities <strong>of</strong> n/2 people each and p, q, α,<br />

and β as above, we have E(a ij ) = c ij with B = A − C:<br />

Setting ν = p, the theorem gives<br />

E(b ij ) = 0 ; var(b ij ) = p(1 − p) or q(1 − q) ≤ p.<br />

||A − C|| 2 ≤ c √ n √ p = c √ α =⇒ σ(C) ≤<br />

√ α<br />

√ n<br />

.<br />

From (8.1) inter-center separation is α−β √ n<br />

. Thus, the condition <strong>of</strong> the theorem is satisfied<br />

as long as √ α ∈ Ω(α − β), which is a reasonable assumption in the regime when α is at<br />

least a large constant.<br />

The pro<strong>of</strong> for the Gaussian mixture model is similar. Suppose we have a mixture <strong>of</strong> k<br />

Gaussians and A is a data matrix with n independent, identically distributed samples from<br />

the mixture as its rows. The Gaussians need not be spherical. Let σmax be the maximum<br />

standard deviation <strong>of</strong> any <strong>of</strong> the k Gaussians in any direction. We again consider C to<br />

279

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!