08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

e made up <strong>of</strong> the means <strong>of</strong> the Gaussians. Now the theorem is satisfied by A − C with<br />

ν = σ 2 max. For k ∈ O(1), it is easy to see that the hypothesis <strong>of</strong> Theorem 8.6 is satisfied<br />

provided the means <strong>of</strong> the component Gaussians are separated by Ω(σmax).<br />

The pro<strong>of</strong> <strong>of</strong> the Theorem 8.6 relies on a crucial lemma, which is simple to prove.<br />

Lemma 8.8 Suppose A is an n × d matrix and suppose V is obtained by projecting the<br />

rows <strong>of</strong> A to the subspace <strong>of</strong> the first k right singular vectors <strong>of</strong> A. For any matrix C <strong>of</strong><br />

rank less than or equal to k<br />

||V − C|| 2 F ≤ 8k||A − C|| 2 2.<br />

Note: V is just one matrix. But it is close to every C, in the sense ||V − C|| 2 F ≤<br />

8k||A − C|| 2 2. While this seems contradictory, the point <strong>of</strong> the lemma is that for C far<br />

away from V , ||A − C|| 2 will be high.<br />

Pro<strong>of</strong>: Since the rank <strong>of</strong> (V − C) is less than or equal to 2k,<br />

||V − C|| 2 F ≤ 2k||V − C|| 2 2 and<br />

||V − C|| 2 ≤ ||V − A|| 2 + ||A − C|| 2 ≤ 2||A − C|| 2 .<br />

The last inequality follows since V is the best rank k approximation in spectral norm and<br />

C has rank at most k. The lemma follows.<br />

Pro<strong>of</strong> <strong>of</strong> Theorem 8.6: We use the lemma to argue that barring a few exceptions, most<br />

ā i are at distance at most 3kσ(C)/ε to the corresponding c i . This will imply that for<br />

most i, the point ā i will be close distance at most to most 6kσ(C)/ε to most other ā j in<br />

its own cluster. Since we assumed the cluster centers <strong>of</strong> C are well separated this will<br />

imply that for most i and j in different clusters, |v i − v j | ≥ 9kσ(C)/ε. This will enable<br />

us to prove that the distance based clustering step, Step 3, <strong>of</strong> the algorithm works.<br />

Define M to be the set <strong>of</strong> exceptions:<br />

M = {i : |ab i − c i | ≥ 3kσ(C)/ε}.<br />

Since ||V − C|| 2 F = ∑ i |v i − c i | 2 ≥ ∑ i∈M |v i − c i | 2 ≥ |M| 9k2 σ 2 (C)<br />

, using the lemma we get:<br />

ε 2<br />

|M| 9k2 σ 2 (C)<br />

ε 2 ≤ ||V − C|| 2 F ≤ 8knσ 2 (C) =⇒ |M| ≤ 8ε2 n<br />

9k . (8.2)<br />

For i, j /∈ M, i, j in the same cluster in C,<br />

|v i − v j | ≤ |v i − c i | + |c i − v j | ≤ 6kσ(C) . (8.3)<br />

ε<br />

280

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!