08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Pro<strong>of</strong>: Let<br />

A =<br />

r∑<br />

σ i u i vi<br />

T<br />

i=1<br />

be the SVD <strong>of</strong> A. If the rank <strong>of</strong> A is less than d, then for convenience complete<br />

{v 1 , v 2 , . . . v r } into an orthonormal basis {v 1 , v 2 , . . . v d } <strong>of</strong> d-space. Write x in the basis<br />

<strong>of</strong> the v i ’s as<br />

d∑<br />

x = c i v i .<br />

Since (A T A) k = ∑ d<br />

|c 1 | ≥ δ.<br />

i=1 σ2k i<br />

i=1<br />

v i v T i , it follows that (AT A) k x = ∑ d<br />

i=1 σ2k i<br />

c i v i . By hypothesis,<br />

Suppose that σ 1 , σ 2 , . . . , σ m are the singular values <strong>of</strong> A that are greater than or equal<br />

to (1 − ε) σ 1 and that σ m+1 , . . . , σ d are the singular values that are less than (1 − ε) σ 1 .<br />

Now<br />

∣<br />

|(A T A) k x| 2 d∑<br />

=<br />

∣<br />

i=1<br />

σ 2k<br />

i c i v i<br />

∣ ∣∣∣∣<br />

2<br />

=<br />

d∑<br />

i=1<br />

σi<br />

4k c 2 i ≥ σ1 4k c 2 1 ≥ σ1 4k δ 2 .<br />

The component <strong>of</strong> |(A T A) k x| 2 perpendicular to the space V is<br />

d∑<br />

i=m+1<br />

σi<br />

4k c 2 i ≤ (1 − ε) 4k σ1<br />

4k<br />

d∑<br />

i=m+1<br />

c 2 i ≤ (1 − ε) 4k σ 4k<br />

1<br />

since ∑ d<br />

i=1 c2 i = |x| = 1. Thus, the component <strong>of</strong> w perpendicular to V has squared<br />

length at most (1−ε)4k σ1<br />

4k<br />

and so its length is at most<br />

σ1 4kδ2<br />

(1 − ε) 2k σ 2k<br />

1<br />

δσ 2k<br />

1<br />

=<br />

(1 − ε)2k<br />

δ<br />

≤ e−2kε<br />

δ<br />

= ε.<br />

Lemma 3.12 Let y ∈ R n be a random vector with the unit variance spherical Gaussian<br />

as its probability density. Let x = y/|y|. Let v be any fixed (not random) unit length<br />

vector. Then<br />

(<br />

Prob |x T v| ≤ 1 )<br />

20 √ ≤ 1<br />

d 10 + 3e−d/64 .<br />

Pro<strong>of</strong>: With c = √ d substituted in Theorem (2.9) <strong>of</strong> Chapter 2, the probability that<br />

|y| ≥ 2 √ d is at most 3e −d/64 . Further, y T v is a random, zero mean, unit variance<br />

Gaussian. Thus, the probability that |y T v| ≤ 1 is at most 1/10. Combining these two<br />

10<br />

facts and using the union bound, establishes the lemma.<br />

51

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!