08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

First suppose x is in the row space V <strong>of</strong> R. From Lemma 7.6 P x = x, so for x ∈ V ,<br />

(A − AP )x = 0. Since every vector can be written as a sum <strong>of</strong> a vector in V plus a vector<br />

orthogonal to V , this implies that the maximum must therefore occur at some x ∈ V ⊥ .<br />

For such x, by Lemma 7.6, (A−AP )x = Ax. Thus, the question becomes: for unit-length<br />

x ∈ V ⊥ , how large can |Ax| 2 be? To analyze this, write:<br />

|Ax| 2 = x T A T Ax = x T (A T A − R T R)x ≤ ||A T A − R T R|| 2 |x| 2 ≤ ||A T A − R T R|| 2 .<br />

This implies that ||A − AP || 2 2 ≤ ||A T A − R T R|| 2 . So, it suffices to prove that ||A T A −<br />

R T R|| 2 2 ≤ ||A|| 4 F /r which follows directly from Theorem 7.5, since we can think <strong>of</strong> RT R<br />

as a way <strong>of</strong> estimating A T A by picking according to length-squared distribution columns<br />

<strong>of</strong> A T , i.e., rows <strong>of</strong> A. This proves Proposition 7.7.<br />

Proposition 7.8 is easy to see. By Lemma 7.6, P is the identity on the space V spanned<br />

by the rows <strong>of</strong> R, and P x = 0 for x perpendicular to the rows <strong>of</strong> R. Thus ||P || 2 F is the<br />

sum <strong>of</strong> its singular values squared which is at most r as claimed.<br />

We now briefly look at the time needed to compute U. The only involved step in<br />

computing U is to find (RR T ) −1 or do the SVD <strong>of</strong> RR T . But note that RR T is an r × r<br />

matrix and since r is much smaller than n and m, this is fast.<br />

Understanding the bound in Theorem 7.9: To better understand the bound in<br />

Theorem 7.9 consider when it is meaningful and when it is not. First, choose parameters<br />

s = Θ(1/ε 3 ) and r = Θ(1/ε 2 ) so that the bound becomes E(||A − CUR|| 2 2) ≤ ε||A|| 2 F .<br />

Recall that ||A|| 2 F = ∑ i σ2 i (A), i.e., the sum <strong>of</strong> squares <strong>of</strong> all the singular values <strong>of</strong> A.<br />

Also, for convenience scale A so that σ1(A) 2 = 1. Then<br />

σ 2 1(A) = ||A|| 2 2 = 1 and E(||A − CUR|| 2 2) ≤ ε ∑ i<br />

σ 2 i (A).<br />

This, gives an intuitive sense <strong>of</strong> when the guarantee is good and when it is not. If the<br />

top k singular values <strong>of</strong> A are all Ω(1) for k ≫ m 1/3 , so that ∑ i σ2 i (A) ≫ m 1/3 , then<br />

the guarantee is only meaningful when ε = o(m −1/3 ), which is not interesting because it<br />

requires s > m. On the other hand, if just the first few singular values <strong>of</strong> A are large<br />

and the rest are quite small, e.g, A represents a collection <strong>of</strong> points that lie very close<br />

to a low-dimensional pancake and in particular if ∑ i σ2 i (A) is a constant, then to be<br />

meaningful the bound requires ε to be a small constant. In this case, the guarantee is<br />

indeed meaningful because it implies that a constant number <strong>of</strong> rows and columns provides<br />

a good 2-norm approximation to A.<br />

7.4 Sketches <strong>of</strong> Documents<br />

Suppose one wished to store all the web pages from the WWW. Since there are billions<br />

<strong>of</strong> web pages, one might store just a sketch <strong>of</strong> each page where a sketch is a few hundred<br />

256

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!