08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

would require s ≥ n, which clearly is <strong>of</strong> no use since this would pick as many or more<br />

columns than the whole <strong>of</strong> A.<br />

Let’s use the identity-like matrix P instead <strong>of</strong> I in the above discussion. Using the<br />

fact that R is picked according to length squared sampling, we will show the following<br />

proposition later.<br />

Proposition 7.7 A ≈ AP and the error E (||A − AP || 2 2) is at most<br />

1 √r ||A|| 2 F .<br />

We then use Theorem 7.5 to argue that instead <strong>of</strong> doing the multiplication AP , we can<br />

use the sampled columns <strong>of</strong> A and the corresponding rows <strong>of</strong> P . The s sampled columns<br />

<strong>of</strong> A form C. We have to take the corresponding s rows <strong>of</strong> P = R T (RR T ) −1 R, which is<br />

the same as taking the corresponding s rows <strong>of</strong> R T , and multiplying this by (RR T ) −1 R. It<br />

is easy to check that this leads to an expression <strong>of</strong> the form CUR. Further, by Theorem<br />

7.5, the error is bounded by<br />

E ( ) ( )<br />

||AP − CUR|| 2 2 ≤ E ||AP − CUR||<br />

2 ||A|| 2<br />

F ≤<br />

F ||P ||2 F<br />

s<br />

since we will show later that:<br />

≤ r s ||A||2 F , (7.6)<br />

Proposition 7.8 ||P || 2 F ≤ r.<br />

Putting (7.6) and Proposition 7.7 together, and using the fact that by triangle inequality<br />

||A − CUR|| 2 ≤ ||A − AP || 2 + ||AP − CUR|| 2 , which in turn implies that ||A − CUR|| 2 2 ≤<br />

2||A − AP || 2 2 + 2||AP − CUR|| 2 2. The main result follows.<br />

Theorem 7.9 Let A be an m × n matrix and r and s be positive integers. Let C be an<br />

m × s matrix <strong>of</strong> s columns <strong>of</strong> A picked according to length squared sampling and let R be<br />

a matrix <strong>of</strong> r rows <strong>of</strong> A picked according to length squared sampling. Then, we can find<br />

from C and R an s × r matrix U so that<br />

E ( (<br />

)<br />

||A − CUR|| 2 2 ≤ ||A||<br />

2 2√r<br />

F + 2r )<br />

.<br />

s<br />

If s is fixed, the error is minimized when r = s 2/3 . Choosing s = r/ε and r = 1/ε 2 , the<br />

bound becomes O(ε)||A|| 2 F . When is this bound meaningful? We discuss this further after<br />

first proving all the claims used in the discussion above.<br />

Pro<strong>of</strong>: (<strong>of</strong> Lemma (7.6)): First for the case when RR T is invertible. For x = R T y,<br />

R T (RR T ) −1 Rx = R T (RR T ) −1 RR T y = R T y = x. If x is orthogonal to every row <strong>of</strong> R,<br />

then Rx = 0, so P x = 0. More generally, if RR T = ∑ t σ tu t v T t , then, R ∑ T 1<br />

t<br />

R =<br />

σ<br />

∑<br />

t 2<br />

t v tv T t and clearly satisfies (i) and (ii).<br />

Next we prove Proposition 7.7. First, recall that<br />

||A − AP || 2 2 =<br />

max |(A − AP<br />

{x:|x|=1} )x|2 .<br />

255

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!