08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

⎡<br />

⎢<br />

⎣<br />

A<br />

m × n<br />

⎤ ⎡<br />

⎥ ⎢<br />

⎦ ⎣<br />

B<br />

n × p<br />

⎡<br />

⎤<br />

≈<br />

⎥<br />

⎦ ⎢<br />

⎣<br />

Sampled<br />

Scaled<br />

columns<br />

<strong>of</strong><br />

A<br />

m × s<br />

⎤<br />

⎡<br />

⎣<br />

⎥<br />

⎦<br />

Corresponding<br />

scaled rows <strong>of</strong> B<br />

s × p<br />

⎤<br />

⎦<br />

Figure 7.3: Approximate Matrix Multiplication using sampling<br />

Define R to be the s×p matrix with the corresponding rows <strong>of</strong> B similarly scaled, namely,<br />

R has rows<br />

B(k 1 , :)<br />

√ , B(k 2, :)<br />

√ , . . . B(k s, :)<br />

√ .<br />

spk1 spk2 spks<br />

The reader may verify that<br />

E ( R T R ) = A T A. (7.4)<br />

From (7.2), we see that 1 s<br />

∑ s<br />

i=1 X i = CR. This is represented in Figure 7.3. We summarize<br />

our discussion in Theorem 7.5.<br />

Theorem 7.5 Suppose A is an m × n matrix and B is an n × p matrix. The product<br />

AB can be estimated by CR, where C is an m × s matrix consisting <strong>of</strong> s columns <strong>of</strong> A<br />

picked according to length-squared distribution and scaled to satisfy (7.3) and R is the<br />

s × p matrix consisting <strong>of</strong> the corresponding rows <strong>of</strong> B scaled to satisfy (7.4). The error<br />

is bounded by:<br />

E ( )<br />

||AB − CR|| 2 ||A|| 2<br />

F ≤<br />

F ||B||2 F<br />

.<br />

s<br />

Thus, to ensure E (||AB − CR|| 2 F ) ≤ ε2 ||A|| 2 F ||B||2 F , it suffices to make s ≥ 1/ε2 . If<br />

ε ∈ Ω(1) (so s ∈ O(1)), then the multiplication CR can be carried out in time O(mp).<br />

When is this error bound good and when is it not? Let’s focus on the case that B = A T<br />

so we have just one matrix to consider. If A is the identity matrix, then the guarantee is<br />

not very good. In this case, ||AA T || 2 n2<br />

F = n, but the right-hand-side <strong>of</strong> the inequality is . s<br />

So we would need s > n for the bound to be any better than approximating the product<br />

with the zero matrix.<br />

More generally, the trivial estimate <strong>of</strong> zero (all zero matrix) for AA T makes an error in<br />

Frobenius norm <strong>of</strong> ||AA T || F . What s do we need to ensure that the error is at most this?<br />

If σ 1 , σ 2 , . . . are the singular values <strong>of</strong> A, then the singular values <strong>of</strong> AA T are σ1, 2 σ2, 2 . . .<br />

and<br />

||AA T || 2 F = ∑ σt 4 and ||A|| 2 F = ∑ σt 2 .<br />

t<br />

t<br />

251

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!