08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

mean.<br />

(ii) There is “no free lunch”. Since we only work on a small random sample and<br />

not on the whole input matrix, our error bounds will not be good for certain matrices.<br />

For example, if the input matrix is the identity, it is intuitively clear that picking a few<br />

random columns will miss the other directions. Indeed, the initial error bounds we prove<br />

using length squared sampling are useful only for “numerically low-rank matrices”, which<br />

we define later. But there are important applications, for example, Principal Component<br />

Analysis, where one has numerically low-rank input matrices and these techniques are<br />

useful. There are more sophisticated and time-consuming sampling methods which have<br />

error bounds which are good even for non-numerically-low-rank matrices.<br />

To the Reader: Why aren’t (i) and (ii) mutually contradictory ?<br />

7.3.1 Matrix Multiplication Using Sampling<br />

Suppose A is an m×n matrix and B is an n×p matrix and the product AB is desired.<br />

We show how to use sampling to get an approximate product faster than the traditional<br />

multiplication. Let A (:, k) denote the k th column <strong>of</strong> A. A (:, k) is a m × 1 matrix. Let<br />

B (k, :) be the k th row <strong>of</strong> B. B (k, :) is a 1 × n matrix. It is easy to see that<br />

AB =<br />

n∑<br />

A (:, k)B (k, :) .<br />

k=1<br />

Note that for each value <strong>of</strong> k, A(:, k)B(k, :) is an m × p matrix each element <strong>of</strong> which is a<br />

single product <strong>of</strong> elements <strong>of</strong> A and B. An obvious use <strong>of</strong> sampling suggests itself. Sample<br />

some values for k and compute A (:, k) B (k, :) for the sampled k’s and use their suitably<br />

scaled sum as the estimate <strong>of</strong> AB. It turns out that nonuniform sampling probabilities<br />

are useful. Define a random variable z that takes on values in {1, 2, . . . , n}. Let p k denote<br />

the probability that z assumes the value k. We will solve for a good choice <strong>of</strong> probabilities<br />

later, but for now just consider the p k as nonnegative numbers that sum to one. Define<br />

an associated random matrix variable that has value<br />

X = 1 p k<br />

A (:, k) B (k, :) (7.1)<br />

with probability p k . Let E (X) denote the entry-wise expectation.<br />

E (X) =<br />

n∑<br />

Prob(z = k) 1 A (:, k) B (k, :) =<br />

p k<br />

k=1<br />

n∑<br />

A (:, k)B (k, :) = AB.<br />

k=1<br />

This explains the scaling by 1<br />

p k<br />

in X. In particular, X is a matrix-valued random variable<br />

each <strong>of</strong> whose components is correct in expectation. We will be interested in<br />

E ( )<br />

||AB − X|| 2 F .<br />

249

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!