08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

⎡<br />

⎢<br />

⎣<br />

A<br />

n × m<br />

⎤<br />

⎡<br />

≈<br />

⎥ ⎢<br />

⎦ ⎣<br />

Sample<br />

columns<br />

n × s<br />

⎤<br />

⎥<br />

⎦<br />

[ Multi<br />

plier<br />

s × r<br />

][ Sample rows<br />

r × m<br />

]<br />

Figure 7.4: Schematic diagram <strong>of</strong> the approximation <strong>of</strong> A by a sample <strong>of</strong> s columns and<br />

r rows.<br />

uct j. The objective is to collect a few sample entries <strong>of</strong> A and based on them, get an<br />

approximation to A so that we can make future recommendations. A few sampled rows<br />

<strong>of</strong> A ( preferences <strong>of</strong> a few customers) and a few sampled columns (customers’ preferences<br />

for a few products) give a good approximation to A provided that the samples are drawn<br />

according to the length-squared distribution.<br />

It remains now to describe how to find U from C and R. There is a n × n matrix P<br />

<strong>of</strong> the form P = QR that acts as the identity on the space spanned by the rows <strong>of</strong> R and<br />

zeros out all vectors orthogonal to this space. We state this now and postpone the pro<strong>of</strong>.<br />

Lemma 7.6 If RR T is invertible, then P = R T (RR T ) −1 R has the following properties:<br />

(i) It acts as the identity matrix on the row space <strong>of</strong> R. I.e., P x = x for every vector x<br />

<strong>of</strong> the form x = R T y (this defines the row space <strong>of</strong> R). Furthermore,<br />

(ii) if x is orthogonal to the row space <strong>of</strong> R, then P x = 0.<br />

If RR T is not invertible, let rank (RR T ) = r and RR T = ∑ r<br />

t=1 σ tu t v T t be the SVD <strong>of</strong><br />

RR T . Then,<br />

( r∑<br />

)<br />

P = R T 1 T<br />

u<br />

σ 2 t v t R<br />

t=1 t<br />

satisfies (i) and (ii).<br />

We begin with some intuition. In particular, we first present a simpler idea that does<br />

not work, but that motivates an idea that does. Write A as AI, where I is the n × n<br />

identity matrix. Approximate the product AI using the algorithm <strong>of</strong> Theorem 7.5, i.e.,<br />

by sampling s columns <strong>of</strong> A according to length-squared. Then, as in the last section,<br />

write AI ≈ CW , where W consists <strong>of</strong> a scaled version <strong>of</strong> the s rows <strong>of</strong> I corresponding<br />

to the s columns <strong>of</strong> A that were picked. Theorem 7.5 bounds the error ||A − CW || 2 F by<br />

||A|| 2 F ||I||2 F /s = n s ||A||2 F . But we would like the error to be a small fraction <strong>of</strong> ||A||2 F which<br />

254

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!