08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

orthonormal basis w 1 , w 2 , . . . , w k <strong>of</strong> W so that w k is perpendicular to v 1 , v 2 , . . . , v k−1 .<br />

Then<br />

|Aw 1 | 2 + |Aw 2 | 2 + · · · + |Aw k−1 | 2 ≤ |Av 1 | 2 + |Av 2 | 2 + · · · + |Av k−1 | 2<br />

since V k−1 is an optimal k − 1 dimensional subspace. Since w k is perpendicular to<br />

v 1 , v 2 , . . . , v k−1 , by the definition <strong>of</strong> v k , |Aw k | 2 ≤ |Av k | 2 . Thus<br />

|Aw 1 | 2 + |Aw 2 | 2 + · · · + |Aw k−1 | 2 + |Aw k | 2 ≤ |Av 1 | 2 + |Av 2 | 2 + · · · + |Av k−1 | 2 + |Av k | 2 ,<br />

proving that V k is at least as good as W and hence is optimal.<br />

Note that the n-dimensional vector Av i is a list <strong>of</strong> lengths (with signs) <strong>of</strong> the projections<br />

<strong>of</strong> the rows <strong>of</strong> A onto v i . Think <strong>of</strong> |Av i | = σ i (A) as the “component” <strong>of</strong> the matrix<br />

A along v i . For this interpretation to make sense, it should be true that adding up the<br />

squares <strong>of</strong> the components <strong>of</strong> A along each <strong>of</strong> the v i gives the square <strong>of</strong> the “whole content<br />

<strong>of</strong> the matrix A”. This is indeed the case and is the matrix analogy <strong>of</strong> decomposing a<br />

vector into its components along orthogonal directions.<br />

Consider one row, say a j , <strong>of</strong> A. Since v 1 , v 2 , . . . , v r span the space <strong>of</strong> all rows <strong>of</strong> A,<br />

r∑<br />

a j · v = 0 for all v perpendicular to v 1 , v 2 , . . . , v r . Thus, for each row a j , (a j · v i ) 2 =<br />

|a j | 2 . Summing over all rows j,<br />

n∑<br />

n∑ r∑<br />

|a j | 2 = (a j · v i ) 2 =<br />

But<br />

j=1<br />

n∑ ∑<br />

|a j | 2 = n<br />

j=1<br />

j=1<br />

d∑<br />

j=1 k=1<br />

i=1<br />

r∑<br />

i=1<br />

n∑<br />

(a j · v i ) 2 =<br />

j=1<br />

r∑<br />

|Av i | 2 =<br />

i=1<br />

i=1<br />

r∑<br />

σi 2 (A).<br />

a 2 jk , the sum <strong>of</strong> squares <strong>of</strong> all the entries <strong>of</strong> A. Thus, the sum <strong>of</strong><br />

squares <strong>of</strong> the singular values <strong>of</strong> A is indeed the square <strong>of</strong> the “whole content <strong>of</strong> A”, i.e.,<br />

the sum <strong>of</strong> squares <strong>of</strong> all the entries. There is an important norm associated with this<br />

quantity, the Frobenius norm <strong>of</strong> A, denoted ||A|| F defined as<br />

i=1<br />

√ ∑<br />

||A|| F =<br />

j,k<br />

a 2 jk .<br />

Lemma 3.2 For any matrix A, the sum <strong>of</strong> squares <strong>of</strong> the singular values equals the square<br />

<strong>of</strong> the Frobenius norm. That is, ∑ σ 2 i (A) = ||A|| 2 F .<br />

Pro<strong>of</strong>: By the preceding discussion.<br />

The vectors v 1 , v 2 , . . . , v r are called the right-singular vectors. The vectors Av i form<br />

a fundamental set <strong>of</strong> vectors and we normalize them to length one by<br />

u i = 1<br />

σ i (A) Av i.<br />

Later we will show that u 1 , u 2 , . . . , u r similarly maximize |u T A| when multiplied on the<br />

left and are called the left-singular vectors. Clearly, the right-singular vectors are orthogonal<br />

by definition. We will show later that the left-singular vectors are also orthogonal.<br />

43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!