08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.6 Left Singular Vectors<br />

Theorem 3.7 The left singular vectors are pairwise orthogonal.<br />

Pro<strong>of</strong>: First we show that each u i , i ≥ 2 is orthogonal to u 1 . Suppose not, and for some<br />

i ≥ 2, u T 1 u i ≠ 0. Without loss <strong>of</strong> generality assume that u T 1 u i = δ > 0. (If u 1 T u i < 0<br />

then just replace u i with −u i .) For ε > 0, let<br />

Notice that v ′ 1 is a unit-length vector.<br />

v ′ 1 = v 1 + εv i<br />

|v 1 + εv i | .<br />

Av 1 ′ = σ 1u 1 + εσ i u<br />

√ i<br />

1 + ε<br />

2<br />

has length at least as large as its component along u 1 which is<br />

u T 1 ( σ 1u 1 + εσ i u<br />

( )<br />

√ i<br />

) > (σ 1 + εσ i δ) 1 − ε2 > σ 1 + ε<br />

2<br />

2 1 − ε2 σ 2 1 + εσ i δ + ε3 σ 2 iδ > σ 1 ,<br />

for sufficiently small ɛ, a contradiction to the definition <strong>of</strong> σ 1 . Thus u 1 · u i = 0 for i ≥ 2.<br />

The pro<strong>of</strong> for other u i and u j , j > i > 1 is similar. Suppose without loss <strong>of</strong> generality<br />

that u T i u j > δ > 0.<br />

( )<br />

vi + εv j<br />

A<br />

= σ iu i + εσ j u<br />

√ j<br />

|v i + εv j | 1 + ε<br />

2<br />

has length at least as large as its component along u i which is<br />

u T i ( σ 1u i + εσ j u<br />

√ j<br />

) > ( ) ( )<br />

σ i + εσ j u T<br />

1 + ε<br />

2<br />

i u j 1 − ε2 > σ<br />

2 i − ε2 σ 2 i + εσ j δ − ε3 σ 2 iδ > σ i ,<br />

for sufficiently small ɛ, a contradiction since v i + εv j is orthogonal to v 1 , v 2 , . . . , v i−1 and<br />

σ i is defined to be the maximum <strong>of</strong> |Av| over such vectors.<br />

Next we prove that A k is the best rank k, 2-norm approximation to A. We first show<br />

that the square <strong>of</strong> the 2-norm <strong>of</strong> A − A k is the square <strong>of</strong> the (k + 1) st singular value <strong>of</strong> A.<br />

This is essentially by definition <strong>of</strong> A k ; that is, A k represents the projections <strong>of</strong> the points<br />

in A onto the space spanned by the top k singular vectors, and so A−A k is the remaining<br />

portion <strong>of</strong> those points, whose top singular value will be σ k+1 .<br />

Lemma 3.8 ‖A − A k ‖ 2 2 = σ2 k+1 .<br />

∑<br />

Pro<strong>of</strong>: Let A = r ∑<br />

σ i u i v T i be the singular value decomposition <strong>of</strong> A. Then A k = k T<br />

σ i u i v i<br />

and A − A k =<br />

i=1<br />

∑ r<br />

i=k+1<br />

σ i u i v i T . Let v be the top singular vector <strong>of</strong> A − A k . Express v as a<br />

47<br />

i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!