08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

establishing the first inequality.<br />

σ 1 (A) ≥ u T Av = ∑ u i v j a ij = d(S, T ) = d(A)<br />

ij<br />

To prove the second inequality, express σ 1 (A) in terms <strong>of</strong> the first left and right<br />

singular vectors x and y.<br />

σ 1 (A) = x T Ay = ∑ i,j<br />

x i a ij y j , |x| = |y| = 1.<br />

Since the entries <strong>of</strong> A are nonnegative, the components <strong>of</strong> the first left and right singular<br />

vectors<br />

∑<br />

must all be nonnegative, that is, x i ≥ 0 and y j ≥ 0 for all i and j. To bound<br />

x i a ij y j , break the summation into O (log n log d) parts. Each part corresponds to a<br />

i,j<br />

given α and β and consists <strong>of</strong> all i such that α ≤ x i < 2α and all j such that β ≤ y i < 2β.<br />

The log n log d parts are defined by breaking the rows into log n blocks with α equal to<br />

1 √1<br />

1<br />

2 n<br />

, √n , 2 √ 1<br />

n<br />

, 4 √ 1<br />

n<br />

, . . . , 1 and by breaking the columns into log d blocks with β equal to<br />

1 √1<br />

1<br />

2 d<br />

, √d 2<br />

, √d 4<br />

, √d , . . . , 1. The i such that x i < 1<br />

2 √ and the j such that y n j < 1<br />

2 √ will be<br />

d<br />

ignored at a loss <strong>of</strong> at most 1σ 4 1(A). Exercise (8.28) proves the loss is at most this amount.<br />

Since ∑ x 2 i = 1, the set S = {i|α ≤ x i < 2α} has |S| ≤ 1 and similarly,<br />

α 2<br />

i<br />

T = {j|β ≤ y j ≤ 2β} has |T | ≤ 1 . Thus<br />

β 2<br />

∑<br />

i<br />

α≤x i ≤2α<br />

∑<br />

x i y j a ij ≤ 4αβA(S, T )<br />

j<br />

β≤y j ≤2β<br />

≤ 4αβd(S, T ) √ |S||T |<br />

≤ 4d(S, T )<br />

≤ 4d(A).<br />

From this it follows that<br />

or<br />

proving the second inequality.<br />

σ 1 (A) ≤ 4d (A) log n log d<br />

d (A) ≥<br />

σ 1(A)<br />

4 log n log d<br />

It is clear that for each <strong>of</strong> the values <strong>of</strong> (α, β), we can compute A(S, T ) and d(S, T )<br />

as above and taking the best <strong>of</strong> these d(S, T ) ’s gives us an algorithm as claimed in the<br />

theorem.<br />

Note that in many cases, the nonzero values <strong>of</strong> x i and y j after zeroing out the low<br />

entries will only go from 1 √1<br />

2 n<br />

to √ c<br />

n<br />

for x i and 1 √1<br />

2 d<br />

to √ c<br />

d<br />

for y j , since the singular vectors<br />

286

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!