08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Definition 3.1 If p is a probability density in d space, the best fit line for p is the line<br />

l = {cv 1 : c ∈ R} where<br />

[<br />

v 1 = arg max E (v T x) 2] .<br />

|v|=1 x∼p<br />

For a spherical Gaussian centered at the origin, it is easy to see that any line passing<br />

through the origin is a best fit line. Our next lemma shows that the best fit line for a<br />

spherical Gaussian centered at µ ≠ 0 is the line passing through µ and the origin.<br />

Lemma 3.16 Let the probability density p be a spherical Gaussian with center µ ≠ 0.<br />

The unique best fit 1-dimensional subspace is the line passing through µ and the origin.<br />

If µ = 0, then any line through the origin is a best-fit line.<br />

Pro<strong>of</strong>: For a randomly chosen x (according to p) and a fixed unit length vector v,<br />

[<br />

E (v T x) 2] [ (v<br />

= E<br />

T (x − µ) + v T µ ) ] 2<br />

x∼p<br />

x∼p<br />

[ (v<br />

= E<br />

T (x − µ) ) 2 (<br />

+ 2 v T µ ) ( v T (x − µ) ) + ( v T µ ) ] 2<br />

x∼p<br />

= E<br />

x∼p<br />

[ (v T (x − µ) ) 2 ] + 2 ( v T µ ) E [ v T (x − µ) ] + ( v T µ ) 2<br />

= E<br />

x∼p<br />

[ (v T (x − µ) ) 2 ] + ( v T µ ) 2<br />

= σ 2 + ( v T µ ) 2<br />

where the fourth line follows from the fact that E[v T (x − µ)] = 0, and the fifth line<br />

follows from the fact that E[(v T (x − µ)) 2 ] is the variance in the direction v. The best fit<br />

line v maximizes E x∼p [(v T x) 2 ] and therefore maximizes ( v T µ ) 2<br />

. This is maximized when<br />

v is aligned with the center µ. To see uniqueness, just note that if µ ≠ 0, then v T µ is<br />

strictly less when v is not aligned with the center.<br />

We now extend Definition 3.1 to k-dimensional subspaces.<br />

Definition 3.2 If p is a probability density in d-space then the best-fit k-dimensional<br />

subspace V k is<br />

[<br />

V k = argmax E |proj(x, V )|<br />

2 ] ,<br />

V :dim(V )=k x∼p<br />

where proj(x, V ) is the orthogonal projection <strong>of</strong> x onto V .<br />

Lemma 3.17 For a spherical Gaussian with center µ, a k-dimensional subspace is a best<br />

fit subspace if and only if it contains µ.<br />

Pro<strong>of</strong>: If µ = 0, then by symmetry any k−dimensional subspace is a best-fit subspace.<br />

If µ ≠ 0, then, the best-fit line must pass through µ by Lemma (3.16). Now, as in the<br />

greedy algorithm for finding subsequent singular vectors, we would project perpendicular<br />

to the first singular vector. But after the projection, the mean <strong>of</strong> the Gaussian becomes<br />

0 and any vectors will do as subsequent best-fit directions.<br />

58

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!