08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.8 Singular Vectors and Eigenvectors<br />

Recall that for a square matrix B, if the vector x and scalar λ are such that Bx = λx,<br />

then x is an eigenvector <strong>of</strong> B and λ is the corresponding eigenvalue. As we saw in Section<br />

3.7, if B = A T A, then the right singular vectors v j <strong>of</strong> A are eigenvectors <strong>of</strong> B with eigenvalues<br />

σ 2 j . The same argument shows that the left singular vectors u j <strong>of</strong> A are eigenvectors<br />

<strong>of</strong> AA T with eigenvalues σ 2 j .<br />

Notice that B = A T A has the property that for any vector x we have x T Bx ≥ 0. This<br />

is because B = ∑ i σ2 i v i v i T and for any x we have x T v i v i T x = (x T v i ) 2 ≥ 0. A matrix B<br />

with the property that x T Bx ≥ 0 for all x is called positive semi-definite. Every matrix <strong>of</strong><br />

the form A T A is positive semi-definite. In the other direction, any positive semi-definite<br />

matrix B can be decomposed into a product A T A, and so its eigenvalue decomposition<br />

can be obtained from the singular value decomposition <strong>of</strong> A. The interested reader should<br />

consult a linear algebra book.<br />

3.9 Applications <strong>of</strong> Singular Value Decomposition<br />

3.9.1 Centering <strong>Data</strong><br />

Singular value decomposition is used in many applications and for some <strong>of</strong> these applications<br />

it is essential to first center the data by subtracting the centroid <strong>of</strong> the data<br />

from each data point. 10 For instance, if you are interested in the statistics <strong>of</strong> the data<br />

and how it varies in relationship to its mean, then you would want to center the data.<br />

On the other hand, if you are interested in finding the best low rank approximation to a<br />

matrix, then you do not center the data. The issue is whether you are finding the best<br />

fitting subspace or the best fitting affine space; in the latter case you first center the data<br />

and then find the best fitting subspace.<br />

We first show that the line minimizing the sum <strong>of</strong> squared distances to a set <strong>of</strong> points,<br />

if not restricted to go through the origin, must pass through the centroid <strong>of</strong> the points.<br />

This implies that if the centroid is subtracted from each data point, such a line will pass<br />

through the origin. The best fit line can be generalized to k dimensional “planes”. The<br />

operation <strong>of</strong> subtracting the centroid from all data points is useful in other contexts as<br />

well. So we give it the name “centering data”.<br />

Lemma 3.13 The best-fit line (minimizing the sum <strong>of</strong> perpendicular distances squared)<br />

<strong>of</strong> a set <strong>of</strong> data points must pass through the centroid <strong>of</strong> the points.<br />

Pro<strong>of</strong>: Subtract the centroid from each data point so that the centroid is 0. Let l be the<br />

best-fit line and assume for contradiction that l does not pass through the origin. The<br />

line l can be written as {a + λv|λ ∈ R}, where a is the closest point to 0 on l and v is<br />

a unit length vector in the direction <strong>of</strong> l, which is perpendicular to a. For a data point<br />

10 The centroid <strong>of</strong> a set <strong>of</strong> points is the coordinate-wise average <strong>of</strong> the points.<br />

52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!