08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

capture this situation. Now AA T = U S DS 2 U S T and AT A = V S DS 2 V S T . By an argument<br />

similar to the one above, U S and V S are essentially unique and are the eigenvectors or<br />

negatives <strong>of</strong> the eigenvectors <strong>of</strong> A and A T . The eigenvalues <strong>of</strong> AA T or A T A are the squares<br />

<strong>of</strong> the eigenvalues <strong>of</strong> A. If A is not positive semi definite and has negative eigenvalues,<br />

then in the singular value decomposition A = U S D S V S , some <strong>of</strong> the left singular vectors<br />

are the negatives <strong>of</strong> the eigenvectors. Let S be a diagonal matrix with ±1 ′ s on the<br />

diagonal depending on whether the corresponding eigenvalue is positive or negative. Then<br />

A = (U S S)(SD S )V S where U S S = V E and SD S = D E .<br />

12.7.3 Extremal Properties <strong>of</strong> Eigenvalues<br />

In this section we derive a min max characterization <strong>of</strong> eigenvalues that implies that<br />

the largest eigenvalue <strong>of</strong> a symmetric matrix A has a value equal to the maximum <strong>of</strong><br />

x T Ax over all vectors x <strong>of</strong> unit length. That is, the largest eigenvalue <strong>of</strong> A equals the<br />

2-norm <strong>of</strong> A. If A is a real symmetric matrix there exists an orthogonal matrix P that<br />

diagonalizes A. Thus<br />

P T AP = D<br />

where D is a diagonal matrix with the eigenvalues <strong>of</strong> A, λ 1 ≥ λ 2 ≥ · · · ≥ λ n , on its<br />

diagonal. Rather than working with A, it is easier to work with the diagonal matrix D.<br />

This will be an important technique that will simplify many pro<strong>of</strong>s.<br />

Consider maximizing x T Ax subject to the conditions<br />

1.<br />

n∑<br />

x 2 i = 1<br />

i=1<br />

2. r T i x = 0, 1 ≤ i ≤ s<br />

where the r i are any set <strong>of</strong> nonzero vectors. We ask over all possible sets {r i |1 ≤ i ≤ s}<br />

<strong>of</strong> s vectors, what is the minimum value assumed by this maximum.<br />

Theorem 12.12 (Min max theorem) For a symmetric matrix A, min<br />

max<br />

r 1 ,...,r s<br />

x<br />

r i ⊥x<br />

(x t Ax) =<br />

λ s+1 where the minimum is over all sets {r 1 , r 2 , . . . , r s } <strong>of</strong> s nonzero vectors and the<br />

maximum is over all unit vectors x orthogonal to the s nonzero vectors.<br />

Pro<strong>of</strong>: A is orthogonally diagonalizable. Let P satisfy P T P = I and P T AP = D, D<br />

diagonal. Let y = P T x. Then x = P y and<br />

x T Ax = y T P T AP y = y T Dy =<br />

n∑<br />

λ i yi<br />

2<br />

i=1<br />

Since there is a one-to-one correspondence between unit vectors x and y, maximizing<br />

x T Ax subject to ∑ ∑<br />

x 2 i = 1 is equivalent to maximizing n λ i yi 2 subject to ∑ yi 2 = 1. Since<br />

409<br />

i=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!