08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

position on genome<br />

trees<br />

=<br />

Phenotype; outward<br />

manifestation, observables<br />

Genotype:<br />

internal code<br />

Figure 10.6: The system <strong>of</strong> linear equations used to find the internal code for some<br />

observable phenomenon.<br />

sparse solution x where x is all zeros except for the two 1’s for 12345 and 678910 where<br />

the two teams are {1,2,3,4,5} and {6,7,8,9,10}. The question is can we recover x from<br />

b. If the matrix A had satisfied the restricted isometry condition, then we could surely<br />

do this. Although A does not satisfy the restricted isometry condition which guarantees<br />

recover <strong>of</strong> all sparse vectors, we can recover the sparse vector in the case where the teams<br />

are non overlapping or almost non overlapping. If A satisfied the restricted isometry<br />

property we would minimize ‖x‖ 1<br />

subject to Ax = b. Instead, we minimize ‖x‖ 1<br />

subject<br />

to ‖Ax − b‖ ∞<br />

≤ ε where we bound the largest error.<br />

10.4.5 Low Rank Matrices<br />

Suppose L is a low rank matrix that has been corrupted by noise. That is, M = L+R.<br />

If the R is Gaussian, then principle component analysis will recover L from M. However,<br />

if L has been corrupted by several missing entries or several entries have a large noise<br />

added to them and they become outliers, then principle component analysis may be far<br />

<strong>of</strong>f. However, if L is low rank and R is sparse, then L can be recovered effectively from<br />

L + R. To do this, find the L and R that minimize ‖L‖ 2 F + λ ‖R‖ 1 . Here ‖L‖2 F<br />

is the sum<br />

<strong>of</strong> the singular values <strong>of</strong> L. A small value <strong>of</strong> ‖L‖ 2 F<br />

indicates a low rank matrix. Notice<br />

that we do not need to know the rank <strong>of</strong> L or the elements that were corrupted. All we<br />

need is that the low rank matrix L is not sparse and that the sparse matrix R is not low<br />

rank. We leave the pro<strong>of</strong> as an exercise.<br />

An example where low rank matrices that have been corrupted might occur is aerial<br />

343

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!