08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1 1 1 1 1 1 1 1 1<br />

1 z z 2 z 3 z 4 z 5 z 6 z 7 z 8<br />

1 z 2 z 4 z 6 z 8 z z 3 z 5 z 7<br />

1 z 3 z 6 1 z 3 z 6 1 z 3 z 6<br />

1 z 4 z 8 z 3 z 7 z 2 z 6 z z 5<br />

1 z 5 z z 6 z 2 z 7 z 3 z 8 z 4<br />

1 z 6 z 3 1 z 6 z 3 1 z 6 z 3<br />

1 z 7 z 5 z 3 z z 8 z 6 z 4 z 2<br />

1 z 8 z 7 z 6 z 5 z 4 z 3 z 2 z<br />

Figure 10.5: The matrix Z for n=9.<br />

when we subtract the two representations.<br />

Suppose two sparse signals had Fourier transforms that agreed in almost all <strong>of</strong> their<br />

coordinates. Then the difference would be a sparse signal with a sparse transform. This is<br />

not possible. Thus, if one selects log n elements <strong>of</strong> their transform these elements should<br />

distinguish between these two signals.<br />

10.4.3 Biological<br />

There are many areas where linear systems arise in which a sparse solution is unique.<br />

One is in plant breading. Consider a breeder who has a number <strong>of</strong> apple trees and for<br />

each tree observes the strength <strong>of</strong> some desirable feature. He wishes to determine which<br />

genes are responsible for the feature so he can cross bread to obtain a tree that better<br />

expresses the desirable feature. This gives rise to a set <strong>of</strong> equations Ax = b where each<br />

row <strong>of</strong> the matrix A corresponds to a tree and each column to a position on the genone.<br />

See Figure 10.6. The vector b corresponds to the strength <strong>of</strong> the desired feature in each<br />

tree. The solution x tells us the position on the genone corresponding to the genes that<br />

account for the feature. It would be surprising if there were two small independent sets<br />

<strong>of</strong> genes that accounted for the desired feature. Thus, the matrix must have a property<br />

that allows only one sparse solution.<br />

10.4.4 Finding Overlapping Cliques or Communities<br />

Consider a graph that consists <strong>of</strong> several cliques. Suppose we can observe only low<br />

level information such as edges and we wish to identify the cliques. An instance <strong>of</strong> this<br />

problem is the task <strong>of</strong> identifying which <strong>of</strong> ten players belongs to which <strong>of</strong> two teams<br />

<strong>of</strong> five players each when one can only observe interactions between pairs <strong>of</strong> individuals.<br />

There is an interaction between two players if and only if they are on the same team.<br />

In this situation we have a matrix A with ( ) (<br />

10<br />

5 columns and 10<br />

)<br />

2 rows. The columns<br />

represent possible teams and the rows represent pairs <strong>of</strong> individuals. Let b be the ( )<br />

10<br />

2<br />

dimensional vector <strong>of</strong> observed interactions. Let x be a solution to Ax = b. There is a<br />

342

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!