08.10.2016 Views

Foundations of Data Science

2dLYwbK

2dLYwbK

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

d<br />

n<br />

A<br />

=<br />

b<br />

x<br />

Figure 10.2: Ax = b has a vector space <strong>of</strong> solutions but possibly only one sparse solution.<br />

would be a 2s-sparse solution to the homogeneous system Ax = 0. A 2s-sparse solution to<br />

the homogeneous equation Ax = 0 requires that some 2s columns <strong>of</strong> A be linearly dependent.<br />

Unless A has 2s linearly dependent columns there can be only one s-sparse solution.<br />

Now suppose n is order s 2 and we pick an n × d matrix A with random independent<br />

zero mean, unit variance Gaussian entries. Take any subset <strong>of</strong> 2s columns <strong>of</strong> A. Since<br />

we have already seen in Chapter 2 that each <strong>of</strong> these 2s vectors is likely to be essentially<br />

orthogonal to the space spanned by the previous vectors, the sub-matrix is unlikely to be<br />

singular. This intuition can be made rigorous.<br />

To find a sparse solution to Ax = b, one would like to minimize the zero norm ‖x‖ 0<br />

over {x|Ax = b} . This is a computationally hard problem. There are techniques to minimize<br />

a convex function over a convex set. But ||x|| 0 is not a convex function. With no<br />

further hypotheses, it is NP-hard. With this in mind, we use the one norm as a proxy for<br />

the zero norm and minimize the one norm ‖x‖ 1<br />

over {x|Ax = b}. Although this problem<br />

appears to be nonlinear, it can be solved by linear programming by writing x = u − v,<br />

u ≥ 0, and v ≥ 0, and then minimizing the linear function ∑ u i + ∑ v i subject to Aui<br />

i<br />

Av=b, u ≥ 0, and v ≥ 0.<br />

Under what conditions will minimizing ‖x‖ 1<br />

over {x|Ax = b} recover the s-sparse<br />

solution to Ax=b? If g(x) is a convex function, then any local minimum <strong>of</strong> g is a<br />

global minimum. If g(x) is differentiable at its minimum, the gradient ∇g must be zero<br />

there. However, the 1-norm is not differentiable at its minimum. Thus, we introduce the<br />

concept <strong>of</strong> a subgradient <strong>of</strong> a convex function. Where the function is differentiable the<br />

subgradient is the gradient. Where the function is not differentiable, the sub gradient is<br />

any line touching the function at the point that lies totally below the function. See Figure<br />

10.3.<br />

334

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!