26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

tractable landscapes for nonconvex optimization 67

For simplicity, we will look at an even simpler version of the

top eigenvector problem. In particular, we consider the case where

M = zz ⊤ is a rank-1 matrix, and z is a unit vector. In this case, the

objective function we defined in Equation (7.6) becomes

min x

f (x) = 1 4 ‖zz⊤ − xx ⊤ ‖ 2 F . (7.8)

The intended global optimal solutions are x = ±z. This problem is

often called the matrix factorization problem as we are given a matrix

M = zz ⊤9 and the goal is to find a decomposition M = xx ⊤ . 9

Note that we only observe M, not z.

Which direction should we move to decrease the objective function?

In this problem we only have the optimal direction z and the

current direction x, so the natural guesses would be z, x or z − x.

Indeed, these directions are enough:

Lemma 7.4.3. For objective function (7.8), there exists a universal constant

c > 0 such that for any τ < 1, if neither x or z is an (cτ, , 1/4)-direction of

improvement for the point x, then f (x) ≤ τ.

The proof of this lemma involves some detailed calculation. To get

some intuition, we can first think about what happens if neither x or

z is a direction of improvement.

Lemma 7.4.4. For objective function (7.8), if neither x or z is a direction of

improvement of f at x, then f (x) = 0.

Proof. We will use the same calculation for gradient and Hessian

as in Equation (7.7), except that M is now zz ⊤ . First, since x is not a

direction of improvement, we must have

〈∇ f (x), x〉 = 0 =⇒ ‖x‖ 4 2 = 〈x, z〉2 . (7.9)

If z is not a direction of improvement, we know z ⊤ [∇ 2 f (x)]z ≥ 0,

which means

‖x‖ 2 + 2〈x, z〉 2 − 1 ≥ 0 =⇒ ‖x‖ 2 ≥ 1/3.

Here we used the fact that 〈x, z〉 2 ≤ ‖x‖ 2 2 ‖z‖2 2 = ‖x‖2 2

. Together with

Equation (7.9) we know 〈x, z〉 2 = ‖x‖ 4 2 ≥ 1/9.

Finally, since z is not a direction of improvement, we know

〈∇ f (x), z〉 = 0, which implies 〈x, z〉(‖x‖ 2 2

− 1) = 0. We have already

proved 〈x, z〉 2 ≥ 1/9 > 0, thus ‖x‖ 2 2

= 1. Again combining

with Equation (7.9) we know 〈x, z〉 2 = ‖x‖ 4 2

= 1. The only two vectors

with 〈x, z〉 2 = 1 and ‖x‖ 2 2

= 1 are x = ±z.

The proof of Lemma 7.4.3 is very similar to Lemma 7.4.4, except

we need to allow slacks in every equation and inequality we use. The

additional benefit of having the more robust Lemma 7.4.3 is that the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!