TheoryofDeepLearning.2022
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
tractable landscapes for nonconvex optimization 67
For simplicity, we will look at an even simpler version of the
top eigenvector problem. In particular, we consider the case where
M = zz ⊤ is a rank-1 matrix, and z is a unit vector. In this case, the
objective function we defined in Equation (7.6) becomes
min x
f (x) = 1 4 ‖zz⊤ − xx ⊤ ‖ 2 F . (7.8)
The intended global optimal solutions are x = ±z. This problem is
often called the matrix factorization problem as we are given a matrix
M = zz ⊤9 and the goal is to find a decomposition M = xx ⊤ . 9
Note that we only observe M, not z.
Which direction should we move to decrease the objective function?
In this problem we only have the optimal direction z and the
current direction x, so the natural guesses would be z, x or z − x.
Indeed, these directions are enough:
Lemma 7.4.3. For objective function (7.8), there exists a universal constant
c > 0 such that for any τ < 1, if neither x or z is an (cτ, , 1/4)-direction of
improvement for the point x, then f (x) ≤ τ.
The proof of this lemma involves some detailed calculation. To get
some intuition, we can first think about what happens if neither x or
z is a direction of improvement.
Lemma 7.4.4. For objective function (7.8), if neither x or z is a direction of
improvement of f at x, then f (x) = 0.
Proof. We will use the same calculation for gradient and Hessian
as in Equation (7.7), except that M is now zz ⊤ . First, since x is not a
direction of improvement, we must have
〈∇ f (x), x〉 = 0 =⇒ ‖x‖ 4 2 = 〈x, z〉2 . (7.9)
If z is not a direction of improvement, we know z ⊤ [∇ 2 f (x)]z ≥ 0,
which means
‖x‖ 2 + 2〈x, z〉 2 − 1 ≥ 0 =⇒ ‖x‖ 2 ≥ 1/3.
Here we used the fact that 〈x, z〉 2 ≤ ‖x‖ 2 2 ‖z‖2 2 = ‖x‖2 2
. Together with
Equation (7.9) we know 〈x, z〉 2 = ‖x‖ 4 2 ≥ 1/9.
Finally, since z is not a direction of improvement, we know
〈∇ f (x), z〉 = 0, which implies 〈x, z〉(‖x‖ 2 2
− 1) = 0. We have already
proved 〈x, z〉 2 ≥ 1/9 > 0, thus ‖x‖ 2 2
= 1. Again combining
with Equation (7.9) we know 〈x, z〉 2 = ‖x‖ 4 2
= 1. The only two vectors
with 〈x, z〉 2 = 1 and ‖x‖ 2 2
= 1 are x = ±z.
The proof of Lemma 7.4.3 is very similar to Lemma 7.4.4, except
we need to allow slacks in every equation and inequality we use. The
additional benefit of having the more robust Lemma 7.4.3 is that the