TheoryofDeepLearning.2022
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
tractable landscapes for nonconvex optimization 59
Claim 7.1.4. For an objective function f (w) : R d → R and a critical point
w (∇ f (w) = 0), we know
• If ∇ 2 f (w) ≻ 0, w is a local minimum.
• If ∇ 2 f (w) ≺ 0, w is a local maximum.
• If ∇ 2 f (w) has both a positive and a negative eigenvalue, w is a saddle
point.
These criteria are known as second order sufficient conditions
in optimization. Intuitively, one can prove this claim by looking at
the second-order Taylor expansion. The three cases in the claim do
not cover all the possible Hessian matrices. The remaining cases are
considered to be degenerate, and can either be a local minimum,
local maximum or a saddle point 2 .
Flat regions Even if a function does not have any spurious local
minima or saddle point, it can still be nonconvex, see Figure 7.1. In
high dimensions such functions can still be very hard to optimize.
The main difficulty here is that even if the norm ‖∇ f (w)‖ 2 is small,
unlike convex functions one cannot conclude that f (w) is close to
f (w ∗ ). However, often in such cases one can hope the function f (w)
to satisfy some relaxed notion of convexity, and design efficient
algorithms accordingly. We discuss one of such cases in Section 7.2.
2
One can consider the w = 0 point of
functions w 4 , −w 4 , w 3 , and it is a local
minimum, maximum and saddle point
respectively.
Figure 7.1: Obstacles for nonconvex
optimization. From left
to right: local minimum, saddle
point and flat region.
7.2 Cases with a unique global minimum
We first consider the case that is most similar to convex objectives.
In this section, the objective functions we look at have no spurious
local minima or saddle points. In fact, in our example the objective
is only going to have a unique global minimum. The only obstacle
in optimizing these functions is that points with small gradients may
not be near-optimal.
The main idea here is to identify properties of the objective and
also a potential function, such that the potential function keeps de-