26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7

Tractable Landscapes for Nonconvex Optimization

Deep learning relies on optimizing complicated, nonconvex loss

functions. Finding the global minimum of a nonconvex objective is

NP-hard in the worst case. However in deep learning simple algorithms

such as stochastic gradient descent often the objective value to

zero or near-zero at the end. This chapter focuses on the optimization

landscape defined by a nonconvex objective and identifies properties

of these landscapes that allow simple optimization algorithms to find

global minima (or near-minima). These properties thus far apply to

simpler nnonconvex problems than deep learning, and it is open how

to analyse deep learning with such landscape analysis.

Warm-up: Convex Optimization To understand optimization landscape,

one can first look at optimizing a convex function. If a function

f (w) is convex, then it satisfies many nice properties, including

∀α ∈ [0, 1], w, w ′ , f (αw + (1 − α)w ′ ) ≤ α f (w) + (1 − α) f (w ′ ). (7.1)

∀w, w ′ , f (w ′ ) ≥ f (w) + 〈∇ f (w), w ′ − w〉. (7.2)

These equations characterize important geometric properties of

the objective function f (w). In particular, Equation (7.1) shows that

all the global minima of f (w) must be connected, because if w, w ′

are both globally optimal, anything on the segment αw + (1 − α)w ′

must also be optimal. Such properties are important because it gives

a characterization of all the global minima. Equation (7.2) shows that

every point with ∇ f (w) = 0 must be a global minimum, because

for every w ′ we have f (w ′ ) ≥ f (w) + 〈∇ f (w), w ′ − w)〉 ≥ f (w).

Such properties are important because it connects a local property

(gradient being 0) to global optimality.

In general, optimization landscape looks for properties of the

objective function that characterizes its local/global optimal points

(such as Equation (7.1)) or connects local properties with global

optimality (such as Equation (7.2)).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!