TheoryofDeepLearning.2022

Recommendations

Info

No information found
7Tractable Landscapes for Nonconvex OptimizationDeep learning relies on optimizing complicated, nonconvex lossfunctions. Finding the global minimum of a nonconvex objective isNP-hard in the worst case. However in deep learning simple algorithmssuch as stochastic gradient descent often the objective value tozero or near-zero at the end. This chapter focuses on the optimizationlandscape defined by a nonconvex objective and identifies propertiesof these landscapes that allow simple optimization algorithms to findglobal minima (or near-minima). These properties thus far apply tosimpler nnonconvex problems than deep learning, and it is open howto analyse deep learning with such landscape analysis.Warm-up: Convex Optimization To understand optimization landscape,one can first look at optimizing a convex function. If a functionf (w) is convex, then it satisfies many nice properties, including∀α ∈ [0, 1], w, w ′ , f (αw + (1 − α)w ′ ) ≤ α f (w) + (1 − α) f (w ′ ). (7.1)∀w, w ′ , f (w ′ ) ≥ f (w) + 〈∇ f (w), w ′ − w〉. (7.2)These equations characterize important geometric properties ofthe objective function f (w). In particular, Equation (7.1) shows thatall the global minima of f (w) must be connected, because if w, w ′are both globally optimal, anything on the segment αw + (1 − α)w ′must also be optimal. Such properties are important because it givesa characterization of all the global minima. Equation (7.2) shows thatevery point with ∇ f (w) = 0 must be a global minimum, becausefor every w ′ we have f (w ′ ) ≥ f (w) + 〈∇ f (w), w ′ − w)〉 ≥ f (w).Such properties are important because it connects a local property(gradient being 0) to global optimality.In general, optimization landscape looks for properties of theobjective function that characterizes its local/global optimal points(such as Equation (7.1)) or connects local properties with globaloptimality (such as Equation (7.2)).
Page 1:
C O N T R I B U T O R S : R A M A N
Page 4 and 5:
44 Basics of generalization theory
Page 6 and 7: 612 Representation Learning 11113 E
Page 8 and 9: 810.2 Autoencoder defined using a d
Page 11: IntroductionThis monograph discusse
Page 14 and 15: 14 theory of deep learning• Train
Page 17 and 18: 2Basics of OptimizationThis chapter
Page 19 and 20: basics of optimization 19where the
Page 21 and 22: basics of optimization 21Therefore,
Page 23 and 24: 3Backpropagation and its VariantsTh
Page 25 and 26: backpropagation and its variants 25
Page 31 and 32: 4Basics of generalization theoryGen
Page 33 and 34: basics of generalization theory 33p
Page 35 and 36: basics of generalization theory 35w
Page 37: basics of generalization theory 37N
Page 41 and 42: 6Algorithmic RegularizationLarge sc
Page 43 and 44: algorithmic regularization 43minimi
Page 45 and 46: algorithmic regularization 45update
Page 47 and 48: algorithmic regularization 476.2 Ma
Page 49 and 50: algorithmic regularization 496.3.2
Page 51 and 52: algorithmic regularization 51Given
Page 53 and 54: algorithmic regularization 53to the
Page 55: algorithmic regularization 55Since
Page 59 and 60: tractable landscapes for nonconvex
Page 69 and 70: 8Ultra-wide Neural Networks and Neu
Page 71 and 72: ultra-wide neural networks and neur
Page 79: ultra-wide neural networks and neur
Page 82 and 83: 82 theory of deep learningsights fr
Page 84 and 85: 84 theory of deep learningProvided
Page 86 and 87: 86 theory of deep learningConsequen
Page 88 and 89: 88 theory of deep learninga rotatio
Page 90 and 91: 90 theory of deep learningWe focus
Page 92 and 93: 92 theory of deep learningPropositi
Page 94 and 95: 94 theory of deep learningso that M
Page 96 and 97: 96 theory of deep learningLet δ =
Page 98 and 99: 98 theory of deep learningrotations
Page 100 and 101: 100 theory of deep learningU(t) = W
Page 102 and 103: 102 theory of deep learningsuch sad
Page 104 and 105: 104 theory of deep learningFigure 1
Page 106 and 107:
106 theory of deep learningcorrespo
Page 108 and 109:
108 theory of deep learningconstrai
Page 110 and 111:
110 theory of deep learningdomness
Page 112 and 113:
112 theory of deep learningModel G
Page 115 and 116:
13Examples of Theorems, Proofs, Alg
Page 117 and 118:
examples of theorems, proofs, algor
show all

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?