TheoryofDeepLearning.2022

Recommendations

Info

No information found
2Basics of OptimizationThis chapter sets up the basic analysis framework for gradient-basedoptimization algorithms and discuss how it applies to deep learning.≪Tengyu notes: Sanjeev notes:Suggestion: when introducing usual abstractions like Lipschitz constt, Hessian norm etc. let’srelate them concretely to what they mean in context of deep learning (noting that Lipschitz consttis wrt the vector of parameters). Be frank about what these numbers might be for deep learning oreven how feasible it is to estimate them. (Maybe that discussion can go in the side bar.)BTW it may be useful to give some numbers for the empirical liptschitz constt encountered intraining.One suspects that the optimization speed analysis is rather pessimistic.≫≪Suriya notes: To ground optimization to our case, we can also mention that f is often of theeither the ERM or stochastic optimization form L(w) = ∑ l(w; x, y) - it might also be useful tomention that outside of this chapter, we typically use f as an alternative for h to denote a functioncomputed≫≪Tengyu notes: should we use w or θ in this section?≫ ≪Suriya notes: I rememberedthat we agreed on w for parameters long time back - did we we go back to theta?≫2.1 Gradient descentSuppose we would like to optimize a continuous function f (w) overR d .min f (w) .w∈R dThe gradient descent (GD) algorithm isw 0 = initializaitonw t+1 = w t − η∇ f (w t )where η is the step size or learning rate.One motivation or justification of the GD is that the update direction−∇ f (w t ) is the steepest descent direction locally. Consider theTaylor expansion at a point w tf (w) = f (w t ) + 〈∇ f (w t ), w − w t 〉 + · · ·} {{ }linear in w
Page 1: C O N T R I B U T O R S : R A M A N
Page 4 and 5: 44 Basics of generalization theory
Page 6 and 7: 612 Representation Learning 11113 E
Page 8 and 9: 810.2 Autoencoder defined using a d
Page 11: IntroductionThis monograph discusse
Page 14 and 15: 14 theory of deep learning• Train
Page 18 and 19: 18 theory of deep learningSuppose w
Page 20 and 21: 20 theory of deep learningHere β(w
Page 22 and 23: 22 theory of deep learningComputing
Page 24 and 25: 24 theory of deep learningvisualize
Page 26 and 27: 26 theory of deep learning3.1.2 Nai
Page 28 and 29: 28 theory of deep learningExtension
Page 30 and 31: 30 theory of deep learningThe proof
Page 32 and 33: 32 theory of deep learningThe notio
Page 34 and 35: 34 theory of deep learningiteration
Page 36 and 37: 36 theory of deep learningare desce
Page 39: 5Advanced Optimization notionsThis
Page 42 and 43: 42 theory of deep learningdescent c
Page 44 and 45: 44 theory of deep learningapproxima
Page 46 and 47: 46 theory of deep learningarg min w
Page 48 and 49: 48 theory of deep learningdirection
Page 50 and 51: 50 theory of deep learningcoordinat
Page 52 and 53: 52 theory of deep learningBy applyi
Page 54 and 55: 54 theory of deep learningConstrain
Page 57 and 58: 7Tractable Landscapes for Nonconvex
Page 59 and 60: tractable landscapes for nonconvex
Page 67 and 68:
tractable landscapes for nonconvex
Page 69 and 70:
8Ultra-wide Neural Networks and Neu
Page 71 and 72:
ultra-wide neural networks and neur
Page 73 and 74:
Page 75 and 76:
Page 77 and 78:
Page 79:
Page 82 and 83:
82 theory of deep learningsights fr
Page 84 and 85:
84 theory of deep learningProvided
Page 86 and 87:
86 theory of deep learningConsequen
Page 88 and 89:
88 theory of deep learninga rotatio
Page 90 and 91:
90 theory of deep learningWe focus
Page 92 and 93:
92 theory of deep learningPropositi
Page 94 and 95:
94 theory of deep learningso that M
Page 96 and 97:
96 theory of deep learningLet δ =
Page 98 and 99:
98 theory of deep learningrotations
Page 100 and 101:
100 theory of deep learningU(t) = W
Page 102 and 103:
102 theory of deep learningsuch sad
Page 104 and 105:
104 theory of deep learningFigure 1
Page 106 and 107:
106 theory of deep learningcorrespo
Page 108 and 109:
108 theory of deep learningconstrai
Page 110 and 111:
110 theory of deep learningdomness
Page 112 and 113:
112 theory of deep learningModel G
Page 115 and 116:
13Examples of Theorems, Proofs, Alg
Page 117 and 118:
examples of theorems, proofs, algor
show all

TheoryofDeepLearning.2022

Create successful ePaper yourself

Delete template?

Save as template?