TheoryofDeepLearning.2022

Recommendations

Info

No information found
6Algorithmic RegularizationLarge scale neural networks used in practice are highly over-parameterizedwith far more trainable model parameters compared to the numberof training examples. Consequently, the optimization objectives forlearning such high capacity models have many global minima thatfit training data perfectly. However, minimizing the training lossusing specific optimization algorithms take us to not just any globalminima, but some special global minima – in this sense the choice ofoptimization algorithms introduce a implicit form of inductive bias inlearning which can aid generalization.In over-parameterized models, specially deep neural networks,much, if not most, of the inductive bias of the learned model comesfrom this implicit regularization from the optimization algorithm. Forexample, early empirical work on this topic (ref. [? ? ? ? ? ? ? ? ? ? ?? ]) show that deep models often generalize well even when trainedpurely by minimizing the training error without any explicit regularization,and even when the networks are highly overparameterized tothe extent of being able to fit random labels. Consequently, there aremany zero training error solutions, all global minima of the trainingobjective, most of which generalize horribly. Nevertheless, our choiceof optimization algorithm, typically a variant of gradient descent,seems to prefer solutions that do generalize well. This generalizationability cannot be explained by the capacity of the explicitly specifiedmodel class (namely, the functions representable in the chosenarchitecture). Instead, the optimization algorithm biasing towarda “simple" model, minimizing some implicit “regularization measureâĂİ,say R(w), is key for generalization. Understanding the implicitinductive bias, e.g. via characterizing R(w), is thus essential forunderstanding how and what the model learns. For example, in linearregression it can be shown that minimizing an under-determinedmodel (with more parameters than samples) using gradient descentyields the minimum l 2 norm solution (see Proposition 6.1.1), and forlinear logistic regression trained on linearly separable data, gradient
Page 1: C O N T R I B U T O R S : R A M A N
Page 4 and 5: 44 Basics of generalization theory
Page 6 and 7: 612 Representation Learning 11113 E
Page 8 and 9: 810.2 Autoencoder defined using a d
Page 11: IntroductionThis monograph discusse
Page 14 and 15: 14 theory of deep learning• Train
Page 17 and 18: 2Basics of OptimizationThis chapter
Page 19 and 20: basics of optimization 19where the
Page 21 and 22: basics of optimization 21Therefore,
Page 23 and 24: 3Backpropagation and its VariantsTh
Page 25 and 26: backpropagation and its variants 25
Page 31 and 32: 4Basics of generalization theoryGen
Page 33 and 34: basics of generalization theory 33p
Page 35 and 36: basics of generalization theory 35w
Page 37: basics of generalization theory 37N
Page 42 and 43: 42 theory of deep learningdescent c
Page 44 and 45: 44 theory of deep learningapproxima
Page 46 and 47: 46 theory of deep learningarg min w
Page 48 and 49: 48 theory of deep learningdirection
Page 50 and 51: 50 theory of deep learningcoordinat
Page 52 and 53: 52 theory of deep learningBy applyi
Page 54 and 55: 54 theory of deep learningConstrain
Page 57 and 58: 7Tractable Landscapes for Nonconvex
Page 59 and 60: tractable landscapes for nonconvex
Page 69 and 70: 8Ultra-wide Neural Networks and Neu
Page 71 and 72: ultra-wide neural networks and neur
Page 79: ultra-wide neural networks and neur
Page 82 and 83: 82 theory of deep learningsights fr
Page 84 and 85: 84 theory of deep learningProvided
Page 86 and 87: 86 theory of deep learningConsequen
Page 88 and 89: 88 theory of deep learninga rotatio
Page 90 and 91:
90 theory of deep learningWe focus
Page 92 and 93:
92 theory of deep learningPropositi
Page 94 and 95:
94 theory of deep learningso that M
Page 96 and 97:
96 theory of deep learningLet δ =
Page 98 and 99:
98 theory of deep learningrotations
Page 100 and 101:
100 theory of deep learningU(t) = W
Page 102 and 103:
102 theory of deep learningsuch sad
Page 104 and 105:
104 theory of deep learningFigure 1
Page 106 and 107:
106 theory of deep learningcorrespo
Page 108 and 109:
108 theory of deep learningconstrai
Page 110 and 111:
110 theory of deep learningdomness
Page 112 and 113:
112 theory of deep learningModel G
Page 115 and 116:
13Examples of Theorems, Proofs, Alg
Page 117 and 118:
examples of theorems, proofs, algor
show all

TheoryofDeepLearning.2022

Create successful ePaper yourself

Delete template?

Save as template?