TheoryofDeepLearning.2022

Recommendations

Info

No information found
9Inductive Biases due to Algorithmic RegularizationMany successful modern machine learning systems based on deepneural networks are over-parametrized, i.e., the number of parametersis typically much larger than the sample size. In other words,there exist (infinitely) many (approximate) minimizers of the empiricalrisk, many of which would not generalize well on the unseendata. For learning to succeed then, it is crucial to bias the learningalgorithm towards “simpler” hypotheses by trading off empirical losswith a certain complexity term that ensures that empirical and populationrisks are close. Several explicit regularization strategies havebeen used in practice to help these systems generalize, including l 1and l 2 regularization of the parameters [? ].Besides explicit regularization techniques, practitioners have useda spectrum of algorithmic approaches to improve the generalizationability of over-parametrized models. This includes early stoppingof back-propagation [? ], batch normalization [? ], dropout [? ], andmore 1 . While these heuristics have enjoyed tremendous success intraining deep networks, a theoretical understanding of how theseheuristics provide regularization in deep learning remains somewhatlimited.In this chapter, we investigate regularization due to Dropout,an algorithmic heurisitic recently proposed by [? ]. The basic ideawhen training a neural network using dropout, is that during aforward pass, we randomly drop neurons in the neural network,independently and identically according to a Bernoulli distribution.Specifically, at each round of the back-propagation algorithm, foreach neuron, independently, with probability p we “drop” the neuron,so it does not participate in making a prediction for the givendata point, and with probability 1 − p we retain that neuron 2 .Deep learning is a field where key innovations have been drivenby practitioners, with several techniques motivated by drawing insightsfrom other fields. For instance, Dropout was introduced asa way of breaking up “co-adaptation” among neurons, drawing in-1We refer the reader to [? ] for anexcellent exposition of over 50 of suchproposals.2The parameter p is treated as a hyperparameterwhich we typically tune forbased on a validation set.
Page 1:
C O N T R I B U T O R S : R A M A N
Page 4 and 5:
44 Basics of generalization theory
Page 6 and 7:
612 Representation Learning 11113 E
Page 8 and 9:
810.2 Autoencoder defined using a d
Page 11:
IntroductionThis monograph discusse
Page 14 and 15:
14 theory of deep learning• Train
Page 17 and 18:
2Basics of OptimizationThis chapter
Page 19 and 20:
basics of optimization 19where the
Page 21 and 22:
basics of optimization 21Therefore,
Page 23 and 24:
3Backpropagation and its VariantsTh
Page 25 and 26:
backpropagation and its variants 25
Page 27 and 28:
backpropagation and its variants 27
Page 29 and 30: backpropagation and its variants 29
Page 31 and 32: 4Basics of generalization theoryGen
Page 33 and 34: basics of generalization theory 33p
Page 35 and 36: basics of generalization theory 35w
Page 37: basics of generalization theory 37N
Page 41 and 42: 6Algorithmic RegularizationLarge sc
Page 43 and 44: algorithmic regularization 43minimi
Page 45 and 46: algorithmic regularization 45update
Page 47 and 48: algorithmic regularization 476.2 Ma
Page 49 and 50: algorithmic regularization 496.3.2
Page 51 and 52: algorithmic regularization 51Given
Page 53 and 54: algorithmic regularization 53to the
Page 55: algorithmic regularization 55Since
Page 58 and 59: 58 theory of deep learning7.1 Preli
Page 60 and 61: 60 theory of deep learningcreasing
Page 62 and 63: 62 theory of deep learning“gradie
Page 64 and 65: 64 theory of deep learning7.4 Case
Page 66 and 67: 66 theory of deep learningWhen x =
Page 68 and 69: 68 theory of deep learningproof is
Page 70 and 71: 70 theory of deep learningthe input
Page 72 and 73: 72 theory of deep learningas the av
Page 74 and 75: 74 theory of deep learningSeveral r
Page 76 and 77: 76 theory of deep learningFigure 8.
Page 78 and 79: 78 theory of deep learningNote the
Page 82 and 83: 82 theory of deep learningsights fr
Page 84 and 85: 84 theory of deep learningProvided
Page 86 and 87: 86 theory of deep learningConsequen
Page 88 and 89: 88 theory of deep learninga rotatio
Page 90 and 91: 90 theory of deep learningWe focus
Page 92 and 93: 92 theory of deep learningPropositi
Page 94 and 95: 94 theory of deep learningso that M
Page 96 and 97: 96 theory of deep learningLet δ =
Page 98 and 99: 98 theory of deep learningrotations
Page 100 and 101: 100 theory of deep learningU(t) = W
Page 102 and 103: 102 theory of deep learningsuch sad
Page 104 and 105: 104 theory of deep learningFigure 1
Page 106 and 107: 106 theory of deep learningcorrespo
Page 108 and 109: 108 theory of deep learningconstrai
Page 110 and 111: 110 theory of deep learningdomness
Page 112 and 113: 112 theory of deep learningModel G
Page 115 and 116: 13Examples of Theorems, Proofs, Alg
Page 117 and 118: examples of theorems, proofs, algor
show all

TheoryofDeepLearning.2022

Create successful ePaper yourself

Delete template?

Save as template?