TheoryofDeepLearning.2022

Recommendations

Info

54 theory of deep learningConstraint qualification allow the first-order optimality conditionsof Definition 6.4.1 to be a necessary condition for optimality. Withoutconstraint qualifications, event he global optimum may not satisfy theoptimality conditions.For example in linear SVM, LICQ is ensured if the support vectorsx i are linearly independent then LICQ holds. For data sampled froman absolutely continuous distribution, the linear SVM solution willalways have linearly independent support vectors.Theorem 6.4.5. Define ¯w = lim t→∞ . Under Assumptions 6.4.2, 6.4.3,‖w t ‖and 6.4.4, ¯w ∈ W is a first-order stationary point of (Max-Margin).Proof. Define S = {i : f i ( ¯w) = γ}, where γ is the optimal marginattainable by a unit norm w.Lemma 6.4.6. Under the setting of Theorem 6.4.5,w t∇ f i (w t ) = ∇ f i (g t ¯w) + O(Bg α−1t ‖δ t ‖). (6.20)For i ∈ S , the second term is asymptotically negligible as a function of t,∇ f i (w t ) = ∇ f i (g t ¯w) + o(∇ f i (g t ¯w)).Lemma 6.4.7. Under the conditions of Theorem 6.4.5, a i = 0 for i ̸∈ S.From the gradient flow dynamics,ẇ(t) = ∑ exp(− f i (w t ))∇ f i (w t )i= ∑(h t a i + h t ɛ it )(∇ f i (g t ¯w) + ∆ it ,iwhere ∆ i (t) = ∫ s=1s=0 ∇2 f i (g t ¯w + sg t δ t )g t δ t ds. By expanding and usinga i = 0 for n ̸∈ S (Lemma 6.4.7) ,ẇ t = ∑ h t a i ∇ f i (g t w)i∈S} {{ }I+ h t ∑ a i ∆ it + h t ∑ ɛ it ∇ f i (g t ¯w) + ∑ h t ɛ it ∆ it}i∈S{{ } }i{{ }i} {{ }IIIIIIVVia Assumption 6.4.4, term I = Ω(g α−1th t ) and from Lemma 6.4.6, II = o(I). Using these, the first term I is the largest and so afternormalizing,ẇ t‖ẇ t ‖ = ∑ a i ∇ f i (g t ¯w) + o(1). (6.21)i∈S
algorithmic regularization 55Since lim tw t‖w t ‖ = lim ẇ tt [? ], then‖w t ‖limt→∞w t‖w t ‖ = ∑ ∇ f i (g t ¯w). (6.22)i∈SThus we have shown w satisfies the first-order optimality condition ofDefinition 6.4.1.6.5 Induced bias in function space≪Suriya notes: Jason: can you introduce the idea of induced biases and give special results forlinear convnets, any relevant results from yours+tengyu’s margin paper, and infinite width 2 layerReLU network?≫
Page 1:
C O N T R I B U T O R S : R A M A N
Page 4 and 5: 44 Basics of generalization theory
Page 6 and 7: 612 Representation Learning 11113 E
Page 8 and 9: 810.2 Autoencoder defined using a d
Page 11: IntroductionThis monograph discusse
Page 14 and 15: 14 theory of deep learning• Train
Page 17 and 18: 2Basics of OptimizationThis chapter
Page 19 and 20: basics of optimization 19where the
Page 21 and 22: basics of optimization 21Therefore,
Page 23 and 24: 3Backpropagation and its VariantsTh
Page 25 and 26: backpropagation and its variants 25
Page 31 and 32: 4Basics of generalization theoryGen
Page 33 and 34: basics of generalization theory 33p
Page 35 and 36: basics of generalization theory 35w
Page 37: basics of generalization theory 37N
Page 41 and 42: 6Algorithmic RegularizationLarge sc
Page 43 and 44: algorithmic regularization 43minimi
Page 45 and 46: algorithmic regularization 45update
Page 47 and 48: algorithmic regularization 476.2 Ma
Page 49 and 50: algorithmic regularization 496.3.2
Page 51 and 52: algorithmic regularization 51Given
Page 53: algorithmic regularization 53to the
Page 58 and 59: 58 theory of deep learning7.1 Preli
Page 60 and 61: 60 theory of deep learningcreasing
Page 62 and 63: 62 theory of deep learning“gradie
Page 64 and 65: 64 theory of deep learning7.4 Case
Page 66 and 67: 66 theory of deep learningWhen x =
Page 68 and 69: 68 theory of deep learningproof is
Page 70 and 71: 70 theory of deep learningthe input
Page 72 and 73: 72 theory of deep learningas the av
Page 74 and 75: 74 theory of deep learningSeveral r
Page 76 and 77: 76 theory of deep learningFigure 8.
Page 78 and 79: 78 theory of deep learningNote the
Page 81 and 82: 9Inductive Biases due to Algorithmi
Page 83 and 84: inductive biases due to algorithmic
Page 103 and 104: 10Unsupervised learning: OverviewMu
Page 105 and 106:
unsupervised learning: overview 105
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
11Generative Adversarial NetsChapte
Page 113:
12Representation Learning
Page 116 and 117:
116 theory of deep learning13.3 Exa
Page 118:
118 theory of deep learning13.5 Exa
show all

TheoryofDeepLearning.2022

Create successful ePaper yourself

Delete template?

Save as template?