TheoryofDeepLearning.2022

Recommendations

Info

110 theory of deep learningdomness in the distribution, i.e. it’s a deterministic transformationthat depends on φ of a parameter-less distribution. With the s i.i.d.samples from q(z|x = x i ) we obtain an unbiased estimate of theobjective ELBOs∑j=1log p(x i , z ij ) −s∑j=1log q(z ij |x i ) =s∑j=1[log p(x i |z ij ) + log p(z ij ) − log q(z ij |x i )](10.11)Here the batch size s indicates the fundamental tradeoff betweencomputational efficiency and accuracy in estimation. Since eachof the terms in the sum in (10.11) is a Gaussian distribution, wecan write the ELBO estimate explicitly in terms of the parameterdependentµ φ (x i ), σ φ (x i ), µ θ (z ij ), σ θ (z ij ) (while skipping some constants).A single term for j ∈ [s] is given by[− 1 ||xi − µ θ (z ij )|| 22 σ θ (z ij ) 2 + n log σ θ (z ij ) 2 + ||z ij || 2 − ||z ]ij − µ φ (x i )|| 2σ φ (x i ) 2 − d log σ φ (x i ) 2(10.12)Notice that (10.12) is differentiable with respect to all the componentsµ φ (x i ), σ φ (x i ), µ θ (z ij ), σ θ (z ij ) while each of these components,being an output of a neural network with parameters φ or θ, is differentiablewith respect to the parameters φ or θ. Thus, the tractablegradient of the batch sum (10.11) w.r.t. φ (or θ) is, due to the reparameterizationtrick, an unbiased estimate of ∇ φ ELBO (or ∇ θ ELBO) whichcan be used in any stochastic gradient-based optimization algorithmto maximize the objective ELBO and train the VAE.10.5 Main open questionMain open question for this lecture is to design methods with provableguarantees for the learning problems discussed here. Can weshow that VAEs correctly (and efficiently) learn simple families ofprobability distributions?There were notable successes in analysis of method of momentsfor learning probability distributions, as mentioned above. Variationalmethods rely upon gradient descent, which seems harder toanalyse as of now.
11Generative Adversarial NetsChapter 10 described some classical approaches to generative models,which are often trained using a log-likelihood approach. We also sawthat they often do not suffice for high-fidelity learning of complicateddistributions such as the distribution of real-life images. GenerativeAdversarial Nets (GANs) is an approach that generates more realisticsamples. For convenience in this chapter we assume the model istrying to generate images. The following would be one standardinterpretation of what it means for the distribution produced by themodel to be realistic.Interpretation 1: The distributions of real and synthetic images are, asdistributions in R d , are close in some statistical measure.The main novelty in GANs is a different interpretation that leveragesthe power of supervised deep learning.Interpretation 2: If we try to train a powerful deep net to distinguishbetween real and synthetic images, by training it to output “1” on a trainingset of real images and “0” on a training set of synthetic images fromour model, then such a net fails to have significant success in distinguishingamong held-out real vs synthetic images at test time.Is it possible that the two interpretations are interrelated? The simplisticanswer is yes: a rich mathematical framework of transportationdistances can give a relationship between the two intepretations. Amore nuanced answer is “maybe”, at least if one treats deep nets asblack boxes with limited representation power. Then a simple butsurprising result shows that the richness of the synthetic distributioncan be quite limited —and one has to worry about mode collapse.11.1 Basic definitionsA generative model G θ (where θ is the vector of parameters) is adeep net that maps from R k to R d . Given a random seed h —whichis usually a sample from a multivariate Normal distribution—itproduces a vector string G θ (h) that is an image.
Page 1:
C O N T R I B U T O R S : R A M A N
Page 4 and 5:
44 Basics of generalization theory
Page 6 and 7:
612 Representation Learning 11113 E
Page 8 and 9:
810.2 Autoencoder defined using a d
Page 11:
IntroductionThis monograph discusse
Page 14 and 15:
14 theory of deep learning• Train
Page 17 and 18:
2Basics of OptimizationThis chapter
Page 19 and 20:
basics of optimization 19where the
Page 21 and 22:
basics of optimization 21Therefore,
Page 23 and 24:
3Backpropagation and its VariantsTh
Page 25 and 26:
backpropagation and its variants 25
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
4Basics of generalization theoryGen
Page 33 and 34:
basics of generalization theory 33p
Page 35 and 36:
basics of generalization theory 35w
Page 37:
basics of generalization theory 37N
Page 41 and 42:
6Algorithmic RegularizationLarge sc
Page 43 and 44:
algorithmic regularization 43minimi
Page 45 and 46:
algorithmic regularization 45update
Page 47 and 48:
algorithmic regularization 476.2 Ma
Page 49 and 50:
algorithmic regularization 496.3.2
Page 51 and 52:
algorithmic regularization 51Given
Page 53 and 54:
algorithmic regularization 53to the
Page 55:
algorithmic regularization 55Since
Page 58 and 59:
58 theory of deep learning7.1 Preli
Page 60 and 61: 60 theory of deep learningcreasing
Page 62 and 63: 62 theory of deep learning“gradie
Page 64 and 65: 64 theory of deep learning7.4 Case
Page 66 and 67: 66 theory of deep learningWhen x =
Page 68 and 69: 68 theory of deep learningproof is
Page 70 and 71: 70 theory of deep learningthe input
Page 72 and 73: 72 theory of deep learningas the av
Page 74 and 75: 74 theory of deep learningSeveral r
Page 76 and 77: 76 theory of deep learningFigure 8.
Page 78 and 79: 78 theory of deep learningNote the
Page 81 and 82: 9Inductive Biases due to Algorithmi
Page 83 and 84: inductive biases due to algorithmic
Page 103 and 104: 10Unsupervised learning: OverviewMu
Page 105 and 106: unsupervised learning: overview 105
Page 107 and 108: unsupervised learning: overview 107
Page 109: unsupervised learning: overview 109
Page 113: 12Representation Learning
Page 116 and 117: 116 theory of deep learning13.3 Exa
Page 118: 118 theory of deep learning13.5 Exa
show all

TheoryofDeepLearning.2022

Create successful ePaper yourself

Delete template?

Save as template?