TheoryofDeepLearning.2022
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
11
Generative Adversarial Nets
Chapter 10 described some classical approaches to generative models,
which are often trained using a log-likelihood approach. We also saw
that they often do not suffice for high-fidelity learning of complicated
distributions such as the distribution of real-life images. Generative
Adversarial Nets (GANs) is an approach that generates more realistic
samples. For convenience in this chapter we assume the model is
trying to generate images. The following would be one standard
interpretation of what it means for the distribution produced by the
model to be realistic.
Interpretation 1: The distributions of real and synthetic images are, as
distributions in R d , are close in some statistical measure.
The main novelty in GANs is a different interpretation that leverages
the power of supervised deep learning.
Interpretation 2: If we try to train a powerful deep net to distinguish
between real and synthetic images, by training it to output “1” on a training
set of real images and “0” on a training set of synthetic images from
our model, then such a net fails to have significant success in distinguishing
among held-out real vs synthetic images at test time.
Is it possible that the two interpretations are interrelated? The simplistic
answer is yes: a rich mathematical framework of transportation
distances can give a relationship between the two intepretations. A
more nuanced answer is “maybe”, at least if one treats deep nets as
black boxes with limited representation power. Then a simple but
surprising result shows that the richness of the synthetic distribution
can be quite limited —and one has to worry about mode collapse.
11.1 Basic definitions
A generative model G θ (where θ is the vector of parameters) is a
deep net that maps from R k to R d . Given a random seed h —which
is usually a sample from a multivariate Normal distribution—it
produces a vector string G θ (h) that is an image.