26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

9

Inductive Biases due to Algorithmic Regularization

Many successful modern machine learning systems based on deep

neural networks are over-parametrized, i.e., the number of parameters

is typically much larger than the sample size. In other words,

there exist (infinitely) many (approximate) minimizers of the empirical

risk, many of which would not generalize well on the unseen

data. For learning to succeed then, it is crucial to bias the learning

algorithm towards “simpler” hypotheses by trading off empirical loss

with a certain complexity term that ensures that empirical and population

risks are close. Several explicit regularization strategies have

been used in practice to help these systems generalize, including l 1

and l 2 regularization of the parameters [? ].

Besides explicit regularization techniques, practitioners have used

a spectrum of algorithmic approaches to improve the generalization

ability of over-parametrized models. This includes early stopping

of back-propagation [? ], batch normalization [? ], dropout [? ], and

more 1 . While these heuristics have enjoyed tremendous success in

training deep networks, a theoretical understanding of how these

heuristics provide regularization in deep learning remains somewhat

limited.

In this chapter, we investigate regularization due to Dropout,

an algorithmic heurisitic recently proposed by [? ]. The basic idea

when training a neural network using dropout, is that during a

forward pass, we randomly drop neurons in the neural network,

independently and identically according to a Bernoulli distribution.

Specifically, at each round of the back-propagation algorithm, for

each neuron, independently, with probability p we “drop” the neuron,

so it does not participate in making a prediction for the given

data point, and with probability 1 − p we retain that neuron 2 .

Deep learning is a field where key innovations have been driven

by practitioners, with several techniques motivated by drawing insights

from other fields. For instance, Dropout was introduced as

a way of breaking up “co-adaptation” among neurons, drawing in-

1

We refer the reader to [? ] for an

excellent exposition of over 50 of such

proposals.

2

The parameter p is treated as a hyperparameter

which we typically tune for

based on a validation set.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!