TheoryofDeepLearning.2022
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1
Basic Setup and some math notions
This Chapter introduces the basic nomenclature. Training/test error,
generalization error etc. ≪Tengyu notes: Todos: Illustrate with plots: a typical training
curve and test curve
Mention some popular architectures (feed forward, convolutional, pooling, resnet, densenet) in
a brief para each. ≫
We review the basic notions in statistical learning theory.
• A space of possible data points X .
• A space of possible labels Y.
• A joint probability distribution D on X × Y. We assume that our
training data consist of n data points
(x (1) , y (1) ), . . . , (x (n) , y (n) ) i.i.d.
∼ D ,
each drawn independently from D.
• Hypothesis space: H is a family of hypotheses, or a family of
predictors. E.g., H could be the set of all neural networks with
a fixed architecture: H = {h θ } where h θ is neural net that is
parameterized by parameters θ.
• Loss function: l : (X × Y) × H → R.
– E.g., in binary classification where Y = {−1, +1}, and suppose
we have a hypothesis h θ (x), then the logistic loss function for
the hypothesis h θ on data point (x, y) is
• Expected loss:
l((x, y), θ) =
L(h) =
1
1 + exp(−yh θ (x)) .
E [l((x, y), h)] .
(x,y)∼D
Recall D is the data distribution over X × Y.