26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1

Basic Setup and some math notions

This Chapter introduces the basic nomenclature. Training/test error,

generalization error etc. ≪Tengyu notes: Todos: Illustrate with plots: a typical training

curve and test curve

Mention some popular architectures (feed forward, convolutional, pooling, resnet, densenet) in

a brief para each. ≫

We review the basic notions in statistical learning theory.

• A space of possible data points X .

• A space of possible labels Y.

• A joint probability distribution D on X × Y. We assume that our

training data consist of n data points

(x (1) , y (1) ), . . . , (x (n) , y (n) ) i.i.d.

∼ D ,

each drawn independently from D.

• Hypothesis space: H is a family of hypotheses, or a family of

predictors. E.g., H could be the set of all neural networks with

a fixed architecture: H = {h θ } where h θ is neural net that is

parameterized by parameters θ.

• Loss function: l : (X × Y) × H → R.

– E.g., in binary classification where Y = {−1, +1}, and suppose

we have a hypothesis h θ (x), then the logistic loss function for

the hypothesis h θ on data point (x, y) is

• Expected loss:

l((x, y), θ) =

L(h) =

1

1 + exp(−yh θ (x)) .

E [l((x, y), h)] .

(x,y)∼D

Recall D is the data distribution over X × Y.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!