26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

102 theory of deep learning

such saddle points are non-degenerate, it suffices to show g ′′ (0) < 0.

It is easy to check that the second directional derivative at the origin

is given by

g ′′ (0) = −4σ 2 j (w j(t) ⊤ Mw j (t) − w ⊤ j Mw j ) < 0,

which completes the proof.

9.4 Role of Parametrization

For least squares linear regression (i.e., for k = 1 and u = W ⊤ 1 ∈ Rd 0

in Problem 9.8), we can show that using dropout amounts to solving

the following regularized problem:

min

u∈R d 0

1

n

n

i=1

(y i − u ⊤ x i ) 2 + λu ⊤ Ĉu.

All the minimizers of the above problem are solutions to the following

system of linear equations (1 + λ)X ⊤ Xu = X ⊤ y, where

X = [x 1 , · · · , x n ] ⊤ ∈ R n×d 0, y = [y 1 , · · · , y n ] ⊤ ∈ R n×1 are the design

matrix and the response vector, respectively. Unlike Tikhonov

regularization which yields solutions to the system of linear equations

(X ⊤ X + λI)u = X ⊤ y (a useful prior, discards the directions

that account for small variance in data even when they exhibit good

discriminability), the dropout regularizer manifests merely as a scaling

of the parameters. This suggests that parametrization plays an

important role in determining the nature of the resulting regularizer.

However, a similar result was shown for deep linear networks [? ]

that the data dependent regularization due to dropout results in

merely scaling of the parameters. At the same time, in the case of

matrix sensing we see a richer class of regularizers. One potential

explanation is that in the case of linear networks, we require a convolutional

structure in the network to yield rich inductive biases. For

instance, matrix sensing can be written as a two layer network in the

following convolutional form:

〈UV ⊤ , A〉 = 〈U ⊤ , V ⊤ A ⊤ 〉 = 〈U ⊤ , (I ⊗ V ⊤ )A ⊤ 〉.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!