26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

84 theory of deep learning

Provided that the sample size n is large enough, the explicit regularizer

on a given sample behaves much like its expected value

with respect to the underlying data distribution 4 . Further, given that

we seek a minimum of ̂L drop , it suffices to consider the factors with

the minimal value of the regularizer among all that yield the same

empirical loss. This motivates studying the the following distributiondependent

induced regularizer:

4

Under mild assumptions, we can formally

show that the dropout regularizer

is well concentrated around its mean

Θ(M) :=

min R(U, V), where R(U, V) := E A [ ̂R(U, V)].

UV ⊤ =M

Next, we consider two two important examples of random sensing

matrices.

9.1.1 Gaussian Sensing Matrices

We assume that the entries of the sensing matrices are independently

and identically distributed as standard Gaussian, i.e., A (i)

kl

∼ N (0, 1).

For Gaussian sensing matrices, we show that the induced regularizer

due to Dropout provides nuclear-norm regularization. Formally, we

show that

Θ(M) = 1 d 1

‖M‖ 2 ∗. (9.6)

Proof. We recall the general form for the dropout regularizer for the

matrix sensing problem in Equation 9.4, and take expectation with

respect to the distribution on the sensing matrices. Then, for any pair

of factors (U, V), it holds that the expected regularizer is given as

follows.

R(U, V) =

=

=

d 1

∑ E(u ⊤ j Av j ) 2

j=1

d 1

j=1

E(

d 2 d 0

∑ ∑

k=1 l=1

d 1 d 2 d 0

∑ ∑ ∑

j=1 k,k ′ =1 l,l ′ =1

=

=

=

d 1

j=1

d 1

j=1

d 2

k=1

d 2

k=1

d 0

l=1

d 0

l=1

U kj A kl V lj ) 2

U kj U k ′ jV lj V l ′ j E[A kl A k ′ l ′]

U 2 kj V2 lj E[A2 kl ]

U 2 kj V2 lj

d 1

∑ ‖u j ‖ 2 ‖v j ‖ 2

j=1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!