TheoryofDeepLearning.2022
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
84 theory of deep learning
Provided that the sample size n is large enough, the explicit regularizer
on a given sample behaves much like its expected value
with respect to the underlying data distribution 4 . Further, given that
we seek a minimum of ̂L drop , it suffices to consider the factors with
the minimal value of the regularizer among all that yield the same
empirical loss. This motivates studying the the following distributiondependent
induced regularizer:
4
Under mild assumptions, we can formally
show that the dropout regularizer
is well concentrated around its mean
Θ(M) :=
min R(U, V), where R(U, V) := E A [ ̂R(U, V)].
UV ⊤ =M
Next, we consider two two important examples of random sensing
matrices.
9.1.1 Gaussian Sensing Matrices
We assume that the entries of the sensing matrices are independently
and identically distributed as standard Gaussian, i.e., A (i)
kl
∼ N (0, 1).
For Gaussian sensing matrices, we show that the induced regularizer
due to Dropout provides nuclear-norm regularization. Formally, we
show that
Θ(M) = 1 d 1
‖M‖ 2 ∗. (9.6)
Proof. We recall the general form for the dropout regularizer for the
matrix sensing problem in Equation 9.4, and take expectation with
respect to the distribution on the sensing matrices. Then, for any pair
of factors (U, V), it holds that the expected regularizer is given as
follows.
R(U, V) =
=
=
d 1
∑ E(u ⊤ j Av j ) 2
j=1
d 1
∑
j=1
E(
d 2 d 0
∑ ∑
k=1 l=1
d 1 d 2 d 0
∑ ∑ ∑
j=1 k,k ′ =1 l,l ′ =1
=
=
=
d 1
∑
j=1
d 1
∑
j=1
d 2
∑
k=1
d 2
∑
k=1
d 0
∑
l=1
d 0
∑
l=1
U kj A kl V lj ) 2
U kj U k ′ jV lj V l ′ j E[A kl A k ′ l ′]
U 2 kj V2 lj E[A2 kl ]
U 2 kj V2 lj
d 1
∑ ‖u j ‖ 2 ‖v j ‖ 2
j=1