TheoryofDeepLearning.2022
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
inductive biases due to algorithmic regularization 85
Now, using the Cauchy-Schwartz inequality, we can bound the
expected regularizer as
( )
R(U, V) ≥ 1 d1
2
d 1
∑ ‖u i ‖‖v i ‖
i=1
( )
= 1 d1
2
d 1
∑ ‖u i vi ⊤ ‖ ∗
i=1
≥ 1 d 1
(‖
d 1
∑
i=1
u i v ⊤ i ‖ ∗
) 2
= 1 d 1
‖UV ⊤ ‖ 2 ∗
where the equality follows because for any pair of vectors a, b, it
holds that ‖ab ⊤ ‖ ∗ = ‖ab ⊤ ‖ F = ‖a‖‖b‖, and the last inequality is due
to triangle inequality.
Next, we need the following key result from [? ].
Theorem 9.1.1. For any pair of matrices U ∈ R d 2×d 1 , V ∈ R d 0×d 1 , there
exists a rotation matrix Q ∈ SO(d 1 ) such that matrices Ũ := UQ, Ṽ :=
VQ satisfy ‖ũ i ‖‖ṽ i ‖ = 1 d 1
‖UV ⊤ ‖ ∗ , for all i ∈ [d 1 ].
Using Theorem 9.1.1 on (U, V), the expected dropout regularizer
at UQ, VQ is given as
R(UQ, VQ) =
which completes the proof.
=
d 1
∑ ‖Uq i
‖ 2 ‖Vq i
‖ 2
i=1
d 1
∑
i=1
1
‖UV ⊤ ‖ 2 ∗
d 2 1
= 1 d 1
‖UV ⊤ ‖ 2 ∗
≤ Θ(UV ⊤ )
For completeness we provide a proof of Theorem 9.1.1.
Proof. Define M := UV ⊤ . Let M = WΣY ⊤ be compact SVD of M.
Define Û := WΣ 1/2 and ̂V := YΣ 1/2 . Let G U = Û ⊤ Û and G V = ̂V ⊤ ̂V
be respective Gram matrices. Observe that G U = G V = Σ. We will
show that there exists a rotation Q such that for Ũ = ÛQ, Ṽ = ̂VQ, it
holds that
and
‖ũ j ‖ 2 = 1 d 1
‖Ũ‖ 2 F = 1 d 1
Tr(Ũ ⊤ Ũ) = 1 d 1
Tr(Σ) = 1 d 1
‖M‖ ∗
‖ṽ j ‖ 2 = 1 d 1
‖Ṽ‖ 2 F = 1 d 1
Tr(Ṽ ⊤ Ṽ) = 1 d 1
Tr(Σ) = 1 d 1
‖M‖ ∗