26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

inductive biases due to algorithmic regularization 85

Now, using the Cauchy-Schwartz inequality, we can bound the

expected regularizer as

( )

R(U, V) ≥ 1 d1

2

d 1

∑ ‖u i ‖‖v i ‖

i=1

( )

= 1 d1

2

d 1

∑ ‖u i vi ⊤ ‖ ∗

i=1

≥ 1 d 1

(‖

d 1

i=1

u i v ⊤ i ‖ ∗

) 2

= 1 d 1

‖UV ⊤ ‖ 2 ∗

where the equality follows because for any pair of vectors a, b, it

holds that ‖ab ⊤ ‖ ∗ = ‖ab ⊤ ‖ F = ‖a‖‖b‖, and the last inequality is due

to triangle inequality.

Next, we need the following key result from [? ].

Theorem 9.1.1. For any pair of matrices U ∈ R d 2×d 1 , V ∈ R d 0×d 1 , there

exists a rotation matrix Q ∈ SO(d 1 ) such that matrices Ũ := UQ, Ṽ :=

VQ satisfy ‖ũ i ‖‖ṽ i ‖ = 1 d 1

‖UV ⊤ ‖ ∗ , for all i ∈ [d 1 ].

Using Theorem 9.1.1 on (U, V), the expected dropout regularizer

at UQ, VQ is given as

R(UQ, VQ) =

which completes the proof.

=

d 1

∑ ‖Uq i

‖ 2 ‖Vq i

‖ 2

i=1

d 1

i=1

1

‖UV ⊤ ‖ 2 ∗

d 2 1

= 1 d 1

‖UV ⊤ ‖ 2 ∗

≤ Θ(UV ⊤ )

For completeness we provide a proof of Theorem 9.1.1.

Proof. Define M := UV ⊤ . Let M = WΣY ⊤ be compact SVD of M.

Define Û := WΣ 1/2 and ̂V := YΣ 1/2 . Let G U = Û ⊤ Û and G V = ̂V ⊤ ̂V

be respective Gram matrices. Observe that G U = G V = Σ. We will

show that there exists a rotation Q such that for Ũ = ÛQ, Ṽ = ̂VQ, it

holds that

and

‖ũ j ‖ 2 = 1 d 1

‖Ũ‖ 2 F = 1 d 1

Tr(Ũ ⊤ Ũ) = 1 d 1

Tr(Σ) = 1 d 1

‖M‖ ∗

‖ṽ j ‖ 2 = 1 d 1

‖Ṽ‖ 2 F = 1 d 1

Tr(Ṽ ⊤ Ṽ) = 1 d 1

Tr(Σ) = 1 d 1

‖M‖ ∗

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!