26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

inductive biases due to algorithmic regularization 87

9.1.2 Matrix Completion

Next, we consider the problem of matrix completion which can be

formulated as a special case of matrix sensing with sensing matrices

that random indicator matrices. Formally, we assume that for all

j ∈ [n], let A (j) be an indicator matrix whose (i, k)-th element is

selected randomly with probability p(i)q(k), where p(i) and q(k)

denote the probability of choosing the i-th row and the j-th column,

respectively.

We will show next that in this setup Dropout induces the weighted

trace-norm studied by [? ] and [? ]. Formally, we show that

Θ(M) = 1 d 1

‖diag( √ p)UV ⊤ diag( √ q)‖ 2 ∗. (9.7)

Proof. For any pair of factors (U, V) it holds that

R(U, V) =

=

=

d 1

∑ E(u ⊤ j Av j ) 2

j=1

d 1

j=1

d 1

j=1

d 2

k=1

d 2

k=1

d 0

l=1

d 0

l=1

p(k)q(l)(u ⊤ j e k e ⊤ l v j) 2

p(k)q(l)U(k, j) 2 V(l, j) 2

d 1 √

= ∑ ‖ diag(p)u j ‖ 2 ‖ diag(q)v j ‖ 2

j=1

(

≥ 1 d1 √

d 1

∑ ‖ diag(p)u j ‖‖ diag(q)v j ‖

j=1

) 2

( )

= 1 d1 √

2

d 1

∑ ‖ diag(p)u j v ⊤ j

√diag(q)‖ ∗

j=1

≥ 1 √

(‖

d 1

diag(p)

d 1

j=1

u j v ⊤ j

= 1 √

‖ diag(p)UV ⊤√ diag(q)‖ 2 ∗

d 1

√diag(q)‖ ∗

) 2

where the first inequality is due to Cauchy-Schwartz and the second

inequality follows from the triangle inequality. The equality right

after the first inequality follows from the fact that for any two vectors

a, b, ‖ab ⊤ ‖ ∗ = ‖ab ⊤ ‖ F = ‖a‖‖b‖. Since the inequalities hold for any

U, V, it implies that

Θ(UV ⊤ ) ≥ 1 √

‖ diag(p)UV ⊤√ diag(q)‖ 2

d

∗.

1

Applying Theorem 9.1.1 on ( √ diag(p)U, √ diag(q)V), there exists

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!