TheoryofDeepLearning.2022
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
inductive biases due to algorithmic regularization 87
9.1.2 Matrix Completion
Next, we consider the problem of matrix completion which can be
formulated as a special case of matrix sensing with sensing matrices
that random indicator matrices. Formally, we assume that for all
j ∈ [n], let A (j) be an indicator matrix whose (i, k)-th element is
selected randomly with probability p(i)q(k), where p(i) and q(k)
denote the probability of choosing the i-th row and the j-th column,
respectively.
We will show next that in this setup Dropout induces the weighted
trace-norm studied by [? ] and [? ]. Formally, we show that
Θ(M) = 1 d 1
‖diag( √ p)UV ⊤ diag( √ q)‖ 2 ∗. (9.7)
Proof. For any pair of factors (U, V) it holds that
R(U, V) =
=
=
d 1
∑ E(u ⊤ j Av j ) 2
j=1
d 1
∑
j=1
d 1
∑
j=1
d 2
∑
k=1
d 2
∑
k=1
d 0
∑
l=1
d 0
∑
l=1
p(k)q(l)(u ⊤ j e k e ⊤ l v j) 2
p(k)q(l)U(k, j) 2 V(l, j) 2
d 1 √
√
= ∑ ‖ diag(p)u j ‖ 2 ‖ diag(q)v j ‖ 2
j=1
(
≥ 1 d1 √
√
d 1
∑ ‖ diag(p)u j ‖‖ diag(q)v j ‖
j=1
) 2
( )
= 1 d1 √
2
d 1
∑ ‖ diag(p)u j v ⊤ j
√diag(q)‖ ∗
j=1
≥ 1 √
(‖
d 1
diag(p)
d 1
∑
j=1
u j v ⊤ j
= 1 √
‖ diag(p)UV ⊤√ diag(q)‖ 2 ∗
d 1
√diag(q)‖ ∗
) 2
where the first inequality is due to Cauchy-Schwartz and the second
inequality follows from the triangle inequality. The equality right
after the first inequality follows from the fact that for any two vectors
a, b, ‖ab ⊤ ‖ ∗ = ‖ab ⊤ ‖ F = ‖a‖‖b‖. Since the inequalities hold for any
U, V, it implies that
Θ(UV ⊤ ) ≥ 1 √
‖ diag(p)UV ⊤√ diag(q)‖ 2
d
∗.
1
Applying Theorem 9.1.1 on ( √ diag(p)U, √ diag(q)V), there exists