TheoryofDeepLearning.2022
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
86 theory of deep learning
Consequently, it holds that ‖ũ i ‖‖ṽ i ‖ =
d 1 1
‖M‖ ∗ .
All that remains is to give a construction of matrix Q. We note
that a rotation matrix Q satisfies the desired properties above if and
only if all diagonal elements of Q ⊤ G U Q are equal 5 , and equal to 5
since (Q ⊤ G U Q) jj = ‖ũ j ‖ 2
Tr G U
d
. The key idea is that for the trace zero matrix G
1
1 := G U −
Tr G U
d
I d1 , if G
1
1 = ∑ r i=1 λ ie i ei ⊤ is an eigendecomposition of G 1 , then
for the average of the eigenvectors, i.e. for w 11 = √ 1 r
∑ r i=1 e i, it holds
that w11 ⊤ G 1w 11 = 0. We use this property recursively to exhibit
an orthogonal transformation Q, such that Q ⊤ G 1 Q is zero on its
diagonal.
To verify the claim, first notice that w 11 is unit norm
‖w 11 ‖ 2 = ‖ 1 √ r
Further, it is easy to see that
w ⊤ 11 Gw 11 = 1 r
r
∑ e i Ge j = 1 r
i,j=1
r
∑ e i ‖ 2 = 1 r
r
∑ ‖e i ‖ 2 = 1.
i=1
i=1
r
∑ λ j ei ⊤ e j = 1 r
i,j=1
r
∑ λ i = 0.
i=1
Complete W 1 := [w 11 , w 12 , · · · , w 1d ] be such that W1 ⊤W 1 = W 1 W1 ⊤ =
I d . Observe that W1 ⊤G 1W 1 has zero on its first diagonal elements
[ ]
W1 ⊤ G 0 b1
⊤ 1W 1 =
b 1 G 2
The principal submatrix G 2 also has a zero trace. With a similar
argument, let w 22 [
∈ R d−1 be such that ‖w 22 ‖ =
]
1 and w22 ⊤ G 2w 22 = 0
1 0 0 · · · 0
and define W 2 =
∈ R d×d such that
0 w 22 w 23 · · · w 2d
W ⊤ 2 W 2 = W 2 W ⊤ 2
= I d, and observe that
⎡
(W 1 W 2 ) ⊤ G 1 (W 1 W 2 ) =
⎢
⎣
⎤
0 · · · ·
· 0 · · · ⎥
⎦ .
. . G 3
This procedure can be applied recursively so that for the matrix
Q = W 1 W 2 · · · W d we have
⎡
⎤
0 · · · · ·
Q ⊤ · 0 · · · ·
G 1 Q =
⎢ .
⎣ . . .. ⎥ . ⎦ ,
· · · 0
so that Tr(ŨŨ ⊤ ) = Tr(Q ⊤ G U Q) = Tr(Σ) = Tr(Q ⊤ G V Q) = Tr(Ṽ ⊤ Ṽ).