26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

86 theory of deep learning

Consequently, it holds that ‖ũ i ‖‖ṽ i ‖ =

d 1 1

‖M‖ ∗ .

All that remains is to give a construction of matrix Q. We note

that a rotation matrix Q satisfies the desired properties above if and

only if all diagonal elements of Q ⊤ G U Q are equal 5 , and equal to 5

since (Q ⊤ G U Q) jj = ‖ũ j ‖ 2

Tr G U

d

. The key idea is that for the trace zero matrix G

1

1 := G U −

Tr G U

d

I d1 , if G

1

1 = ∑ r i=1 λ ie i ei ⊤ is an eigendecomposition of G 1 , then

for the average of the eigenvectors, i.e. for w 11 = √ 1 r

∑ r i=1 e i, it holds

that w11 ⊤ G 1w 11 = 0. We use this property recursively to exhibit

an orthogonal transformation Q, such that Q ⊤ G 1 Q is zero on its

diagonal.

To verify the claim, first notice that w 11 is unit norm

‖w 11 ‖ 2 = ‖ 1 √ r

Further, it is easy to see that

w ⊤ 11 Gw 11 = 1 r

r

∑ e i Ge j = 1 r

i,j=1

r

∑ e i ‖ 2 = 1 r

r

∑ ‖e i ‖ 2 = 1.

i=1

i=1

r

∑ λ j ei ⊤ e j = 1 r

i,j=1

r

∑ λ i = 0.

i=1

Complete W 1 := [w 11 , w 12 , · · · , w 1d ] be such that W1 ⊤W 1 = W 1 W1 ⊤ =

I d . Observe that W1 ⊤G 1W 1 has zero on its first diagonal elements

[ ]

W1 ⊤ G 0 b1

⊤ 1W 1 =

b 1 G 2

The principal submatrix G 2 also has a zero trace. With a similar

argument, let w 22 [

∈ R d−1 be such that ‖w 22 ‖ =

]

1 and w22 ⊤ G 2w 22 = 0

1 0 0 · · · 0

and define W 2 =

∈ R d×d such that

0 w 22 w 23 · · · w 2d

W ⊤ 2 W 2 = W 2 W ⊤ 2

= I d, and observe that

(W 1 W 2 ) ⊤ G 1 (W 1 W 2 ) =

0 · · · ·

· 0 · · · ⎥

⎦ .

. . G 3

This procedure can be applied recursively so that for the matrix

Q = W 1 W 2 · · · W d we have

0 · · · · ·

Q ⊤ · 0 · · · ·

G 1 Q =

⎢ .

⎣ . . .. ⎥ . ⎦ ,

· · · 0

so that Tr(ŨŨ ⊤ ) = Tr(Q ⊤ G U Q) = Tr(Σ) = Tr(Q ⊤ G V Q) = Tr(Ṽ ⊤ Ṽ).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!