26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

98 theory of deep learning

rotations. Observe that

g(t) := L θ (U + ∆(t))

= L θ (U) + ‖ √ 1 − t 2 u 1 + tu 2 ‖ 4 − ‖u 1 ‖ 4 + ‖ √ 1 − t 2 u 2 − tu 1 ‖ 4 − ‖u 2 ‖ 4

= L θ (U) − 2t 2 (‖u 1 ‖ 4 + ‖u 2 ‖ 4 ) + 8t 2 (u 1 u 2 ) 2 + 4t 2 ‖u 1 ‖ 2 ‖u 2 ‖ 2

+ 4t 1 − t 2 u1 ⊤ u 2(‖u 1 ‖ 2 − ‖u 2 ‖ 2 ) + O(t 3 ).

The derivative of g then is given as

g ′ (t) = −4t(‖u 1 ‖ 4 + ‖u 2 ‖ 4 ) + 16t(u 1 u 2 ) 2 + 8t‖u 1 ‖ 2 ‖u 2 ‖ 2

+ 4( √ 1 − t 2 −

t 2

1 − t 2 )(u⊤ 1 u 2)(‖u 1 ‖ 2 − ‖u 2 ‖ 2 ) + O(t 2 ).

Since U is a critical point and L θ is continuously differentiable, it

should hold that

g ′ (0) = 4(u ⊤ 1 u 2)(‖u 1 ‖ 2 − ‖u 2 ‖ 2 ) = 0.

Since by assumption ‖u 1 ‖ 2 − ‖u 2 ‖ 2 > 0, it should be the case that

u ⊤ 1 u 2 = 0. We now consider the second order directional derivative:

g ′′ (0) = −4(‖u 1 ‖ 4 + ‖u 2 ‖ 4 ) + 16(u 1 u 2 ) 2 + 8‖u 1 ‖ 2 ‖u 2 ‖ 2

= −4(‖u 1 ‖ 2 − ‖u 2 ‖ 2 ) 2 < 0

which completes the proof.

We now focus on the critical points that are equalized, i.e. points U

such that ∇L θ (U) = 0 and diag(U ⊤ U) = ‖U‖2 F

d 1

I.

Lemma 9.3.7. Let r := Rank(M). Assume that d 1 ≤ d 0 and λ <

rλ r

∑ r i=1 (λ . Then all equalized local minima are global. All other equalized

i−λ r )

critical points are strict saddle points.

Proof of Lemma 9.3.7. Let U be a critical point that is equalized. Furthermore,

let r ′ be the rank of U, and U = WΣV ⊤ be its rank-r ′ SVD,

i.e. W ∈ R d 0×r ′ , V ∈ R d 1×r ′ are such that U ⊤ U = V ⊤ V = I r ′ and

Σ ∈ R r′ ×r ′ , is a positive definite diagonal matrix whose diagonal

entries are sorted in descending order. We have:

∇L θ (U) = 4(UU ⊤ − M)U + 4λUdiag(U ⊤ U) = 0

=⇒ UU ⊤ U + λ‖U‖2 F

U = MU

d 1

=⇒ WΣ 3 V ⊤ + λ‖Σ‖2 F

WΣV ⊤ = MWΣV ⊤

d 1

=⇒ Σ 2 + λ‖Σ‖2 F

I = W ⊤ MW

d 1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!