26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

100 theory of deep learning

U(t) = W d0 Σ ′ V ⊤ d 1

where Σ ′ ∈ R d 0×d 1 is diagonal with non-zero

diagonal elements given as σ

i ′ = σi 2 + t 2 for i ≤ d 1 . Observe that

U(t) ⊤ U(t) = VΣ 2 V ⊤ + t 2 V ⊤ d 1

V d1 = U ⊤ U + t 2 I d1 .

Thus, the parametric curve U(t) is equalized for all t. The population

risk at U(t) equals:

L(U(t)) =

d 1

i=1

(λ i − σ 2 i − t 2 ) 2 +

d 0

∑ λ 2 i

i=d 1 +1

= L(U) + d 1 t 4 − 2t 2 d 1

∑ (λ i − σi 2).

i=1

Furthermore, since U(t) is equalized, we obtain the following form

for the regularizer:

R(U(t)) = λ d 1

‖U(t)‖ 4 F = λ d 1

(

‖U‖ 2 F + d 1t 2) 2

= R(U) + λd 1 t 4 + 2λt 2 ‖U‖ 2 F .

Define g(t) := L(U(t)) + R(U(t)). We have that

g(t) = L(U) + R(U) + d 1 t 4 − 2t 2 d 1

∑ (λ i − σi 2) + λd 1t 4 + 2λt 2 ‖U‖ 2 F .

i=1

It is easy to verify that g ′ (0) = 0. Moreover, the second derivative of g

at t = 0 is given as:

g ′′ (0) = −4

d 1

i=1

(λ i − σi 2 d 1

) + 4λ‖U‖2 F = −4 ∑ λ i + 4(1 + λ)‖U‖ 2 F (9.15)

i=1

We use ‖U‖ 2 F = ∑r′ i=1 σ2 i

and Equation (9.14) to arrive at

‖U‖ 2 F = trΣ2 =

r ′

i=1

(λ i − λ ∑r′ j=1 λ j

d 1 + λr ′ ) = (

r ′

i=1

λ i )(1 −

λr′

d 1 + λr ′ ) = d 1 ∑ r′

d 1 + λr ′

Plugging back the above equality in Equation (9.15), we get

g ′′ (0) = −4

d 1

i=1

λ i + 4 d r

1 + d 1 λ

d 1 + λr

∑ ′

i=1

λ i = −4

d 1

i=r ′ +1

i=1 λ i

λ i + 4 (d 1 − r ′ r

d 1 + λr

∑ ′ λ i

i=1

To get a sufficient condition for U to be a strict saddle point, it suf-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!