26.12.2022 Views

TheoryofDeepLearning.2022

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

74 theory of deep learning

Several remarks are in sequel.

Remark 1: The assumption that y i = O(1) is a mild assumption

because in practice most labels are bounded by an absolute constant.

Remark 2: The assumption on u i (τ) = O(1) for all τ ≤ t and

m’s dependency on t can be relaxed. This requires a more refined

analysis. See [? ].

Remark 3: One can generalize the proof for multi-layer neural

network. See [? ] for more details.

Remark 4: While we only prove the continuous time limit, it is

not hard to show with small learning rate (discrete time) gradient

descent, H(t) is close to H ∗ . See [? ].

8.3 Explaining Optimization and Generalization of Ultra-wide

Neural Networks via NTK

Now we have established the following approximation

du(t)

dt

≈ −H ∗ · (u(t) − y) (8.8)

where H ∗ is the NTK matrix. Now we use this approximation to

analyze the optimization and generalization behavior of ultra-wide

neural networks.

Understanding Optimization

du(t)

dt

The dynamics of u(t) that follows

= −H ∗ · (u(t) − y)

is actually linear dynamical system. For this dynamics, there is a

standard analysis. We denote the eigenvalue decomposition of H ∗ as

H ∗ =

n

∑ λ i v i vi

i=1

where λ 1 ≥ . . . ≥ λ n ≥ 0 are eigenvalues and v 1 , . . . , v n are eigenvectors.

With this decomposition, we consider the dynamics of u(t) on

each eigenvector separately. Formally, fixing an eigenvevector v i and

multiplying both side by v i , we obtain

dv ⊤ i

u(t)

dt

= − vi ⊤ H ∗ · (u(t) − y)

)

= − λ i

(vi ⊤ (u(t) − y) .

Observe that the dynamics of vi ⊤ u(t) only depends on itself and

λ i , so this is actually a one dimensional ODE. Moreover, this ODE

admits an analytical solution

(

)

vi ⊤ (u(t) − y) = exp (−λ i t) vi ⊤ (u(0) − y) . (8.9)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!