26.12.2022 Views

TheoryofDeepLearning.2022

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

76 theory of deep learning

Figure 8.2: Generalization error

vs. complexity measure.

that original labels have much better alignment with top eigenvectors,

thus enjoying faster convergence.

Understanding Generalization of Ultra-wide Neural Networks The approximation

in Equation (8.8) implies the final prediction function

of ultra-wide neural network is approximately the kernel prediction

function defined in Equation (8.6). Therefore, we can just use the generalization

theory for kernels to analyze the generalization behavior

of ultra-wide neural networks. For the kernel prediction function

defined in Equation (8.6), we can use Rademacher complexity bound

to derive the following generalization bound for 1-Lipschitz loss

function (which is an upper bound of classification error):

2y ⊤ (H ∗ ) −1 y · tr (H ∗ )

. (8.10)

n

This is a data-dependent complexity measure that upper bounds the

generalization error.

We can check this complexity measure empirically. In Figure 8.2,

we compare the generalization error (l 1 loss and classification error)

with this complexity measure. We vary the portion of random labels

in the dataset to see how the generalization error and the complexity

measure change. We use the neural network architecture defined in

Equation (8.7) with ReLU activation function and only train the first

layer. The left figure uses data from two classes of MNIST and the

right figure uses two classes from CIFAR. This complexity measure

almost matches the trend of generalization error as the portion of

random labels increases.

8.4 NTK formula for Multilayer Fully-connected Neural Network

In this section we show case the NTK formulas of fully-connected

neural network. We first define a fully-connected neural net formally.

Let x ∈ R d be the input, and denote g (0) (x) = x and d 0 = d for

notational convenience. We define an L-hidden-layer fully-connected

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!