TheoryofDeepLearning.2022
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
ultra-wide neural networks and neural tangent kernel 79
Applying this approximation inductively in Equation (8.16), we get
〈
〉
b (h) (x), b (h) (x ′ L
) → ∏
˙Σ (h′) (x, x ′ ).
h ′ =h
〈 〉 〈 〉
∂ f (w,x)
Finally, since
∂w , ∂ f (w,x′ )
∂w
= ∑ L+1 ∂ f (w,x)
h=1
, ∂ f (w,x′ )
, we obtain
the final NTK expression for the fully-connected neural network:
∂W (h) ∂W (h)
(
)
Θ (L) (x, x ′ L+1
) = ∑ Σ (h−1) (x, x ′ ) · ˙Σ (h′) (x, x ′ ) .
h=1
8.5 NTK in Practice
L+1
∏
h ′ =h
Up to now we have shown an ultra-wide neural network with certain
initialization scheme and trained by gradient flow correspond to a
kernel with a particular kernel function. A natural question is: why
don’t we use this kernel classifier directly?
A recent line of work showed that NTKs can be empirically useful,
especially on small to medium scale datasets. Arora et al. [? ] tested
the NTK classifier on 90 small to medium scale datasets from UCI
database. 3 They found NTK can beat neural networks, other kernels
like Gaussian and the best previous classifier, random forest under
various metrics, including average rank, average accuracy, etc. This
suggests the NTK classifier should belong in any list of off-the-shelf
machine learning methods.
For every neural network architecture, one can derive a corresponding
kernel function. Du et al. [? ] derived graph NTK (GNTK)
for graph classification tasks. On various social network and bioinformatics
datasets, GNTK can outperform graph neural networks.
Similarly, Arora et al. [? ] derived convolutional NTK (CNTK)
formula that corresponds to convolutional neural networks. For
image classification task, in small-scale data and low-shot settings,
CNTKs can be quite strong [? ]. However, for large scale data, Arora
et al. [? ] found there is still a performance gap between CNTK and
CNN. It is an open problem to explain this phenomenon theoretically.
This may need to go beyond the NTK framework.
3
https://archive.ics.uci.edu/ml/datasets.php
8.6 Exercises
1. NTK formula for ReLU activation function: prove E w∼N (0,I)
[ ˙σw ⊤ x ˙σw ⊤ x ′] =
( )
π−arccos x⊤x ′
‖x‖ 2 ‖x ′ ‖ 2
2π
.
2. Prove Equation (8.10)