20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Subclassing nn.Module

209

First, our goal is reflected by the size of our intermediate values generally

shrinking—this is done by reducing the number of channels in the convolutions, by

reducing the number of pixels through pooling, and by having an output dimension

lower than the input dimension in the linear layers. This is a common trait of

classification networks. However, in many popular architectures like the ResNets we saw

in chapter 2 and discuss more in section 8.5.3, the reduction is achieved by pooling in

the spatial resolution, but the number of channels increases (still resulting in a

reduction in size). It seems that our pattern of fast information reduction works well

with networks of limited depth and small images; but for deeper networks, the decrease

is typically slower.

Second, in one layer, there is not a reduction of output size with regard to input

size: the initial convolution. If we consider a single output pixel as a vector of 32 elements

(the channels), it is a linear transformation of 27 elements (as a convolution of

3 channels × 3 × 3 kernel size)—only a moderate increase. In ResNet, the initial convolution

generates 64 channels from 147 elements (3 channels × 7 × 7 kernel size). 6

So the first layer is exceptional in that it greatly increases the overall dimension (as in

channels times pixels) of the data flowing through it, but the mapping for each output

pixel considered in isolation still has approximately as many outputs as inputs. 7

8.3.2 How PyTorch keeps track of parameters and submodules

Interestingly, assigning an instance of nn.Module to an attribute in an nn.Module, as

we did in the earlier constructor, automatically registers the module as a submodule.

NOTE The submodules must be top-level attributes, not buried inside list or

dict instances! Otherwise the optimizer will not be able to locate the submodules

(and, hence, their parameters). For situations where your model

requires a list or dict of submodules, PyTorch provides nn.ModuleList and

nn.ModuleDict.

We can call arbitrary methods of an nn.Module subclass. For example, for a model

where training is substantially different than its use, say, for prediction, it may make

sense to have a predict method. Be aware that calling such methods will be similar to

calling forward instead of the module itself—they will be ignorant of hooks, and the

JIT does not see the module structure when using them because we are missing the

equivalent of the __call__ bits shown in section 6.2.1.

This allows Net to have access to the parameters of its submodules without further

action by the user:

6

The dimensions in the pixel-wise linear mapping defined by the first convolution were emphasized by Jeremy

Howard in his fast.ai course (https://www.fast.ai).

7

Outside of and older than deep learning, projecting into high-dimensional space and then doing conceptually

simpler (than linear) machine learning is commonly known as the kernel trick. The initial increase in the

number of channels could be seen as a somewhat similar phenomenon, but striking a different balance

between the cleverness of the embedding and the simplicity of the model working on the embedding.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!