20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

224 CHAPTER 8 Using convolutions to generalize

the context of a very deep network. The derivatives of the loss function with respect to

the parameters, especially those in early layers, need to be multiplied by a lot of other

numbers originating from the chain of derivative operations between the loss and the

parameter. Those numbers being multiplied could be small, generating ever-smaller

numbers, or large, swallowing smaller numbers due to floating-point approximation.

The bottom line is that a long chain of multiplications will tend to make the contribution

of the parameter to the gradient vanish, leading to ineffective training of that layer

since that parameter and others like it won’t be properly updated.

In December 2015, Kaiming He and

coauthors presented residual networks

(ResNets), an architecture that uses a

simple trick to allow very deep networks

to be successfully trained (https://

arxiv.org/abs/1512.03385). That work

opened the door to networks ranging

from tens of layers to 100 layers in depth,

surpassing the then state of the art in

computer vision benchmark problems.

We encountered residual networks

when we were playing with pretrained

models in chapter 2. The trick we mentioned

is the following: using a skip connection

to short-circuit blocks of layers, as

shown in figure 8.11.

A skip connection is nothing but

the addition of the input to the output

of a block of layers. This is exactly how

it is done in PyTorch. Let’s add one

layer to our simple convolutional

model, and let’s use ReLU as the activation

for a change. The vanilla module

with an extra layer looks like this:

NetDepth / NetRes

Output (2d)

Linear (32d->2d)

View ((n/2*16)d)

n/2cx4x4

MaxPOol (2x2)

# In[51]:

class NetDepth(nn.Module):

def __init__(self, n_chans1=32):

super().__init__()

self.n_chans1 = n_chans1

self.conv1 = nn.Conv2d(3, n_chans1, kernel_size=3, padding=1)

self.conv2 = nn.Conv2d(n_chans1, n_chans1 // 2, kernel_size=3,

padding=1)

self.conv3 = nn.Conv2d(n_chans1 // 2, n_chans1 // 2,

kernel_size=3, padding=1)

self.fc1 = nn.Linear(4 * 4 * n_chans1 // 2, 32)

self.fc2 = nn.Linear(32, 2)

relu

Linear ((n/2*16)d->32d)

relu

Conv2d(3x3, n/2c->n/2c)

n/2cx8x8

MaxPOol (2x2)

relu

Conv2d(3x3, nc->n/2c)

ncx16x16

MaxPOol (2x2)

ncx32 x32

Conv2d (3x3, 3c->nc)

Input (3c, 32x32)

relu

Skip

coNnection

Figure 8.11 The architecture of our network with

three convolutional layers. The skip connection is

what differentiates NetRes from NetDepth.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!