20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Convolutions in action

203

solve our birds versus airplanes problem effectively, since although CIFAR-10 images

are small, the objects still have a (wing-)span several pixels across.

One possibility could be to use large convolution kernels. Well, sure, at the limit we

could get a 32 × 32 kernel for a 32 × 32 image, but we would converge to the old fully

connected, affine transformation and lose all the nice properties of convolution.

Another option, which is used in convolutional neural networks, is stacking one convolution

after the other and at the same time downsampling the image between successive

convolutions.

FROM LARGE TO SMALL: DOWNSAMPLING

Downsampling could in principle occur in different ways. Scaling an image by half is

the equivalent of taking four neighboring pixels as input and producing one pixel as

output. How we compute the value of the output based on the values of the input is

up to us. We could

• Average the four pixels. This average pooling was a common approach early on but

has fallen out of favor somewhat.

• Take the maximum of the four pixels. This approach, called max pooling, is currently

the most commonly used approach, but it has a downside of discarding the

other three-quarters of the data.

• Perform a strided convolution, where only every Nth pixel is calculated. A 3 × 4 convolution

with stride 2 still incorporates input from all pixels from the previous layer.

The literature shows promise for this approach, but it has not yet supplanted

max pooling.

We will be focusing on max pooling, illustrated in figure 8.7, going forward. The figure

shows the most common setup of taking non-overlapping 2 x 2 tiles and taking the

maximum over each of them as the new pixel at the reduced scale.

Intuitively, the output images from a convolution layer, especially since they are followed

by an activation just like any other linear layer, tend to have a high magnitude

INput

(output of conv + activation)

2 2 2 0

2 5 2 1

2 2 2 0

0 1 0 0

maxpOol

2 2

2 5

max=5

max=2

output

5 2

2 2

2 2 2

0 1

0

2 0

2 1

max=2

max=2

0

0

Figure 8.7

Max pooling in detail

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!