20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The case for convolutions

195

Of course, this approach is more than impractical. Fortunately, there is a readily

available, local, translation-invariant linear operation on the image: a convolution. We

can come up with a more compact description of a convolution, but what we are going

to describe is exactly what we just delineated—only taken from a different angle.

Convolution, or more precisely, discrete convolution 1 (there’s an analogous continuous

version that we won’t go into here), is defined for a 2D image as the scalar product

of a weight matrix, the kernel, with every neighborhood in the input. Consider a

3 × 3 kernel (in deep learning, we typically use small kernels; we’ll see why later on) as

a 2D tensor

weight = torch.tensor([[w00, w01, w02],

[w10, w11, w12],

[w20, w21, w22]])

and a 1-channel, MxN image:

image = torch.tensor([[i00, i01, i02, i03, ..., i0N],

[i10, i11, i12, i13, ..., i1N],

[i20, i21, i22, i23, ..., i2N],

[i30, i31, i32, i33, ..., i3N],

...

[iM0, iM1m iM2, iM3, ..., iMN]])

We can compute an element of the output image (without bias) as follows:

o11 = i11 * w00 + i12 * w01 + i22 * w02 +

i21 * w10 + i22 * w11 + i23 * w12 +

i31 * w20 + i32 * w21 + i33 * w22

Figure 8.1 shows this computation in action.

That is, we “translate” the kernel on the i11 location of the input image, and we

multiply each weight by the value of the input image at the corresponding location.

Thus, the output image is created by translating the kernel on all input locations and

performing the weighted sum. For a multichannel image, like our RGB image, the

weight matrix would be a 3 × 3 × 3 matrix: one set of weights for every channel, contributing

together to the output values.

Note that, just like the elements in the weight matrix of nn.Linear, the weights in

the kernel are not known in advance, but they are initialized randomly and updated

through backpropagation. Note also that the same kernel, and thus each weight in the

kernel, is reused across the whole image. Thinking back to autograd, this means the use

of each weight has a history spanning the entire image. Thus, the derivative of the loss

with respect to a convolution weight includes contributions from the entire image.

1

There is a subtle difference between PyTorch’s convolution and mathematics’ convolution: one argument’s

sign is flipped. If we were in a pedantic mood, we could call PyTorch’s convolutions discrete cross-correlations.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!