20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Our first-pass neural network design

291

self.conv1 = nn.Conv3d(

in_channels, conv_channels, kernel_size=3, padding=1, bias=True,

)

self.relu1 = nn.ReLU(inplace=True) 1((CO5-1))

self.conv2 = nn.Conv3d(

conv_channels, conv_channels, kernel_size=3, padding=1, bias=True,

)

self.relu2 = nn.ReLU(inplace=True)

self.maxpool = nn.MaxPool3d(2, 2)

def forward(self, input_batch):

block_out = self.conv1(input_batch)

block_out = self.relu1(block_out)

block_out = self.conv2(block_out)

block_out = self.relu2(block_out)

These could be

implemented as calls

to the functional API

instead.

return self.maxpool(block_out)

Finally, the head of the network takes the output from the backbone and converts it

into the desired output form. For convolutional networks, this often involves flattening

the intermediate output and passing it to a fully connected layer. For some networks,

it makes sense to also include a second fully connected layer, although that is

usually more appropriate for classification problems in which the imaged objects have

more structure (think about cars versus trucks having wheels, lights, grill, doors, and

so on) and for projects with a large number of classes. Since we are only doing binary

classification, and we don’t seem to need the additional complexity, we have only a

single flattening layer.

Using a structure like this can be a good first building block for a convolutional

network. There are more complicated designs out there, but for many projects they’re

overkill in terms of both implementation complexity and computational demands. It’s

a good idea to start simple and add complexity only when there’s a demonstrable

need for it.

We can see the convolutions of our block represented in 2D in figure 11.6. Since

this is a small portion of a larger image, we ignore padding here. (Note that the ReLU

activation function is not shown, as applying it does not change the image sizes.)

Let’s walk through the information flow between our input voxels and a single voxel

of output. We want to have a strong sense of how our output will respond when the

inputs change. It might be a good idea to review chapter 8, particularly sections 8.1

through 8.3, just to make sure you’re 100% solid on the basic mechanics of convolutions.

We’re using 3 × 3 × 3 convolutions in our block. A single 3 × 3 × 3 convolution has

a receptive field of 3 × 3 × 3, which is almost tautological. Twenty-seven voxels are fed

in, and one comes out.

It gets interesting when we use two 3 × 3 × 3 convolutions stacked back to back. Stacking

convolutional layers allows the final output voxel (or pixel) to be influenced by an

input further away than the size of the convolutional kernel suggests. If that output

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!