20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Pretraining setup and initialization

287

Using SGD is generally considered a safe place to start when it comes to picking an

optimizer; there are some problems that might not work well with SGD, but they’re

relatively rare. Similarly, a learning rate of 0.001 and a momentum of 0.9 are pretty

safe choices. Empirically, SGD with those values has worked reasonably well for a wide

range of projects, and it’s easy to try a learning rate of 0.01 or 0.0001 if things aren’t

working well right out of the box.

That’s not to say any of those values is the best for our use case, but trying to find better

ones is getting ahead of ourselves. Systematically trying different values for learning

rate, momentum, network size, and other similar configuration settings is called a hyperparameter

search. There are other, more glaring issues we need to address first in the coming

chapters. Once we address those, we can begin to fine-tune these values. As we

mentioned in the section “Testing other optimizers” in chapter 5, there are also other,

more exotic optimizers we might choose; but other than perhaps swapping

torch.optim.SGD for torch.optim.Adam, understanding the trade-offs involved is a

topic too advanced for this book.

11.3.2 Care and feeding of data loaders

The LunaDataset class that we built in the last chapter acts as the bridge between

whatever Wild West data we have and the somewhat more structured world of tensors

that the PyTorch building blocks expect. For example, torch.nn.Conv3d (https://

pytorch.org/docs/stable/nn.html#conv3d) expects five-dimensional input: (N, C, D,

H, W): number of samples, channels per sample, depth, height, and width. Quite different

from the native 3D our CT provides!

You may recall the ct_t.unsqueeze(0) call in LunaDataset.__getitem__ from the

last chapter; it provides the fourth dimension, a “channel” for our data. Recall from

chapter 4 that an RGB image has three channels, one each for red, green, and blue.

Astronomical data could have dozens, one each for various slices of the electromagnetic

spectrum—gamma rays, X-rays, ultraviolet light, visible light, infrared, microwaves,

and/or radio waves. Since CT scans are single-intensity, our channel dimension

is only size 1.

Also recall from part 1 that training on single samples at a time is typically an inefficient

use of computing resources, because most processing platforms are capable of

more parallel calculations than are required by a model to process a single training or

validation sample. The solution is to group sample tuples together into a batch tuple,

as in figure 11.4, allowing multiple samples to be processed at the same time. The fifth

dimension (N) differentiates multiple samples in the same batch.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!