20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

286 CHAPTER 11 Training a classification model to detect suspected tumors

Detects

multiple

GPUs

def initModel(self):

model = LunaModel()

if self.use_cuda:

log.info("Using CUDA; {} devices.".format(torch.cuda.device_count()))

if torch.cuda.device_count() > 1:

model = nn.DataParallel(model)

model = model.to(self.device)

return model

Wraps the model

Sends model

parameters to the GPU

def initOptimizer(self):

return SGD(self.model.parameters(), lr=0.001, momentum=0.99)

If the system used for training has more than one GPU, we will use the nn.DataParallel

class to distribute the work between all of the GPUs in the system and then collect and

resync parameter updates and so on. This is almost entirely transparent in terms of both

the model implementation and the code that uses that model.

DataParallel vs. DistributedDataParallel

In this book, we use DataParallel to handle utilizing multiple GPUs. We chose Data-

Parallel because it’s a simple drop-in wrapper around our existing models. It is not

the best-performing solution for using multiple GPUs, however, and it is limited to working

with the hardware available in a single machine.

PyTorch also provides DistributedDataParallel, which is the recommended wrapper

class to use when you need to spread work between more than one GPU or

machine. Since the proper setup and configuration are nontrivial, and we suspect that

the vast majority of our readers won’t see any benefit from the complexity, we won’t

cover DistributedDataParallel in this book. If you wish to learn more, we suggest

reading the official documentation: https://pytorch.org/tutorials/intermediate/

ddp_tutorial.html.

Assuming that self.use_cuda is true, the call self.model.to(device) moves the

model parameters to the GPU, setting up the various convolutions and other calculations

to use the GPU for the heavy numerical lifting. It’s important to do so before

constructing the optimizer, since, otherwise, the optimizer would be left looking at

the CPU-based parameter objects rather than those copied to the GPU.

For our optimizer, we’ll use basic stochastic gradient descent (SGD;

https://pytorch.org/docs/stable/optim.html#torch.optim.SGD) with momentum.

We first saw this optimizer in chapter 5. Recall from part 1 that many different optimizers

are available in PyTorch; while we won’t cover most of them in any detail, the

official documentation (https://pytorch.org/docs/stable/optim.html#algorithms)

does a good job of linking to the relevant papers.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!