20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

PyTorch’s autograd: Backpropagating all things

125

Figure 5.10 The forward graph

and backward graph of the model

as computed with autograd

ACCUMULATING GRAD FUNCTIONS

We could have any number of tensors with requires_grad set to True and any composition

of functions. In this case, PyTorch would compute the derivatives of the loss

throughout the chain of functions (the computation graph) and accumulate their values

in the grad attribute of those tensors (the leaf nodes of the graph).

Alert! Big gotcha ahead. This is something PyTorch newcomers—and a lot of more

experienced folks, too—trip up on regularly. We just wrote accumulate, not store.

WARNING Calling backward will lead derivatives to accumulate at leaf nodes.

We need to zero the gradient explicitly after using it for parameter updates.

Let’s repeat together: calling backward will lead derivatives to accumulate at leaf nodes.

So if backward was called earlier, the loss is evaluated again, backward is called again

(as in any training loop), and the gradient at each leaf is accumulated (that is,

summed) on top of the one computed at the previous iteration, which leads to an

incorrect value for the gradient.

In order to prevent this from occurring, we need to zero the gradient explicitly at each

iteration. We can do this easily using the in-place zero_ method:

# In[8]:

if params.grad is not None:

params.grad.zero_()

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!