22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

torch.manual_seed(42)

parm = nn.Parameter(torch.randn(2, 1))

fake_grads = torch.tensor([[2.5], [.8]])

We’re also generating the fake gradients above so we can manually set them as if

they were the computed gradients of our random parameters. We’ll use these

gradients to illustrate two different ways of clipping them.

Value Clipping

This is the most straightforward way: It clips gradients element-wise so they stay

inside the [-clip_value, +clip_value] range. We can use PyTorch’s

nn.utils.clip_grad_value_() to clip gradients in-place:

parm.grad = fake_grads.clone()

#Gradient Value Clipping

nn.utils.clip_grad_value_(parm, clip_value=1.0)

parm.grad.view(-1,)

Output

tensor([1.0000, 0.8000])

The first gradient got clipped, the other one kept its original value. It doesn’t get any

simpler than that.

Now, pause for a moment and think of the gradients above as the steps gradient

descent is taking along two different dimensions to navigate the loss surface

toward (some) minimum value. What happens if we clip some of these steps? We’re

actually changing directions on its path toward the minimum. Figure E.5 illustrates

both vectors, original and clipped.

By clipping values, we’re modifying the gradients in such a way that, not only is the

step smaller, but it is in a different direction. Is this a problem? No, not necessarily.

Can we avoid changing directions? Yes, we can, that’s what norm clipping is good

for.

Vanishing and Exploding Gradients | 575

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!