22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure E.7 - Losses—clipping by value

What about taking a look at the average gradients once again (there are 320

updates now, so we’re looking at the extremes only):

avg_grad = np.array(

sbs_reg_clip._gradients['fc1']['weight']).mean(axis=(1, 2)

)

avg_grad.min(), avg_grad.max()

Output

(-24.69288555463155, 14.385948762893676)

"How come these (absolute) values are much larger than our clipping

value?"

These are the computed gradients; that is, before clipping. Left unchecked, these

gradients would have caused large updates, which, in turn, would have resulted in

even larger gradients, and so on and so forth. Explosion, basically. But these values

were all clipped before being used in the parameter update, so all went well with

the model training.

It is possible to take a more aggressive approach and clip the gradients at the origin

using the backward hooks we discussed before.

582 | Extra Chapter: Vanishing and Exploding Gradients

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!