22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Output

-3.044811379650508 -1.8337537171510832

Visualizing Gradients

Since the gradient for b is larger (in absolute value, 3.04) than the gradient for w (in

absolute value, 1.83), the answer for the question I posed you in the "Cross-

Sections" section is: The black curve (b changes, w is constant) yields the largest

changes in loss.

"Why is that?"

To answer that, let’s first put both cross-section plots side-by-side, so we can more

easily compare them. What is the main difference between them?

Figure 0.7 - Cross-sections of the loss surface

The curve on the right is steeper. That’s your answer! Steeper curves have larger

gradients.

Cool! That’s the intuition… Now, let’s get a bit more geometrical. So, I am zooming

in on the regions given by the red and black squares of Figure 0.7.

From the "Cross-Sections" section, we already know that to minimize the loss, both

b and w needed to be increased. So, keeping in the spirit of using gradients, let’s

increase each parameter a little bit (always keeping the other one fixed!). By the

Step 3 - Compute the Gradients | 39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!