22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

way, in this example, a little bit equals 0.12 (for convenience’s sake, so it results in a

nicer plot).

What effect do these increases have on the loss? Let’s check it out:

Figure 0.8 - Computing (approximate) gradients, geometrically

On the left plot, increasing w by 0.12 yields a loss reduction of 0.21. The

geometrically computed and roughly approximate gradient is given by the ratio

between the two values: -1.79. How does this result compare to the actual value of

the gradient (-1.83)? It is actually not bad for a crude approximation. Could it be

better? Sure, if we make the increase in w smaller and smaller (like 0.01, instead of

0.12), we’ll get better and better approximations. In the limit, as the increase

approaches zero, we’ll arrive at the precise value of the gradient. Well, that’s the

definition of a derivative!

The same reasoning goes for the plot on the right: increasing b by the same 0.12

yields a larger loss reduction of 0.35. Larger loss reduction, larger ratio, larger

gradient—and larger error, too, since the geometric approximation (-2.90) is

farther away from the actual value (-3.04).

Time for another question: Which curve, red or black, do you like best to reduce

the loss? It should be the black one, right? Well, yes, but it is not as straightforward

as we’d like it to be. We’ll dig deeper into this in the "Learning Rate" section.

Backpropagation

Now that you’ve learned about computing the gradient of the loss function w.r.t. to

40 | Chapter 0: Visualizing Gradient Descent

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!