22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 6.22 - Path taken by each SGD flavor

Take the third point in the lower-left part of the black line, for instance: Its location

is quite different in each of the plots and thus so are the corresponding gradients.

The two plots on the left are already known to us. The new plot in town is the one

to the right. The dampening of the oscillations is abundantly clear, but Nesterov’s

momentum still gets past its target and has to backtrack a little to approach it

from the opposite direction. And let me remind you that this is one of the easiest

loss surfaces of all!

Talking about losses, let’s take a peek at their trajectories.

Figure 6.23 - Losses for each SGD flavor

The plot on the left is there just for comparison; it is the same as before. The one on

the right is quite straightforward too, depicting the fact that Nesterov’s

momentum quickly found its way to a lower loss and slowly approached the

optimal value.

The plot in the middle is a bit more intriguing: Even though regular momentum

produced a path with wild swings over the loss surface (each black dot

corresponds to a mini-batch), its loss trajectory oscillates less than Adam’s does.

This is an artifact of this simple linear regression problem (namely, the bowlshaped

loss surface), and should not be taken as representative of typical behavior.

Learning Rates | 477

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!