22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

After applying each scheduler to SGD with momentum, and to SGD with

Nesterov’s momentum, we obtain the following paths:

Figure 6.28 - Paths taken by SGD combining momentum and scheduler

Adding a scheduler to the mix seems to have helped the optimizer to achieve a

more stable path toward the minimum.

The general idea behind using a scheduler is to allow the

optimizer to alternate between exploring the loss surface (high

learning rate phase) and targeting a minimum (low learning rate

phase).

What is the impact of the scheduler on loss trajectories? Let’s check it out!

Learning Rates | 491

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!