22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

different optimizer, set them to capture parameters, and train them for ten epochs.

The captured parameters (bias and weight) will draw the following paths (the red

dot represents their optimal values).

Figure 6.18 - Paths taken by SGD and Adam

On the left plot, we have the typical well-behaved (and slow) path taken by simple

gradient descent. You can see it is wiggling a bit due to the noise introduced by

using mini-batches. On the right plot, we see the effect of using the exponentially

weighted moving averages: On the one hand, it is smoother and moves faster; on

the other hand, it overshoots and has to change course back and forth as it

approaches the target. It is adapting to the loss surface, if you will.

If you like the idea of visualizing (and animating) the paths of

optimizers, make sure to check out Louis Tiao’s tutorial [103] on the

subject.

Talking about losses, we can also compare the trajectories of training and

validation losses for each optimizer.

Learning Rates | 469

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!