22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Mini-Batch Schedulers

These schedulers have their step() method called at the end of every mini-batch.

They are all cyclical schedulers.

• CyclicLR: This cycles between base_lr and max_lr (so it disregards the initial

learning rate set in the optimizer), using step_size_up updates to go from the

base to the max learning rate, and step_size_down updates to go back. This

behavior corresponds to mode=triangular. Additionally, it is possible to shrink

the amplitude using different modes: triangular2 will halve the amplitude

after each cycle, while exp_range will exponentially shrink the amplitude using

gamma as base and the number of the cycle as the exponent.

A typical choice of value for max_lr is the learning rate found

using the LR Range Test.

• OneCycleLR: This uses a method called annealing to update the learning rate

from its initial value up to a defined maximum learning rate (max_lr) and then

down to a much lower learning rate over a total_steps number of updates,

thus performing a single cycle.

• CosineAnnealingWarmRestarts: This uses cosine annealing [104] to update the

learning rate, but we’re not delving into details here, except to say that this

particular scheduler requires the epoch number (including the fractional part

corresponding to the number of mini-batches over the length of the data

loader) as an argument of its step() method.

Let’s try CyclicLR in different modes for a range of learning rates between 1e-4

and 1e-3, two steps in each direction.

dummy_parm = [nn.Parameter(torch.randn(1))]

dummy_optimizer = optim.SGD(dummy_parm, lr=0.01)

dummy_scheduler1 = CyclicLR(dummy_optimizer, base_lr=1e-4,

max_lr=1e-3, step_size_up=2, mode='triangular')

dummy_scheduler2 = CyclicLR(dummy_optimizer, base_lr=1e-4,

max_lr=1e-3, step_size_up=2, mode='triangular2')

dummy_scheduler3 = CyclicLR(dummy_optimizer, base_lr=1e-4,

max_lr=1e-3, step_size_up=2, mode='exp_range', gamma=np.sqrt(.5))

Learning Rates | 485

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!