20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

128 CHAPTER 5 The mechanics of learning

'RMSprop',

'Rprop',

'SGD',

'SparseAdam',

...

]

Every optimizer constructor takes a list of parameters (aka PyTorch tensors, typically

with requires_grad set to True) as the first input. All parameters passed to the optimizer

are retained inside the optimizer object so the optimizer can update their values

and access their grad attribute, as represented in figure 5.11.

A

B

C

D

Figure 5.11 (A) Conceptual representation of how an optimizer holds a reference to

parameters. (B) After a loss is computed from inputs, (C) a call to .backward leads to

.grad being populated on parameters. (D) At that point, the optimizer can access

.grad and compute the parameter updates.

Each optimizer exposes two methods: zero_grad and step. zero_grad zeroes the

grad attribute of all the parameters passed to the optimizer upon construction. step

updates the value of those parameters according to the optimization strategy implemented

by the specific optimizer.

USING A GRADIENT DESCENT OPTIMIZER

Let’s create params and instantiate a gradient descent optimizer:

# In[6]:

params = torch.tensor([1.0, 0.0], requires_grad=True)

learning_rate = 1e-5

optimizer = optim.SGD([params], lr=learning_rate)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!