22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

respect to (w.r.t.) a single parameter. We have two parameters, b and w, so we must

compute two partial derivatives.

A derivative tells you how much a given quantity changes when you slightly vary

some other quantity. In our case, how much does our MSE loss change when we

vary each of our two parameters separately?

Gradient = how much the loss changes if ONE parameter

changes a little bit!

The right-most part of the equations below is what you usually see in

implementations of gradient descent for simple linear regression. In the

intermediate step, I show you all elements that pop up from the application of the

chain rule, [37] so you know how the final expression came to be.

Equation 0.4 - Computing gradients w.r.t coefficients b and w using n points

Just to be clear: We will always use our "regular" error computed at the beginning

of Step 2. The loss surface is surely eye candy, but, as I mentioned before, it is only

feasible to use it for educational purposes.

Step 3

1 # Step 3 - Computes gradients for both "b" and "w" parameters

2 b_grad = 2 * error.mean()

3 w_grad = 2 * (x_train * error).mean()

4 print(b_grad, w_grad)

38 | Chapter 0: Visualizing Gradient Descent

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!