22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

For a mini-batch of n data points, given one particular feature x, batch

normalization will first compute the statistics for that mini-batch:

Equation 7.2 - Mean and standard deviation

Then, it will use these statistics to standardize each data point in the mini-batch:

Equation 7.3 - Standardization

So far, this is pretty much the same as the standardization of features, except for

the epsilon term that has been added to the denominator to make it numerically

stable (its typical value is 1e-5).

Since the batch normalization layer is meant to produce a zero

mean output, it makes the bias in the layer that precedes it

totally redundant. It would be a waste of computation to learn a

bias that will be immediately removed by the following layer.

So, it is best practice to set bias=False in the preceding layer

(you can check it out in the code for BasicConv2d in the previous

section).

The actual difference is the optional affine transformation at the end:

Equation 7.4 - Batch normalization

If you choose not to perform an affine transformation, parameters b and w will be

automatically set to zero and one, respectively. Although I’ve chosen the familiar b

534 | Chapter 7: Transfer Learning

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!