22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

class BasicConv2d(nn.Module):

def __init__(self, in_channels, out_channels, **kwargs):

super(BasicConv2d, self).__init__()

self.conv = nn.Conv2d(

in_channels, out_channels, bias=False, **kwargs

)

self.bn = nn.BatchNorm2d(out_channels, eps=0.001)

def forward(self, x):

x = self.conv(x)

x = self.bn(x)

return F.relu(x, inplace=True)

Sure, its main component is still nn.Conv2d, but it also applies a ReLU activation

function to the final output. More important, though, it calls nn.BatchNorm2d

between the two.

"What is that?"

That is…

Batch Normalization

The batch normalization layer is a very important component of

many modern architectures. Although its inner workings are not

exactly complex (you’ll see that in the following paragraphs), its

impact on model training certainly is complex. From its

placement (before or after an activation function) to the way its

behavior is impacted by the mini-batch size, I try to briefly

address the main discussion points in asides along the main text.

This is meant to bring you up to speed on this topic, but is by no

means a comprehensive take on it.

We briefly talked in Chapter 4 about the need for normalization layers in order to

prevent (or mitigate) an issue commonly called "internal covariate shift," which is

just fancy for having different distributions of activation values in different layers.

In general, we would like to have all layers produce activation values with similar

distributions, ideally with zero mean and unit standard deviation.

532 | Chapter 7: Transfer Learning

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!