22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

is only 0.25 (for z = 0) and that it gets close to zero as the absolute value of z

reaches a value of five.

Also, remember that the activation values of any given layer are the inputs of the

following layer and, given the range of the sigmoid, the activation values are going

to be centered around 0.5, instead of zero. This means that, even if we normalize

our inputs to feed the first layer, it will not be the case anymore for the other

layers.

"Why does it matter if the outputs are centered around zero or not?"

In previous chapters, we standardized features (zero mean, unit standard

deviation) to improve the performance of gradient descent. The same reasoning

applies here since the outputs of any given layer are the inputs of the following

layer. There is actually more to it, and we’ll briefly touch on this topic again in the

ReLU activation function when talking about the "internal covariate shift."

PyTorch has the sigmoid function available in two flavors, as we’ve already seen it

in Chapter 3: torch.sigmoid() and nn.Sigmoid. The first one is a simple function,

and the second one is a full-fledged class inherited from nn.Module, thus being, for

all intents and purposes, a model on its own.

dummy_z = torch.tensor([-3., 0., 3.])

torch.sigmoid(dummy_z)

Output

tensor([0.0474, 0.5000, 0.9526])

nn.Sigmoid()(dummy_z)

Output

tensor([0.0474, 0.5000, 0.9526])

Activation Functions | 313

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!