22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Hyperbolic Tangent (TanH)

The hyperbolic tangent activation function was the evolution of the sigmoid, as its

outputs are values with a zero mean, different from its predecessor.

Figure 4.12 - TanH function and its gradient

As you can see in Figure 4.12, the TanH activation function "squashes" the input

values into the range (-1, 1). Therefore, being centered at zero, the activation

values are already (somewhat) normalized inputs for the next layer, making the

hyperbolic tangent a better activation function than the sigmoid.

Regarding the gradient, it has a much larger peak value of 1.0 (again, for z = 0), but

its decrease is even faster, approaching zero to absolute values of z as low as three.

This is the underlying cause of what is referred to as the problem of vanishing

gradients, which causes the training of the network to be progressively slower.

Just like the sigmoid function, the hyperbolic tangent also comes in two flavors:

torch.tanh() and nn.Tanh.

dummy_z = torch.tensor([-3., 0., 3.])

torch.tanh(dummy_z)

Output

tensor([-0.9951, 0.0000, 0.9951])

314 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!