22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

nn.Tanh()(dummy_z)

Output

tensor([-0.9951, 0.0000, 0.9951])

Rectified Linear Unit (ReLU)

Maybe "squashing" is not the way to go—what if we bend the rules a bit and use an

activation function that bends (yay, another pun!) the line? The ReLU was born like

that, and it spawned a whole family of similar functions! The ReLU, or one of its

relatives, is the commonplace choice of activation function nowadays. It addresses

the problem of vanishing gradients found with its two predecessors, while also

being the fastest to compute gradients for.

Figure 4.13 - ReLU function and its gradient

As you can see in Figure 4.13, the ReLU is a totally different beast: It does not

"squash" the values into a range—it simply preserves positive values and turns all

negative values into zero.

The upside of using a ReLU is that its gradient is either one (for positive values) or

zero (for negative values)—no more vanishing gradients! This pattern leads to a

Activation Functions | 315

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!