22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

We are missing the activation functions!

An activation function is a nonlinear function that transforms

the outputs of the hidden layers, in a similar way to how the

sigmoid function transforms the logits in the output layer.

Actually, the sigmoid is one of many activation functions. There

are others, like the hyperbolic-tangent (tanh) and the rectified

linear unit (ReLU).

A deeper model without activation functions in its hidden layers

is no better than a linear or logistic regression. That’s what I

wanted to illustrate with the two models we’ve trained, the

shallow and the deep. That’s why I removed the bias in both

models too: It makes the comparison more straightforward.

Show Me the Math!

This subsection is optional. If you’re curious to understand, using matrix

multiplication, why our deep-ish model is equivalent to a logistic regression, check

the sequence of equations below.

The deep-ish model is above the line, each row corresponding to a layer. The data

flows from right to left (since that’s how one multiplies a sequence of matrices),

starting with the 25 features on the right and finishing with a single logit output on

the left. Looking at each layer (row) individually, it should also be clear that the

outputs of a given layer (each row’s left-most vector) are the inputs of the next

layer, the same way the features are the inputs of the first layer.

306 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!