22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

• Its original motivation was to address the so-called internal covariate shift by

producing similar distributions across different layers, but it was later found

that it actually improves model training by making the loss surface smoother.

• The batch normalization may be placed either before or after the activation

function; there is no "right" or "wrong" way.

• The layer preceding the batch normalization layer should have its bias=False

set to avoid useless computation.

• Even though batch normalization works for a different reason than initially

thought, addressing the internal covariate shift may still bring benefits, like

solving the vanishing gradients problem, one of the topics of the next chapter.

So, we’ve learned that batch normalization speeds up training by making the loss

surface smoother. It turns out, there is yet another technique that works along

these lines…

Residual Connections

The idea of a residual connection is quite simple, actually: After passing the input

through a layer and activation function, the input itself is added to the result.

That’s it! Simple, elegant, and effective.

"Why would I want to add the input to the result?"

Learning the Identity

Neural networks and their nonlinearities (activation functions) are great! We’ve

seen in the "Bonus" chapter how models manage to twist and turn the feature

space to the point where classes can be separated by a straight line in the

activated feature space. But nonlinearities are both a blessing and a curse: They

make it extremely hard for a model to learn the identity function.

To illustrate this, let’s start with a dummy dataset containing 100 random data

points with a single feature. But this feature isn’t simply a feature—it is also the

label. Data preparation is fairly straightforward:

546 | Chapter 7: Transfer Learning

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!