22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

I’d like to draw your attention to the third column in particular: It clearly shows the

effect of a gate, the reset gate in this case, over the feature space. Since a gate has

a distinct value for each dimension, each dimension will shrink differently (it can

only shrink because values are always between zero and one). In the third row, for

example, the first dimension gets multiplied by 0.70, while the second dimension

gets multiplied by only 0.05, making the resulting feature space really small.

Can We Do Better?

The gated recurrent unit is definitely an improvement over the regular RNN, but

there are a couple of points I’d like to raise:

• Using the reset gate inside the hyperbolic tangent seems "weird" (not a

scientific argument at all, I know).

• The best thing about the hidden state is that it is bounded by the hyperbolic

tangent—it guarantees the next cell will get the hidden state in the same range.

• The worst thing about the hidden state is that it is bounded by the hyperbolic

tangent—it constrains the values the hidden state can take and, along with

them, the corresponding gradients.

• Since we cannot have the cake and eat it too when it comes to the hidden state

being bounded, what is preventing us from using two hidden states in the

same cell?

Yes, let’s try that—two hidden states are surely better than one, right?

By the way—I know that GRUs were invented a long time AFTER

the development of LSTMs, but I’ve decided to present them in

order of increasing complexity. Please don’t take the "story" I’m

telling too literally—it is just a way to facilitate learning.

Long Short-Term Memory (LSTM)

Long short-term memory, or LSTM for short, uses two states instead of one.

Besides the regular hidden state (h), which is bounded by the hyperbolic tangent,

as usual, it introduces a second cell state (c) as well, which is unbounded.

So, let’s work through the points raised in the last section. First, let’s keep it simple

and use a regular RNN to generate a candidate hidden state (g):

640 | Chapter 8: Sequences

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!