22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

By the way, the two new parameters r and z are called

gates—respectively, reset and update gates. Both of them must

produce values between zero and one, thus allowing only a

fraction of the original values to go through.

Every gate produces a vector of values (each value between zero

and one) with a size corresponding to the number of hidden

dimensions. For two hidden dimensions, a gate may have values

like [0.52, 0.87] for example.

Since gates produce vectors, operations involving them are

element-wise multiplications.

GRU Cell

If we place both expressions next to one another, we can more easily see that the

RNN is a special case of the GRU (for r=1 and z=0):

Equation 8.5 - RNN vs GRU

"OK, I see it; but where do r and z come from?"

Well, this is a deep learning book, so the only right answer to "Where does

something come from" is, a neural network! Just kidding … or am I? Actually, we’ll

train both gates using a structure that is pretty much an RNN cell, except for the

fact that it uses a sigmoid activation function:

Equation 8.6 - Gates (r and z) and candidate hidden state (n)

626 | Chapter 8: Sequences

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!