22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Since the RNN cell has both of them (t h and t x ) on the same footing and simply adds

them up, there is no way to address the two questions above. To do so, we would

need something different, like…

Gated Recurrent Units (GRUs)

Gated recurrent units, or GRUs for short, provide the answer to those two

questions! Let’s see how they do it by tackling one problem at a time. What if,

instead of simply computing a new hidden state and going with it, we tried a

weighted average of both hidden states, old and new?

Equation 8.2 - Weighted average of old and new hidden states

The new parameter z controls how much weight the GRU should give to the old

hidden state. OK, the first question has been addressed, and we can recover the

typical RNN behavior simply by setting z to zero.

Now, what if, instead of computing the new hidden state by simply adding up t h and

t x , we tried scaling t h first?

Equation 8.3 - Scaling the old hidden state

The new parameter r controls how much we keep from the old hidden state

before adding the transformed input. For low values of r, the relative importance

of the data point is increased, thus addressing the second question. Moreover, we

can recover the typical RNN behavior simply by setting r to one. The new hidden

state is called candidate hidden state (n).

Next, we can combine these two changes into a single expression:

Equation 8.4 - Hidden state, the GRU way

And we’ve (re)invented the gated recurrent unit cell on our own :-)

Gated Recurrent Units (GRUs) | 625

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!