22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Then, there is the cross-attention mechanism, the first one we discussed.

Figure 9.53 - Cross-attention scores for its three heads

There is a lot of variation in the matrices above: In the third head of the first

sequence, for example, points #3 and #4 pay attention to point #1 only while the

first head pays attention to to alternate points; in the second sequence, though, it’s

the second head that pays attention to alternate points.

Putting It All Together

In this chapter, we used the same dataset of colored squares, but this time we

focused on predicting the coordinates of the last two corners (target sequence)

given the coordinates of the first two corners (source sequence). In the beginning,

we used familiar recurrent neural networks to build an encoder-decoder

architecture. Then, we progressively built on top of it by using (cross-)attention,

self-attention, and positional encoding.

Data Preparation

The training set has the full sequences as features, while the test set has only the

source sequences as features:

784 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!