22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

"Great, but there is something weird—position four should have

a larger distance than position three to position zero, right?"

Not necessarily, no. The distance does not need to always increase. It is OK

for the distance between positions zero and four (1.86) to be less than the

distance between positions zero and three (2.02), as long as the diagonals

hold.

For a more detailed discussion about using sines and cosines

for positional encoding, check Amirhossein Kazemnejad’s

great post on the subject: "Transformer Architecture: The

Positional Encoding." [145]

Since we’re using four different angular speeds, the positional encoding depicted

in Figure 9.47 has eight dimensions. Notice that the red arrow barely moves in the

last two rows.

In practice, we’ll choose the number of dimensions first, and then compute the

corresponding speeds. For example, for encoding with eight dimensions, like

Figure 9.47, there are four angular speeds:

Equation 9.20 - Angular speeds

The positional encoding is given by the two formulas below:

Equation 9.21 - Positional encoding

772 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!