22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

"What about 'normalizing' each sequence by its own length?"

It solves that problem, but it raises another one; namely, the same position gets

different encoding values depending on the length of the sequence.

Figure 9.39 - "Normalizing" over different lengths

Ideally, the positional encoding should remain constant for a given position,

regardless of the length of the sequence.

"What if we take the module first, and then 'normalize' it?"

Well, that indeed solves the two problems above, but the values aren’t unique

anymore.

Figure 9.40 - "Normalizing" over a module of the length

"OK, I give up! How do we handle this?"

Let’s think outside the box for a moment; no one said we must use only one vector,

right? Let’s build three vectors instead, using three hypothetical sequence lengths

(four, five, and seven).

Figure 9.41 - Combining results for different modules

The positional encoding above is unique up to the 140th position and we can easily

extend that by adding more vectors.

766 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!