22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

clear that the model simply cannot cheat (there’s no K 3 or V 3 ):

Equation 9.15 - Context vector for the second target

We can also check it quickly by looking at the subscript indices: As long as the

indices of the "values" are lower than the index of the context vector, there is no

cheating. By the way, it is even easier to check what’s happening if we use the

alphas matrix:

Equation 9.16 - Decoder’s attention scores

For the decoder, the shape of the alphas attribute is given by (N, L target , L target ) since

it is looking at itself. Any alphas above the diagonal are, literally, cheating codes.

We need to force the self-attention mechanism to ignore them. If only there was a

way to do it…

"What about those masks we discussed earlier?"

You’re absolutely right! They are perfect for this case.

Target Mask (Training)

The purpose of the target mask is to zero attention scores for "future" data

points. In our example, that’s the alphas matrix we’re aiming for:

752 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!