22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Equation 9.17 - Decoder’s (masked) attention scores

Therefore we need a mask that flags every element above the

diagonal as invalid, as we did with the padded data points in the

source mask. The shape of the target mask, though, must match

the shape of the alphas attribute: (1, L target , L target ).

We can create a function to generate the mask for us:

Subsequent Mask

1 def subsequent_mask(size):

2 attn_shape = (1, size, size)

3 subsequent_mask = (

4 1 - torch.triu(torch.ones(attn_shape), diagonal=1)

5 ).bool()

6 return subsequent_mask

subsequent_mask(2) # 1, L, L

Output

tensor([[[ True, False],

[ True, True]]])

Perfect! The element above the diagonal is indeed set to False.

We must use this mask while querying the decoder to prevent it

from cheating. You can choose to use an additional mask to

"hide" more data from the decoder if you wish, but the

subsequent mask is a necessity with the self-attention decoder.

Self-Attention | 753

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!