22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

chunks to compute the other half of the context vector, which, in the end, has

the desired dimension.

• Like the former multi-headed attention mechanism, the context vector goes

through a feed-forward network to generate the "hidden states" (only the first

one is depicted in the figure above).

It looks complicated, I know, but it really isn’t that bad. Maybe it helps to see it in

code.

Multi-Headed Attention

The new multi-headed attention class is more than a combination of both the

Attention and MultiHeadedAttention classes from the previous chapter: It

implements the chunking of the projections and introduces dropout for attention

scores.

Multi-Headed Attention

1 class MultiHeadedAttention(nn.Module):

2 def __init__(self, n_heads, d_model, dropout=0.1):

3 super(MultiHeadedAttention, self).__init__()

4 self.n_heads = n_heads

5 self.d_model = d_model

6 self.d_k = int(d_model / n_heads) 1

7 self.linear_query = nn.Linear(d_model, d_model)

8 self.linear_key = nn.Linear(d_model, d_model)

9 self.linear_value = nn.Linear(d_model, d_model)

10 self.linear_out = nn.Linear(d_model, d_model)

11 self.dropout = nn.Dropout(p=dropout) 4

12 self.alphas = None

13

14 def make_chunks(self, x): 1

15 batch_size, seq_len = x.size(0), x.size(1)

16 # N, L, D -> N, L, n_heads * d_k

17 x = x.view(batch_size, seq_len, self.n_heads, self.d_k)

18 # N, n_heads, L, d_k

19 x = x.transpose(1, 2)

20 return x

21

22 def init_keys(self, key):

23 # N, n_heads, L, d_k

24 self.proj_key = self.make_chunks(self.linear_key(key)) 1

Narrow Attention | 801

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!