22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8. Multi-Headed Attention

The multi-headed attention mechanism below replicates the implemented narrow

attention described at the start of this chapter, chunking the projections of "keys"

(K), "values" (V), and "queries" (Q) to make the size of the model more manageable:

Multi-Headed Attention

1 class MultiHeadedAttention(nn.Module):

2 def __init__(self, n_heads, d_model, dropout=0.1):

3 super(MultiHeadedAttention, self).__init__()

4 self.n_heads = n_heads

5 self.d_model = d_model

6 self.d_k = int(d_model / n_heads)

7 self.linear_query = nn.Linear(d_model, d_model)

8 self.linear_key = nn.Linear(d_model, d_model)

9 self.linear_value = nn.Linear(d_model, d_model)

10 self.linear_out = nn.Linear(d_model, d_model)

11 self.dropout = nn.Dropout(p=dropout)

12 self.alphas = None

13

14 def make_chunks(self, x):

15 batch_size, seq_len = x.size(0), x.size(1)

16 # N, L, D -> N, L, n_heads * d_k

17 x = x.view(batch_size, seq_len, self.n_heads, self.d_k)

18 # N, n_heads, L, d_k

19 x = x.transpose(1, 2)

20 return x

21

22 def init_keys(self, key):

23 # N, n_heads, L, d_k

24 self.proj_key = self.make_chunks(self.linear_key(key))

25 self.proj_value = \

26 self.make_chunks(self.linear_value(key))

27

28 def score_function(self, query):

29 # scaled dot product

30 # N, n_heads, L, d_k x # N, n_heads, d_k, L

31 # -> N, n_heads, L, L

32 proj_query = self.make_chunks(self.linear_query(query))

33 dot_products = torch.matmul(

34 proj_query, self.proj_key.transpose(-2, -1)

35 )

Putting It All Together | 873

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!