22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

OK, we have the "keys" and a "query," so let’s pretend we can compute attention

scores (alphas) using them:

def calc_alphas(ks, q):

N, L, H = ks.size()

alphas = torch.ones(N, 1, L).float() * 1/L

return alphas

alphas = calc_alphas(keys, query)

alphas

Output

tensor([[[0.5000, 0.5000]]])

We had to make sure alphas had the right shape (N, 1, L) so that, when multiplied

by the "values" with shape (N, L, H), it will result in a weighted sum of the

alignment vectors with shape (N, 1, H). We can use batch matrix multiplication

(torch.bmm()) for that:

Equation 9.2 - Shapes for batch matrix multiplication

In other words, we can simply ignore the first dimension, and PyTorch will go over

all the elements in the mini-batch for us:

# N, 1, L x N, L, H -> 1, L x L, H -> 1, H

context_vector = torch.bmm(alphas, values)

context_vector

Output

tensor([[[ 0.1968, -0.2809]]], grad_fn=<BmmBackward0>)

"Why are you spending so much time on matrix multiplication, of all

things?"

Although it seems a fairly basic topic, getting the shapes and dimensions right is of

712 | Chapter 9 — Part I: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!