22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

◦ If a mask is provided—(N, 1, L) shape for the source mask (in the encoder) or

(N, L, L) shape for the target mask (in the decoder)—it unsqueezes a new

dimension after the first one to accommodate the multiple heads since

every head should use the same mask.

torch.bmm() vs torch.matmul()

In the last chapter, we used torch.bmm() to perform batch matrix

multiplication. It was the right tool for the task at hand since we had two

three-dimensional tensors (for example, computing the context vector using

alphas and "values"):

Equation 10.1 - Batch matrix multiplication using torch.bmm()

Unfortunately, torch.bmm() cannot handle tensors with more dimensions

than that. Since we have a four-dimensional tensor after chunking, we need

something more powerful: torch.matmul(). It is a more generic operation

that, depending on its inputs, behaves like torch.dot(), torch.mm(), or

torch.bmm().

If we’re using torch.matmul() to multiply alphas and "values" again, while

using multiple heads and chunking, it looks like this:

Equation 10.2 - Batch matrix multiplication using torch.matmul()

It is quite similar to batch matrix multiplication, but you’re free to have as

many extra dimensions as you want: It still looks at the last two dimensions

only.

We can generate some dummy points corresponding to a mini-batch of 16

sequences (N), each sequence having two data points (L), each data point having

four features (F):

804 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!