22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 9.20 - Attention scores

See? For the second sequence, it mostly paid attention to either the first data point

(for predicting point #3) or the second data point (for predicting point #4). The

model picks and chooses what it’s going to look at depending on the inputs. How

amazing is that?

Do you know what’s even better than one attention mechanism?

Multi-Headed Attention

There is no reason to stick with only one attention mechanism: We can use several

attention mechanisms at once, each referred to as an attention head.

Each attention head will output its own context vector, and they will all get

concatenated together and combined using a linear layer. In the end, the multiheaded

attention mechanism will still output a single context vector.

Attention | 735

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!