22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The figure below depicts the self-attention part of a decoder.

Figure 9.28 - Decoder with self-attention (simplified)

Once again, we can dive deeper into the inner workings of the self-attention

mechanism.

Figure 9.29 - Decoder with self-attention

There is one small difference in the self-attention architecture between encoder

and decoder: The feed-forward network in the decoder sits atop the cross-attention

mechanism (not depicted in the figure above) instead of the self-attention

mechanism. The feed-forward network also maps the decoder’s output from the

dimensionality of the model (d_model) back to the number of features, thus

yielding predictions.

Self-Attention | 749

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!