22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Transformer Decoder

We’ll be representing the decoder using "stacked" layers in detail (like Figure 10.6

(b)); that is, showing the internal wrapped "sub-layers" (the dashed rectangles).

Figure 10.9 - Transformer decoder—norm-last vs norm-first

The small arrow on the left represents the states produced by the encoder, which

will be used as inputs for "keys" and "values" of the (cross-)multi-headed attention

mechanism in each "layer."

Moreover, there is one final linear layer responsible for projecting the decoder’s

output back to the original number of dimensions (corner’s coordinates, in our

case). This linear layer is not included in our decoder’s class, though: It will be part

816 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!