22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Transformer Encoder

We’ll be representing the encoder using "stacked" layers in detail (like Figure 10.6

(b)); that is, showing the internal wrapped "sub-layers" (the dashed rectangles).

Figure 10.8 - Transformer encoder—norm-last vs norm-first

On the left, the encoder uses a norm-last wrapper, and its output (the encoder’s

states) is given by:

Equation 10.5 - Encoder’s output: norm-last

Transformer Encoder | 811

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!