22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stacking Encoders and Decoders

Let’s make our encoder-decoder architecture deeper by stacking two encoders on

top of one another, and then do the same with two decoders. It looks like this.

Figure 10.5 - Stacking encoders and decoders

The output of one encoder feeds the next, and the last encoder outputs states as

usual. These states will feed the cross-attention mechanism of all stacked

decoders. The output of one decoder feeds the next, and the last decoder outputs

predictions as usual.

The former encoder is now a so-called "layer", and a stack of "layers" composes

the new, deeper encoder. The same holds true for the decoder. Moreover, each

operation (multi-headed self- and cross-attention mechanisms, and feed-forward

Stacking Encoders and Decoders | 807

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!