22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Let’s create an encoder and feed it a source sequence:

torch.manual_seed(11)

encself = EncoderSelfAttn(n_heads=3, d_model=2,

ff_units=10, n_features=2)

query = source_seq

encoder_states = encself(query)

encoder_states

Output

tensor([[[-0.0498, 0.2193],

[-0.0642, 0.2258]]], grad_fn=<AddBackward0>)

It produced a sequence of states that will be the input of the (cross-)attention

mechanism used by the decoder. Business as usual.

Cross-Attention

The cross-attention was the first mechanism we discussed: The decoder provided

a "query" (Q), which served not only as input but also got concatenated to the

resulting context vector. That won’t be the case anymore! Instead of

concatenation, the context vector will go through a feed-forward network in the

decoder to generate the predicted coordinates.

The figure below illustrates the current state of the architecture: self-attention as

encoder, cross-attention on top of it, and the modifications to the decoder.

746 | Chapter 9 — Part II: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!