22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Next, we shift our focus to the self-attention mechanism on the right:

• It is the second data point's turn to be the "query" (Q), being paired with both

"keys" (K), generating attention scores and a context vector, resulting in the

second "hidden state":

Equation 9.12 - Context vector for second input (x 1 )

As you probably already noticed, the context vector (and thus

the "hidden state") associated with a data point is basically a

function of the corresponding "query" (Q), and everything else

("keys" (K), "values" (V), and the parameters of the self-attention

mechanism) is held constant for all queries.

Therefore, we can simplify a bit our previous diagram and depict only one selfattention

mechanism, assuming it will be fed a different "query" (Q) every time.

Figure 9.25 - Encoder with self-attention

Self-Attention | 743

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!