22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Figure 9.12 - Matching a query to the keys

The encoder’s hidden states are called "keys" (K), while the

decoder’s hidden state is called a "query" (Q).

"Wait a minute! I thought the encoder’s hidden states were called

"values" (V)."

You’re absolutely right. The encoder’s hidden states are used as both "keys" (K)

and "values" (V). Later on, we’ll apply affine transformations to the hidden states,

one for the "keys," another for the "values," so they will actually have different

values.

"Where do these names come from, anyway?"

Well, the general idea is that the encoder works like a key-value store, as if it were

some sort of database, and then the decoder queries it. The attention mechanism

looks the query up in its keys (the matching part) and returns its values. Honestly,

I don’t think this idea helps much, because the mechanism doesn’t return a single

original value, but rather a weighted average of all of them. But this naming

convention is used everywhere, so you need to know it.

"Why is 'the' a better match than 'zone' in this case?"

Fair enough. These are made-up values, and their sole purpose is to illustrate the

attention mechanism. If it helps, consider that sentences are more likely to start

with "the" than "zone," so the former is likely a better match to the special <start>

token.

"OK, I will play along."

Thanks! Even though we haven’t actually discussed how to match a given "query"

(Q) to the "keys" (K), we can update our diagram to include them.

Attention | 709

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!