22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

But it is also common to use a special classifier token [CLS], especially in NLP tasks

(as we’ll see in Chapter 11). The idea is quite simple and elegant: Add the same

token to the beginning of every sequence. This special token has an embedding as

well, and it will be learned by the model like any other parameter.

The first "hidden state" produced by the encoder, the output

corresponding to the added special token, plays the role of

overall representation of the image—just like the final hidden

state represented the overall sequence in recurrent neural

networks.

Remember that the Transformer encoder uses self-attention and

that every token can pay attention to every other token in the

sequence. Therefore, the special classifier token can actually

learn which tokens (patches, in our case) it needs to pay

attention to in order to correctly classify the sequence.

Let’s illustrate the addition of the [CLS] token by taking two images from our

dataset.

Figure 10.25 - Two images

Next, we get their corresponding patch embeddings:

embeddeds = patch_embed(imgs)

Our images were transformed into sequences of nine patch embeddings of size 16

each, so they can be represented like this (the features are on the horizontal axis

now).

854 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!