22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1 What is that?

Its constructor takes four arguments:

• n_heads: the number of attention heads in the self-attention mechanism

• d_model: the number of (projected) features, that is, the dimensionality of the

model (remember, this number will be split among the attention heads, so it

must be a multiple of the number of heads)

• ff_units: the number of units in the hidden layer of the feed-forward network

• dropout: the probability of dropping out inputs

The forward() method takes a "query" and a source mask (to ignore padded data

points) as usual.

"What is that nn.LayerNorm?"

It is one teeny-tiny detail I haven’t mentioned before: Transformers do not use batch

normalization, but rather layer normalization.

"What’s the difference?"

Short answer: Batch normalization normalizes features, while layer normalization

normalizes data points. Long answer: There is a whole section on it; we’ll get back to

it soon enough.

Now we can stack a bunch of "layers" like that to build an actual encoder

(EncoderTransf). Its constructor takes an instance of an EncoderLayer, the number

of "layers" we’d like to stack on top of one another, and a max length of the source

sequence that’s going to be used for the positional encoding.

We’re using deepcopy() to make sure we create real copies of the encoder layer,

and nn.ModuleList to make sure PyTorch can find the "layers" inside the list. Our

default for the number of "layers" is only one, but the original Transformer uses six.

The forward() method is quite straightforward (I was actually missing making

puns): It adds positional encoding to the "query," loops over the "layers," and

normalizes the outputs in the end. The final outputs are, as usual, the states of the

encoder that will feed the cross-attention mechanism of every "layer" of the

decoder.

814 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!