22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

torch.manual_seed(11)

proj_dim = 6

linear_proj = nn.Linear(2, proj_dim)

pe = PositionalEncoding(2, proj_dim)

source_seq_proj = linear_proj(source_seq)

source_seq_proj_enc = pe(source_seq_proj)

source_seq_proj_enc

Output

tensor([[[-2.0934, 1.5040, 1.8742, 0.0628, 0.3034, 2.0190],

[-0.8853, 2.8213, 0.5911, 2.4193, -2.5230, 0.3599]]],

grad_fn=<AddBackward0>)

See? Now each data point in our source sequence has six features (the projected

dimensions), and they are positionally-encoded too. Sure, this particular projection

is totally random, but that won’t be the case once we add the corresponding linear

layer to our model. It will learn a meaningful projection that, after being

positionally-encoded, will be normalized:

norm = nn.LayerNorm(proj_dim)

norm(source_seq_proj_enc)

Output

tensor([[[-1.9061, 0.6287, 0.8896, -0.3868, -0.2172, 0.9917],

[-0.7362, 1.2864, 0.0694, 1.0670, -1.6299, -0.0568]]],

grad_fn=<NativeLayerNormBackward>)

Problem solved! Finally, we have everything we need to build a full-blown

Transformer!

In Chapter 9, we used affine transformations inside the attention

heads to map from input dimensions to hidden (or model)

dimensions. Now, this change in dimensionality is being

performed using projections directly on the input sequences

before they are passed to the encoder and the decoder.

830 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!