22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Layer normalization, in a tabular dataset, standardizes the rows.

Each data point will have the average of its features equal zero,

and the standard deviation of its features will equal one.

Let’s assume we have a mini-batch of three sequences (N=3), each sequence having

a length of two (L=2), each data point having four features (D=4), and, to illustrate

the importance of layer normalization, let’s add positional encoding to it too:

d_model = 4

seq_len = 2

n_points = 3

torch.manual_seed(34)

data = torch.randn(n_points, seq_len, d_model)

pe = PositionalEncoding(seq_len, d_model)

inputs = pe(data)

inputs

Output

tensor([[[-3.8049, 1.9899, -1.7325, 2.1359],

[ 1.7854, 0.8155, 0.1116, -1.7420]],

[[-2.4273, 1.3559, 2.8615, 2.0084],

[-1.0353, -1.2766, -2.2082, -0.6952]],

[[-0.8044, 1.9707, 3.3704, 2.0587],

[ 4.2256, 6.9575, 1.4770, 2.0762]]])

It should be straightforward to identify the different dimensions, N (three vertical

groups), L (two rows in each group), and D (four columns), in the tensor above.

There are six data points in total, and their value range is mostly the result of the

addition of positional encoding.

Well, layer normalization standardizes individual data points, the rows in the

tensor above, so we need to compute statistics over the corresponding dimension

(D). Let’s start with the means:

822 | Chapter 10: Transform and Roll Out

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!