22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

curse. On the one hand, it makes computation very efficient; on the other hand, it is

throwing away valuable information.

"Can we fix it?"

Definitely! Instead of using a model designed to encode the order of the inputs (like

the recurrent neural networks), let’s encode the positional information ourselves

and add it to the inputs.

Positional Encoding (PE)

We need to find a way to inform the model about the position of every data point

such that it knows the order of the data points. In other words, we need to

generate a unique value for each position in the input.

Let’s put our simple sequence-to-sequence problem aside for a while and imagine

we have a sequence of four data points instead. The first idea that comes to mind is

to use the index of the position itself, right? Let’s just use zero, one, two, and three,

done! Maybe it wouldn’t be so bad for a short sequence like this, but what about a

sequence of, say, one thousand data points? The positional encoding of the last

data point would be 999. We shouldn’t be using unbounded values like that as inputs

for a neural network.

"What about 'normalizing' the indices?"

Sure, we can try that and divide the indices by the length of our sequence (four).

Figure 9.37 - "Normalizing" over the length

Unfortunately, that didn’t solve the problem—a longer sequence will still generate

values larger than one.

Figure 9.38 - "Normalizing" over a (shorter) length

Positional Encoding (PE) | 765

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!