22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

predictions in previous steps have no effect whatsoever.

"This is great for training time, sure—but what about testing time,

when the target sequence is unknown?"

At testing time, there is no escape from using only the model’s own predictions

from previous steps.

The problem is, a model trained using teacher forcing will minimize the loss given

the correct inputs at every step of the target sequence. But, since this will never

be the case at testing time, the model is likely to perform poorly when using its

own predictions as inputs.

"What can we do about it?"

When in doubt, flip a coin. Literally. During training, sometimes the model will use

teacher forcing, and sometimes it will use its own predictions. So we occasionally

help the model by providing an actual input, but we still force it to be robust

enough to generate and use its own inputs. In code, we just have to add an if

statement and draw a random number:

# Initial hidden state will be encoder's final hidden state

decoder.init_hidden(hidden_seq)

# Initial data point is the last element of source sequence

inputs = source_seq[:, -1:]

teacher_forcing_prob = 0.5

target_len = 2

for i in range(target_len):

print(f'Hidden: {decoder.hidden}')

out = decoder(inputs)

print(f'Output: {out}\n')

# If it is teacher forcing

if torch.rand(1) <= teacher_forcing_prob:

# Takes the actual element

inputs = target_seq[:, i:i+1]

else:

# Otherwise uses the last predicted output

inputs = out

Encoder-Decoder Architecture | 697

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!