22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

entirely handled by the model itself using its hidden_state attribute.

There is one problem with the approach above, though—an untrained model will

make really bad predictions, and these predictions will still be used as inputs for

subsequent steps. This makes model training unnecessarily hard because the

prediction error in one step is caused by both the (untrained) model and the

prediction error in the previous step.

"Can’t we use the actual target sequence instead?"

Sure we can! This technique is called teacher forcing.

Teacher Forcing

The reasoning is simple: Ignore the predictions and use the real data from the

target sequence instead. In code, we only need to change the last line:

# Initial hidden state will be encoder's final hidden state

decoder.init_hidden(hidden_seq)

# Initial data point is the last element of source sequence

inputs = source_seq[:, -1:]

target_len = 2

for i in range(target_len):

print(f'Hidden: {decoder.hidden}')

out = decoder(inputs) # Predicts coordinates

print(f'Output: {out}\n')

# Completely ignores the predictions and uses real data instead

inputs = target_seq[:, i:i+1] 1

1 Inputs to the next step are not predictions anymore.

Output

Hidden: tensor([[[ 0.3105, -0.5263]]], grad_fn=<SliceBackward>)

Output: tensor([[[-0.2339, 0.4702]]], grad_fn=<ViewBackward>)

Hidden: tensor([[[ 0.3913, -0.6853]]], grad_fn=<StackBackward>)

Output: tensor([[[0.2265, 0.4529]]], grad_fn=<ViewBackward>)

Now, a bad prediction can only be traced to the model itself, and any bad

696 | Chapter 9 — Part I: Sequence-to-Sequence

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!