22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7. "Sub-Layer" Wrapper

The "sub-layer" wrapper implements the norm-first approach to wrapping "sublayers."

Figure 10.31 - "Sub-Layer" wrapper—norm-first

It normalizes the inputs, calls the "sub-layer" itself (passed as argument), applies

dropout, and adds the residual connection at the end:

Sub-Layer Wrapper

1 class SubLayerWrapper(nn.Module):

2 def __init__(self, d_model, dropout):

3 super().__init__()

4 self.norm = nn.LayerNorm(d_model)

5 self.drop = nn.Dropout(dropout)

6

7 def forward(self, x, sublayer, is_self_attn=False, **kwargs):

8 norm_x = self.norm(x)

9 if is_self_attn:

10 sublayer.init_keys(norm_x)

11 out = x + self.drop(sublayer(norm_x, **kwargs)) 1

12 return out

1 Calls Multi-Headed Attention (8) (and the feed-forward network as well,

depending on the sublayer argument)

To make it more clear how this module was used to replace most of the code in the

Putting It All Together | 871

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!