22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Model Configuration

1 class BERTClassifier(nn.Module):

2 def __init__(self, bert_model, ff_units,

3 n_outputs, dropout=0.3):

4 super().__init__()

5 self.d_model = bert_model.config.dim

6 self.n_outputs = n_outputs

7 self.encoder = bert_model

8 self.mlp = nn.Sequential(

9 nn.Linear(self.d_model, ff_units),

10 nn.ReLU(),

11 nn.Dropout(dropout),

12 nn.Linear(ff_units, n_outputs)

13 )

14

15 def encode(self, source, source_mask=None):

16 states = self.encoder(

17 input_ids=source, attention_mask=source_mask)[0]

18 cls_state = states[:, 0]

19 return cls_state

20

21 def forward(self, X):

22 source_mask = (X > 0)

23 # Featurizer

24 cls_state = self.encode(X, source_mask)

25 # Classifier

26 out = self.mlp(cls_state)

27 return out

Both encode() and forward() methods are roughly the same as before, but the

classifier (mlp) has both hidden and dropout layers now.

Our model takes an instance of a pre-trained BERT model, the number of units in

the hidden layer of the classifier, and the desired number of outputs (logits)

corresponding to the number of existing classes. The forward() method takes a

mini-batch of token IDs, encodes them using BERT (featurizer), and outputs logits

(classifier).

"Why does the model compute the source mask itself instead of using

the output from the tokenizer?"

BERT | 985

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!