22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 11.24 - Losses—simple classifier with BERT embeddings

OK, it’s still not overfitting, but can it deliver good predictions? You betcha!

StepByStep.loader_apply(test_loader, sbs_doc_emb.correct)

Output

tensor([[424, 440],

[310, 331]])

That’s 95.20% accuracy on the validation (test) set! Quite impressive for a model

with only three hidden units, I might say.

Now, imagine what can be accomplished if we fine-tune the actual BERT model

instead! Right? Right?

BERT

BERT, which stands for Bidirectional Encoder Representations from Transformers,

is a model based on a Transformer encoder. It was introduced by Devlin, J. et al. in

their paper "BERT: Pre-training of Deep Bidirectional Transformers for Language

Understanding" [199] (2019).

The original BERT model was trained on two huge corpora: BookCorpus [200]

(composed of 800M words in 11,038 unpublished books) and English Wikipedia [201]

(2.5B words). It has twelve "layers" (the original Transformer had only six), twelve

attention heads, and 768 hidden dimensions, totaling 110 million parameters.

BERT | 965

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!