22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

torch.all(new_flair_sentences[0].tokens[31].embedding ==

new_flair_sentences[1].tokens[13].embedding)

Output

tensor(True, device='cuda:0')

For more details on classical word embeddings, please check

"Tutorial 3: Word Embeddings" [194] and "Classic Word

Embeddings." [195]

BERT

The general idea, introduced by ELMo, of obtaining contextual word embeddings

using a language model still holds true for BERT. While ELMo is only a Muppet,

BERT is both Muppet and Transformer (such a bizarre sentence to write!).

BERT, which stands for Bidirectional Encoder Representations from Transformers,

is a model based on a Transformer encoder. We’ll skip more details about its

architecture for now (don’t worry, BERT has a full section of its own) and use it to

get contextual word embeddings only (just like we did with ELMo).

First, we need to load BERT in flair using TransformerWordEmbeddings:

from flair.embeddings import TransformerWordEmbeddings

bert = TransformerWordEmbeddings('bert-base-uncased', layers='-1')

By the way, flair uses HuggingFace models under the hood, so

you can load any pre-trained model [196] to generate embeddings

for you.

In the example above, we’re using the traditional bert-base-uncased to generate

contextual word embeddings using BERT’s last layer (-1).

Next, we can use the same get_embeddings() function to get the stacked

Contextual Word Embeddings | 957

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!