22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

That’s a fairly simple model, right? If our vocabulary had only five words ("the,"

"small," "is," "barking," and "dog"), we could try to represent each word with an

embedding of three dimensions. Let’s create a dummy model to inspect its

(randomly initialized) embeddings:

torch.manual_seed(42)

dummy_cbow = CBOW(vocab_size=5, embedding_size=3)

dummy_cbow.embedding.state_dict()

Output

OrderedDict([('weight', tensor([[ 0.3367, 0.1288, 0.2345],

[ 0.2303, -1.1229, -0.1863],

[ 2.2082, -0.6380, 0.4617],

[ 0.2674, 0.5349, 0.8094],

[ 1.1103, -1.6898, -0.9890]]))])

Figure 11.12 - Word embeddings

As depicted in the figure above, PyTorch’s nn.Embedding layer is a large lookup

table. It may be randomly initialized given the size of the vocabulary

(num_embeddings) and the number of dimensions (embedding_dim). To actually

retrieve the values, we need to call the embedding layer with a list of token

indices, and it will return the corresponding rows of the table.

For example, we can retrieve the embeddings for the tokens "is" and "barking" using

their corresponding indices (two and three):

# tokens: ['is', 'barking']

dummy_cbow.embedding(torch.as_tensor([2, 3]))

Word Embeddings | 921

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!