22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Output

tensor([[ 2.2082, -0.6380, 0.4617],

[ 0.2674, 0.5349, 0.8094]], grad_fn=<EmbeddingBackward>)

That’s why the main job of the tokenizer is to transform a

sentence into a list of token IDs. That list is used as an input to

the embedding layer, and from then on, the tokens are

represented by dense vectors.

"How do you choose the number of dimensions?"

It is commonplace to use between 50 and 300 dimensions for word embeddings,

but some embeddings may be as large as 3,000 dimensions. That may look like a lot

but, compared to one-hot-encoded vectors, it is a bargain! The vocabulary of our

tiny dataset would already require more than 3,000 dimensions if it were one-hotencoded.

In our former example, "dog" was the central word and the other four words were

the context words:

tiny_vocab = ['the', 'small', 'is', 'barking', 'dog']

context_words = ['the', 'small', 'is', 'barking']

target_words = ['dog']

Now, let’s pretend that we tokenized the words and got their corresponding

indices:

batch_context = torch.as_tensor([[0, 1, 2, 3]]).long()

batch_target = torch.as_tensor([4]).long()

In its very first training step, the model would compute the continuous bag-ofwords

for the inputs by averaging the corresponding embeddings.

922 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!