22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The largest logit corresponds to the word "small" (class index one), so that would be

the predicted central word: "The small small is barking." The prediction is obviously

wrong, but, then again, that’s still a randomly initialized model. Given a large

enough dataset of context and target words, we could train the CBOW model above

using an nn.CrossEntropyLoss() to learn actual word embeddings.

The Word2Vec model may also be trained using the skip-gram

approach instead of continuous bag-of-words. The skip-gram

uses the central word to predict the surrounding words, thus

being a multi-label multiclass classification problem. In our

simple example, the input would be the central word "dog," and

the model would try to predict the four context words ("the,"

"small," "is," and "barking") at once.

We’re not diving any deeper into the inner workings of the

Word2Vec model, but you can check Jay Alammar’s "The

Illustrated Word2Vec" [181] and Lilian Weng’s "Learning Word

Embedding," [182] amazing posts on the subject.

If you’re interested in training a Word2Vec model yourself,

follow Jason Brownlee’s great tutorial: "How to Develop Word

Embeddings in Python with Gensim." [183]

So far, it looks like we’re learning word embeddings just for the sake of getting

more compact (denser) representations than one-hot encoding can offer for each

word. But word embeddings are more than that.

What Is an Embedding Anyway?

An embedding is a representation of an entity (a word, in our case), and each of its

dimensions can be seen as an attribute or feature.

Let’s forget about words for a moment and talk about restaurants instead. We can

rate restaurants over many different dimensions, like food, price, and service, for

example.

924 | Chapter 11: Down the Yellow Brick Rabbit Hole

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!