20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

100 CHAPTER 4 Real-world data representation using tensors

kiwi

brown

0.8

white

lychEe

0.6

yeLlow

lemon

0.4

orange

tangerine

0.2

red aPple

0.0 fruit

lily

pOodle

daFfodil

golden retriever

fox

poPpy

redbone

rose

flower

dog

0.0 0.2 0.4 0.6 0.8

Figure 4.7

Our manual word embeddings

rather shallow) neural network to generate the embedding. Once the embedding was

available, we could use it for downstream tasks.

One interesting aspect of the resulting embeddings is that similar words end up not

only clustered together, but also having consistent spatial relationships with other

words. For example, if we were to take the embedding vector for apple and begin to add

and subtract the vectors for other words, we could begin to perform analogies like apple

- red - sweet + yellow + sour and end up with a vector very similar to the one for lemon.

More contemporary embedding models—with BERT and GPT-2 making headlines

even in mainstream media—are much more elaborate and are context sensitive: that

is, the mapping of a word in the vocabulary to a vector is not fixed but depends on the

surrounding sentence. Yet they are often used just like the simpler classic embeddings

we’ve touched on here.

4.5.5 Text embeddings as a blueprint

Embeddings are an essential tool for when a large number of entries in the vocabulary

have to be represented by numeric vectors. But we won’t be using text and text

embeddings in this book, so you might wonder why we introduce them here. We

believe that how text is represented and processed can also be seen as an example for

dealing with categorical data in general. Embeddings are useful wherever one-hot

encoding becomes cumbersome. Indeed, in the form described previously, they are

an efficient way of representing one-hot encoding immediately followed by multiplication

with the matrix containing the embedding vectors as rows.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!