20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

34 CHAPTER 2 Pretrained networks

“AN ODd-LOoKING

FELlOW HOLDING

A PINK BALlOoN”

CONVOLUTIONAL

(IMAGE RECOGNITION) RECURrENT

(TEXT GENERATION)

TRAINED END-TO-END ON

IMAGE-CAPTION PAIRS

Figure 2.9

Concept of a captioning model

along with a paired sentence description: for example, “A Tabby cat is leaning on a

wooden table, with one paw on a laser mouse and the other on a black laptop.” 3

This captioning model has two connected halves. The first half of the model is a

network that learns to generate “descriptive” numerical representations of the scene

(Tabby cat, laser mouse, paw), which are then taken as input to the second half. That

second half is a recurrent neural network that generates a coherent sentence by putting

those numerical descriptions together. The two halves of the model are trained

together on image-caption pairs.

The second half of the model is called recurrent because it generates its outputs

(individual words) in subsequent forward passes, where the input to each forward pass

includes the outputs of the previous forward pass. This generates a dependency of the

next word on words that were generated earlier, as we would expect when dealing with

sentences or, in general, with sequences.

2.3.1 NeuralTalk2

The NeuralTalk2 model can be found at https://github.com/deep-learning-withpytorch/ImageCaptioning.pytorch.

We can place a set of images in the data directory

and run the following script:

python eval.py --model ./data/FC/fc-model.pth

➥ --infos_path ./data/FC/fc-infos.pkl --image_folder ./data

Let’s try it with our horse.jpg image. It says, “A person riding a horse on a beach.”

Quite appropriate.

3

Andrej Karpathy and Li Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions,”

https://cs.stanford.edu/people/karpathy/cvpr2015.pdf.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!