20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Predicting malignancy

423

Recall from chapter 8 that we could interpret the intermediate values as features

extracted from the image—features could be edges or corners that the model detects

or indications of any pattern. Before deep learning, it was very common to use handcrafted

features similar to what we briefly experimented with when starting with convolutions.

Deep learning has the network derive features useful for the task at hand,

such as discrimination between classes, from the data. Now, fine-tuning has us mix the

ancient ways (almost a decade ago!) of using preexisting features and the new way of

using learned features. We treat some (often large) part of the network as a fixed feature

extractor and only train a relatively small part on top of it.

This generally works very well. Pretrained networks trained on ImageNet as we saw

in chapter 2 are very useful as feature extractors for many tasks dealing with natural

images—sometimes they also work amazingly for completely different inputs, from

paintings or imitations thereof in style transfer to audio spectrograms. There are cases

when this strategy works less well. For example, one of the common data augmentation

strategies in training models on ImageNet is randomly flipping the images—a dog

looking right is the same class as one looking left. As a result, the features between

flipped images are very similar. But if we now try to use the pretrained model for a task

where left or right matters, we will likely encounter accuracy problems. If we want to

identify traffic signs, turn left here is quite different than turn right here; but a network

building on ImageNet-based features will probably make lots of wrong assignments

between the two classes. 6

In our case, we have a network that has been trained on similar data: the nodule

classification network. Let’s try using that.

For the sake of exposition, we will stay very basic in our fine-tuning approach. In

the model architecture in figure 14.8, the two bits of particular interest are highlighted:

the last convolutional block and the head_linear module. The simplest finetuning

is to cut out the head_linear part—in truth, we are just keeping the random

initialization. After we try that, we will also explore a variant where we retrain both

head_linear and the last convolutional block.

We need to do the following:

• Load the weights of the model we wish to start with, except for the last linear

layer, where we want to keep the initialization.

• Disable gradients for the parameters we do not want to train (everything except

parameters with names starting with head).

When we do fine-tuning training on more than head_linear, we still only reset head

_linear to random, because we believe the previous feature-extraction layers might

6

You can try it yourself with the venerable German Traffic Sign Recognition Benchmark dataset at http://

mng.bz/XPZ9.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!