20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

436 CHAPTER 14 End-to-end nodule analysis, and where to go next

In ensembling, we typically use completely separate training runs or even varying

model structures. But if we were to make it particularly simple, we could take several

snapshots of the model from a single training run—preferably shortly before the end

or before we start to observe overfitting. We might try to build an ensemble of these

snapshots, but as they will still be somewhat close to each other, we could instead average

them. This is the core idea of stochastic weight averaging. 17 We need to exercise

some care when doing so: for example, when our models use batch normalization, we

might want to adjust the statistics, but we can likely get a small accuracy boost even

without that.

GENERALIZING WHAT WE ASK THE NETWORK TO LEARN

We could also look at multitask learning, where we require a model to learn additional

outputs beyond the ones we will then evaluate, 18 which has a proven track record of

improving results. We could try to train on nodule versus non-nodule and benign versus

malignant at the same time. Actually, the data source for the malignancy data provides

additional labeling we could use as additional tasks; see the next section. This

idea is closely related to the transfer-learning concept we looked at earlier, but here

we would typically train both tasks in parallel rather than first doing one and then trying

to move to the next.

If we do not have additional tasks but rather have a stash of additional unlabeled

data, we can look into semi-supervised learning. An approach that was recently proposed

and looks very effective is unsupervised data augmentation. 19 Here we train our model

as usual on the data. On the unlabeled data, we make a prediction on an unaugmented

sample. We then take that prediction as the target for this sample and train

the model to predict that target on the augmented sample as well. In other words, we

don’t know if the prediction is correct, but we ask the network to produce consistent

outputs whether we augment or not.

When we run out of tasks of genuine interest but do not have additional data, we

may look at making things up. Making up data is somewhat difficult (although people

sometimes use GANs similar to the ones we briefly saw in chapter 2, with some success),

so we instead make up tasks. This is when we enter the realm of self-supervised

learning; the tasks are often called pretext tasks. A very popular crop of pretext tasks

apply some sort of corruption to some of the inputs. Then we can train a network to

reconstruct the original (for example, using a U-Net-like architecture) or train a classifier

to detect real from corrupted data while sharing large parts of the model (such

as the convolutional layers).

This is still dependent on us coming up with a way to corrupt our inputs. If we

don’t have such a method in mind and aren’t getting the results we want, there are

17 Pavel Izmailov and Andrew Gordon Wilson present an introduction with PyTorch code at http://mng.bz/gywe.

18 See Sebastian Ruder, “An Overview of Multi-Task Learning in Deep Neural Networks,” https://arxiv.org/

abs/1706.05098; but this is also a key idea in many areas.

19 Q. Xie et al., “Unsupervised Data Augmentation for Consistency Training,” https://arxiv.org/abs/

1904.12848.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!