20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Independence of the validation set

407

b ROC/AUC metrics—Before we can start our last classification step, we’ll define

some new metrics for examining the performance of classification models, as

well as establish a baseline metric against which to compare our malignancy

classifiers.

c Fine-tuning the malignancy model—Once our new metrics are in place, we will

define a model specifically for classifying benign and malignant nodules,

train it, and see how it performs. We will do the training by fine-tuning: a

process that cuts out some of the weights of an existing model and replaces

them with fresh values that we then adapt to our new task.

At that point we will be within arm’s reach of our ultimate goal: to classify nodules into

benign and malignant classes and then derive a diagnosis from the CT. Again, diagnosing

lung cancer in the real world involves much more than staring at a CT scan, so

our performing this diagnosis is more an experiment to see how far we can get using

deep learning and imaging data alone.

3 End-to-end detection. Finally, we will put all of this together to get to the finish

line, combining the components into an end-to-end solution that can look at a

CT and answer the question “Are there malignant nodules present in the

lungs?”

a IRC—We will segment our CT to get nodule candidate samples to classify.

b Determine the nodules—We will perform nodule classification on the candidate

to determine whether it should be fed into the malignancy classifier.

c Determine malignancy—We will perform malignancy classification on the nodules

that pass through the nodule classifier to determine whether the patient

has cancer.

We’ve got a lot to do. To the finish line!

NOTE As in the previous chapter, we will discuss the key concepts in detail in

the text and leave out the code for repetitive, tedious, or obvious parts. Full

details can be found in the book’s code repository.

14.2 Independence of the validation set

We are in danger of making a subtle but critical mistake, which we need to discuss and

avoid: we have a potential leak from the training set to the validation set! For each of

the segmentation and classification models, we took care of splitting the data into a

training set and an independent validation set by taking every tenth example for validation

and the remainder for training.

However, the split for the classification model was done on the list of nodules, and

the split for the segmentation model was done on the list of CT scans. This means we

likely have nodules from the segmentation validation set in the training set of the classification

model and vice versa. We must avoid that! If left unfixed, this situation could

lead to performance figures that would be artificially higher compared to what we

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!