20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

324 CHAPTER 12 Improving training with metrics and augmentation

events in the middle of our graph. They must pick a vertical bark-worthiness threshold,

which means it’s impossible for either one of them to do so perfectly. Sometimes

the person hauling your appliances to their van is the repair person you hired to fix

your washing machine, and sometimes burglars show up in a van that says “Washing

Machine Repair” on the side. Expecting a dog to pick up on those nuances is bound

to fail.

The actual input data we’re going to use has high dimensionality—we need to consider

a ton of CT voxel values, along with more abstract things like candidate size,

overall location in the lungs, and so on. The job of our model is to map each of these

events and respective properties into this rectangle in such a way that we can separate

those positive and negative events cleanly using a single vertical line (our classification

threshold). This is done by the nn.Linear layers at the end of our model. The position

of the vertical line corresponds exactly to the classificationThreshold_float

we saw in section 11.6.1. There, we chose the hardcoded value 0.5 as our threshold.

Note that in reality, the data presented is not two-dimensional; it goes from very-highdimensional

after the second-to-last layer, to one-dimensional (here, our X-axis) at the

output—just a single scalar per sample (which is then bisected by the classification

threshold). Here, we use the second dimension (the Y-axis) to represent per-sample

features that our model cannot see or use: things like age or gender of the patient,

location of the nodule candidate in the lung, or even local aspects of the candidate that

the model hasn’t utilized. It also gives us a convenient way to represent confusion

between non-nodule and nodule samples.

The quadrant areas in figure 12.5 and the count of samples contained in each will

be the values we use to discuss model performance, since we can use the ratios between

these values to construct increasingly complex metrics that we can use to objectively

measure how well we are doing. As they say, “the proof is in the proportions.” 1 Next,

we’ll use ratios between these event subsets to start defining better metrics.

12.3.1 Recall is Roxie’s strength

Recall is basically “Make sure you never miss any interesting events!” Formally, recall is

the ratio of the true positives to the union of true positives and false negatives. We can

see this depicted in figure 12.6.

NOTE In some contexts, recall is referred to as sensitivity.

To improve recall, minimize false negatives. In guard dog terms, that means if you’re

unsure, bark at it, just in case. Don’t let any rodent thieves sneak by on your watch!

Roxie accomplishes having an incredibly high recall by pushing her classification

threshold all the way to the left, such that it encompasses nearly all of the positive

events in figure 12.7. Note how doing so means her recall value is near 1.0, which

means 99% of robbers are barked at. Since that’s how Roxie defines success, in her

mind, she’s doing a great job. Never mind the huge expanse of false positives!

1

No one actually says this.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!