20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

400 CHAPTER 13 Using segmentation to find suspected nodules

...

E15 trn 0.2226 loss, 0.6234 precision, 0.9536 recall, 0.7540 f1 score

E15 trn_all 0.2226 loss, 95.4% tp, <2> 4.6% fn, 57.6% fp

...

E20 trn 0.2149 loss, 0.6368 precision, 0.9584 recall, 0.7652 f1 score

E20 trn_all 0.2149 loss, 95.8% tp, <2> 4.2% fn, 54.7% fp

In these rows, we are particularly interested

in the F1 score—it is trending up. Good!

Overall, it looks pretty good. True positives and the F1 score are trending up, false

positives and negatives are trending down. That’s what we want to see! The validation

metrics will tell us whether these results are legitimate. Keep in mind that since we’re

training on 64 × 64 crops, but validating on whole 512 × 512 CT slices, we are almost

certainly going to have drastically different TP:FN:FP ratios. Let’s see:

The highest TP rate (great). Note that the TP rate is the same

as recall. But FPs are 4495%—that sounds like a lot.

TPs are trending up, too. Great! And

FNs and FPs are trending down.

E1 val 0.9441 loss, 0.0219 precision, 0.8131 recall, 0.0426 f1 score

E1 val_all 0.9441 loss, 81.3% tp, 18.7% fn, 3637.5% fp

E5 val 0.9009 loss, 0.0332 precision, 0.8397 recall, 0.0639 f1 score

E5 val_all 0.9009 loss, 84.0% tp, 16.0% fn, 2443.0% fp

E10 val 0.9518 loss, 0.0184 precision, 0.8423 recall, 0.0360 f1 score

E10 val_all 0.9518 loss, 84.2% tp, 15.8% fn, 4495.0% fp

E15 val 0.8100 loss, 0.0610 precision, 0.7792 recall, 0.1132 f1 score

E15 val_all 0.8100 loss, 77.9% tp, 22.1% fn, 1198.7% fp

E20 val 0.8602 loss, 0.0427 precision, 0.7691 recall, 0.0809 f1 score

E20 val_all 0.8602 loss, 76.9% tp, 23.1% fn, 1723.9% fp

Ouch—false positive rates over 4,000%? Yes, actually, that’s expected. Our validation

slice area is 2 18 pixels (512 is 2 9 ), while our training crop is only 2 12 . That means we’re

validating on a slice surface that’s 2 6 = 64 times bigger! Having a false positive count

that’s also 64 times bigger makes sense. Remember that our true positive rate won’t

have changed meaningfully, since it would all have been included in the 64 × 64 sample

we trained on in the first place. This situation also results in very low precision,

and, hence, a low F1 score. That’s a natural result of how we’ve structured the training

and validation, so it’s not a cause for alarm.

What’s problematic, however, is our recall (and, hence, our true positive rate). Our

recall plateaus between epochs 5 and 10 and then starts to drop. It’s pretty obvious that

we begin overfitting very quickly, and we can see further evidence of that in figure

13.18—while the training recall keeps trending upward, the validation recall decreases

after 3 million samples. This is how we identified overfitting in chapter 5, in particular

figure 5.14.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!