20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

342 CHAPTER 12 Improving training with metrics and augmentation

E1 trn_neg 0.0026 loss, 100.0% correct (494717 of 494743)

E1 trn_pos 6.5267 loss, 0.0% correct (0 of 1215)

...

E1 val 0.0173 loss, 99.8% correct, nan precision, 0.0000 recall,

➥ nan f1 score

E1 val_neg 0.0026 loss, 100.0% correct (54971 of 54971)

E1 val_pos 5.9577 loss, 0.0% correct (0 of 136)

But when we run with --balanced, we see the following:

$ python -m p2ch12.training --balanced

...

E1 LunaTrainingApp

E1 trn 0.1734 loss, 92.8% correct, 0.9363 precision, 0.9194 recall,

➥ 0.9277 f1 score

E1 trn_neg 0.1770 loss, 93.7% correct (93741 of 100000)

E1 trn_pos 0.1698 loss, 91.9% correct (91939 of 100000)

...

E1 val 0.0564 loss, 98.4% correct, 0.1102 precision, 0.7941 recall,

➥ 0.1935 f1 score

E1 val_neg 0.0542 loss, 98.4% correct (54099 of 54971)

E1 val_pos 0.9549 loss, 79.4% correct (108 of 136)

This seems much better! We’ve given up about 5% correct answers on the negative

samples to gain 86% correct positive answers. We’re back into a solid B range again! 5

As in chapter 11, however, this result is deceptive. Since there are 400 times as

many negative samples as positive ones, even getting just 1% wrong means we’d be

incorrectly classifying negative samples as positive four times more often than there

are actually positive samples in total!

Still, this is clearly better than the outright wrong behavior from chapter 11 and

much better than a random coin flip. In fact, we’ve even crossed over into being

(almost) legitimately useful in real-world scenarios. Recall our overworked radiologist

poring over each and every speck of a CT: well, now we’ve got something that can do a

reasonable job of screening out 95% of the false positives. That’s a huge help, since it

translates into about a tenfold increase in productivity for the machine-assisted human.

Of course, there’s still that pesky issue of the 14% of positive samples that were

missed, which we should probably deal with. Perhaps some additional epochs of training

would help. Let’s see (and again, expect to spend at least 10 minutes per epoch):

$ python -m p2ch12.training --balanced --epochs 20

...

E2 LunaTrainingApp

E2 trn 0.0432 loss, 98.7% correct, 0.9866 precision, 0.9879 recall,

➥ 0.9873 f1 score

E2 trn_ben 0.0545 loss, 98.7% correct (98663 of 100000)

E2 trn_mal 0.0318 loss, 98.8% correct (98790 of 100000)

5

And remember that this is after only the 200,000 training samples presented, not the 500,000+ of the unbalanced

dataset, so we got there in less than half the time.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!