20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Graphing the positives and negatives

333

$ ../.venv/bin/python -m p2ch12.training

Starting LunaTrainingApp...

...

E1 LunaTrainingApp

.../p2ch12/training.py:274: RuntimeWarning:

➥ invalid value encountered in double_scalars

metrics_dict['pr/f1_score'] = 2 * (precision * recall) /

➥ (precision + recall)

E1 trn 0.0025 loss, 99.8% correct, 0.0000 prc, 0.0000 rcl, nan f1

E1 trn_ben 0.0000 loss, 100.0% correct (494735 of 494743)

E1 trn_mal 1.0000 loss, 0.0% correct (0 of 1215)

.../p2ch12/training.py:269: RuntimeWarning:

➥ invalid value encountered in long_scalars

precision = metrics_dict['pr/precision'] = truePos_count /

➥ (truePos_count + falsePos_count)

The exact count and

line numbers of these

RuntimeWarning lines might

be different from run to run.

E1 val 0.0025 loss, 99.8% correct, nan prc, 0.0000 rcl, nan f1

E1 val_ben 0.0000 loss, 100.0% correct (54971 of 54971)

E1 val_mal 1.0000 loss, 0.0% correct (0 of 136)

Bummer. We’ve got some warnings, and given that some of the values we computed

were nan, there’s probably a division by zero happening somewhere. Let’s see what we

can figure out.

First, since none of the positive samples in the training set are getting classified as

positive, that means both precision and recall are zero, which results in our F1 score

calculation dividing by zero. Second, for our validation set, truePos_count and

falsePos_count are both zero due to nothing being flagged as positive. It follows that

the denominator of our precision calculation is also zero; that makes sense, as that’s

where we’re seeing another RuntimeWarning.

A handful of negative training samples are classified as positive (494735 of 494743

are classified as negative, so that leaves 8 samples misclassified). While that might

seem odd at first, recall that we are collecting our training results throughout the epoch,

rather than using the model’s end-of-epoch state as we do for the validation results.

That means the first batch is literally producing random results. A few of the samples

from that first batch being flagged as positive isn’t surprising.

NOTE Due to both the random initialization of the network weights and the

random ordering of the training samples, individual runs will likely exhibit

slightly different behavior. Having exactly reproducible behavior can be desirable

but is out of scope for what we’re trying to do in part 2 of this book.

Well, that was somewhat painful. Switching to our new metrics resulted in going from

A+ to “Zero, if you’re lucky”—and if we’re not lucky, the score is so bad that it’s not

even a number. Ouch.

That said, in the long run, this is good for us. We’ve known that our model’s performance

was garbage since chapter 11. If our metrics told us anything but that, it

would point to a fundamental flaw in the metrics!

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!