22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

actual data, it is as bad as it can be. We can simply generate uniformly distributed

values between zero and one as our random probabilities:

np.random.seed(39)

random_probs = np.random.uniform(size=y_val.shape)

fpr_random, tpr_random, thresholds1_random = \

roc_curve(y_val, random_probs)

prec_random, rec_random, thresholds2_random = \

precision_recall_curve(y_val, random_probs)

Figure 3.21 - Worst curves ever

We have only 20 data points, so our curves are not as bad as they theoretically are

:-) The black dashed lines are the theoretical worst for both curves. On the left, the

diagonal line is as bad as it can be. On the right, it is a bit more nuanced: The worst

is a horizontal line, but the level is given by the proportion of positive samples in

the dataset. In our example, we have 11 positive examples out of 20 data points, so

the line sits at the level of 0.55.

Comparing Models

"If I have two models, how do I choose the best one?"

"The best model is the one with the best curve."

Captain Obvious

Thank you, Captain. The real question here is: How do you compare curves? The

closer they are to squares, the better they are, this much we already know.

Besides, if one curve has all its points above all the points of another curve, the

one above is clearly the best. The problem is, two different models may produce

Classification Threshold | 257

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!