20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Predicting malignancy

421

Here, we also call out two specific threshold values: diameters of 5.42 mm and 10.55

mm. We chose those two values because they give us somewhat reasonable endpoints

for the range of thresholds we might consider, were we to need to pick a single threshold.

Anything smaller than 5.42 mm, and we’d only be dropping our TPR. Larger

than 10.55 mm, and we’d just be flagging malignant nodules as benign for no gain.

The best threshold for this classifier will probably be in the middle somewhere.

How do we actually compute the values shown here? We first grab the candidate

info list, filter out the annotated nodules, and get the malignancy label and diameter.

For convenience, we also get the number of benign and malignant nodules.

Listing 14.8

p2ch14_malben_baseline.ipynb

Takes the regular dataset and in particular

the list of benign and malignant nodules

# In[2]:

ds = p2ch14.dsets.MalignantLunaDataset(val_stride=10, isValSet_bool=True)

nodules = ds.ben_list + ds.mal_list

is_mal = torch.tensor([n.isMal_bool for n in nodules])

diam = torch.tensor([n.diameter_mm for n in nodules])

num_mal = is_mal.sum()

num_ben = len(is_mal) - num_mal For normalization of the TPR and

FPR , we take the number of

malignant and benign nodules.

To compute the ROC curve, we need an array of the possible thresholds. We get this

from torch.linspace, which takes the two boundary elements. We wish to start at

zero predicted positives, so we go from maximal threshold to minimal. This is the 3.25

to 22.78 we already mentioned:

# In[3]:

threshold = torch.linspace(diam.max(), diam.min())

Gets lists of

malignancy status

and diameter

We then build a two-dimensional tensor in which the rows are per threshold, the columns

are per-sample information, and the value is whether this sample is predicted as

positive. This Boolean tensor is then filtered by whether the label of the sample is

malignant or benign. We sum the rows to count the number of True entries. Dividing

by the number of malignant or benign nodules gives us the TPR and FPR—the two

coordinates for the ROC curve:

Indexing by None adds a dimension of size 1, just like

.unsqueeze(ndx). This gets us a 2D tensor of whether a

given nodule (in a column) is classified as malignant for

# In[4]:

a given diameter (in the row).

predictions = (diam[None] >= threshold[:, None])

tp_diam = (predictions & is_mal[None]).sum(1).float() / num_mal

fp_diam = (predictions & ~is_mal[None]).sum(1).float() / num_ben

With the predictions matrix, we can

compute the TPRs and FPRs for each

diameter by summing over the columns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!