20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Predicting malignancy

417

We run the following, which should take half an hour to an hour when run on the

GPU. After coffee (or a full-blown nap), here is what we get:

$ python3 -m p2ch14.nodule_analysis --run-validation

...

Total

| Complete Miss | Filtered Out | Pred. Nodule

Non-Nodules | | 164893 | 2156

Benign | 12 | 3 | 87

Malignant | 1 | 6 | 45

We detected 132 of the 154 nodules, or 85%. Of the 22 we missed, 13 were not considered

candidates by the segmentation, so this would be the obvious starting point for

improvements.

About 95% of the detected nodules are false positives. This is of course not great; on

the other hand, it’s a lot less critical—having to look at 20 nodule candidates to find one

nodule will be much easier than looking at the entire CT. We will go into this in more detail

in section 14.7.2, but we want to stress that rather than treating these mistakes as a black

box, it’s a good idea to investigate the misclassified cases and see if they have commonalities.

Are there characteristics that differentiate them from the samples that were correctly

classified? Can we find anything that could be used to improve our performance?

For now, we’re going to accept our numbers as is: not bad, but not perfect. The

exact numbers may differ when you run your self-trained model. Toward the end of

this chapter, we will provide some pointers to papers and techniques that can help

improve these numbers. With inspiration and some experimentation, we are confident

that you can achieve better scores than we show here.

14.5 Predicting malignancy

Now that we have implemented the nodule-detection task of the LUNA challenge and

can produce our own nodule predictions, we ask ourselves the logical next question:

can we distinguish malignant nodules from benign ones? We should say that even with

a good system, diagnosing malignancy would probably take a more holistic view of the

patient, additional non-CT context, and eventually a biopsy, rather than just looking

at single nodules in isolation on a CT scan. As such, this seems to be a task that is likely

to be performed by a doctor for some time to come.

14.5.1 Getting malignancy information

The LUNA challenge focuses on nodule detection and does not come with malignancy

information. The LIDC-IDRI dataset (http://mng.bz/4A4R) has a superset of

the CT scans used for the LUNA dataset and includes additional information about

the degree of malignancy of the identified tumors. Conveniently, there is a PyLIDC

library that can be installed easily, as follows:

$ pip3 install pylidc

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!