20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Updating the dataset for segmentation

377

Consortium image collection (LIDC-IDRI) 9 and includes detailed annotation

information from multiple radiologists. We’ve already done the legwork to get the

original LIDC annotations, pull out the nodules, dedupe them, and save them to the

file /data/part2/luna/annotations_with_malignancy.csv.

With that file, we can update our getCandidateInfoList function to pull our nodules

from our new annotations file. First, we loop over the new annotations for the

actual nodules. Using the CSV reader, 10 we need to convert the data to the appropriate

types before we stick them into our CandidateInfoTuple data structure.

Listing 13.9

dsets.py:43, def getCandidateInfoList

candidateInfo_list = []

with open('data/part2/luna/annotations_with_malignancy.csv', "r") as f:

for row in list(csv.reader(f))[1:]:

series_uid = row[0]

annotationCenter_xyz = tuple([float(x) for x in row[1:4]])

annotationDiameter_mm = float(row[4])

isMal_bool = {'False': False, 'True': True}[row[5]]

For each line in

the annotations

file that

represents one

nodule, …

candidateInfo_list.append(

CandidateInfoTuple(

True,

True,

isMal_bool,

annotationDiameter_mm,

series_uid,

annotationCenter_xyz,

)

)

isNodule_bool

… we add a record to our list.

hasAnnotation_bool

Similarly, we loop over candidates from candidates.csv as before, but this time we only

use the non-nodules. As these are not nodules, the nodule-specific information will

just be filled with False and 0.

Listing 13.10

dsets.py:62, def getCandidateInfoList

with open('data/part2/luna/candidates.csv', "r") as f:

for row in list(csv.reader(f))[1:]:

series_uid = row[0]

# ... line 72

if not isNodule_bool:

candidateInfo_list.append(

CandidateInfoTuple(

For each line in the

candidates file …

… but only the non-nodules (we

have the others from earlier) …

… we add a candidate record.

9

Samuel G. Armato 3rd et al., 2011, “The Lung Image Database Consortium (LIDC) and Image Database

Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans,” Medical Physics

38, no. 2 (2011): 915-31, https://pubmed.ncbi.nlm.nih.gov/21452728/. See also Bruce Vendt, LIDC-IDRI,

Cancer Imaging Archive, http://mng.bz/mBO4.

10 If you do this a lot, the pandas library that just released 1.0 in 2020 is a great tool to make this faster. We stick

with the CSV reader included in the standard Python distribution here.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!