20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

378 CHAPTER 13 Using segmentation to find suspected nodules

isMal_bool

)

)

False,

isNodule_bool

False,

False, hasAnnotation_bool

0.0,

series_uid,

candidateCenter_xyz,

Other than the addition of the hasAnnotation_bool and isMal_bool flags (which we

won’t use in this chapter), the new annotations will slot in and be usable just like the

old ones.

NOTE You might be wondering why we haven’t discussed the LIDC before

now. As it turns out, the LIDC has a large amount of tooling that’s already

been constructed around the underlying dataset, which is specific to the

LIDC. You could even get ready-made masks from PyLIDC. That tooling presents

a somewhat unrealistic picture of what sort of support a given dataset

might have, since the LIDC is anomalously well supported. What we’ve done

with the LUNA data is much more typical and provides for better learning,

since we’re spending our time manipulating the raw data rather than learning

an API that someone else cooked up.

13.5.4 Implementing Luna2dSegmentationDataset

Compared to previous chapters, we are going to take a different approach to the training

and validation split in this chapter. We will have two classes: one acting as a general

base class suitable for validation data, and one subclassing the base for the training

set, with randomization and a cropped sample.

While this approach is somewhat more complicated in some ways (the classes

aren’t perfectly encapsulated, for example), it actually simplifies the logic of selecting

randomized training samples and the like. It also becomes extremely clear which code

paths impact both training and validation, and which are isolated to training only.

Without this, we found that some of the logic can become nested or intertwined in

ways that make it hard to follow. This is important because our training data will look

significantly different from our validation data!

NOTE Other class arrangements are also viable; we considered having two

entirely separate Dataset subclasses, for example. Standard software engineering

design principles apply, so try to keep your structure relatively simple,

and try to not copy and paste code, but don’t invent complicated

frameworks to prevent having to duplicate three lines of code.

The data that we produce will be two-dimensional CT slices with multiple channels.

The extra channels will hold adjacent slices of CT. Recall figure 4.2, shown here as

figure 13.12; we can see that each slice of CT scan can be thought of as a 2D grayscale

image.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!