20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

What does an ideal dataset look like?

337

group 3 and we’re using a batch size of 32, roughly 500/32 = 15 batches will go by

before seeing a single positive sample. That implies that 14 out of 15 training batches

will be 100% negative and will only pull all model weights toward predicting negative.

That lopsided pull is what produces the degenerate behavior we’ve been seeing.

Instead, we’d like to have just as many positive samples as negative ones. For the

first part of training, then, half of both labels will be classified incorrectly, meaning

that groups 2 and 3 should be roughly equal in size. We also want to make sure we

present batches with a mix of negative and positive samples. Balance would result in

the tug-of-war evening out, and the mixture of classes per batch will give the model a

decent chance of learning to discriminate between the two classes. Since our LUNA

data has only a small, fixed number of positive samples, we’ll have to settle for taking

the positive samples that we have and presenting them repeatedly during training.

Discrimination

Here, we define discrimination as “the ability to separate two classes from each

other.” Building and training a model that can tell “actually nodule” candidates from

normal anatomical structures is the entire point of what we’re doing in part 2.

Some other definitions of discrimination are more problematic. While out of scope for

the discussion of our work here, there is a larger issue with models trained from realworld

data. If that real-world dataset is collected from sources that have a real-worlddiscriminatory

bias (for example, racial bias in arrest and conviction rates, or anything

collected from social media), and that bias is not corrected for during dataset preparation

or training, then the resulting model will continue to exhibit the same biases

present in the training data. Just as in humans, racism is learned.

This means almost any model trained from internet-at-large data sources will be compromised

in some fashion, unless extreme care is taken to scrub those biases from

the model. Note that like our goal in part 2, this is considered an unsolved problem.

Recall our professor from chapter 11 who had a final exam with 99 false answers and 1

true answer. The next semester, after being told “You should have a more even balance

of true and false answers,” the professor decided to add a midterm with 99 true

answers and 1 false one. “Problem solved!”

Clearly, the correct approach is to intermix true and false answers in a way that

doesn’t allow the students to exploit the larger structure of the tests to answer things

correctly. Whereas a student would pick up on a pattern like “odd questions are true,

even questions are false,” the batching system used by PyTorch doesn’t allow the

model to “notice” or utilize that kind of pattern. Our training dataset will need to be

updated to alternate between positive and negative samples, as in figure 12.17.

The unbalanced data is the proverbial needle in the haystack we mentioned at the

start of chapter 9. If you had to perform this classification work by hand, you’d probably

start to empathize with Preston.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!