20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

334 CHAPTER 12 Improving training with metrics and augmentation

12.4 What does an ideal dataset look like?

Before we start crying into our cups over the current sorry state of affairs, let’s instead

think about what we actually want our model to do. Figure 12.14 says that first we need

to balance our data so that our model can train properly. Let’s build up the logical

steps needed to get us there.

1. Guard dogs

2. Birds and

burglars

5. Balancing

POS

NEG

6.

Augmentation

n

3. Ratios recaLl

and precision

4. new metric:

f1 score

7. Workin’ great!

Figure 12.14 The set of topics for this chapter, with a focus on balancing our positive and

negative samples

Recall figure 12.5 earlier, and the following discussion of classification thresholds.

Getting better results by moving the threshold has limited effectiveness—there’s just

too much overlap between the positive and negative classes to work with. 3

Instead, we want to see an image like figure 12.15. Here, our label threshold is nearly

vertical. That’s what we want, because it means the label threshold and our classification

threshold can line up reasonably well. Similarly, most of the samples are concentrated at

either end of the diagram. Both of these things require that our data be easily separable

and that our model have the capacity to perform that separation. Our model currently

has enough capacity, so that’s not the issue. Instead, let’s take a look at our data.

Recall that our data is wildly imbalanced. There’s a 400:1 ratio of positive samples

to negative ones. That’s crushingly imbalanced! Figure 12.16 shows what that looks

like. No wonder our “actually nodule” samples are getting lost in the crowd!

3

Keep in mind that these images are just a representation of the classification space and do not represent

ground truth.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!