20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

338 CHAPTER 12 Improving training with metrics and augmentation

Batch: 0

Unbalanced

Batch: 1

balanced

Batch: 0

Batch: 2

Batch: 3

Batch: 1

Batch: 13

Batch: 14

Batch: N

Batch: 15

First positive

Sample!

Figure 12.17 Batch after batch of imbalanced data will have nothing but negative events long before

the first positive event, while balanced data can alternate every other sample.

We will not be doing any balancing for validation, however. Our model needs to function

well in the real world, and the real world is imbalanced (after all, that’s where we

got the raw data!).

How should we accomplish this balancing? Let’s discuss our choices.

SAMPLERS CAN RESHAPE DATASETS

One of the optional arguments to DataLoader is sampler=… . This allows the data

loader to override the iteration order native to the dataset passed in and instead

shape, limit, or reemphasize the underlying data as desired. This can be incredibly

useful when working with a dataset that isn’t under your control. Taking a public dataset

and reshaping it to meet your needs is far less work than reimplementing that dataset

from scratch.

The downside is that many of the mutations we could accomplish with samplers

require that we break encapsulation of the underlying dataset. For example, let’s

assume we have a dataset like CIFAR-10 (www.cs.toronto.edu/~kriz/cifar.html) that

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!