22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

We can also check if the loaders are returning the correct number of mini-batches:

len(iter(train_loader)), len(iter(val_loader))

Output

(15, 4)

There are 15 mini-batches in the training loader (15 mini-batches * 16 batch size =

240 data points), and four mini-batches in the validation loader (4 mini-batches *

16 batch size = 64 data points). In the validation set, the last mini-batch will have

only 12 points, since there are only 60 points in total.

OK, cool, this means we don’t need two (split) datasets anymore—we only need

two samplers. Right? Well, it depends.

Data Augmentation Transforms

No, I did not change topics :-) The reason why we may still need two split datasets

is exactly that: data augmentation. In general, we want to apply data augmentation

to the training data only (yes, there is test-data augmentation too, but that’s a

different matter). Data augmentation is accomplished using composing

transforms, which will be applied to all points in the dataset. See the problem?

If we need some data points to be augmented, but not others, the easiest way to

accomplish this is to create two composers and use them in two different datasets.

We can still use the indices, though:

# Uses indices to perform the split

x_train_tensor = x_tensor[train_idx]

y_train_tensor = y_tensor[train_idx]

x_val_tensor = x_tensor[val_idx]

y_val_tensor = y_tensor[val_idx]

Then, here come the two composers: The train_composer() augments the data,

and then scales it (min-max); the val_composer() only scales the data (min-max).

288 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!