22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

train_composer = Compose([RandomHorizontalFlip(p=.5),

Normalize(mean=(.5,), std=(.5,))])

val_composer = Compose([Normalize(mean=(.5,), std=(.5,))])

Next, we use them to create two datasets and their corresponding data loaders:

train_dataset = TransformedTensorDataset(

x_train_tensor, y_train_tensor, transform=train_composer

)

val_dataset = TransformedTensorDataset(

x_val_tensor, y_val_tensor, transform=val_composer

)

# Builds a loader of each set

train_loader = DataLoader(

dataset=train_dataset, batch_size=16, shuffle=True

)

val_loader = DataLoader(dataset=val_dataset, batch_size=16)

And, since we’re not using a sampler to perform the split anymore, we can (and

should) set shuffle to True.

If you do not perform data augmentation, you may keep using

samplers and a single dataset.

Disappointed with the apparently short-lived use of samplers? Don’t be! I saved

the best sampler for last.

WeightedRandomSampler

We have already talked about imbalanced datasets when learning about binary

cross-entropy losses in Chapter 3. We adjusted the loss weight for points in the

positive class to compensate for the imbalance. It wasn’t quite the weighted average

one would expect, though. Now, we can tackle the imbalance using a different

approach: A weighted sampler.

The reasoning is pretty much the same but, instead of weighted losses, we use

weights for sampling: The class with fewer data points (minority class) should get

Data Preparation | 289

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!