22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

train_loader.sampler.generator.manual_seed(42)

random.seed(42)

Now we can check if our sampler is doing its job correctly. Let’s have it sample a full

run (240 data points in 15 mini-batches of 16 points each), and sum up the labels so

we know how many points are in the positive class:

torch.tensor([t[1].sum() for t in iter(train_loader)]).sum()

Output

tensor(123.)

Close enough! We have 160 images of the positive class, and now, thanks to the

weighted sampler, we’re sampling only 123 of them. It means we’re oversampling

the negative class (which has 80 images) to a total of 117 images, adding up to 240

images. Mission accomplished, our dataset is balanced now.

"Wait a minute! Why on Earth there was an extra seed

(random.seed(42)) in the code above? Don’t we have enough

already?"

I agree, too many seeds. Besides one specific seed for the generator, we also have

to set yet another seed for Python’s random module.

Honestly, this came to me as a surprise too when I found out

about it! As weird as it may sound, in Torchvision versions prior

to 0.8, there was still some code that depended upon Python’s

native random module, instead of PyTorch’s own random

generators. The problem happened when some of the random

transformations for data augmentation were used, like

RandomRotation(), RandomAffine(), and others.

It’s better to be safe than sorry, so we better set yet another seed to ensure the

reproducibility of our code.

And that’s exactly what we’re going to do! Remember the set_seed() method we

294 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!