22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

larger weights, while the class with more data points (majority class) should get

smaller weights. This way, on average, we’ll end up with mini-batches containing

roughly the same number of data points in each class: A balanced dataset.

"How are the weights computed?"

First, we need to find how imbalanced the dataset is; that is, how many data points

belong to each label. We can use PyTorch’s unique() method on our training set

labels (y_train_tensor), with return_counts equals True, to get a list of the

existing labels and the corresponding number of data points:

classes, counts = y_train_tensor.unique(return_counts=True)

print(classes, counts)

Output

tensor([0., 1.]) tensor([ 80, 160])

Ours is a binary classification, so it is no surprise we have two classes: zero (not

diagonal) and one (diagonal). There are 80 images with lines that are not diagonal,

and 160 images with diagonal lines. Clearly, an imbalanced dataset.

Next, we compute the weights by inverting the counts. It is as simple as that:

weights = 1.0 / counts.float()

weights

Output

tensor([0.0125, 0.0063])

The first weight (0.0125) corresponds to the negative class (not diagonal). Since

this class has only 80 out of 240 images in our training set, it is also the minority

class. The other weight (0.0063) corresponds to the positive class (diagonal), which

has the remaining 160 images, thus making it the majority class.

290 | Chapter 4: Classifying Images

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!