20.03.2021 Views

Deep-Learning-with-PyTorch

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

What does an ideal dataset look like?

341

Listing 12.8

dsets.py:280, LunaDataset.__len__

def __len__(self):

if self.ratio_int:

return 200000

else:

return len(self.candidateInfo_list)

We’re no longer tied to a specific number of samples, and presenting “a full epoch”

doesn’t really make sense when we would have to repeat positive samples many, many

times to present a balanced training set. By picking 200,000 samples, we reduce the

time between starting a training run and seeing results (faster feedback is always

nice!), and we give ourselves a nice, clean number of samples per epoch. Feel free to

adjust the length of an epoch to meet your needs.

For completeness, we also add a command-line parameter.

Listing 12.9

training.py:31, class LunaTrainingApp

class LunaTrainingApp:

def __init__(self, sys_argv=None):

# ... line 52

parser.add_argument('--balanced',

help="Balance the training data to half positive, half negative.",

action='store_true',

default=False,

)

Then we pass that parameter into the LunaDataset constructor.

Listing 12.10

training.py:137, LunaTrainingApp.initTrainDl

def initTrainDl(self):

train_ds = LunaDataset(

val_stride=10,

isValSet_bool=False,

ratio_int=int(self.cli_args.balanced),

)

Here we rely on python’s True

being convertible to a 1.

We’re all set. Let’s run it!

12.4.2 Contrasting training with a balanced LunaDataset to previous

runs

As a reminder, our unbalanced training run had results like these:

$ python -m p2ch12.training

...

E1 LunaTrainingApp

E1 trn 0.0185 loss, 99.7% correct, 0.0000 precision, 0.0000 recall,

➥ nan f1 score

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!