22.02.2024 Views

Daniel Voigt Godoy - Deep Learning with PyTorch Step-by-Step A Beginner’s Guide-leanpub

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Output

{'labels': 1,

'sentence': 'There was nothing so VERY remarkable in that; nor did

Alice think it so VERY much out of the way to hear the Rabbit say to

itself, `Oh dear!',

'source': 'alice28-1476.txt'}

Now that the labels are in place, we can finally shuffle the dataset and split it into

training and test sets:

Data Preparation

1 shuffled_dataset = dataset.shuffle(seed=42)

2 split_dataset = shuffled_dataset.train_test_split(test_size=0.2)

3 split_dataset

Output

DatasetDict({

train: Dataset({

features: ['sentence', 'source'],

num_rows: 3081

})

test: Dataset({

features: ['sentence', 'source'],

num_rows: 771

})

})

The splits are actually a dataset dictionary, so you may want to retrieve the actual

datasets from it:

Data Preparation

1 train_dataset = split_dataset['train']

2 test_dataset = split_dataset['test']

Done! We have two—training and test—randomly shuffled datasets.

Building a Dataset | 895

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!