20.03.2021 Views

Deep-Learning-with-PyTorch

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Representing tabular data

83

[6],

...,

[7],

[6]])

The call to unsqueeze adds a singleton dimension, from a 1D tensor of 4,898 elements

to a 2D tensor of size (4,898 × 1), without changing its contents—no extra elements

are added; we just decided to use an extra index to access the elements. That is, we

access the first element of target as target[0] and the first element of its

unsqueezed counterpart as target_unsqueezed[0,0].

PyTorch allows us to use class indices directly as targets while training neural networks.

However, if we wanted to use the score as a categorical input to the network, we

would have to transform it to a one-hot-encoded tensor.

4.3.5 When to categorize

Now we have seen ways to deal with both continuous and categorical data. You may

wonder what the deal is with the ordinal case discussed in the earlier sidebar. There is

no general recipe for it; most commonly, such data is either treated as categorical (losing

the ordering part, and hoping that maybe our model will pick it up during training

if we only have a few categories) or continuous (introducing an arbitrary notion of

distance). We will do the latter for the weather situation in figure 4.5. We summarize

our data mapping in a small flow chart in figure 4.4.

Column

contains

Example representation

of one value

Continuous

Data

yes

Use values directly

3.1415

no

yes

Treat as continuous

Ordinal

Data

yes

ordering

a priority?

no

no

Treat as categorical

Categorical

Data

yes

Use one-hot

or embeDding

0 0 0 0 1 0 0 0 0

Figure 4.4

How to treat columns with continuous, ordinal, and categorical data

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!