11.04.2024 Views

Thinking-data-science-a-data-science-practitioners-guide

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

10 1 Data Science Process

Fig. 1.5 Seaborn sample outputs

Fig. 1.6 Options in features

engineering

Features Engineering

Selecting appropriate features while keeping the features count low is very important

in training the model efficiently. We call this features engineering. The features

engineering workflow has two paths, features selection and dimensionality reduction.

This is shown in Fig. 1.6.

Feature selection means selecting and excluding certain features without changing

them. Dimensionality reduction reduces the features count by reducing the

features dimensions using many techniques developed by researchers over years.

With domain knowledge, you can eliminate the unwanted features by manual

inspection and simply dropping those columns from the database. You may remove

features that have missing values and low variance or are highly correlated. You may

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!