11.04.2024 Views

Thinking-data-science-a-data-science-practitioners-guide

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

28 2 Dimensionality Reduction

So far, you studied the feature selection techniques based on manual inspection of

datasets. I will now show you a technique that somewhat eliminates this manual

process.

Random Forest

Random forest is very useful in feature selections. The algorithm produces a feature

importance chart, as seen in Fig. 2.7. This chart helps you in eliminating features

having a low impact on model’s performance.

Looking at the above chart, we can drop the low importance features such as

Education, Gender, Married, and Self_Employed.

We widely use random forest in features engineering because of its in-built

features importance package. The RandomForestRegressor computes a score

based on a feature’s impact on the target variable. The visual representation of

these scores makes it easier for a data scientist to create a final list of features. As

an example, looking at the above figure, a data scientist may select only the first four

features as they have the maximum impact on the target for model building.

Fig. 2.7 Plot of features importance generated by random forest

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!