11.04.2024 Views

Thinking-data-science-a-data-science-practitioners-guide

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

30 2 Dimensionality Reduction

This is the output in my run:

Support features

[ True True False True False False False False False True False]

As per this output, the first, second, fourth, and tenth features, which are marked

as True, are the most significant ones. You can also get the ranking of each feature by

examining the ranking_ attribute.

print('Features ranking by importance')

print(selector.ranking_)

This is the output in my run:

Features ranking by importance

[11413876512]

A ranking value of 1 shows higher significance. Thus, features at index 0, 1,

3, and 9 are most significant in our case. You can print the names of these four most

significant features by writing a small function:

def get_top_features():

rank_1=[]

for i in range(0,len(selector.ranking_)):

if selector.ranking_[i]==1:

rank_1.append(i)

print('The four most informative features are:')

print(X.iloc[:,rank_1].columns)

Calling this function gave the following output:

The four most informative features are:

Index(['Gender', 'Married', 'Education', 'Credit_History'],

dtype='object')

You observe that Gender, Married, Education, and Credit_History are the four

most significant features for our model training.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!