10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Extracting Features <strong>with</strong> Transformers<br />

The adult file itself contains two blank lines at the end of the file. By default, pandas<br />

will interpret the penultimate new line to be an empty (but valid) row. To remove<br />

this, we remove any line <strong>with</strong> invalid numbers (the use of inplace just makes sure<br />

the same <strong>Data</strong>frame is affected, rather than creating a new one):<br />

adult.dropna(how='all', inplace=True)<br />

Having a look at the dataset, we can see a variety of features from adult.columns:<br />

adult.columns<br />

The results show each of the feature names that are stored inside an Index object<br />

from pandas:<br />

Index(['Age', 'Work-Class', 'fnlwgt', 'Education',<br />

'Education-Num', 'Marital-Status', 'Occupation', 'Relationship',<br />

'Race', 'Sex', 'Capital-gain', 'Capital-loss', 'Hours-per-week',<br />

'Native-Country', 'Earnings-Raw'], dtype='object')<br />

Common feature patterns<br />

While there are millions of ways to create features, there are some common patterns<br />

that are employed across different disciplines. However, choosing appropriate<br />

features is tricky and it is worth considering how a feature might correlate to the end<br />

result. As the adage says, don't judge a book by its cover—it is probably not worth<br />

considering the size of a book if you are interested in the message contained <strong>with</strong>in.<br />

Some commonly used features focus on the physical properties of the real world<br />

objects being studied, for example:<br />

• Spatial properties such as the length, width, and height of an object<br />

• Weight and/or density of the object<br />

• Age of an object or its components<br />

• The type of the object<br />

• The quality of the object<br />

Other features might rely on the usage or history of the object:<br />

• The producer, publisher, or creator of the object<br />

• The year of manufacturing<br />

• The use of the object<br />

[ 84 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!