10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Extracting Features <strong>with</strong><br />

Transformers<br />

The datasets we have used so far have been described in terms of features. In the<br />

previous chapter, we used a transaction-centric dataset. However, ultimately this<br />

was just a different format for representing feature-based data.<br />

There are many other types of datasets, including text, images, sounds, movies, or<br />

even real objects. Most data mining algorithms, however, rely on having numerical<br />

or categorical features. This means we need a way to represent these types before we<br />

input them into the data mining algorithm.<br />

In this chapter, we will discuss how to extract numerical and categorical features,<br />

and choose the best features when we do have them. We will discuss some common<br />

patterns and techniques for extracting features.<br />

The key concepts introduced in this chapter include:<br />

• Extracting features from datasets<br />

• Creating new features<br />

• Selecting good features<br />

• Creating your own transformer for custom datasets<br />

[ 81 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!