08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 11<br />

Feature extraction<br />

At some point, after we have removed the redundant features and dropped the<br />

irrelevant ones, we often still find that we have too many features. No matter what<br />

learning method we use, they all perform badly, and given the huge feature space,<br />

we understand that they actually cannot do better. We realize that we have to cut<br />

living flesh and that we have to get rid of features that all common sense tells us<br />

are valuable. Another situation when we need to reduce the dimensions, and when<br />

feature selection does not help much, is when we want to visualize data. Then, we<br />

need to have at most three dimensions at the end to provide any meaningful graph.<br />

Enter the feature extraction methods. They restructure the feature space to make it<br />

more accessible to the model, or simply cut down the dimensions to two or three so<br />

that we can show dependencies visually.<br />

Again, we can distinguish between feature extraction methods as being linear<br />

or non-linear ones. And as before, in the feature selection section, we will<br />

present one method for each type, principal component analysis for linear and<br />

multidimensional scaling for the non-linear version. Although they are widely<br />

known and used, they are only representatives for many more interesting and<br />

powerful feature extraction methods.<br />

About principal component analysis (PCA)<br />

Principal component analysis is often the first thing to try out if you want to cut<br />

down the number of features and do not know what feature extraction method to<br />

use. PCA is limited as it is a linear method, but chances are that it already goes far<br />

enough for your model to learn well enough. Add to that the strong mathematical<br />

properties it offers, the speed at which it finds the transformed feature space, and<br />

its ability to transform between the original and transformed features later, we<br />

can almost guarantee that it will also become one of your frequently used machine<br />

learning tools.<br />

Summarizing it, given the original feature space, PCA finds a linear projection of it<br />

into a lower dimensional space that has the following properties:<br />

• The conserved variance is maximized<br />

• The final reconstruction error (when trying to go back from transformed<br />

features to original ones) is minimized<br />

As PCA simply transforms the input data, it can be applied both to classification<br />

and regression problems. In this section, we will use a classification task to discuss<br />

the method.<br />

[ 233 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!