Building Machine Learning Systems with Python - Richert, Coelho
Building Machine Learning Systems with Python - Richert, Coelho
Building Machine Learning Systems with Python - Richert, Coelho
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 11<br />
Feature extraction<br />
At some point, after we have removed the redundant features and dropped the<br />
irrelevant ones, we often still find that we have too many features. No matter what<br />
learning method we use, they all perform badly, and given the huge feature space,<br />
we understand that they actually cannot do better. We realize that we have to cut<br />
living flesh and that we have to get rid of features that all common sense tells us<br />
are valuable. Another situation when we need to reduce the dimensions, and when<br />
feature selection does not help much, is when we want to visualize data. Then, we<br />
need to have at most three dimensions at the end to provide any meaningful graph.<br />
Enter the feature extraction methods. They restructure the feature space to make it<br />
more accessible to the model, or simply cut down the dimensions to two or three so<br />
that we can show dependencies visually.<br />
Again, we can distinguish between feature extraction methods as being linear<br />
or non-linear ones. And as before, in the feature selection section, we will<br />
present one method for each type, principal component analysis for linear and<br />
multidimensional scaling for the non-linear version. Although they are widely<br />
known and used, they are only representatives for many more interesting and<br />
powerful feature extraction methods.<br />
About principal component analysis (PCA)<br />
Principal component analysis is often the first thing to try out if you want to cut<br />
down the number of features and do not know what feature extraction method to<br />
use. PCA is limited as it is a linear method, but chances are that it already goes far<br />
enough for your model to learn well enough. Add to that the strong mathematical<br />
properties it offers, the speed at which it finds the transformed feature space, and<br />
its ability to transform between the original and transformed features later, we<br />
can almost guarantee that it will also become one of your frequently used machine<br />
learning tools.<br />
Summarizing it, given the original feature space, PCA finds a linear projection of it<br />
into a lower dimensional space that has the following properties:<br />
• The conserved variance is maximized<br />
• The final reconstruction error (when trying to go back from transformed<br />
features to original ones) is minimized<br />
As PCA simply transforms the input data, it can be applied both to classification<br />
and regression problems. In this section, we will use a classification task to discuss<br />
the method.<br />
[ 233 ]