08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dimensionality Reduction<br />

Of course, using MDS requires an understanding of the individual feature's units;<br />

maybe we are using features that cannot be compared using the Euclidean metric.<br />

For instance, a categorical variable, even when encoded as an integer (0 = red circle,<br />

1 = blue star, 2 = green triangle, and so on), cannot be compared using Euclidean (is<br />

red closer to blue than to green?).<br />

But once we are aware of this issue, MDS is a useful tool that reveals similarities in<br />

our data that otherwise would be difficult to see in the original feature space.<br />

Looking a bit deeper into MDS, we realize that it is not a single algorithm, but<br />

a family of different algorithms, of which we have used just one. The same was<br />

true for PCA, and in case you realize that neither PCA nor MDS solves your<br />

problem, just look at the other manifold learning algorithms that are available<br />

in the Scikit-learn toolkit.<br />

Summary<br />

We learned that sometimes we can get rid of all the features using feature selection<br />

methods. We also saw that in some cases this is not enough, and we have to employ<br />

feature extraction methods that reveal the real and the lower-dimensional structure<br />

in our data, hoping that the model has an easier game <strong>with</strong> it.<br />

We have only scratched the surface of the huge body of available dimensionality<br />

reduction methods. Still, we hope that we have got you interested in this whole<br />

field, as there are lots of other methods waiting for you to pick up. At the end,<br />

feature selection and extraction is an art, just like choosing the right learning<br />

method or training model.<br />

The next chapter covers the use of Jug, a little <strong>Python</strong> framework to manage<br />

computations in a way that takes advantage of multiple cores or multiple<br />

machines. We will also learn about AWS – the Amazon cloud.<br />

[ 240 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!