08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Dimensionality Reduction<br />

However, the p-value basically tells us that whatever the correlation coefficient is,<br />

we should not pay attention to it. The following output in the screenshot illustrates<br />

the same:<br />

In the first three cases that have high correlation coefficients, we would<br />

probably want to throw out either or since they seem to convey similar<br />

if not the same information.<br />

In the last case, however, we should keep both features. In our application, this<br />

decision would of course be driven by that p-value.<br />

Although it worked nicely in the previous example, reality is seldom nice to us.<br />

One big disadvantage of correlation-based feature selection is that it only detects<br />

linear relationships (a relationship that can be modeled by a straight line). If we use<br />

correlation on non-linear data, we see the problem. In the following example, we<br />

have a quadratic relationship:<br />

[ 224 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!