08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dimensionality Reduction<br />

Asking the model about the features<br />

using wrappers<br />

While filters can tremendously help in getting rid of useless features, they can<br />

go only so far. After all the filtering, there might still be some features that are<br />

independent among themselves and show some degree of dependence <strong>with</strong> the<br />

result variable, but yet they are totally useless from the model's point of view. Just<br />

think of the following data, which describes the XOR function. Individually, neither<br />

A nor B would show any signs of dependence on Y, whereas together they clearly do:<br />

A B Y<br />

0 0 0<br />

0 1 1<br />

1 0 1<br />

1 1 0<br />

So why not ask the model itself to give its vote on the individual features?<br />

This is what wrappers do, as we can see in the following process chart diagram:<br />

y<br />

Current<br />

features,<br />

initialized <strong>with</strong><br />

all features<br />

x1, x2, ..., xN<br />

Train model<br />

<strong>with</strong> y and check<br />

the importance<br />

of individual<br />

features<br />

Importance of<br />

individual<br />

features<br />

Feature set too big<br />

No<br />

Resulting<br />

features<br />

x2, x10, x14<br />

Yes<br />

Drop features<br />

that are<br />

unimportant<br />

Here we have pushed the calculation of feature importance to the model training<br />

process. Unfortunately (but understandably), feature importance is not determined<br />

as a binary but as a ranking value. So we still have to specify where to make the<br />

cut – what part of the features are we willing to take and what part do we want<br />

to drop?<br />

[ 230 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!