01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dimensionality Reduction<br />

Asking the model about the features<br />

using wrappers<br />

While filters can tremendously help in getting rid of useless features, they can<br />

go only so far. After all the filtering, there might still be some features that are<br />

independent among themselves and show some degree of dependence with the<br />

result variable, but yet they are totally useless from the model's point of view. Just<br />

think of the following data, which describes the XOR function. Individually, neither<br />

A nor B would show any signs of dependence on Y, whereas together they clearly do:<br />

A B Y<br />

0 0 0<br />

0 1 1<br />

1 0 1<br />

1 1 0<br />

So why not ask the model itself to give its vote on the individual features?<br />

This is what wrappers do, as we can see in the following process chart diagram:<br />

y<br />

Current<br />

features,<br />

initialized with<br />

all features<br />

x1, x2, ..., xN<br />

Train model<br />

with y and check<br />

the importance<br />

of individual<br />

features<br />

Importance of<br />

individual<br />

features<br />

Feature set too big<br />

No<br />

Resulting<br />

features<br />

x2, x10, x14<br />

Yes<br />

Drop features<br />

that are<br />

unimportant<br />

Here we have pushed the calculation of feature importance to the model training<br />

process. Unfortunately (but understandably), feature importance is not determined<br />

as a binary but as a ranking value. So we still have to specify where to make the<br />

cut – what part of the features are we willing to take and what part do we want<br />

to drop?<br />

[ 230 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!