10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5<br />

[18, 19, 20],<br />

[21, 22, 23],<br />

[24, 25, 26],<br />

[27, 28, 29]])<br />

Then, we set the entire second column/feature to the value 1:<br />

X[:,1] = 1<br />

The result has lots of variance in the first and third rows, but no variance in the<br />

second row:<br />

array([[ 0, 1, 2],<br />

[ 3, 1, 5],<br />

[ 6, 1, 8],<br />

[ 9, 1, 11],<br />

[12, 1, 14],<br />

[15, 1, 17],<br />

[18, 1, 20],<br />

[21, 1, 23],<br />

[24, 1, 26],<br />

[27, 1, 29]])<br />

We can now create a VarianceThreshold transformer and apply it to our dataset:<br />

from sklearn.feature_selection import VarianceThreshold<br />

vt = VarianceThreshold()<br />

Xt = vt.fit_transform(X)<br />

Now, the result Xt does not have the second column:<br />

array([[ 0, 2],<br />

[ 3, 5],<br />

[ 6, 8],<br />

[ 9, 11],<br />

[12, 14],<br />

[15, 17],<br />

[18, 20],<br />

[21, 23],<br />

[24, 26],<br />

[27, 29]])<br />

We can observe the variances for each column by printing the vt.variances_<br />

attribute:<br />

print(vt.variances_)<br />

[ 89 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!