08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dimensionality Reduction<br />

Sketching PCA<br />

PCA involves a lot of linear algebra, which we do not want to go into. Nevertheless,<br />

the basic algorithm can be easily described <strong>with</strong> the help of the following steps:<br />

1. Center the data by subtracting the mean from it.<br />

2. Calculate the covariance matrix.<br />

3. Calculate the eigenvectors of the covariance matrix.<br />

If we start <strong>with</strong> features, the algorithm will again return a transformed feature<br />

space <strong>with</strong> dimensions – we gained nothing so far. The nice thing about this<br />

algorithm, however, is that the eigenvalues indicate how much of the variance is<br />

described by the corresponding eigenvector.<br />

Let us assume we start <strong>with</strong> features, and we know that our model does<br />

not work well <strong>with</strong> more than 20 features. Then we simply pick the 20 eigenvectors<br />

having the highest eigenvalues.<br />

Applying PCA<br />

Let us consider the following artificial dataset, which is visualized in the left plot<br />

as follows:<br />

>>> x1 = np.arange(0, 10, .2)<br />

>>> x2 = x1+np.random.normal(loc=0, scale=1, size=len(x1))<br />

>>> X = np.c_[(x1, x2)]<br />

>>> good = (x1>5) | (x2>5) # some arbitrary classes<br />

>>> bad = ~good # to make the example look good<br />

[ 234 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!