01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Dimensionality Reduction<br />

Sketching PCA<br />

PCA involves a lot of linear algebra, which we do not want to go into. Nevertheless,<br />

the basic algorithm can be easily described with the help of the following steps:<br />

1. Center the data by subtracting the mean from it.<br />

2. Calculate the covariance matrix.<br />

3. Calculate the eigenvectors of the covariance matrix.<br />

If we start with features, the algorithm will again return a transformed feature<br />

space with dimensions – we gained nothing so far. The nice thing about this<br />

algorithm, however, is that the eigenvalues indicate how much of the variance is<br />

described by the corresponding eigenvector.<br />

Let us assume we start with features, and we know that our model does<br />

not work well with more than 20 features. Then we simply pick the 20 eigenvectors<br />

having the highest eigenvalues.<br />

Applying PCA<br />

Let us consider the following artificial dataset, which is visualized in the left plot<br />

as follows:<br />

>>> x1 = np.arange(0, 10, .2)<br />

>>> x2 = x1+np.random.normal(loc=0, scale=1, size=len(x1))<br />

>>> X = np.c_[(x1, x2)]<br />

>>> good = (x1>5) | (x2>5) # some arbitrary classes<br />

>>> bad = ~good # to make the example look good<br />

[ 234 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!