29.06.2014 Views

an introduction to Principal Component Analysis (PCA)

an introduction to Principal Component Analysis (PCA)

an introduction to Principal Component Analysis (PCA)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>an</strong> <strong>introduction</strong> <strong>to</strong><br />

<strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong><br />

(<strong>PCA</strong>)


abstract<br />

<strong>Principal</strong> component <strong>an</strong>alysis (<strong>PCA</strong>) is a technique that is useful for the<br />

compression <strong>an</strong>d classification of data. The purpose is <strong>to</strong> reduce the<br />

dimensionality of a data set (sample) by finding a new set of variables,<br />

smaller th<strong>an</strong> the original set of variables, that nonetheless retains most<br />

of the sample's information.<br />

By information we me<strong>an</strong> the variation present in the sample,<br />

given by the correlations between the original variables. The new<br />

variables, called principal components (PCs), are uncorrelated, <strong>an</strong>d are<br />

ordered by the fraction of the <strong>to</strong>tal information each retains.


overview<br />

• geometric picture of PCs<br />

• algebraic definition <strong>an</strong>d derivation of PCs<br />

• usage of <strong>PCA</strong><br />

• astronomical application


Geometric picture of principal components (PCs)<br />

A sample of n observations in the 2-D space<br />

Goal: <strong>to</strong> account for the variation in a sample<br />

in as few variables as possible, <strong>to</strong> some accuracy


Geometric picture of principal components (PCs)<br />

• the 1 st PC is a minimum dist<strong>an</strong>ce fit <strong>to</strong> a line in space<br />

• the 2 nd PC is a minimum dist<strong>an</strong>ce fit <strong>to</strong> a line<br />

in the pl<strong>an</strong>e perpendicular <strong>to</strong> the 1 st PC<br />

PCs are a series of linear least squares fits <strong>to</strong> a sample,<br />

each orthogonal <strong>to</strong> all the previous.


Algebraic definition of PCs<br />

Given a sample of n observations on a vec<strong>to</strong>r of p variables<br />

define the first principal component of the sample<br />

λ<br />

by the linear tr<strong>an</strong>sformation<br />

where the vec<strong>to</strong>r<br />

is chosen such that<br />

is maximum


Algebraic definition of PCs<br />

Likewise, define the k th PC of the sample<br />

by the linear tr<strong>an</strong>sformation<br />

where the vec<strong>to</strong>r<br />

λ<br />

is chosen such that<br />

is maximum<br />

subject <strong>to</strong><br />

<strong>an</strong>d <strong>to</strong>


Algebraic derivation of coefficient vec<strong>to</strong>rs<br />

To find<br />

first note that<br />

where<br />

is the covari<strong>an</strong>ce matrix for the variables


Algebraic derivation of coefficient vec<strong>to</strong>rs<br />

To find maximize subject <strong>to</strong><br />

Let λ be a Lagr<strong>an</strong>ge multiplier<br />

then maximize<br />

by differentiating…<br />

therefore<br />

is <strong>an</strong> eigenvec<strong>to</strong>r of<br />

corresponding <strong>to</strong> eigenvalue


Algebraic derivation of<br />

We have maximized<br />

So<br />

is the largest eigenvalue of<br />

The first PC<br />

retains the greatest amount of variation in the sample.


Algebraic derivation of coefficient vec<strong>to</strong>rs<br />

To find the next coefficient vec<strong>to</strong>r<br />

maximize<br />

subject <strong>to</strong><br />

<strong>an</strong>d <strong>to</strong><br />

First note that<br />

then let λ <strong>an</strong>d φ be Lagr<strong>an</strong>ge multipliers, <strong>an</strong>d maximize


Algebraic derivation of coefficient vec<strong>to</strong>rs<br />

We find that<br />

whose eigenvalue<br />

is also <strong>an</strong> eigenvec<strong>to</strong>r of<br />

is the second largest.<br />

In general<br />

• The k th largest eigenvalue of<br />

is the vari<strong>an</strong>ce of the k th PC.<br />

• The k th PC retains the k th greatest fraction<br />

of the variation in the sample.


Algebraic formulation of <strong>PCA</strong><br />

Given a sample of n observations<br />

on a vec<strong>to</strong>r of p variables<br />

define a vec<strong>to</strong>r of p PCs<br />

according <strong>to</strong><br />

where<br />

is <strong>an</strong> orthogonal p x p matrix<br />

whose k th column is the k th eigenvec<strong>to</strong>r<br />

of<br />

Then<br />

is the covari<strong>an</strong>ce matrix of the PCs,<br />

being diagonal with elements


usage of <strong>PCA</strong>: Probability distribution for sample PCs<br />

If (i) the n observations of in the sample are independent &<br />

(ii)<br />

is drawn from <strong>an</strong> underlying population that<br />

follows a p-variate normal (Gaussi<strong>an</strong>) distribution<br />

with known covari<strong>an</strong>ce matrix<br />

then<br />

where<br />

is the Wishart distribution<br />

else<br />

utilize a bootstrap approximation


usage of <strong>PCA</strong>: Probability distribution for sample PCs<br />

If (i) follows a Wishart distribution &<br />

(ii) the population eigenvalues<br />

are all distinct<br />

then<br />

the following results hold as<br />

• all the<br />

are independent of all the<br />

•are jointly normally distributed<br />

(a tilde denotes a population qu<strong>an</strong>tity)


<strong>an</strong>d<br />

usage of <strong>PCA</strong>: Probability distribution for sample PCs<br />

(a tilde denotes a population qu<strong>an</strong>tity)<br />


usage of <strong>PCA</strong>: Inference about population PCs<br />

If<br />

then<br />

follows a p-variate normal distribution<br />

<strong>an</strong>alytic expressions exist* for<br />

MLE’s of , , <strong>an</strong>d<br />

confidence intervals for<br />

hypothesis testing for<br />

<strong>an</strong>d<br />

<strong>an</strong>d<br />

else<br />

bootstrap <strong>an</strong>d jackknife approximations exist<br />

*see references, esp. Jolliffe


usage of <strong>PCA</strong>: Practical computation of PCs<br />

In general it is useful <strong>to</strong> define st<strong>an</strong>dardized variables by<br />

If the are each measured about their sample me<strong>an</strong><br />

then the covari<strong>an</strong>ce matrix of<br />

will be equal <strong>to</strong> the correlation matrix of<br />

<strong>an</strong>d the PCs will be dimensionless


usage of <strong>PCA</strong>: Practical computation of PCs<br />

Given a sample of n observations on a vec<strong>to</strong>r<br />

(each measured about its sample me<strong>an</strong>)<br />

of p variables<br />

compute the covari<strong>an</strong>ce matrix<br />

where<br />

is the n x p matrix<br />

whose i th row is the i th obsv.<br />

Then compute the n x p matrix<br />

whose i th row is the PC score<br />

for the i th observation.


usage of <strong>PCA</strong>: Practical computation of PCs<br />

Write<br />

<strong>to</strong> decompose each observation in<strong>to</strong> PCs


usage of <strong>PCA</strong>: Data compression<br />

Because the k th PC retains the k th greatest fraction of the variation<br />

we c<strong>an</strong> approximate each observation<br />

by truncating the sum at the first m < p PCs


usage of <strong>PCA</strong>: Data compression<br />

Reduce the dimensionality of the data<br />

from p <strong>to</strong> m < p by approximating<br />

where<br />

<strong>an</strong>d<br />

is the n x m portion of<br />

is the p x m portion of


astronomical application: PCs for elliptical galaxies<br />

Rotating <strong>to</strong> PC in B T – Σ space improves Faber-Jackson relation<br />

as a dist<strong>an</strong>ce indica<strong>to</strong>r<br />

Dressler, et al. 1987


astronomical application: Eigenspectra (KL tr<strong>an</strong>sform)<br />

Connolly, et al. 1995


eferences<br />

Connolly, <strong>an</strong>d Szalay, et al., “Spectral Classification of Galaxies: An Orthogonal Approach”, AJ, 110, 1071-1082, 1995.<br />

Dressler, et al., “Spectroscopy <strong>an</strong>d Pho<strong>to</strong>metry of Elliptical Galaxies. I. A New Dist<strong>an</strong>ce Estima<strong>to</strong>r”, ApJ, 313, 42-58, 1987.<br />

Efstathiou, G., <strong>an</strong>d Fall, S.M., “Multivariate <strong>an</strong>alysis of elliptical galaxies”, MNRAS, 206, 453-464, 1984.<br />

Johns<strong>to</strong>n, D.E., et al., “SDSS J0903+5028: A New Gravitational Lens”, AJ, 126, 2281-2290, 2003.<br />

Jolliffe, I<strong>an</strong> T., 2002, <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> (Springer-Verlag New York, Secaucus, NJ).<br />

Lup<strong>to</strong>n, R., 1993, Statistics In Theory <strong>an</strong>d Practice (Prince<strong>to</strong>n University Press, Prince<strong>to</strong>n, NJ).<br />

Murtagh, F., <strong>an</strong>d Heck, A., Multivariate Data <strong>Analysis</strong> (D. Reidel Publishing Comp<strong>an</strong>y, Dordrecht, Holl<strong>an</strong>d).<br />

Yip, C.W., <strong>an</strong>d Szalay, A.S., et al., “Distributions of Galaxy Spectral Types in the SDSS”, AJ, 128, 585-609, 2004.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!