an introduction to Principal Component Analysis (PCA)
an introduction to Principal Component Analysis (PCA)
an introduction to Principal Component Analysis (PCA)
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>an</strong> <strong>introduction</strong> <strong>to</strong><br />
<strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong><br />
(<strong>PCA</strong>)
abstract<br />
<strong>Principal</strong> component <strong>an</strong>alysis (<strong>PCA</strong>) is a technique that is useful for the<br />
compression <strong>an</strong>d classification of data. The purpose is <strong>to</strong> reduce the<br />
dimensionality of a data set (sample) by finding a new set of variables,<br />
smaller th<strong>an</strong> the original set of variables, that nonetheless retains most<br />
of the sample's information.<br />
By information we me<strong>an</strong> the variation present in the sample,<br />
given by the correlations between the original variables. The new<br />
variables, called principal components (PCs), are uncorrelated, <strong>an</strong>d are<br />
ordered by the fraction of the <strong>to</strong>tal information each retains.
overview<br />
• geometric picture of PCs<br />
• algebraic definition <strong>an</strong>d derivation of PCs<br />
• usage of <strong>PCA</strong><br />
• astronomical application
Geometric picture of principal components (PCs)<br />
A sample of n observations in the 2-D space<br />
Goal: <strong>to</strong> account for the variation in a sample<br />
in as few variables as possible, <strong>to</strong> some accuracy
Geometric picture of principal components (PCs)<br />
• the 1 st PC is a minimum dist<strong>an</strong>ce fit <strong>to</strong> a line in space<br />
• the 2 nd PC is a minimum dist<strong>an</strong>ce fit <strong>to</strong> a line<br />
in the pl<strong>an</strong>e perpendicular <strong>to</strong> the 1 st PC<br />
PCs are a series of linear least squares fits <strong>to</strong> a sample,<br />
each orthogonal <strong>to</strong> all the previous.
Algebraic definition of PCs<br />
Given a sample of n observations on a vec<strong>to</strong>r of p variables<br />
define the first principal component of the sample<br />
λ<br />
by the linear tr<strong>an</strong>sformation<br />
where the vec<strong>to</strong>r<br />
is chosen such that<br />
is maximum
Algebraic definition of PCs<br />
Likewise, define the k th PC of the sample<br />
by the linear tr<strong>an</strong>sformation<br />
where the vec<strong>to</strong>r<br />
λ<br />
is chosen such that<br />
is maximum<br />
subject <strong>to</strong><br />
<strong>an</strong>d <strong>to</strong>
Algebraic derivation of coefficient vec<strong>to</strong>rs<br />
To find<br />
first note that<br />
where<br />
is the covari<strong>an</strong>ce matrix for the variables
Algebraic derivation of coefficient vec<strong>to</strong>rs<br />
To find maximize subject <strong>to</strong><br />
Let λ be a Lagr<strong>an</strong>ge multiplier<br />
then maximize<br />
by differentiating…<br />
therefore<br />
is <strong>an</strong> eigenvec<strong>to</strong>r of<br />
corresponding <strong>to</strong> eigenvalue
Algebraic derivation of<br />
We have maximized<br />
So<br />
is the largest eigenvalue of<br />
The first PC<br />
retains the greatest amount of variation in the sample.
Algebraic derivation of coefficient vec<strong>to</strong>rs<br />
To find the next coefficient vec<strong>to</strong>r<br />
maximize<br />
subject <strong>to</strong><br />
<strong>an</strong>d <strong>to</strong><br />
First note that<br />
then let λ <strong>an</strong>d φ be Lagr<strong>an</strong>ge multipliers, <strong>an</strong>d maximize
Algebraic derivation of coefficient vec<strong>to</strong>rs<br />
We find that<br />
whose eigenvalue<br />
is also <strong>an</strong> eigenvec<strong>to</strong>r of<br />
is the second largest.<br />
In general<br />
• The k th largest eigenvalue of<br />
is the vari<strong>an</strong>ce of the k th PC.<br />
• The k th PC retains the k th greatest fraction<br />
of the variation in the sample.
Algebraic formulation of <strong>PCA</strong><br />
Given a sample of n observations<br />
on a vec<strong>to</strong>r of p variables<br />
define a vec<strong>to</strong>r of p PCs<br />
according <strong>to</strong><br />
where<br />
is <strong>an</strong> orthogonal p x p matrix<br />
whose k th column is the k th eigenvec<strong>to</strong>r<br />
of<br />
Then<br />
is the covari<strong>an</strong>ce matrix of the PCs,<br />
being diagonal with elements
usage of <strong>PCA</strong>: Probability distribution for sample PCs<br />
If (i) the n observations of in the sample are independent &<br />
(ii)<br />
is drawn from <strong>an</strong> underlying population that<br />
follows a p-variate normal (Gaussi<strong>an</strong>) distribution<br />
with known covari<strong>an</strong>ce matrix<br />
then<br />
where<br />
is the Wishart distribution<br />
else<br />
utilize a bootstrap approximation
usage of <strong>PCA</strong>: Probability distribution for sample PCs<br />
If (i) follows a Wishart distribution &<br />
(ii) the population eigenvalues<br />
are all distinct<br />
then<br />
the following results hold as<br />
• all the<br />
are independent of all the<br />
•are jointly normally distributed<br />
(a tilde denotes a population qu<strong>an</strong>tity)
<strong>an</strong>d<br />
usage of <strong>PCA</strong>: Probability distribution for sample PCs<br />
(a tilde denotes a population qu<strong>an</strong>tity)<br />
•
usage of <strong>PCA</strong>: Inference about population PCs<br />
If<br />
then<br />
follows a p-variate normal distribution<br />
<strong>an</strong>alytic expressions exist* for<br />
MLE’s of , , <strong>an</strong>d<br />
confidence intervals for<br />
hypothesis testing for<br />
<strong>an</strong>d<br />
<strong>an</strong>d<br />
else<br />
bootstrap <strong>an</strong>d jackknife approximations exist<br />
*see references, esp. Jolliffe
usage of <strong>PCA</strong>: Practical computation of PCs<br />
In general it is useful <strong>to</strong> define st<strong>an</strong>dardized variables by<br />
If the are each measured about their sample me<strong>an</strong><br />
then the covari<strong>an</strong>ce matrix of<br />
will be equal <strong>to</strong> the correlation matrix of<br />
<strong>an</strong>d the PCs will be dimensionless
usage of <strong>PCA</strong>: Practical computation of PCs<br />
Given a sample of n observations on a vec<strong>to</strong>r<br />
(each measured about its sample me<strong>an</strong>)<br />
of p variables<br />
compute the covari<strong>an</strong>ce matrix<br />
where<br />
is the n x p matrix<br />
whose i th row is the i th obsv.<br />
Then compute the n x p matrix<br />
whose i th row is the PC score<br />
for the i th observation.
usage of <strong>PCA</strong>: Practical computation of PCs<br />
Write<br />
<strong>to</strong> decompose each observation in<strong>to</strong> PCs
usage of <strong>PCA</strong>: Data compression<br />
Because the k th PC retains the k th greatest fraction of the variation<br />
we c<strong>an</strong> approximate each observation<br />
by truncating the sum at the first m < p PCs
usage of <strong>PCA</strong>: Data compression<br />
Reduce the dimensionality of the data<br />
from p <strong>to</strong> m < p by approximating<br />
where<br />
<strong>an</strong>d<br />
is the n x m portion of<br />
is the p x m portion of
astronomical application: PCs for elliptical galaxies<br />
Rotating <strong>to</strong> PC in B T – Σ space improves Faber-Jackson relation<br />
as a dist<strong>an</strong>ce indica<strong>to</strong>r<br />
Dressler, et al. 1987
astronomical application: Eigenspectra (KL tr<strong>an</strong>sform)<br />
Connolly, et al. 1995
eferences<br />
Connolly, <strong>an</strong>d Szalay, et al., “Spectral Classification of Galaxies: An Orthogonal Approach”, AJ, 110, 1071-1082, 1995.<br />
Dressler, et al., “Spectroscopy <strong>an</strong>d Pho<strong>to</strong>metry of Elliptical Galaxies. I. A New Dist<strong>an</strong>ce Estima<strong>to</strong>r”, ApJ, 313, 42-58, 1987.<br />
Efstathiou, G., <strong>an</strong>d Fall, S.M., “Multivariate <strong>an</strong>alysis of elliptical galaxies”, MNRAS, 206, 453-464, 1984.<br />
Johns<strong>to</strong>n, D.E., et al., “SDSS J0903+5028: A New Gravitational Lens”, AJ, 126, 2281-2290, 2003.<br />
Jolliffe, I<strong>an</strong> T., 2002, <strong>Principal</strong> <strong>Component</strong> <strong>Analysis</strong> (Springer-Verlag New York, Secaucus, NJ).<br />
Lup<strong>to</strong>n, R., 1993, Statistics In Theory <strong>an</strong>d Practice (Prince<strong>to</strong>n University Press, Prince<strong>to</strong>n, NJ).<br />
Murtagh, F., <strong>an</strong>d Heck, A., Multivariate Data <strong>Analysis</strong> (D. Reidel Publishing Comp<strong>an</strong>y, Dordrecht, Holl<strong>an</strong>d).<br />
Yip, C.W., <strong>an</strong>d Szalay, A.S., et al., “Distributions of Galaxy Spectral Types in the SDSS”, AJ, 128, 585-609, 2004.