01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

8.2 Dimensional Reduction<br />

8.2 Dimensional Reduction 337<br />

When using principal component analysis for dimensional reduction, one must<br />

decide how many components (<strong>and</strong> corresponding variances) to retain. There are<br />

several criteria published in the literature to consider. The following are commonly<br />

used:<br />

1. Select the principal components that explain a certain percentage (say, 95%)<br />

of tr(Λ). This is a very simplistic criterion that is not recommended.<br />

2. The Guttman-Kaiser criterion discards eigenvalues below the average<br />

tr(Λ)/d (below 1 for st<strong>and</strong>ardised data), which amounts to retaining the<br />

components responsible for the variance contributed by one variable if the<br />

total variance was equally distributed.<br />

3. The so-called scree test uses a plot of the eigenvalues (scree plot),<br />

discarding those starting where the plot levels off.<br />

4. A more elaborate criterion is based on the so-called broken stick model. This<br />

criterion discards the eigenvalues whose proportion of explained variance is<br />

smaller than what should be the expected length lk of the kth longest<br />

segment of a unit length stick r<strong>and</strong>omly broken into d segments:<br />

l<br />

k<br />

1<br />

=<br />

d<br />

d<br />

∑<br />

i=<br />

k<br />

1<br />

. 8.12<br />

i<br />

A table of lk values is given in Tools.xls.<br />

5. The Bartlett’s test method is based on the assessment of whether or not the<br />

null hypothesis that the last p − q eigenvalues are equal, λq+1 = λq+2 = …<br />

= λp, can be accepted. The mathematics of this test are intricate (see Jolliffe<br />

IT, 2002, for a detailed discussion) <strong>and</strong> its results often unreliable. We pay<br />

no further attention to this procedure.<br />

6. The Velicer partial correlation procedure uses the partial correlations<br />

among the original variables when one or more principal components are<br />

removed. Let Sk represent the remaining covariance matrix when the<br />

covariance of the first k principal components is removed:<br />

k<br />

S S − λ u u ’<br />

; k = 0,<br />

1,<br />

K,<br />

d . 8.13<br />

k = ∑<br />

i=<br />

1<br />

i<br />

i<br />

i<br />

<strong>Using</strong> the diagonal matrix Dk of Sk, containing the variances, we compute<br />

the correlation matrix:<br />

−1/<br />

2 −1/<br />

2<br />

k = D k S k D k<br />

R . 8.14<br />

Finally, with the elements rij(k) of Rk we compute the following quantity:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!