01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

344 8 Data Structure Analysis<br />

In order to analyse this issue in further detail, let us consider the simple dataset<br />

shown in Figure 8.9a, consisting of normally distributed bivariate data generated<br />

with (true) mean µo = [3 3] ’ <strong>and</strong> the following (true) covariance matrix:<br />

⎡5<br />

3⎤<br />

Σ o = ⎢ ⎥ .<br />

⎣3<br />

2⎦<br />

Figure 8.9b shows this dataset after st<strong>and</strong>ardisation (subtraction of the mean <strong>and</strong><br />

division by the st<strong>and</strong>ard deviation) with the new covariance matrix:<br />

⎡ 1 0.<br />

9478⎤<br />

Σ = ⎢<br />

⎥ .<br />

⎣0.<br />

9478 1 ⎦<br />

The st<strong>and</strong>ardised data has unit variance along all variables with the new<br />

covariance: σ12 = σ21 = 3/( 5 2 ) = 0.9487. The eigenvalues <strong>and</strong> eigenvectors of Σ<br />

(computed with <strong>MATLAB</strong> function eig), are:<br />

⎡1.<br />

9487 0 ⎤<br />

Λ = ⎢<br />

⎥ ;<br />

⎣ 0 0.<br />

0513⎦<br />

⎡−1<br />

/ 2 1/<br />

2⎤<br />

U = ⎢<br />

⎥ .<br />

⎢⎣<br />

1/<br />

2 1/<br />

2⎥⎦<br />

Note that tr(Λ) = 2, the total variance, <strong>and</strong> that the first principal component<br />

explains 97% of the total variance.<br />

Figure 8.9c shows the st<strong>and</strong>ardised data projected onto the new system of<br />

variables F1 <strong>and</strong> F2.<br />

Let us now consider a group of data with mean mo = [4 4] ’ <strong>and</strong> a one-st<strong>and</strong>arddeviation<br />

boundary corresponding to the ellipsis shown in Figure 8.9a, with sx<br />

= 5 /2 <strong>and</strong> sy = 2 /2, respectively. The mean vector maps onto m = mo – µo =<br />

[1 1] ’ ; given the values of the st<strong>and</strong>ard deviation, the ellipsis maps onto a circle of<br />

radius 0.5 (Figure 8.9b). This same group of data is shown in the F1-F2 plane<br />

(Figure 8.9c) with mean:<br />

m p<br />

⎡−1<br />

/ 2<br />

= U’<br />

m = ⎢<br />

⎢⎣<br />

1/<br />

2<br />

1/<br />

1/<br />

2⎤<br />

⎡1⎤<br />

⎥ ⎢ ⎥<br />

2⎥⎦<br />

⎣1⎦<br />

⎡<br />

= ⎢<br />

⎣<br />

0 ⎤<br />

2<br />

⎥ .<br />

⎦<br />

Figure 8.9d shows the correlations of the principal components with the original<br />

variables, computed with formula 8.9:<br />

rF1 X = rF1 Y = 0.987; rF2 X = − rF2 Y = 0.16 .<br />

These correlations always lie inside a unit-radius circle. Equal magnitude<br />

correlations occur when the original variables are perfectly correlated with<br />

λ1 = λ2 = 1. The correlations are then rF1 X = rF1 Y =1/ 2 (apply formula 8.9).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!