02.07.2014 Views

Lecture Notes on Compositional Data Analysis - Sedimentology ...

Lecture Notes on Compositional Data Analysis - Sedimentology ...

Lecture Notes on Compositional Data Analysis - Sedimentology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

36 Chapter 5. Exploratory data analysis<br />

Figure 5.1: Simulated data set before (left) and after (right) centring.<br />

with the same relative c<strong>on</strong>tributi<strong>on</strong> of each log-ratio in the variati<strong>on</strong> array. This is a<br />

significant difference with c<strong>on</strong>venti<strong>on</strong>al standardisati<strong>on</strong>: with real vectors, the relative<br />

c<strong>on</strong>tributi<strong>on</strong>s variable is an artifact of the units of each variable, and most usually should<br />

be ignored; in c<strong>on</strong>trast, in compositi<strong>on</strong>al vectors, all parts share the same “units”, and<br />

their relative c<strong>on</strong>tributi<strong>on</strong> to total variati<strong>on</strong> is a rich informati<strong>on</strong>.<br />

5.4 The biplot: a graphical display<br />

Gabriel (1971) introduced the biplot to represent simultaneously the rows and columns<br />

of any matrix by means of a rank-2 approximati<strong>on</strong>. Aitchis<strong>on</strong> (1997) adapted it for<br />

compositi<strong>on</strong>al data and proved it to be a useful exploratory and expository tool. Here<br />

we briefly describe first the philosophy and mathematics of this technique, and then its<br />

interpretati<strong>on</strong> in depth.<br />

5.4.1 C<strong>on</strong>structi<strong>on</strong> of a biplot<br />

C<strong>on</strong>sider the data matrix X with n rows and D columns. Thus, D measurements have<br />

been obtained from each <strong>on</strong>e of n samples. Centre the data set as described in Secti<strong>on</strong><br />

5.3, and find the coefficients Z in clr coordinates (Eq. 4.3). Note that Z is of the same<br />

order as X, i.e. it has n rows and D columns and recall that clr coordinates preserve<br />

distances. Thus, we can apply to Z standard results, and in particular the fact that the<br />

best rank-2 approximati<strong>on</strong> Y to Z in the least squares sense is provided by the singular<br />

value decompositi<strong>on</strong> of Z (Krzanowski, 1988, p. 126-128).<br />

The singular value decompositi<strong>on</strong> of a matrix of coefficients is obtained from the<br />

matrix of eigenvectors L of ZZ ′ , the matrix of eigenvectors M of Z ′ Z and the square<br />

roots of the s positive eigenvalues λ 1 , λ 2 , . . .,λ s of either ZZ ′ or Z ′ Z, which are the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!