02.07.2014 Views

Lecture Notes on Compositional Data Analysis - Sedimentology ...

Lecture Notes on Compositional Data Analysis - Sedimentology ...

Lecture Notes on Compositional Data Analysis - Sedimentology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

40 Chapter 5. Exploratory data analysis<br />

3. a box-plot, summarising the order statistics of each ilr coordinate.<br />

Each coordinate is represented in a horiz<strong>on</strong>tal axis, which limits corresp<strong>on</strong>d to a certain<br />

range (the same for every coordinate). The vertical bar going up from each <strong>on</strong>e of these<br />

coordinate axes represents the variance of that specific coordinate, and the c<strong>on</strong>tact point<br />

is the coordinate mean. Figure 5.2 shows these elements in an illustrative example.<br />

Given that the range of each coordinate is symmetric (in Figure 5.2 it goes from<br />

−3 to +3), the box plots closer to <strong>on</strong>e part (or group) indicate that part (or group) is<br />

more abundant. Thus, in Figure 5.2, SiO2 is slightly more abundant that Al 2 O 3 , there<br />

is more FeO than Fe 2 O 3 , and much more structural oxides (SiO2 and Al 2 O 3 ) than the<br />

rest. Another feature easily read from a balance-dendrogram is symmetry: it can be<br />

assessed both by comparis<strong>on</strong> between the several quantile boxes, and looking at the<br />

difference between the median (marked as “Q2” in Figure 5.2 right) and the mean.<br />

In fact, a balance-dendrogram c<strong>on</strong>tains informati<strong>on</strong> <strong>on</strong> the marginal distributi<strong>on</strong> of<br />

each coordinate. It can potentially c<strong>on</strong>tain any other representati<strong>on</strong> of these marginals,<br />

not <strong>on</strong>ly box-plots: <strong>on</strong>e could use the horiz<strong>on</strong>tal axes to represent, e.g., histograms or<br />

kernel density estimati<strong>on</strong>s, or even the sample itself. On the other side, a balancedendrogram<br />

does not c<strong>on</strong>tain any informati<strong>on</strong> <strong>on</strong> the relati<strong>on</strong>ship between coordinates:<br />

this can be approximately inferred from the biplot or just computing the correlati<strong>on</strong><br />

matrix of the coordinates.<br />

5.6 Illustrati<strong>on</strong><br />

We are going to use, both for illustrati<strong>on</strong> and for the exercises, the data set X given<br />

in table 5.1. They corresp<strong>on</strong>d to 17 samples of chemical analysis of rocks from Kilauea<br />

Iki lava lake, Hawaii, published by Richter and Moore (1966) and cited by Rollins<strong>on</strong><br />

(1995).<br />

Originally, 14 parts had been registered, but H 2 O + and H 2 O − have been omitted<br />

because of the large amount of zeros. CO 2 has been kept in the table, to call attenti<strong>on</strong><br />

up<strong>on</strong> parts with some zeros, but has been omitted from the study precisely because of<br />

the zeros. This is the strategy to follow if the part is not essential in the characterisati<strong>on</strong><br />

of the phenomen<strong>on</strong> under study. If the part is essential and the proporti<strong>on</strong> of zeros is<br />

high, then we are dealing with two populati<strong>on</strong>s, <strong>on</strong>e characterised by zeros in that<br />

comp<strong>on</strong>ent and the other by n<strong>on</strong>-zero values. If the part is essential and the proporti<strong>on</strong><br />

of zeros is small, then we can look for input techniques, as explained in the beginning<br />

of this chapter.<br />

The centre of this data set is<br />

g = (48.57, 2.35, 11.23, 1.84, 9.91, 0.18, 13.74,9.65,1.82,0.48, 0.22) ,<br />

the total variance is totvar[X] = 0.3275 and the normalised variati<strong>on</strong> matrix T ∗ is given<br />

in Table 5.2.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!