12.01.2015 Views

Additive vs ultrametric: triangle inequality Additive vs ultrametric ...

Additive vs ultrametric: triangle inequality Additive vs ultrametric ...

Additive vs ultrametric: triangle inequality Additive vs ultrametric ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Additive</strong> <strong>vs</strong> <strong>ultrametric</strong>: <strong>triangle</strong> <strong>inequality</strong><br />

• Pairwise distance: any numeric variable that:<br />

– Expresses the difference between two objects a,b.<br />

– Has at least the properties of being metric:<br />

(1) d(a,a)=0, d(b,b)=0, and d(a,b)≥0<br />

– Identical objects are indistinguishable.<br />

– Nonidentical objects might or might not be<br />

distinguishable.<br />

(2) If a≠b, then d(a,b)>0<br />

– If the objects differ in properties being measured, then<br />

their distance must be greater than zero.<br />

(3) d(a,b)=d(b,a)<br />

– Symmetry.<br />

<strong>Additive</strong> <strong>vs</strong> <strong>ultrametric</strong>: <strong>triangle</strong> <strong>inequality</strong><br />

(4) d(a,c) ≤ d(a,b) + d(b,c) a<br />

– Triangle <strong>inequality</strong>.<br />

b<br />

• Any set of distances having the 4 metric properties will<br />

produce an additive tree.<br />

• Properties can be superimposed on distances, even if<br />

they don’t possess the properties.<br />

– ‘Forces’ the distances into an additive tree.<br />

a<br />

c<br />

–4 th property can be relaxed:<br />

(4) d(a,c) ≤ max{d(a,b), d(b,c)} d(bc)}<br />

b<br />

• Any set of distances having these properties will produce<br />

an <strong>ultrametric</strong> tree.<br />

• Any distances can be ‘forced’ into an <strong>ultrametric</strong> tree.<br />

• Ensures that clustering monotonically increases with<br />

distance.<br />

c<br />

1


Some commonly used distances<br />

• Euclidean distance:<br />

• Manhattan distance:<br />

• Mahalanobis distance:<br />

• Hamming distance between two strings of equal<br />

length: number of positions in which symbols are<br />

different (→ percent dissimilarity).<br />

Ultrametric and additive trees for same data<br />

UPGMA<br />

7<br />

Neighbor-joining<br />

g<br />

6<br />

4<br />

5<br />

5<br />

7<br />

1<br />

4<br />

3<br />

2<br />

6<br />

1<br />

0.8<br />

0.7<br />

0.6 0.5 0.4 0.3<br />

Distance<br />

0.2<br />

0.1<br />

0<br />

2<br />

3<br />

Monotonically<br />

Increasing clustering<br />

2


General procedure for<br />

hierarchical cluster analysis<br />

(1) Begin with N x N symmetric distance matrix.<br />

(2) Each object (taxon) initially considered to be a<br />

separate cluster.<br />

(3) Find two objects i and j separated by smallest<br />

distance.<br />

(4) Combine objects i and j into a new cluster, k.<br />

(5) Calculate distance between the new cluster and all<br />

other existing clusters.<br />

– Reduces size of the distance matrix.<br />

(6) Go to step 3. Continue until all objects have been<br />

merged into a single cluster.<br />

Cluster analysis<br />

• Methods differ according to (5) how distances are<br />

calculated between new cluster and all other<br />

existing clusters:<br />

– Single linkage:<br />

– Complete linkage:<br />

– UPGMA, WPGMA:<br />

3


Cluster analyses<br />

http://www.biomedcentral.com/1471-2105/9/90/figure/F5<br />

Distances are always estimates<br />

• Pairwise distance values are never exact:<br />

– Complete phylogenetic record of all<br />

genetic/phenotypic events would constitute set of<br />

distances that are completely:<br />

• <strong>Additive</strong>.<br />

• Mutually consistent across all taxa.<br />

– Observed distances are approximations:<br />

• Comprise ‘true’ distances, plus error.<br />

• Don’t display complete additivity and mutual<br />

consistency.<br />

– Additivity (metricity) and <strong>ultrametric</strong>ity are properties<br />

of a distance matrix.<br />

• Can be tested statistically.<br />

– Q: to what extent do methods recover ‘true’<br />

distances, in presence of error.<br />

4


UPGMA<br />

• UPGMA and other <strong>ultrametric</strong> methods assume<br />

that evolutionary rates have been constant:<br />

– Simulations show will work if rates are variable,<br />

but constant t on average (clocklike, =stationary).<br />

ti – But if rates are not clocklike, can give misleading<br />

results.<br />

– Estimation error can mimic effects of<br />

nonstationarity.<br />

<strong>Additive</strong> trees<br />

• Calculated in iterative fashion.<br />

– Several algorithms, most giving similar results.<br />

– Update placement of nodes rather than formation<br />

of clusters.<br />

• Rate uniformity not assumed.<br />

– Corrects original distances for unequal divergence<br />

among branches.<br />

• Least-squares and neighbor-joining trees<br />

guaranteed to recover true tree if distance matrix<br />

is an exact reflection of a tree.<br />

5


Phenetic <strong>vs</strong> patristic distances<br />

• Assessing agreement between tree and original<br />

distance matrix:<br />

– Patristic distance: predicted between two taxa by<br />

tree.<br />

• UPGMA: 2 x distance to nearest common node.<br />

• NJ: sum of horizontal branch lengths between taxa.<br />

UPGMA<br />

2<br />

Neighbor-joining<br />

7<br />

1<br />

6<br />

3<br />

3<br />

5<br />

2<br />

4<br />

1<br />

1<br />

0.8<br />

0.6 0.4<br />

Distance<br />

0.2<br />

7<br />

6<br />

0<br />

4<br />

5<br />

Correlations between<br />

phenetic and patristic distances<br />

• Ex: phenetic distances calculated from clocklike<br />

tree:<br />

– Add noise.<br />

– Calculate cophenetic correlation between phenetic<br />

and patristic distances.<br />

8<br />

r = 0.59<br />

UPGMA, 7 taxa<br />

8<br />

7.5<br />

r = 0.82<br />

Neighbor-joining tree, 7 taxa<br />

7<br />

Patristic distanc ce<br />

7<br />

6<br />

5<br />

4<br />

3<br />

ce<br />

Patristic distanc<br />

6.5<br />

6<br />

5.5<br />

5<br />

4.5<br />

4<br />

3.5<br />

4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9<br />

Original phenetic distance<br />

3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8<br />

Original phenetic distance<br />

6


Assessing confidence in trees<br />

• Two basic kinds of questions:<br />

(1) Is the overall tree “significant”, or significantly better<br />

than another tree<br />

• Permutation tests.<br />

• Frequency distributions of tree lengths.<br />

– E.g., g 1 statistic.<br />

• Likelihood and Bayesian scores.<br />

(2) Are particular parts of the tree “significant”<br />

• Bootstrapping.<br />

• Bayesian posterior probabilities.<br />

• Bremer support.<br />

Bootstrap<br />

• Bootstrap: general randomization procedure for estimating<br />

reliability of statistics:<br />

– Used primarily for estimating sampling distributions and<br />

associated confidence intervals.<br />

– Bootstrap p( (or bootstrapped) sample from a sample of N<br />

observations is a resample of those N observations with<br />

replacement.<br />

• Any particular observation might be sampled once, twice, or more<br />

often, or might be missed altogether.<br />

• Number of unique bootstrap samples: N N .<br />

• In practice, use a random set of bootstrap samples, sufficient to<br />

stabilize the values we’re trying to estimate.<br />

Original<br />

sample<br />

Bootstrapped resamples (of 46656 possible)<br />

1 1 2 6 5 4 5 6 3 6 4<br />

2 3 3 3 6 5 6 5 1 5 1<br />

3 4 1 2 3 1 6 4 5 1 5<br />

4 2 5 1 1 4 6 1 4 3 3<br />

5 4 1 4 6 1 5 4 5 6 2<br />

6 4 3 4 2 1 3 5 5 6 2<br />

7


Bootstrap<br />

Bootstrapped sampling distribution<br />

0.2<br />

cy<br />

Relative Frequen<br />

015 0.15<br />

0.1<br />

0.05<br />

0<br />

15 1.5 2 25 2.5 3 35 3.5 4 45 4.5 5 55 5.5<br />

Mean<br />

• Rationale and strong assumption: all observations are<br />

equivalent (i.e., independent replicates) and are<br />

representative off all possible observations that might<br />

have been sampled from the population.<br />

Bootstrapping trees<br />

• Very important topic in the theory and practice of<br />

phylogenetic inference.<br />

• Originally proposed for trees estimated by<br />

“parsimony” with discrete data (Felsenstein 1985).<br />

• Can be extended to all trees and networks:<br />

– Cladograms, dendrograms, phenograms, etc.<br />

8


Bootstrapping trees<br />

• Requires data matrix from which distance matrix<br />

is calculated:<br />

– Sample characters, with replacement, maintaining<br />

number of characters.<br />

• Bootstrapped sample, = pseudoreplicate.<br />

• Acts as a proxy for a true replicate from nature.<br />

– Convert pseudoreplicate data matrix to distance<br />

matrix.<br />

• For <strong>ultrametric</strong> and additive trees.<br />

– Construct tree, keep track of appearance of<br />

clusters (groups, clades, etc.).<br />

• Proportion of bootstrapped trees in which observed<br />

clusters occur.<br />

• Gives bootstrap support frequency (BSF).<br />

Original data matrix<br />

Bootstrapping trees<br />

Bootstrapped data matrix<br />

Characters<br />

Characters<br />

Taxon a b c d<br />

Taxon a a c c<br />

A 1 3 2 1 A 1 1 2 2<br />

B 2 4 5 1 B 2 2 5 5<br />

C 1 2 2 1 C 1 1 2 2<br />

D 2 1 3 3 D 2 2 3 3<br />

E 2 1 2 3 E 2 2 2 2<br />

• Assumes that characters are:<br />

– Independent (uncorrelated).<br />

– Equivalent (equally weighted).<br />

– Representative of an underlying pool of characters that<br />

might have been sampled.<br />

9


Bootstrapping trees<br />

Data matrix<br />

Pseudoreplicate<br />

data matrix<br />

Pseudoreplicate<br />

distance matrix<br />

Distance matrix<br />

Tree<br />

Record groups<br />

observed on tree<br />

Yes<br />

Continue<br />

No<br />

Calculate<br />

BSFs<br />

Pseudoreplicate<br />

tree<br />

Record whether<br />

observed<br />

groups are on<br />

pseudoreplicate<br />

tree<br />

Bootstrapped UPGMA and NJ trees<br />

from same random data<br />

0.52<br />

UPGMA<br />

0.98<br />

2<br />

1<br />

Neighbor-joining<br />

1.00<br />

0.64<br />

6<br />

7<br />

0.64<br />

3<br />

0.01<br />

3<br />

0.76<br />

5<br />

4<br />

0.01<br />

0.01<br />

1<br />

2<br />

1<br />

0.8<br />

0.6 0.4<br />

Distance<br />

0.96<br />

0.2<br />

7<br />

6<br />

0<br />

4<br />

5<br />

10


Common interpretations of<br />

bootstrap support frequencies<br />

• Probability that the cluster (clade) will be observed on<br />

repeated sampling of many characters from the<br />

underlying pool of characters.<br />

• Confidence limit on a cluster.<br />

• Level of “significance” of a cluster.<br />

• Probability that a given cluster is a “real” group.<br />

• General measure of support for a given cluster.<br />

Assessing the “significance” of a<br />

bootstrap support frequency<br />

• Minimum-50% (majority consensus) rule.<br />

• Gestalt 70% rule (Hillis & Bull 1993).<br />

• 1-P rule (Felsenstein & Kishino 1993).<br />

• However:<br />

– None of these rules is adequate.<br />

– All can be misleading.<br />

11


Expected BSFs per<br />

hundred bootstrap iterations<br />

(by simulation)<br />

0.7<br />

1.5<br />

6.7<br />

A<br />

B<br />

C<br />

6.7<br />

1.5<br />

1.5<br />

6.7<br />

D<br />

E<br />

F<br />

G<br />

H<br />

I<br />

Other methods for modeling distance matrices<br />

Cluster analysis<br />

methods<br />

Ultrametric<br />

2.5<br />

<strong>Additive</strong><br />

2<br />

1.5<br />

3<br />

1<br />

5<br />

0.5<br />

4<br />

1<br />

2<br />

0<br />

4<br />

3<br />

5<br />

2<br />

1<br />

Distance matrix<br />

1 2 3 4 5<br />

1 0.00 1.41 2.45 2.45 2.83<br />

2 1.41 0.00 2.45 3.16 2.45<br />

3 2.45 2.45 0.00 1.41 1.41<br />

4 245 2.45 316 3.16 141 1.41 000 0.00 245 2.45<br />

5 2.83 2.45 1.41 2.45 0.00<br />

Data<br />

matrix<br />

Characters<br />

Taxa a b c<br />

1 1 2 1<br />

2 2 1 1<br />

3 3 3 2<br />

4 2 4 2<br />

5 3 2 3<br />

Correlation matrix<br />

a b c<br />

a 1.00 0.16 0.79<br />

b 0.16 1.00 0.37<br />

c 0.79 0.37 1.00<br />

PCo2 (34.5% %)<br />

Dim2<br />

PC2 (29.4%)<br />

1<br />

0.5<br />

Ordination<br />

methods<br />

Principal coordinates<br />

0<br />

-0.5<br />

-1<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

-1.5<br />

4<br />

2<br />

3<br />

5<br />

-1 -0.5 0 0.5 1 1.5<br />

PCo1 (60.6%)<br />

Multidimensional scaling<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

1<br />

5<br />

-1.5 -1 -0.5 0 0.5 1<br />

Dim1<br />

Principal components<br />

1<br />

2<br />

-1.5 -1 -0.5 0 0.5 1 1.5<br />

PC1 (64.5%)<br />

4<br />

3<br />

1<br />

3<br />

2<br />

4<br />

5<br />

12


Principal coordinates analysis (PCoA)<br />

• Eigenanalysis method:<br />

– Decomposes information in the distance matrix into a set of<br />

orthogonal axes:<br />

• Axes = principal coordinates.<br />

• Orthogonal = statistically independent.<br />

– Principal coordinates:<br />

• PCo1 accounts for the maximum information in the distance matrix.<br />

• PCo2 accounts for the maximum residual information in the<br />

distance matrix, independent of PCo1.<br />

• PCo3 accounts for the maximum residual information in the<br />

distance matrix, independent of both PCo1 and PCo2.<br />

• Etc.<br />

– For an N×N distance dsta matrix, ,there eeae are N principal pa coordinates.<br />

• The full set of N principal coordinates accounts for all of the<br />

information in the distance matrix.<br />

– Observations (taxa) are projected onto the axes to give<br />

projection scores, which can be plotted with scattergrams.<br />

Principal coordinates analysis (PCoA)<br />

5<br />

Distance matrix<br />

1 2 3 4 5<br />

1 0.00 1.41 2.45 2.45 2.83<br />

2 1.41 0.00 2.45 3.16 2.45<br />

3 2.45 2.45 0.00 1.41 1.41<br />

4 2.45 3.16 1.41 0.00 2.45<br />

5 2.83 2.45 1.41 2.45 0.00<br />

<strong>Additive</strong><br />

4<br />

PCo2 (34.5%)<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

4<br />

3<br />

-1 -0.5 0 0.5 1 1.5<br />

PCo1 (60.6%)<br />

1<br />

2<br />

3<br />

5<br />

2<br />

1<br />

13


Multidimensional scaling (MDS)<br />

• Not an eigenanalysis method:<br />

– Information in distance matrix is not decomposed.<br />

• Points representing observations (taxa) are<br />

‘squeezed’ into 1D, 2D, or 3D… space.<br />

– Placed into space so that interpoint distances<br />

reconstruct original distances as much as possible.<br />

• Metric and nonmetric (monotonic) versions.<br />

– Axes are arbitrary, and points can be arbitrarily rotated.<br />

– Amount of distortion measured as ‘stress’.<br />

• Observations (taxa) are projected onto the axes to<br />

give projection scores, which can be plotted with<br />

scattergrams.<br />

Multidimensional scaling (MDS)<br />

Stress = 0.27<br />

Distance matrix<br />

1 2 3 4 5<br />

1 0.00 1.41 2.45 2.45 2.83<br />

2 1.41 0.00 2.45 3.16 2.45<br />

3 2.45 2.45 0.00 1.41 1.41<br />

4 2.45 3.16 1.41 0.00 2.45<br />

5 2.83 2.45 1.41 2.45 0.00<br />

Dim2<br />

1<br />

0.5<br />

0<br />

-0.5<br />

-1<br />

2<br />

1<br />

3<br />

4<br />

<strong>Additive</strong><br />

4<br />

-1.5 15<br />

-1.5 -1 -0.5 0 0.5 1<br />

Dim1<br />

5<br />

3<br />

5<br />

2<br />

1<br />

14


Principal components analysis (PCA)<br />

• Eigenanalysis method, but based on correlation or covariance<br />

matrix:<br />

– Decomposes information in the correlation matrix into a set of<br />

orthogonal axes:<br />

• Axes = principal p components.<br />

• Orthogonal = statistically independent.<br />

– Principal components:<br />

• PC1 accounts for the maximum information in the distance matrix.<br />

• PC2 accounts for the maximum residual information in the distance<br />

matrix, independent of PC1.<br />

• PC3 accounts for the maximum residual information in the distance<br />

matrix, independent of both PC1 and PC2.<br />

• Etc.<br />

– For an N×N correlation matrix, there are N principal<br />

components.<br />

• The full set of N principal components accounts for all of the<br />

information in the distance matrix.<br />

– Observations (taxa) are projected onto the axes to give<br />

projection scores, which can be plotted with scattergrams.<br />

Principal components analysis (PCA)<br />

4<br />

1<br />

Correlation matrix<br />

a b c<br />

a 1.00 0.16 0.79<br />

b 0.16 1.00 0.37<br />

c 0.79 0.37 1.00<br />

PC2 (29.4%)<br />

0.5<br />

0<br />

1<br />

3<br />

-0.5<br />

<strong>Additive</strong><br />

-1<br />

2<br />

5<br />

-1.5 -1 -0.5 0 0.5 1 1.5<br />

PC1 (64.5%)<br />

4<br />

3<br />

5<br />

2<br />

1<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!