22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

epresentation of evolutionary relationships — but of course, we would like to<br />

do more than just look at the trees.<br />

Definition 1 (Evolutionary tree). An evolutionary tree T is an ordered pair<br />

(T ; φ), where T is a tree with vertex set V and φ : X → V with the property<br />

that, for each v ∈ V of degree at most two, v ∈ φ(X). An evolutionary tree is<br />

also called a semi-labeled tree (on X).<br />

The main thing to notice about Definition 1 is the rule that all leaves must<br />

correspond to species, but not all species have to correspond to leaves. While<br />

working on this thesis, the author experienced some confusion until Definition 2<br />

filled in a gap in the authors understanding of evolutionary and phylogenetic<br />

trees and their relation to graph theoretical trees. One might be tempted to<br />

interpret evolutionary trees as unrooted trees where leaves represent species, but<br />

the small detail that evolutionary trees do not need to be fully resolved is quite<br />

important.<br />

Definition 2 (Phylogenetic tree). A phylogenetic tree T is an evolutionary<br />

tree (T ; φ) with the property that φ is a bijection from X into the set of leaves<br />

of T .<br />

In graph theoretical terms, a phylogenetic tree is a leaf-labeled tree, while<br />

an evolutionary tree is a semi-labeled tree. An illustration of this distinction is<br />

given in Figure 2.1. Here we have an evolutionary tree with three “abnormal”<br />

regions A, B and C, where the tree is not fully resolved. If we look at regions<br />

A and B, we see that a species appears to be the ancestor of other species. It<br />

is of course possible that we have such a dataset, but a more likely explanation<br />

would be that the underlying evolutionary data simple does not tell us how to<br />

resolve these species — which is precisely the situation in region C. Wemight<br />

argue that since we cannot distinguish between these species, they must be the<br />

same species. But it is also very likely that the underlying evolutionary data is<br />

inaccurate.<br />

The important thing to stress is that our tree reconstruction method might<br />

output a tree that is not fully resolved for whatever reason, and additional<br />

analysis might be needed to find the answers we are looking for. An easy way<br />

of obtaining a leaf-labeled tree is to simply add extra edges to those labeled<br />

nodes which have degree two or more. This is illustrated in Figure 2.2. Of<br />

course, by doing so we are loosing information about the original tree, but this<br />

can be remedied by marking the extra edges. The reason for performing this<br />

transformation is that leaf-labeled trees are easier to work with then semi-labeled<br />

trees, for the average computer scientist.<br />

Two graph theoretical results are handy when reasoning about trees and tree<br />

search spaces: a semi-labeled unrooted tree with at most n leaves has at most<br />

n − 2 inner nodes, for a maximum of 2n − 2 nodes in the whole tree. And there<br />

are at most 2n − 3edges.<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!