22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A, C, T<br />

1 2 3<br />

T<br />

A, C, T<br />

T<br />

C, T<br />

A, T<br />

T<br />

T<br />

C T A T T C T A T T<br />

C T A T T<br />

Figure 4.1: An illustration of the parsimony tree method.<br />

4.3 Maximum likelihood methods<br />

Using maximum likelihood methods is quite simple in theory. We start out with<br />

e.g. a set of n nucleotide sequences of length m — aligned such that only substitutions<br />

occur, not insertions or deletions. We then postulate some evolutionary<br />

tree topology over the n sequences, giving us a rooted binary tree with n − 1<br />

inner nodes. For each inner node we assume there is some ancestor sequence for<br />

the sequences in the subtree of that node, but we do not know which nucleotides<br />

are in this ancestor sequence.<br />

We assume some substitution model, and there are many to choose from,<br />

ranging from simple to very complicated. Jukes-Cantor ([JC69]), Kimura ([Kim80])<br />

and Hasegawa-Kishino-Yano ([HKY85]) are names of well known substitution<br />

models, and there is a long hierarchy of increasingly complex models using more<br />

and more biologically founded assumptions. Now, to find the likelihood of a single<br />

nucleotide site in the n leaf sequences we have to multiply probabilities of<br />

substitutions through the tree, for all possible substitutions, i.e. for all possible<br />

assignments of nucleotides to sites in the n − 1 inner nodes in the tree. And<br />

then we would have to sum over all these likelihoods to find the likelihood of<br />

the entire sequences, i.e. a sum over m terms.<br />

Now, this expression would have to be evaluated for all possible tree topologies,<br />

and of course this results in a very time consuming algorithm. But, since<br />

the substitution model is actually based in biology, we have a high confidence<br />

that the resulting tree with maximum likelihood is the real tree.<br />

One way of making this method useful in practise would be to search for<br />

tree topologies in some clever way, so that the search would be able to skip<br />

large parts of the search space, which would of course limit the accuracy of the<br />

method. The ML method is also useful for evaluating trees found by other tree<br />

reconstruction method, since we assume the method captures a lot of biological<br />

meaning, depending on the model.<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!