22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

efined <strong>Buneman</strong> algorithm which is the main focus of this work belongs in this<br />

subclass of evolutionary tree methods. The hope is that the refined <strong>Buneman</strong><br />

method will be less safe than its namesake and infer more splits, while still<br />

maintaining a high degree of confidence in the splits it infers.<br />

4.2 Parsimony methods<br />

The foundation for this class of methods is the phylosophy of William of Ockham:<br />

Pluralitas non est ponenda sine neccesitate — meaning something along<br />

the lines of the best hypothesis is the one that requires the smallest number of<br />

assumptions 2 .<br />

This philosophy is also known as Ockham’s Razor or the parsimony principle.<br />

In our context we shall use it to create a condition of optimality, saying that if<br />

two proposed evolutionary processes have the same starting and ending points,<br />

we shall assume that the simplest or shortest process is the correct one. For<br />

example, we could imagine two substitutions of the same nucleotide working in<br />

reverse: A → G → A — in this case we would say that no substitutions took<br />

place at all.<br />

The parsimony method works by considering one specific site at a time across<br />

a set of nucleotide sequences. For each site, we postulate all possible binary tree<br />

topologies linking these sites. We now search for a combination of assignments<br />

of nucleotides to inner nodes such that the total number of substitutions is minimal,<br />

selecting that topology/ those topologies for further study, since we might<br />

not identify the same optimal topologies for all sites — we have to sum over<br />

possible topologies to find the one that has the smallest number of substitutions<br />

across all sites.<br />

Figure 4.1 shows an example of estimating the number of substitutions for<br />

a topology. Firstly, we have a tree spanning specific sites across 5 nucleotide<br />

sequences. Secondly, we can fill in the sites for the ancestral taxa by considering<br />

the minimum number of substitutions required, bottom up from the sites we<br />

are already given. In this case, looking at a subtree of C and T in the lower<br />

left corner, we know that their ancestor site must have been either C or T ,<br />

requiring only one substitution, since all other combinations (A or G) would<br />

require two substitutions. Thirdly, we present one solution of of many as to<br />

how the assignment of nucleotides in ancestral sites might have been. In this<br />

case we could make due with only two substitutions, but this solution is not<br />

unique, there are several combinations which require only two substitutions.<br />

The intuition for this method is quite easy to understand, and according to<br />

[NK00] and others, under favorable conditions such as the method is expected to<br />

produce the correct tree. However, under less favorable conditions the method is<br />

known to produce incorrect topologies, and in any case the method is hopelessly<br />

inefficient for large data sets, at least when using exhaustive search techniques.<br />

[NK00] describes how search might be speeded up by using e.g. branch and<br />

bound.<br />

2 or even shorter: keep it simple!<br />

29

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!