Refined Buneman Trees
Refined Buneman Trees
Refined Buneman Trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
worst case time and space complexity. However, with respect to performance<br />
in terms of reconstructing accurate phylogenetic trees, the refined <strong>Buneman</strong><br />
method might demonstrate unknown strengths.<br />
16.1 Test setup<br />
In the following section we will test the refined <strong>Buneman</strong> tree algorithm against<br />
two other known algorithms, the <strong>Buneman</strong> method ([Bun71]) and the Neighbor-<br />
Joining ([SN87]) method. All experiments have been run on test data from the<br />
PFAM database of protein sequence families. The data consists of distance<br />
matrices with sizes ranging from 10–50 (30), 100–200 (50) and 500–700 (50),<br />
for a total of 130 test matrices. Matrices with larger sizes are available, but<br />
they will be disregarded due to time constraints. Distance data is given in the<br />
common Phylip format.<br />
16.2 <strong>Buneman</strong> and refined <strong>Buneman</strong><br />
The implementation of the refined <strong>Buneman</strong> tree algorithm can easily be adapted<br />
to mark those splits which are both in B(δ) andinRB(δ). Since we have the<br />
n − 3 least scoring quartets used to calculate the refined <strong>Buneman</strong> index for a<br />
split, it is easy to find the least scoring quartet among them. If that quartet<br />
has positive <strong>Buneman</strong> score, we can mark the split as belonging in B(δ).<br />
This first experiment consists of running the (modified) refined <strong>Buneman</strong><br />
tree algorithm on examples from the set of PFAM distance matrices, counting<br />
for each one the number of splits in B(δ) andinRB(δ). The results from the<br />
experiment are given in Figure 16.1, sorted according to increasing distance<br />
matrix size and plotted in percentage (the size of RB(δ) is 100%).<br />
Figure 16.1 shows that the size of B(δ) fluctuates quite a bit, especially<br />
when the size of the distance matrix increases, where more and more datasets<br />
do not infer any <strong>Buneman</strong> splits at all. The refined <strong>Buneman</strong> method is clearly<br />
much less restrictive then the <strong>Buneman</strong> method, as expected. Regarding the<br />
quality of the splits that are in RB(δ) but not in B(δ), further studies would<br />
need to be undertaken — one could study either the specific dataset for which<br />
the <strong>Buneman</strong> method produces few splits, or use simulated data which would<br />
provide a key to which splits are well-supported and which are unsupported.<br />
16.3 <strong>Refined</strong> <strong>Buneman</strong> and Neighbor-Joining<br />
To test the refined <strong>Buneman</strong> tree method against the Neighbor-Joining method,<br />
the author has run the implementation of the refined <strong>Buneman</strong> tree algorithm<br />
described in this paper, against the Quick-Join algorithm described in [BFM + 03].<br />
The Quick-Join software is available from this website:<br />
http://www.birc.dk/Software/QuickJoin/<br />
89