22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

efined <strong>Buneman</strong> therefore suffers a great penalty as as the number of species<br />

goes up. Mathematically, when the number of species approaches infinity,<br />

n − 3<br />

n 4 → 0<br />

This would indicate that the method is not very useful for large data sets,<br />

since it suffers a great disadvantage. It would be interesting to see the performance<br />

of a quartet based method that considers a fixed percentage of quartets<br />

per split, e.g. 5–10%, which would not suffer from the same problems with scalability<br />

as the refined <strong>Buneman</strong> method. The problem becomes finding the 5–10%<br />

of least scoring quartets for every split in an efficient manner, while ensuring the<br />

set of splits produced is still a compatible set of splits — or tree. The Q ∗ method<br />

[BG00] tackles the problem of going from favorable quartets to a compatible set<br />

of splits, at the cost of performance compared to the refined <strong>Buneman</strong> method.<br />

One experiment which would be interesting to perform, but which the author<br />

has not undertaken, is to measure the quality of splits from the Neighbor-Joining<br />

method that are either accepted or rejected by the <strong>Buneman</strong> method. The<br />

authors expectation is that the splits accepted by the refined <strong>Buneman</strong> method<br />

would get a high confidence, as that method relies on much more evidence<br />

then the Neighbor-Joining method does. Such an investigation could be done<br />

by inspecting the PFAM data and interpreting their biological meaning, by<br />

using bootstrap tests, or by using simulated data where one would know the<br />

evolutionary history. Is it the case that the splits in RB(δ) are more trustworthy<br />

than splits in NJ(δ) \ RB(δ) In the cases where the NJ method infers a fully<br />

resolved binary tree, and the RB method infers no splits at all, which one<br />

tells the truth Some datasets might be completely random, and an NJ tree<br />

based on such a set is completely useless — on the other hand, tests show that<br />

the refined <strong>Buneman</strong> method produces very few splits or non at all on input<br />

distance matrices with random entries. So when the RB method tells us that<br />

the dataset does not warrant any splits, is that because the RB method has<br />

good biological properties This question is beyond the scope of this thesis, but<br />

certainly deserves further investigation.<br />

96

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!