Refined Buneman Trees
Refined Buneman Trees
Refined Buneman Trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
efined <strong>Buneman</strong> therefore suffers a great penalty as as the number of species<br />
goes up. Mathematically, when the number of species approaches infinity,<br />
n − 3<br />
n 4 → 0<br />
This would indicate that the method is not very useful for large data sets,<br />
since it suffers a great disadvantage. It would be interesting to see the performance<br />
of a quartet based method that considers a fixed percentage of quartets<br />
per split, e.g. 5–10%, which would not suffer from the same problems with scalability<br />
as the refined <strong>Buneman</strong> method. The problem becomes finding the 5–10%<br />
of least scoring quartets for every split in an efficient manner, while ensuring the<br />
set of splits produced is still a compatible set of splits — or tree. The Q ∗ method<br />
[BG00] tackles the problem of going from favorable quartets to a compatible set<br />
of splits, at the cost of performance compared to the refined <strong>Buneman</strong> method.<br />
One experiment which would be interesting to perform, but which the author<br />
has not undertaken, is to measure the quality of splits from the Neighbor-Joining<br />
method that are either accepted or rejected by the <strong>Buneman</strong> method. The<br />
authors expectation is that the splits accepted by the refined <strong>Buneman</strong> method<br />
would get a high confidence, as that method relies on much more evidence<br />
then the Neighbor-Joining method does. Such an investigation could be done<br />
by inspecting the PFAM data and interpreting their biological meaning, by<br />
using bootstrap tests, or by using simulated data where one would know the<br />
evolutionary history. Is it the case that the splits in RB(δ) are more trustworthy<br />
than splits in NJ(δ) \ RB(δ) In the cases where the NJ method infers a fully<br />
resolved binary tree, and the RB method infers no splits at all, which one<br />
tells the truth Some datasets might be completely random, and an NJ tree<br />
based on such a set is completely useless — on the other hand, tests show that<br />
the refined <strong>Buneman</strong> method produces very few splits or non at all on input<br />
distance matrices with random entries. So when the RB method tells us that<br />
the dataset does not warrant any splits, is that because the RB method has<br />
good biological properties This question is beyond the scope of this thesis, but<br />
certainly deserves further investigation.<br />
96