22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.4 Hybrid methods<br />

All the methods we have described until now are very one-sided in their bias,<br />

exploring only one side of the trade-off between accuracy and speed. Hybrid<br />

methods do exist, where a combination of a fast method and an accurate one<br />

might produce a practical method with sound biological meaning.<br />

Another method called Disc Covering is described in [HNP + 98], using a<br />

divide-and-conquer type of approach. From the abstract:<br />

(The Disc-Covering Method) DCM obtains a decomposition of the<br />

input dataset into small overlapping sets of closely related taxa,<br />

reconstructs trees on these subsets (using a “base” phylogenetic<br />

method of choice), and then combines the subtrees into one tree on<br />

the entire set of taxa. Because the subproblems analyzed by DCM<br />

are smaller, computationally expensive methods such as maximum<br />

likelihood estimation can be used without incurring too much cost.<br />

4.5 Accuracy of inferred trees<br />

It is possible to evaluate the quality or confidence of branches in the evolutionary<br />

trees we find using our tree reconstruction methods. These statistical methods<br />

are known as bootstrap tests, and they are described [NK00], with references to<br />

other articles.<br />

The basic idea is to find some tree T using some method M, forsomedata<br />

set D. Lets say D consists of n nucleotide sequences. Now we may select n<br />

sequences from D with replacement, to form a new sample dataset D ′ —notice<br />

that the same sequence might occur several times in the new set, while some<br />

sequences might not occur at all. Now we use the new sample to infer a new tree<br />

T ′ bythesamemethodM, andbycomparingT and T ′ we can assign a count<br />

of1tothebranchesinT which also occur in T ′ , and 0 to the rest. Repeating<br />

this process many times (e.g a thousand times) yields a statistic of how often<br />

each branch occurs for different samples, and thus a reflection of how confident<br />

we can be in this particular branch.<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!