22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Algorithm 11 A naive algorithm that computes the refined <strong>Buneman</strong> tree<br />

Require: δ is a distance matrix of size n × n<br />

Ensure: S = RB(δ)<br />

1: S = ∅<br />

2: for σ ∈ σ(X) do<br />

3: Q = ∅<br />

4: for q ∈ q(σ) do<br />

5: Q = Q ∪ q<br />

6: end for<br />

7: SORT(Q)<br />

8: w = 1<br />

n−3<br />

9: if w>0 then<br />

10: S = S ∪ σ<br />

11: end if<br />

12: end for<br />

∑ n−3<br />

i=1 β Q[i](δ)<br />

13.2 Implementation highlights<br />

There are two important issues in this algorithm: we need to avoid considering<br />

duplicate splits, and we need to avoid considering duplicate quartets. Both have<br />

built-in symmetries.<br />

Firstly, let us consider splits as bitvectors. We can start out with the bitvector<br />

containing all zeros, and count through splits by bit-flipping. We flip from<br />

low-order end to high-order end every bit that is ’1’. When we reach a bit that is<br />

’0’, we flip it to ’1’ and stop. Then we have the next split. To avoid duplicates,<br />

we can just restrict ourselves to counting through the first n − 1 bits, leaving<br />

the nth bit as ’0’.<br />

Regarding quartets, we need to recognize that quartets are symmetric on<br />

either side of their central edge, i.e. the quartet ab|cd isthesamequartetas<br />

ba|cd, ab|dc and ba|dc. For the split σ = U|V we say that a, b ∈ U and c, d ∈ V ,<br />

and to avoid duplicates we just have to make sure that a ≤ b and c ≤ d.<br />

13.3 Correctness of the reference implementation<br />

We are going to use this reference implementation of the refined <strong>Buneman</strong> tree<br />

algorithm to demonstrate the correctness of our implementation of the refined<br />

<strong>Buneman</strong> tree algorithm described in [BFÖ+ 03]. But first we must convince<br />

ourselves that the reference implementation is correct.<br />

To this end, the author has written a test program for the reference implementation.<br />

The test program generate a random distance matrix, and during<br />

the computation of the reference implementation the program will output:<br />

• the distance matrix.<br />

75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!