22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

So how do we find σ ′ We start by counting T bottom-up with respect to<br />

U and V ,wehavethatforeverynodew ∈ T which hangs from some edge e<br />

that u w is equal the the value |U ′ ∩ U|. U ′ is the set of leaves which lie in the<br />

subtree of w, sou w is both the number of elements from U in w’s subtree and<br />

the number of elements that U and U ′ have in common. Similarly, we also have<br />

that v w = |U ′ ∩ V | by the same argument.<br />

Now we apply basic set theory to find the values |V ′ ∩ U| = |U|−|U ′ ∩ U|<br />

and |V ′ ∩ V | = |V |−|V ′ ∩ V |. We have all four quantities, and we can check if<br />

any one of them is zero, and answer the question whether w is the node we are<br />

looking for (consequently whether e is the edge we are looking for).<br />

The time spent searching for an incompatible node is linear. We use linear<br />

time decorating T with U/V counts, and we use linear time checking each one for<br />

incompatibility. Given u w , v w , |U| and |V | we can in constant time determine<br />

incompatibility by calculating the quantities described above.<br />

We are not guaranteed to find an incompatible split, but if we do, we can report<br />

it in linear time. To do this, we allocate a bitvector b of length n and search<br />

through one of the subtrees induced by e. Wemarkanentryinb corresponding<br />

to the index of every leaf we find.<br />

6.3 Testing the tree data structure<br />

The tree data structure has not been tested formally. Of course, all tree operations<br />

have been unit-tested, and have been found to be correct. Also, performance<br />

tests have been run to some extent, but this is not easy: if we want to<br />

test the performance of the tree operations, we would need to be able to create<br />

highly resolved trees by inserting a large number of compatible splits created<br />

at random, which could then be a basis for inserting, deleting or searching for<br />

splits. Doing the operations on an unresolved tree would not give a realistic<br />

picture of the performance. It is quite easy to generate a larvae-tree by starting<br />

out with a bitvector in the form 11000..., and then generating splits of<br />

the form 111000..., 1111000..., 11111000... andsoon. Thiswasusedfor<br />

informal performance testing, but clearly this kind of tree is biased, and the<br />

author has chosen not to present a formal test on this basis. Still, the author<br />

claims that the tree data structure does indeed run as specified, referring to<br />

the performance test for the whole algorithm — if the tree data structure did<br />

not support linear time insertion, deletion and searching, the refined <strong>Buneman</strong><br />

algorithm would not be able to perform as specified.<br />

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!