13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.3. IMPLEMENTING LEAF SET ALGORITHMS 29whereas <strong>the</strong> star tree, having n edges, would be faster to deal with. In <strong>between</strong>, sqrt-<strong>trees</strong>should be faster than wc-<strong>trees</strong>, again due to <strong>the</strong> number of edges.This algorithm has a quadratic space consumption, due to <strong>the</strong> table being used forstorage, which should not cause any problems with regards to memory. As an example,think of two binary <strong>trees</strong> of 800 leaves. The table has an entry for each pair of sub<strong>trees</strong>,which is equal to each pair of directed edges. Since a binary tree has |E| = 2n − 3, <strong>the</strong>equation is roughly:2|E| × 2|E| × size_of_int ≈ 2 × 1600 × 2 × 1600 × 4 Bytes ≈ 41 MB.This will be no problem for <strong>the</strong> test environment described in Sec. 2.2, which will haveplenty of memory to spare.Result Figure 5.2 and 5.3 display <strong>the</strong> results of <strong>the</strong> two implementations, Python andC++ respectively, being applied to <strong>the</strong> five types of <strong>trees</strong>. The first thing to observe is <strong>the</strong>correctness of <strong>the</strong> time bound. Every plot is parallel to <strong>the</strong> line indicating a quadraticgrowth. Fur<strong>the</strong>rmore, <strong>the</strong> estimated exponents of <strong>the</strong> expressions describing <strong>the</strong> plotsare very close to 2. Next, <strong>the</strong> expectations about <strong>the</strong> order among <strong>the</strong> <strong>trees</strong> hold true.It seems to be correct that more edges lead to higher processing time. There is actuallyas large a difference as a factor of ten <strong>between</strong> <strong>the</strong> best and worst results. What is notas clear, because of <strong>the</strong> log-log-plot, is that <strong>the</strong> difference is more pronounced <strong>the</strong> larger<strong>the</strong> <strong>trees</strong> become. This is <strong>the</strong> case, because <strong>the</strong> <strong>distance</strong> <strong>between</strong> plots remains <strong>the</strong> same,even though one step on an axis becomes increasingly significant when moving along <strong>the</strong>axis. This is of course a consequence of <strong>the</strong> quadratic development.Last and less important, we can compare <strong>the</strong> two figures and see that <strong>the</strong> C++ implementationis clearly faster than <strong>the</strong> Python implementation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!