computing the quartet distance between general trees
computing the quartet distance between general trees
computing the quartet distance between general trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.3. IMPLEMENTING LEAF SET ALGORITHMS 29whereas <strong>the</strong> star tree, having n edges, would be faster to deal with. In <strong>between</strong>, sqrt-<strong>trees</strong>should be faster than wc-<strong>trees</strong>, again due to <strong>the</strong> number of edges.This algorithm has a quadratic space consumption, due to <strong>the</strong> table being used forstorage, which should not cause any problems with regards to memory. As an example,think of two binary <strong>trees</strong> of 800 leaves. The table has an entry for each pair of sub<strong>trees</strong>,which is equal to each pair of directed edges. Since a binary tree has |E| = 2n − 3, <strong>the</strong>equation is roughly:2|E| × 2|E| × size_of_int ≈ 2 × 1600 × 2 × 1600 × 4 Bytes ≈ 41 MB.This will be no problem for <strong>the</strong> test environment described in Sec. 2.2, which will haveplenty of memory to spare.Result Figure 5.2 and 5.3 display <strong>the</strong> results of <strong>the</strong> two implementations, Python andC++ respectively, being applied to <strong>the</strong> five types of <strong>trees</strong>. The first thing to observe is <strong>the</strong>correctness of <strong>the</strong> time bound. Every plot is parallel to <strong>the</strong> line indicating a quadraticgrowth. Fur<strong>the</strong>rmore, <strong>the</strong> estimated exponents of <strong>the</strong> expressions describing <strong>the</strong> plotsare very close to 2. Next, <strong>the</strong> expectations about <strong>the</strong> order among <strong>the</strong> <strong>trees</strong> hold true.It seems to be correct that more edges lead to higher processing time. There is actuallyas large a difference as a factor of ten <strong>between</strong> <strong>the</strong> best and worst results. What is notas clear, because of <strong>the</strong> log-log-plot, is that <strong>the</strong> difference is more pronounced <strong>the</strong> larger<strong>the</strong> <strong>trees</strong> become. This is <strong>the</strong> case, because <strong>the</strong> <strong>distance</strong> <strong>between</strong> plots remains <strong>the</strong> same,even though one step on an axis becomes increasingly significant when moving along <strong>the</strong>axis. This is of course a consequence of <strong>the</strong> quadratic development.Last and less important, we can compare <strong>the</strong> two figures and see that <strong>the</strong> C++ implementationis clearly faster than <strong>the</strong> Python implementation.