13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.2. IMPLEMENTATION 477.2 ImplementationThis section gives a description of my work on implementing <strong>the</strong> sub-cubic algorithmof Sec. 7.1. As mentioned, it comes down to implementing two algorithms, namelyshared B (T,T ′ ) and diff B (T,T ′ ), used for counting shared and different butterflies in <strong>the</strong>two <strong>trees</strong> T and T ′ respectively.To calculate <strong>the</strong> total number of butterflies in one tree <strong>the</strong> algorithm for countingshared butterflies is used. This means that <strong>the</strong> basic preprocessing, namely calculating<strong>the</strong> shared leaf set sizes, has to be done three times; once for counting shared and differentbutterflies in <strong>the</strong> two <strong>trees</strong>, once for counting butterflies in <strong>the</strong> tree T and oncefor counting butterflies in <strong>the</strong> tree T ′ . The implementation of <strong>the</strong> algorithm for sharedleaf set size calculation has been described in Sec. 5.3 and will comprise <strong>the</strong> most spaceconsuming part of <strong>the</strong> sub-cubic algorithm, resulting in O(n 2 ) memory usage.The paper [14] presents <strong>the</strong> algorithms as follows: do <strong>the</strong> preprocessing for each pairof inner nodes, <strong>the</strong>n do <strong>the</strong> counting for each pair of directed edges. Instead, I have chosena strategy of processing each pair of inner nodes in turn, including preprocessing andcounting for each pair of directed edges adjacent to <strong>the</strong> nodes. Consequently, <strong>the</strong> onlyreal preprocessing that is done prior to processing <strong>the</strong> nodes is calculating <strong>the</strong> sharedleaf set sizes, <strong>the</strong> implementation of which is described in Sec. 5.3.The full amount of preprocessing information needed to deal with a pair of nodes islisted in App. A. Some of it is needed in both shared B (T,T ′ ) and diff B (T,T ′ ), and someis specific to <strong>the</strong> latter. Therefore, I implemented <strong>the</strong> two toge<strong>the</strong>r in a single functioncapable of returning both counts. If both values are needed, <strong>the</strong> preprocessing specificto <strong>the</strong> counting of different butterflies is just an extension to <strong>the</strong> preprocessing done forshared butterflies.Implementing shared B (T,T ′ ) is straight forward following <strong>the</strong> recipe I have given inSec. 7.1.2, 7.1.4 and App. A. I loop through every pair of internal nodes and fill out all<strong>the</strong> preprocessing tables needed for shared <strong>quartet</strong>s only. Then I go through <strong>the</strong> pairs ofedges adjacent to <strong>the</strong> two nodes, apply Eq. (7.8) and add <strong>the</strong> result to <strong>the</strong> total sum ofshared butterflies. I make use of <strong>the</strong> Boost 1 library for binomial coefficient calculation.Before returning <strong>the</strong> total number of shared butterflies <strong>the</strong> result is divided by 4; by 2because of symmetry (see clarification in Contribution 7.1) and 2 because <strong>the</strong>re are twiceas many directed edges as <strong>the</strong>re are undirected edges. One thing to note is that it onlymakes sense to look at edges pointing to internal nodes since <strong>the</strong>re has to be at least twoleaves behind <strong>the</strong> edge (e.g. in <strong>the</strong> subtree F i on Fig. 7.3) to form a butterfly.Implementing diff B (T,T ′ ) requires more attention. If <strong>the</strong> analysis given in <strong>the</strong> paper1 Boost: http://www.boost.org/

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!