13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.1. IMPLEMENTATION 33that we can look up in constant time:|T ′ rest ∩ T a| = |T a | − (|T a ∩ T ′ a | + |T a ∩ T ′ b | + |T a ∩ T ′ c |)|T ′ rest ∩ T b| = |T b | − (|T b ∩ T ′ a | + |T b ∩ T ′ b | + |T b ∩ T ′ c |)|T rest ′ ∩ T c| = |T c | − (|T c ∩ T a ′ | + |T c ∩ T ′ b | + |T c ∩ T c ′ |) (6.3)The last expression is derived directly from <strong>the</strong> number of leaves in T ′ :|T rest ′ | = n − (|T a ′ | + |T ′ b | + |T c ′ |) (6.4)We are now able to compute <strong>the</strong> number of shared star <strong>quartet</strong>s and thus, <strong>the</strong> overallnumber of shared <strong>quartet</strong>s of some triplet in constant time. Combining this with <strong>the</strong>approach of finding all centers associated with some pair of leaves in linear time, yieldsa running time of <strong>the</strong> entire algorithm of O(n 3 ). Because of <strong>the</strong> table for shared leaf setsizes, <strong>the</strong> algorithm requires a space consumption of O(n 2 ).Just like with <strong>the</strong> quartic algorithm <strong>the</strong> shared <strong>quartet</strong>s are counted too many times.Here, however, each <strong>quartet</strong> is counted twelve times. This is due to <strong>the</strong> fact that we dealwith pairs and that each <strong>quartet</strong> is considered once for each possible six pairs that canbe composed from <strong>the</strong> four leaves. In addition, each of those pairs will be used for constructionof two triplets; one for each of <strong>the</strong> two remaining leaves. Therefore, <strong>the</strong> numberof shared <strong>quartet</strong> topologies counted is divided by twelve.See Alg. 6.1 for an outline of <strong>the</strong> algorithm.Contribution 6.1 Despite of <strong>the</strong> description in Section 2.2 of Christiansen et al. [6]which states that each <strong>quartet</strong> is counted four times, <strong>the</strong> cubic algorithm is actuallycounting each <strong>quartet</strong> twelve times as explained in <strong>the</strong> text. This is a minor change,but of course necessary to keep in mind, as to get <strong>the</strong> correct result when implementing<strong>the</strong> algorithm. I made <strong>the</strong> discovery while working on my implementation, seeing that<strong>the</strong> result was consistently triple of what was expected.6.1 ImplementationThe cubic algorithm has been implemented first in Python and later in C++. Since nounusual tricks or language features are required, <strong>the</strong> two implementations are very similar.The first step is <strong>the</strong> implementation of <strong>the</strong> algorithms for calculating <strong>the</strong> leaf sets –line 1 and 2 of Alg. 6.1. These have been described and tested in Sec. 5.3.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!