computing the quartet distance between general trees
computing the quartet distance between general trees
computing the quartet distance between general trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.1. IMPLEMENTATION 33that we can look up in constant time:|T ′ rest ∩ T a| = |T a | − (|T a ∩ T ′ a | + |T a ∩ T ′ b | + |T a ∩ T ′ c |)|T ′ rest ∩ T b| = |T b | − (|T b ∩ T ′ a | + |T b ∩ T ′ b | + |T b ∩ T ′ c |)|T rest ′ ∩ T c| = |T c | − (|T c ∩ T a ′ | + |T c ∩ T ′ b | + |T c ∩ T c ′ |) (6.3)The last expression is derived directly from <strong>the</strong> number of leaves in T ′ :|T rest ′ | = n − (|T a ′ | + |T ′ b | + |T c ′ |) (6.4)We are now able to compute <strong>the</strong> number of shared star <strong>quartet</strong>s and thus, <strong>the</strong> overallnumber of shared <strong>quartet</strong>s of some triplet in constant time. Combining this with <strong>the</strong>approach of finding all centers associated with some pair of leaves in linear time, yieldsa running time of <strong>the</strong> entire algorithm of O(n 3 ). Because of <strong>the</strong> table for shared leaf setsizes, <strong>the</strong> algorithm requires a space consumption of O(n 2 ).Just like with <strong>the</strong> quartic algorithm <strong>the</strong> shared <strong>quartet</strong>s are counted too many times.Here, however, each <strong>quartet</strong> is counted twelve times. This is due to <strong>the</strong> fact that we dealwith pairs and that each <strong>quartet</strong> is considered once for each possible six pairs that canbe composed from <strong>the</strong> four leaves. In addition, each of those pairs will be used for constructionof two triplets; one for each of <strong>the</strong> two remaining leaves. Therefore, <strong>the</strong> numberof shared <strong>quartet</strong> topologies counted is divided by twelve.See Alg. 6.1 for an outline of <strong>the</strong> algorithm.Contribution 6.1 Despite of <strong>the</strong> description in Section 2.2 of Christiansen et al. [6]which states that each <strong>quartet</strong> is counted four times, <strong>the</strong> cubic algorithm is actuallycounting each <strong>quartet</strong> twelve times as explained in <strong>the</strong> text. This is a minor change,but of course necessary to keep in mind, as to get <strong>the</strong> correct result when implementing<strong>the</strong> algorithm. I made <strong>the</strong> discovery while working on my implementation, seeing that<strong>the</strong> result was consistently triple of what was expected.6.1 ImplementationThe cubic algorithm has been implemented first in Python and later in C++. Since nounusual tricks or language features are required, <strong>the</strong> two implementations are very similar.The first step is <strong>the</strong> implementation of <strong>the</strong> algorithms for calculating <strong>the</strong> leaf sets –line 1 and 2 of Alg. 6.1. These have been described and tested in Sec. 5.3.