13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6Cubic time algorithmHere I will describe ano<strong>the</strong>r algorithm for <strong>quartet</strong> <strong>distance</strong> computation by Christiansenet al. [6] that improves <strong>the</strong> solution to a cubic running time at <strong>the</strong> expense of an increasedspace consumption. The algorithm is based on <strong>the</strong> concept of shared leaf set sizes, introducedin Section 5.2, and fur<strong>the</strong>r extends <strong>the</strong> idea of using centers, as introduced alongwith <strong>the</strong> quartic algorithm described in Chapter 4.We are interested in <strong>the</strong> number of <strong>quartet</strong>s for which <strong>the</strong> topology differ <strong>between</strong><strong>the</strong> two <strong>trees</strong>. Therefore, we might as well count or calculate <strong>the</strong> number of <strong>quartet</strong>sthat share <strong>the</strong> same topology and <strong>the</strong>n subtract this number from <strong>the</strong> overall number of<strong>quartet</strong>s, ( n4).Having calculated <strong>the</strong> shared leaf set sizes, <strong>the</strong> number of leaves common to twosub<strong>trees</strong>, |T x ∩ T x ′ |, can be found with a constant time look-up. This will be used extensivelyto calculate <strong>the</strong> number of shared <strong>quartet</strong>s containing some triplet of leaves,(a,b,c), in constant time. However, <strong>the</strong> main idea of <strong>the</strong> algorithm is to process pairs ofleaves, (a,b), of which <strong>the</strong>re are O(n 2 ), and <strong>the</strong>n, in linear time, to calculate <strong>the</strong> numberof shared <strong>quartet</strong>s that include a given pair. This is done by considering all internalnodes on <strong>the</strong> path from a to b as centers. These centers can clearly be found in lineartime. Every leaf c, different from a and b, can be reached from exactly one of <strong>the</strong> centersC , by following an outgoing edge from C that is not part of <strong>the</strong> path <strong>between</strong> a and b. SeeFig. 6.1.In linear time, a path <strong>between</strong> a and b is found and an array computed, storing inentry i , <strong>the</strong> center of <strong>the</strong> triplet (a,b,i ). This is done for both <strong>trees</strong>. The arrays are linearin size and processing each pair of centers of leaves (a,b,i ) in constant time gives anoverall running time of <strong>the</strong> algorithm of O(n 3 ). The following and remaining part of <strong>the</strong>algorithm is a recipe for constant time computation of <strong>the</strong> number of shared <strong>quartet</strong>scontaining a triplet (a,b,c), given two centers, C in T and C ′ in T ′ , of <strong>the</strong> triplet.31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!