13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 CHAPTER 1. INTRODUCTIONIt is evident that within binary <strong>trees</strong>, a <strong>quartet</strong> can only inherit one of <strong>the</strong> three butterflytopologies, however, with <strong>the</strong> inclusion of <strong>the</strong> star topology, <strong>the</strong> method works justas well on <strong>general</strong> <strong>trees</strong>. This means that <strong>the</strong> <strong>quartet</strong> <strong>distance</strong> can be used with <strong>trees</strong> thatinclude partly resolved relationships, i.e. polytomies, but it is required that <strong>the</strong> two <strong>trees</strong>specify <strong>the</strong> exact same set of leaves.The <strong>quartet</strong> <strong>distance</strong> does not consider branch length but focuses solely on topologicalproperties. It works equally well with rooted and unrooted <strong>trees</strong> since a rootedtree can be interpreted as unrooted and a <strong>quartet</strong> will inherit <strong>the</strong> same topology in both.However, for rooted <strong>trees</strong>, <strong>the</strong>re is in fact more than one topology for a group of onlythree leaves meaning that also a triplet <strong>distance</strong> has its justification, see Dobson [11].1.3 Overview of algorithms for <strong>quartet</strong> <strong>distance</strong> computationIn computer science, a widely used data structure is <strong>the</strong> tree data structure. It comesin numerous variants, has an endless amount of applications, and <strong>the</strong>refore algorithmsworking on <strong>trees</strong> have been studied in very great detail. Hence, algorithms for calculationof <strong>the</strong> <strong>quartet</strong> <strong>distance</strong> <strong>between</strong> evolutionary <strong>trees</strong> is just ano<strong>the</strong>r application, and whilealgorithms for this problem benefit from previous research, <strong>the</strong>y might end up beingof use within completely different areas than phylogenetics. Likewise, <strong>the</strong> focus of this<strong>the</strong>sis will be purely algorithmic and not on applications in bioinformatics.There are ( n4)∈ O(n 4 ) unique <strong>quartet</strong>s in a tree with n leaves which makes <strong>quartet</strong><strong>distance</strong> calculation a computationally heavy problem. Computing <strong>the</strong> <strong>quartet</strong> <strong>distance</strong>naively by explicitly inspecting <strong>the</strong> topologies of <strong>the</strong> O(n 4 ) <strong>quartet</strong>s in <strong>the</strong> two <strong>trees</strong> takesO(n 5 ) time.Several algorithms have been designed over <strong>the</strong> years, resulting in dramatic improvementsin time usage. Focus has been on calculation of <strong>the</strong> <strong>quartet</strong> <strong>distance</strong> <strong>between</strong> binary<strong>trees</strong>, which do not include star <strong>quartet</strong>s and seem to be less complex to handle.Steel and Penny [16] showed how to calculate <strong>the</strong> <strong>quartet</strong> <strong>distance</strong> in time O(n 3 ). Bryantet al. [4] improved this to O(n 2 ) and introduced some concepts important to this <strong>the</strong>sis.The work of Brodal et al. [3] has also been important to this <strong>the</strong>sis and resulted in <strong>the</strong>fastest known algorithm for binary <strong>trees</strong> with a time bound of O(n logn).For <strong>general</strong> <strong>trees</strong>, Bansal et al. [1] describe an O(n 2 ) time 2-approximation algorithm,but this <strong>the</strong>sis will only deal with exact <strong>quartet</strong> <strong>distance</strong> calculation. Christiansen et al.[6] present three algorithms with running times of O(n 4 ), O(n 3 ) and O(n 2 d 2 ) respectively,where d is <strong>the</strong> maximum degree of any node in <strong>the</strong> two <strong>trees</strong>. Stissing et al. [17]present an O(d 9 n logn) time algorithm. Note that some of <strong>the</strong> algorithms are boundedby <strong>the</strong> degree of <strong>the</strong> internal nodes whereas o<strong>the</strong>rs are independent of this factor and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!