computing the quartet distance between general trees
computing the quartet distance between general trees
computing the quartet distance between general trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5Calculating leaf set sizesCalculating <strong>the</strong> size of every subtree leaf set in a given tree, that is, <strong>the</strong> number of leavescontained in <strong>the</strong> subtree, is an essential part of two of <strong>the</strong> algorithms studied in this<strong>the</strong>sis, namely <strong>the</strong> cubic time algorithm of Chap. 6 and, most important, <strong>the</strong> sub-cubicalgorithm of Chap. 7, that is <strong>the</strong> central topic of <strong>the</strong> <strong>the</strong>sis. So is <strong>the</strong> calculation of sharedleaf set sizes <strong>between</strong> two <strong>trees</strong>. A shared leaf set is <strong>the</strong> intersection of a pair of subtreeleaf sets. It turns out that <strong>the</strong> former is needed under fast calculation of <strong>the</strong> latter.Christiansen and Randers [5] study <strong>the</strong>se calculations for different types of <strong>trees</strong> andhave been my primary inspiration. Here I will focus solely on outlining <strong>the</strong> approach andaspects important to this <strong>the</strong>sis.5.1 Subtree leaf set sizesThe calculation of all subtree leaf set sizes can be done in time O(n). Consider some<strong>general</strong> unrooted tree, T like <strong>the</strong> one in Fig. 5.1(a). If T is rooted in one of its internalnodes, e.g. r , one can consider directed edges as ei<strong>the</strong>r pointing away from r or towardsr , like e and e 2 respectively, see Fig. 5.1(b). Each directed edge represents a subtree. Thesubtree F , represented by e, does not contain r and <strong>the</strong> subtree ¯F , represented by <strong>the</strong>opposite edge, ē, does does contain r .The leaf set size of every subtree that, like F , does not contain r , can be calculatedrecursively using <strong>the</strong> following recipe. Make a depth-first traversal of all nodes, v ∈ T ,starting at r . For each node v we look at <strong>the</strong> subtree defined by <strong>the</strong> edge entering v andpointing away from r . If v is a leaf node, <strong>the</strong> leaf set size is 1. If v is an internal node,<strong>the</strong> leaf set size is <strong>the</strong> sum of all leaf set sizes of <strong>the</strong> sub<strong>trees</strong> pointing downwards from v.Half of <strong>the</strong> leaf set sizes have been calculated and this calculation takes O(n) time, sincea tree contains a linear number of nodes.25