13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5Calculating leaf set sizesCalculating <strong>the</strong> size of every subtree leaf set in a given tree, that is, <strong>the</strong> number of leavescontained in <strong>the</strong> subtree, is an essential part of two of <strong>the</strong> algorithms studied in this<strong>the</strong>sis, namely <strong>the</strong> cubic time algorithm of Chap. 6 and, most important, <strong>the</strong> sub-cubicalgorithm of Chap. 7, that is <strong>the</strong> central topic of <strong>the</strong> <strong>the</strong>sis. So is <strong>the</strong> calculation of sharedleaf set sizes <strong>between</strong> two <strong>trees</strong>. A shared leaf set is <strong>the</strong> intersection of a pair of subtreeleaf sets. It turns out that <strong>the</strong> former is needed under fast calculation of <strong>the</strong> latter.Christiansen and Randers [5] study <strong>the</strong>se calculations for different types of <strong>trees</strong> andhave been my primary inspiration. Here I will focus solely on outlining <strong>the</strong> approach andaspects important to this <strong>the</strong>sis.5.1 Subtree leaf set sizesThe calculation of all subtree leaf set sizes can be done in time O(n). Consider some<strong>general</strong> unrooted tree, T like <strong>the</strong> one in Fig. 5.1(a). If T is rooted in one of its internalnodes, e.g. r , one can consider directed edges as ei<strong>the</strong>r pointing away from r or towardsr , like e and e 2 respectively, see Fig. 5.1(b). Each directed edge represents a subtree. Thesubtree F , represented by e, does not contain r and <strong>the</strong> subtree ¯F , represented by <strong>the</strong>opposite edge, ē, does does contain r .The leaf set size of every subtree that, like F , does not contain r , can be calculatedrecursively using <strong>the</strong> following recipe. Make a depth-first traversal of all nodes, v ∈ T ,starting at r . For each node v we look at <strong>the</strong> subtree defined by <strong>the</strong> edge entering v andpointing away from r . If v is a leaf node, <strong>the</strong> leaf set size is 1. If v is an internal node,<strong>the</strong> leaf set size is <strong>the</strong> sum of all leaf set sizes of <strong>the</strong> sub<strong>trees</strong> pointing downwards from v.Half of <strong>the</strong> leaf set sizes have been calculated and this calculation takes O(n) time, sincea tree contains a linear number of nodes.25

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!