Refined Buneman Trees

More documents

Recommendations

Info

know that we need at most 2(n − 3) matrices initialized with diagonal quartets of minimum score. Therefore in line 13–15 we measure the size of Q e —if |Q e |≥3(n − 3) we can safely remove the n − 3 diagonal quartets with largest score, and thus the matrices to which they belong, since we are guaranteed to find (n − 3) minimum diagonals in the remaining 2(n − 3) matrices. The procedure of selecting the n − 3 largest members of a set of size 3(n − 3) is described in chapter 10, and this can be done in linear time. Thus in lines 4–17 we spend only O(n 3 )time. After initialising the matrices it is time to search through them. We iterate through the edges in line 18, and for each edge we initialize a list of diagonal quartets S e to the empty set (line 19). We need the n − 3 smallest minimum diagonals in order to identify the n − 3 minimum quartets. So we iterate in line 20 until we have enough minimum diagonals, in linear time since we would at most need to look at 2(n − 3) minimum diagonal quartets. We find minimum diagonals by successively looking at the diagonal quartets with minimum score in Q e . The size of Q e is bounded by O(n), since |Q e | is at most 3(n − 3) after initialization, and when we traverse our matrices we remove one entry and add two at most 2(n − 3) times — by that time we are sure to have found the n − 3 minimum diagonals. So |Q e | hasatmost5(n − 3) elements. Thus, we can find and remove the minimum element from Q e in linear time in line 21. If ab i ||cd j is a minimum diagonal, we add it to S e (lines 22–24). After considering ab i ||cd j , we use our matrix traversal/ entry readying scheme to add its appropriate neighbors in lines 25 and 26. Since we do not have the imaginary matrices, we instead have to search the two subtrees of T induced by e, forthe choices of b and d that yields the matrix neighbours. Such a search can be done in linear time. All in all we iterate a linear number of times, performing linear tasks, so this of the algorithm takes only O(n 2 )time. In line 29 we initialise a list of splits to the empty set. Going through the edges of T and summing up minimum diagonals takes time O(n 2 ) in lines 30–34. The pruning part of the refined <strong>Buneman</strong> tree algorithm is extremely complicated, and it is very hard to both capture the intuition behind the algorithm, and still keep track of all details. In the description of the algorithm, the author has tried to capture modules that could be splits off and described seperately, in order to reduce the complexity. This was possible for the first part, but the author was not so successful in the second part. The point of the description was to give intuition rather than convey all details, and therefore it is possible that some small errors and oversights have crept in — particularly in the pruning part. For a full and concise account of the algorithm, turn to [BFÖ+ 03]. 41
Chapter 6 TheTreeDataStructure This chapter discusses a tree datastructure designed to maintain a compatible set of splits. In part 1 of the algorithm described in [BFÖ+ 03], we need to insert and remove splits from the tree and search for incompatible splits. These operations all need to perform in time O(|X|). Part 2 of the algorithm requires traversing paths in the tree and searching subtrees, also in linear time. 6.1 Design of the tree data datastructure As we saw in a previous section, a compatible set of splits is atree—not necessarily a regular, leaf-labeled tree, but at least a semi-labeled tree. We also saw there was a close connection between the evolutionary trees we want to represent, and phylogenetic trees which can be represented by leaf-labeled trees. Thus, the design of the tree data structure used in this work will be a leaf-labeled tree that uses special markers to indicate edges that are not “real” edges, when needed. Since the refined <strong>Buneman</strong> tree algorithm deals with overapproximations of evolutionary trees, we shall simply disregard the controversy altogether; all complexity bounds will hold even if we do. The operations we need to support with the tree data structure are not affected by the presence of extra trivial splits; the search for incompatible splits will of course never return a trivial split, since they are compatible with any split, and in the insert and delete operations we can just ignore or work around the cases where trivial splits are involved. Even though we say that our tree is unrooted, we still need some sort of root as a starting point for our tree traversals. One way of creating such an artificial root would be to select an inner node and use this as a starting point for every tree operation. However, as the topology of the tree changes as we insert and remove splits from the compatible set of splits that the tree represents, a fixed root node might be removed at any time. Instead, we shall choose a random node in the tree as starting point each time we start a new operation, to ensure we have a valid starting point. This of course means we have to have a design 42
Page 1 and 2: Refined Buneman Trees Lasse Westh-N
Page 3 and 4: This thesis is dedicated to my fami
Page 5 and 6: Contents 1 Introduction 7 1.1 Docum
Page 7 and 8: 13.3 Correctness of the reference i
Page 9 and 10: The theory of evolution has also be
Page 11 and 12: Chapter 2 Definitions This chapter
Page 13 and 14: A C B Figure 2.1: An evolutionary t
Page 15 and 16: 2.4 Quartets To every set of four s
Page 17 and 18: 2.6 Splits The partition of a finit
Page 19 and 20: evolutionary tree gives an invaluab
Page 21 and 22: time in the algorithm. In [BFÖ+ 03
Page 23 and 24: Figure 3.1: A tree of life. 22
Page 25 and 26: knowing its origins, but how does h
Page 27 and 28: anging from huge time complexity to
Page 29 and 30: Algorithm 2 The Neighbor-Joining al
Page 31 and 32: A, C, T 1 2 3 T A, C, T T C, T A, T
Page 33 and 34: Part II Implementing Refined Bunema
Page 35 and 36: Algorithm 3 Overapproximating the r
Page 37 and 38: the pseudo-code for the algorithm i
Page 39 and 40: AE DE D B e E BC A C BE AC Figure 5
Page 41: construction, but we still have to
Page 45 and 46: incidentedge Figure 6.2: The world
Page 47 and 48: interface EdgeIterator { boolean ha
Page 49 and 50: Figure 6.7: An example a node which
Page 51 and 52: So how do we find σ ′ We start
Page 53 and 54: Algorithm 5. Offhand, the algorithm
Page 55 and 56: Algorithm 6 The algorithm that calc
Page 57 and 58: a b c d root ab cd Figure 8.2: Upda
Page 59 and 60: 6000 Quad Tree performance characte
Page 61 and 62: 30000 Quad Tree performance charact
Page 63 and 64: sets A, B, C and D by scanning the
Page 65 and 66: Chapter 10 The Selection Algorithm
Page 67 and 68: is O(n 2 ). The algorithm uses a di
Page 69 and 70: Chapter 11 JSplits Figure 11.1: A s
Page 71 and 72: implementing an algorithm with a hi
Page 73 and 74: Chapter 12 Source Code The source c
Page 75 and 76: Chapter 13 The Reference Implementa
Page 77 and 78: • the splits that are generated.
Page 79 and 80: Chapter 14 Correctness This chapter
Page 81 and 82: The best possible way of testing wo
Page 83 and 84: 100000 90000 Performance of the ref
Page 85 and 86: of the size of the heap during the
Page 87 and 88: 140000 Space complexity best fit: x
Page 89 and 90: Chapter 16 Comparing Evolutionary T
Page 91 and 92: Figure 16.1: The size of the set B(
Page 93 and 94:
Figure 16.2: The total number of sp
Page 95 and 96:
that it over-induces splits, and th
Page 97 and 98:
efined Buneman therefore suffers a
Page 99 and 100:
Speedups might be achieved using op
Page 101 and 102:
Appendix A Correctness of the Refer
Page 103 and 104:
Quartet: 0 0 | 1 4 -0.1263501163396
Page 105 and 106:
Appendix B Garbage Collector Log 0.
Page 107 and 108:
Bibliography [AJL + 02] Bruce Alber
Page 109:
[Kim80] M. Kimura. A simple model f
show all

Refined Buneman Trees

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?