Refined Buneman Trees

More documents

Recommendations

Info

and David Bryant ([BB99], section 3) states the complexity of computing the anchored Buneman tree. Lemma 2. Let δ be a distance measure on X. ThenB(δ) = ⋂ B x (δ) Lemma 3. B x (δ) can be computed in time and space O(n 2 ). 2.9 Refined Buneman tree Given a split σ for a set of size n, letm = |q(σ)| and let q 1 , ··· ,q m be an ordering of the elements in q(σ) in non-decreasing order of their Buneman scores. Then the refined Buneman index of a split σ is defined as: μ σ (δ) = 1 n−3 ∑ β qi (2.6) n − 3 i=1 In other words, the refined Buneman index of a split is the average over the n − 3 least scoring quartets. The choice of n − 3 is accredited to divine intervention — apparently, that was the choice that would make the proof in [MS99] work. The set of splits RB(δ) ={σ : μ σ (δ) > 0} is a compatible set of splits ([MS99]). And thus the Refined Buneman Tree corresponding to a given dissimilarity measure δ is defined to be the weighted unrooted tree whose edges represent the splits σ ∈ RB(δ) and are weighted according to μ σ (δ). When constructing the refined Buneman tree as described later in this work, we shall rely heavily on Lemma 4. The lemma is due to [MS99], and it is used to maintain compatibility in a set of splits overapproximating the set of refined Buneman splits. Lemma 4. Given two incompatible splits σ and σ ′ , μ σ (δ) ≤ 0 ∨ μ σ ′(δ) ≤ 0 and this can be computed in time O(n). In the algorithm that computes the refined Buneman tree, we are going to construct a compatible set of splits which is an overapproximation of the set of refined Buneman splits. We shall do this for subsets of X of increasing size. The way to do this is, after bootstrapping some compatible set of splits, we shall introduce a set of candidates to go into the overapproximation. However, to maintain compatibility in the set we shall test the candidate splits against the existing splits, to find pairs that are incompatible. Once we find a pair of incompatible splits, we shall use Lemma 4 to determine which one does not belong in the set. We are not concerned with testing if one of the splits in the pair actually belongs in the set of refined Buneman splits. We are only interested in throwing away candidates that are clearly incompatible. We can allow ourselves this luxury since we are only creating an overapproximation of RB(δ| Xk ), not the set itself, and this saves valueable x∈X 19
time in the algorithm. In [BFÖ+ 03], an algorithm is given which solves the problem in Lemma 4 in linear time. It this text, this algorithm is called the DISCARD-RIGHT algorithm. The second important lemma regarding refined Buneman trees is the foundation for the incremental algorithm presented in the article by Brodal et al. ([BFÖ+ 03]), and which is presented later in this work. It is due to Bryant and Moulton ([BM99], proposition 3). It says that a split σ ∈ RB(δ| Xk )iseither amemberofB xk (δ| Xk )orRB(δ| Xk−1 ). If we turn it around we can say that given the refined Buneman tree for X k , we can calculate the refined Buneman tree for X k+1 by looking only at splits in B xk+1 (δ| Xk+1 )andRB(δ| Xk )(with the discussion from the previous paragraph in mind, this would be “bootstrap set” and “candidate set”, respectively). Lemma 5. Suppose |X| > 4, and fix x ∈ X. Ifσ = U|V is a split in RB(δ) with x ∈ U, and|U| > 2, then either U|V ∈ B x (δ) or U −{x}|V ∈ RB(δ | X−{x} ), or both. 20
Page 1 and 2: Refined Buneman Trees Lasse Westh-N
Page 3 and 4: This thesis is dedicated to my fami
Page 5 and 6: Contents 1 Introduction 7 1.1 Docum
Page 7 and 8: 13.3 Correctness of the reference i
Page 9 and 10: The theory of evolution has also be
Page 11 and 12: Chapter 2 Definitions This chapter
Page 13 and 14: A C B Figure 2.1: An evolutionary t
Page 15 and 16: 2.4 Quartets To every set of four s
Page 17 and 18: 2.6 Splits The partition of a finit
Page 19: evolutionary tree gives an invaluab
Page 23 and 24: Figure 3.1: A tree of life. 22
Page 25 and 26: knowing its origins, but how does h
Page 27 and 28: anging from huge time complexity to
Page 29 and 30: Algorithm 2 The Neighbor-Joining al
Page 31 and 32: A, C, T 1 2 3 T A, C, T T C, T A, T
Page 33 and 34: Part II Implementing Refined Bunema
Page 35 and 36: Algorithm 3 Overapproximating the r
Page 37 and 38: the pseudo-code for the algorithm i
Page 39 and 40: AE DE D B e E BC A C BE AC Figure 5
Page 41 and 42: construction, but we still have to
Page 43 and 44: Chapter 6 TheTreeDataStructure This
Page 45 and 46: incidentedge Figure 6.2: The world
Page 47 and 48: interface EdgeIterator { boolean ha
Page 49 and 50: Figure 6.7: An example a node which
Page 51 and 52: So how do we find σ ′ We start
Page 53 and 54: Algorithm 5. Offhand, the algorithm
Page 55 and 56: Algorithm 6 The algorithm that calc
Page 57 and 58: a b c d root ab cd Figure 8.2: Upda
Page 59 and 60: 6000 Quad Tree performance characte
Page 61 and 62: 30000 Quad Tree performance charact
Page 63 and 64: sets A, B, C and D by scanning the
Page 65 and 66: Chapter 10 The Selection Algorithm
Page 67 and 68: is O(n 2 ). The algorithm uses a di
Page 69 and 70: Chapter 11 JSplits Figure 11.1: A s
Page 71 and 72:
implementing an algorithm with a hi
Page 73 and 74:
Chapter 12 Source Code The source c
Page 75 and 76:
Chapter 13 The Reference Implementa
Page 77 and 78:
• the splits that are generated.
Page 79 and 80:
Chapter 14 Correctness This chapter
Page 81 and 82:
The best possible way of testing wo
Page 83 and 84:
100000 90000 Performance of the ref
Page 85 and 86:
of the size of the heap during the
Page 87 and 88:
140000 Space complexity best fit: x
Page 89 and 90:
Chapter 16 Comparing Evolutionary T
Page 91 and 92:
Figure 16.1: The size of the set B(
Page 93 and 94:
Figure 16.2: The total number of sp
Page 95 and 96:
that it over-induces splits, and th
Page 97 and 98:
efined Buneman therefore suffers a
Page 99 and 100:
Speedups might be achieved using op
Page 101 and 102:
Appendix A Correctness of the Refer
Page 103 and 104:
Quartet: 0 0 | 1 4 -0.1263501163396
Page 105 and 106:
Appendix B Garbage Collector Log 0.
Page 107 and 108:
Bibliography [AJL + 02] Bruce Alber
Page 109:
[Kim80] M. Kimura. A simple model f
show all

Refined Buneman Trees

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?