Refined Buneman Trees

More documents

Recommendations

Info

2.5 Diagonal quartets Looking at the definition of a quartet and its associated <strong>Buneman</strong> score, we might observe that the quartet q = ab|cd canbeviewedastwodiagonal quartets, denoted ab||cd and ab||dc. This intuition is not simple, but we can make the following rewrite to illustrate this point: β q = 1 2 (min{δ ac + δ bd ,δ ad + δ bc }−(δ ab + δ cd )) (2.2) { 1 = min 2 (−δ } ab + δ bc − δ cd + δ da ) 1 2 (−δ (2.3) ab + δ bd − δ dc + δ ca ) = min{η ab||cd ,η ab||dc } (2.4) where η ab||cd = 1 2 (δ bc − δ ab + δ ad − δ cd ) is the score of the diagonal quartet ab||cd. Looking at Figure 2.5 we can see that the terms η ab||cd and η ab||dc correspond to a “tour” in one of the two diagonal quartets. In words, we go through either a → b → c → d → a or a → b → d → c → a, adding or subtracting chords as appropriate. We might also notice that diagonal quartets are symmetric on either side of their central edge, i.e. starting out with the diagonal quartet ab||cd we can swap a ↔ b and c ↔ d or ab ↔ cd to obtain ba||dc or cd||ab, where η ab||cd = η ba||dc = η cd||ab and all three diagonal quartets still identify the same quartet ab|cd. a c b ab|cd d a c a c b ab||cd d b ab||dc d Figure 2.5: The symmetric quartet and its associated asymmetric diagonal quartets. To keep an ordering on diagonal quartets we shall define the minimum diagonal of ab|cd: Leta be the smallest (by index in X) of the species a, b, c, d ∈ X. ab||cd is the minimum diagonal if η ab||cd
2.6 Splits The partition of a finite set S into two non-empty parts U and V is denoted a split σ = U|V .If|U| =1or|V | = 1 the split is called trivial. It is reasonable to represent a split as a bitvector or binary number, and for convention we shall say that for a bitvector A representing σ, x i ∈ U if and only if A[i] = 0. Splits are symmetric, so if w represents the split σ, then¬w also represents σ. Theset of splits on a set X is denoted σ(X). The size of σ(X) is the number of unique splits on X, so|σ(X)| = |P(X)|−2 2 = 2n −2 2 =2 n−1 − 1. We exclude symmetric splits and deduct the two splits where U = ∅ or V = ∅. The set of quartets associated with a split σ = U|V on a set X is defined by q(σ) ={uu ′ |vv ′ : u, u ′ ∈ U ∧ v, v ′ ∈ V }. Hereu, u ′ (and similarly v, v ′ ) need not be distinct. The size of q(U|V ) is in the order of O(|X| 4 ) — recall that an edge in a tree induces O(|X| 4 ) quartets, and splits are equivalent to edges in this case. Definition 3 (Compatibility). Two splits A|B and C|D are said to be compatible if and only if one of A ∩ C, A ∩ D, B ∩ C or B ∩ D is empty. Compatible sets of splits are the foundation for the algorithm presented in this thesis, and they are a perfect tool when dealing with evolutionary trees. A set of splits is compatible if and only if all splits in the set are pairwise compatible. And of course, any subset of a compatible set of splits is again compatible. There is a close connection between compatible sets of splits and evolutionary trees. Any edge e in an unrooted tree T splits the set of leaves of T into two non-empty parts. Let Σ(T ) denote the set of splits associated with the edges of atreeT . Then Theorem 1 (from [SS03]) gives the relation between compatible sets of splits and evolutionary trees. Theorem 1 (Splits-Equivalence Theorem). Let Σ be a collection of splits on X. Then, there is an evolutionary tree T such that Σ=Σ(T ) if and only if Σ is a compatible set of splits. If T exists it is unique up to isomorphisms. From now on we shall use the terms compatible set of splits/ evolutionary tree and split/ edge interchangeably. They are one and the same: Table 2.1 shows a compatible set of (weighted) splits, and Figure 2.6 shows the equivalent evolutionary tree. Recall the discussion of evolutionary trees versus phylogenetic trees; when working with a method such as the refined <strong>Buneman</strong> tree algorithm which outputs compatible sets of splits which might or might now correspond to a fully resolved tree, it is important to be able to such a tree in a precise manner. When dealing with e.g Neighbor-Joining, we can rely on the more regular phylogenetic trees since the NJ method always resolves trees completely. Lemma 1 is due to Dan Gusfield ([Gus91], section 1.2) and gives an important upper bound for the time required to go from compatible sets of splits to phylogenetic trees. Lemma 1. An unrooted tree with n leaves can be constructed from its set of non-trivial splits in time O(kn), wherek is the number of non-trivial splits. 16
Page 1 and 2: Refined Buneman Trees Lasse Westh-N
Page 3 and 4: This thesis is dedicated to my fami
Page 5 and 6: Contents 1 Introduction 7 1.1 Docum
Page 7 and 8: 13.3 Correctness of the reference i
Page 9 and 10: The theory of evolution has also be
Page 11 and 12: Chapter 2 Definitions This chapter
Page 13 and 14: A C B Figure 2.1: An evolutionary t
Page 15: 2.4 Quartets To every set of four s
Page 19 and 20: evolutionary tree gives an invaluab
Page 21 and 22: time in the algorithm. In [BFÖ+ 03
Page 23 and 24: Figure 3.1: A tree of life. 22
Page 25 and 26: knowing its origins, but how does h
Page 27 and 28: anging from huge time complexity to
Page 29 and 30: Algorithm 2 The Neighbor-Joining al
Page 31 and 32: A, C, T 1 2 3 T A, C, T T C, T A, T
Page 33 and 34: Part II Implementing Refined Bunema
Page 35 and 36: Algorithm 3 Overapproximating the r
Page 37 and 38: the pseudo-code for the algorithm i
Page 39 and 40: AE DE D B e E BC A C BE AC Figure 5
Page 41 and 42: construction, but we still have to
Page 43 and 44: Chapter 6 TheTreeDataStructure This
Page 45 and 46: incidentedge Figure 6.2: The world
Page 47 and 48: interface EdgeIterator { boolean ha
Page 49 and 50: Figure 6.7: An example a node which
Page 51 and 52: So how do we find σ ′ We start
Page 53 and 54: Algorithm 5. Offhand, the algorithm
Page 55 and 56: Algorithm 6 The algorithm that calc
Page 57 and 58: a b c d root ab cd Figure 8.2: Upda
Page 59 and 60: 6000 Quad Tree performance characte
Page 61 and 62: 30000 Quad Tree performance charact
Page 63 and 64: sets A, B, C and D by scanning the
Page 65 and 66: Chapter 10 The Selection Algorithm
Page 67 and 68:
is O(n 2 ). The algorithm uses a di
Page 69 and 70:
Chapter 11 JSplits Figure 11.1: A s
Page 71 and 72:
implementing an algorithm with a hi
Page 73 and 74:
Chapter 12 Source Code The source c
Page 75 and 76:
Chapter 13 The Reference Implementa
Page 77 and 78:
• the splits that are generated.
Page 79 and 80:
Chapter 14 Correctness This chapter
Page 81 and 82:
The best possible way of testing wo
Page 83 and 84:
100000 90000 Performance of the ref
Page 85 and 86:
of the size of the heap during the
Page 87 and 88:
140000 Space complexity best fit: x
Page 89 and 90:
Chapter 16 Comparing Evolutionary T
Page 91 and 92:
Figure 16.1: The size of the set B(
Page 93 and 94:
Figure 16.2: The total number of sp
Page 95 and 96:
that it over-induces splits, and th
Page 97 and 98:
efined Buneman therefore suffers a
Page 99 and 100:
Speedups might be achieved using op
Page 101 and 102:
Appendix A Correctness of the Refer
Page 103 and 104:
Quartet: 0 0 | 1 4 -0.1263501163396
Page 105 and 106:
Appendix B Garbage Collector Log 0.
Page 107 and 108:
Bibliography [AJL + 02] Bruce Alber
Page 109:
[Kim80] M. Kimura. A simple model f
show all

Refined Buneman Trees

Create successful ePaper yourself

Delete template?

Save as template?