computing the quartet distance between general trees

More documents

Recommendations

Info

44 CHAPTER 7. SUB-CUBIC TIME ALGORITHM7.1.4 How to count butterflies in constant timeSec. 7.1.1, 7.1.2 and 7.1.3 gave an intuitive explanation of the algorithm. This sectionwill clarify that the number of butterflies, shared and different, associated with a pair ofdirected edges, can in fact be calculated in constant time, given the right preprocessinginformation. All preprocessing arrays and tables are listed in Appendix A. These will bereferenced here and the most important table, calculation of which is posing a threat tothe overall complexity of the algorithm, will be explained in detail below.The article explains how the expression in Eq. (7.3), used to calculate the number ofshared directed butterflies associated with a pair of internal nodes, can be translated intothe following constant time computable expression:( )1 I [i , j ] (M ′ − R ′ [i ] −C ′ [j ] + I ′ [i , j ]+2 2(I [i , j ] − R[i ] −C [j ])(M − R[i ] −C [j ] + I [i , j ])+R ′′ [i ] − I [i , j ](C [j ] − I [i , j ])+C ′′ [j ] − I [i , j ](R[i ] − I [i , j ]) ) (7.8)And furthermore how the expression in Eq. (7.7), used to calculate the number ofdifferent directed butterflies associated with a pair of internal nodes, is translated intoone of the succeeding two expressions.I [i , j ]((M − R[i ] −C [j ] + I [i , j ])(R[i ] − I [i , j ])(C [j ] − I [i , j ])+(R[i ] − I [i , j ])(I [i , j ](R[i ] − I [i , j ]) −C ′′ [j ])+(C [j ] − I [i , j ])(I [i , j ](C [j ] − I [i , j ]) − R ′′ [i ])+I 1 ′′′′′[i , j ] − I [i , j ]I 1 [i ,i ] − I [i , j ](C ′′′ [j ] − I [i , j ] 2 ) ) (7.9)I [i , j ]((M − R[i ] −C [j ] + I [i , j ])(R[i ] − I [i , j ])(C [j ] − I [i , j ])+(R[i ] − I [i , j ])(I [i , j ](R[i ] − I [i , j ]) −C ′′ [j ])+(C [j ] − I [i , j ])(I [i , j ](C [j ] − I [i , j ]) − R ′′ [i ])+I 2 ′′′′′[i , j ] − I [i , j ]I 2 [j, j ] − I [i , j ](R′′′ [i ] − I [i , j ] 2 ) ) (7.10)This requires some explanation. The respective translations into these constant timecomputable expressions are very cumbersome and not essential to this thesis and I willmerely refer to the appendix of the original article by Mailund et al. [14] for details onthis process.
7.1. THE ALGORITHM 45More interesting are the steps needed to prepare each of the tables used for look-up– because the actual calculation of these is part of the algorithm and a thorough understandingis therefore essential. All tables are prepared during the preprocessing of eachpair of internal nodes, along with the table I , presented in Sec. 7.1.1. They are all resultsof further processing of I and necessary to make the constant time calculation. With theexception of the tables I1 ′′′ ′′′and I2, none of the tables are time-consuming to deal withand no worse than O(d v d v ′) which, for all pairs of inner nodes, leads to a total time of∑v∈T∑v ′ ∈T ′ d v d v′ = ( ∑ v∈T d v )( ∑ v ′ ∈T ′ d v ′) ≤ (2|E|)(2|E|) = O(n2 ).These two exceptions, I1 ′′′ ′′′and I2, are actually identical and only differ in the waythey are calculated. With one term, they shall simply be known as I ′′′ and the calculationof this table will be explained thoroughly in the following Sec. 7.1.4.1.7.1.4.1 The calculation of I ′′′The table named I ′′′ plays a part in the calculation of the number of different butterfliesand in order to keep the entire running time of the algorithm sub-cubic, special care isneeded in the calculation of the table. It is defined as follows:I ′′′ [i , j ] =∑d vd v ′∑k=1,k≠i l=1,l≠jI [i ,l]I [k, j ]I [k,l] (7.11)Filling the table naively, in accordance with the formula, takes time O(n 4 ) and thesub-cubic time bound promised will be broken. Hence, the calculation constitutes aserious barrier and demands separate attention. The solution is instead to calculate eitherI ′′′1 = (I I T )I or otherwise I ′′′2 = I (I T I ) which are both described in more detail in theappendix. At first sight, this does not seem to solve the problem, since the solution is relyingon matrix multiplication. The complexity of matrix multiplication, if done naively,is O(n 3 ). As explained in the article [14] choosing either the first or second solution willresult in an explicit running time of either O(d 2 v d v ′) or O(d v d 2 v ′ ) for processing a pair ofinternal nodes (v, v ′ ) with degrees d v and d v′ respectively. However, other methods forcalculating the matrix product may be utilized and this is essential to the algorithm. Applyinga matrix multiplication method with a time complexity of O(n ω ) on square matrices,one can make the calculation in time O(max(d v ,d v ′) ω ). This value might be smallerfor some matrices that are nearly square, but the approach requires that the matrices arepadded with zeroes to become square – i.e. extending the matrix to fit the requirementsof the matrix multiplication method used.It is difficult to predict the impact of the matrix multiplication on the entire algorithm;since it is on an internal node basis the complete running time is not identical tothe one of the multiplication method. In the article [14], Section 4 gives a thorough case
Page 1: Master’s thesisCOMPUTING THE QUAR
Page 5: AcknowledgementsFirst of all I woul
Page 8 and 9: viiiCONTENTS5.3 Implementing leaf s
Page 10 and 11: 2 CHAPTER 1. INTRODUCTIONhas is cal
Page 12 and 13: 4 CHAPTER 1. INTRODUCTIONIt is evid
Page 14 and 15: 6 CHAPTER 1. INTRODUCTIONsub-cubic
Page 17 and 18: Chapter 2PrerequisitesFirst, this c
Page 19: 2.2. CHOICE OF LANGUAGE AND TEST EN
Page 22 and 23: 14 CHAPTER 3. EXPERIMENTAL APPROACH
Page 28 and 29: 20 CHAPTER 4. QUARTIC TIME ALGORITH
Page 30 and 31: 22 CHAPTER 4. QUARTIC TIME ALGORITH
Page 33 and 34: Chapter 5Calculating leaf set sizes
Page 35 and 36: 5.3. IMPLEMENTING LEAF SET ALGORITH
Page 37 and 38: 5.3. IMPLEMENTING LEAF SET ALGORITH
Page 39 and 40: Chapter 6Cubic time algorithmHere I
Page 41 and 42: 6.1. IMPLEMENTATION 33that we can l
Page 43 and 44: 6.1. IMPLEMENTATION 35carried out o
Page 45 and 46: Chapter 7Sub-cubic time algorithmIn
Page 47 and 48: 7.1. THE ALGORITHM 39CcabADFigure 7
Page 49 and 50: 7.1. THE ALGORITHM 41reflecting Eq.
Page 51: 7.1. THE ALGORITHM 43choice of indi
Page 55 and 56: 7.2. IMPLEMENTATION 477.2 Implement
Page 57 and 58: 7.2. IMPLEMENTATION 49tween the num
Page 59 and 60: 7.2. IMPLEMENTATION 51oretic approa
Page 61 and 62: 7.2. IMPLEMENTATION 53well. My impl
Page 63 and 64: 7.2. IMPLEMENTATION 55Performance o
Page 65: 7.2. IMPLEMENTATION 57ble sort, wil
Page 68 and 69: 60 CHAPTER 8. RESULTS AND DISCUSSIO
Page 71 and 72: Chapter 9ConclusionThe focus of thi
Page 73 and 74: Bibliography[1] Mukul S. Bansal, Ji
Page 75: BIBLIOGRAPHY 67[20] M.S. Waterman a
Page 78 and 79: 70 APPENDIX A. PREPROCESSING FOR TH
Page 81 and 82: Appendix CReal-life application of
Page 83: 75t29t25t54t20t27 t13 t1t3t17t24t41

computing the quartet distance between general trees

Create successful ePaper yourself

Delete template?

Save as template?