13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

48 CHAPTER 7. SUB-CUBIC TIME ALGORITHMis optimal, this is a key point to success or failure since <strong>the</strong> matrix multiplication poses<strong>the</strong> worst threat to <strong>the</strong> sub-cubic time bound promised. Solving <strong>the</strong> task involves decidingwhich library to use for matrix multiplication and to study <strong>the</strong> behaviour of this, as tofind out how to make <strong>the</strong> crucial comparison of <strong>the</strong> three quantities max(d v ,d v ′) ω , dv 2d v ′and d v d 2 .v ′After implementing those two algorithms only one thing remains, namely to apply<strong>the</strong> algorithms on <strong>the</strong> two <strong>trees</strong> and put <strong>the</strong> counts toge<strong>the</strong>r according to <strong>the</strong> expressionshown in Eq. (7.1).7.2.1 PrototypeStarting out softly, <strong>the</strong> first goal is to make a prototype implementation only focusingon <strong>the</strong> correct result. Consequently, I will make a simple implementation of <strong>the</strong> choiceand solely base it on which of d 2 v d v ′ and d v d 2 v ′ is smaller. That is easy to determine, andalong with <strong>the</strong> choice comes <strong>the</strong> calculation of ei<strong>the</strong>r C ′′′ and I ′′′1 = (I I T )I , or R ′′′ andI ′′′2 = I (I T I ), respectively. I will use a basic, naive implementation of matrix multiplication.For <strong>the</strong> Python implementation I will utilize <strong>the</strong> NumPy library for scientific <strong>computing</strong>2 and for <strong>the</strong> C++ implementation I will utilize <strong>the</strong> Boost.uBLAS Library 3 . Theyboth provide matrix data structures and routines for matrix multiplication. Then it is allabout looping through pairs of edges, applying Eq. (7.9) or Eq. (7.10), and summing up<strong>the</strong> results. Before returning, <strong>the</strong> result is divided by four because directed edges give riseto four different situations (see Contribution 7.2).ExpectationsSo, what would one expect from <strong>the</strong> prototype implementations whensubjected to <strong>the</strong> usual range of <strong>trees</strong>? Using naive matrix multiplication, and hence <strong>the</strong>value of α = 1, will make <strong>the</strong> whole algorithm O(n 3 ). Thus, we can expect to see cubicworst-case behavior. However, since <strong>the</strong> analysis is simply worst-case, we can still,as mentioned earlier, hope that it is not a tight upper bound and that <strong>the</strong> algorithm isefficient in practice.With regards to <strong>the</strong> difference in performance <strong>between</strong> <strong>the</strong> <strong>trees</strong> used, I will, onceagain, base my expectations on <strong>the</strong> actual code written. From my point of view <strong>the</strong>re isone reasonable way to look at this. The algorithm is processing pairs of internal nodes,and for each of <strong>the</strong>se pairs <strong>the</strong>re is some preprocessing and some calculations to do,both of which are dependent on <strong>the</strong> degree of <strong>the</strong> two internal nodes. The preprocessingrelies on matrix multiplication, meaning that larger degrees will be harder to deal with.Depending on <strong>the</strong> overhead due to preprocessing and calculations <strong>the</strong> relationship be-2 SciPy/NumPy: http://numpy.scipy.org/3 Boost.uBLAS: http://www.boost.org/doc/libs/1_42_0/libs/numeric/ublas/doc/index.htm

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!