13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

7.1. THE ALGORITHM 39CcabADFigure 7.2: Example of a claim. The directed edge is a unique identifierof <strong>the</strong> directed <strong>quartet</strong> ab → cd.dedges, count <strong>the</strong> <strong>quartet</strong> as a shared butterfly if <strong>the</strong> two claims give rise to <strong>the</strong> same topology,and o<strong>the</strong>rwise count <strong>the</strong> <strong>quartet</strong> as a different butterfly. Making heavy use of preprocessing,one such pair of edges can be handled in constant time, see Sec. 7.1.4. Since|E| = O(n) and <strong>the</strong>re are O(n 2 ) pairs of edges, <strong>the</strong> process of counting <strong>the</strong> butterflies canbe done in O(n 2 ) time. Thus, <strong>the</strong> preprocessing step is crucial to <strong>the</strong> running time.7.1.1 Basic preprocessingHere I will describe only a fundamental part of <strong>the</strong> preprocessing that is necessary tounderstand <strong>the</strong> intuition behind <strong>the</strong> algorithm and how to count shared and differentbutterflies. More preprocessing is needed to make it possible to process a pair of directededges in constant time as mentioned. Unfortunately that part of <strong>the</strong> preprocessing posesa threat to <strong>the</strong> sub-cubic complexity of <strong>the</strong> entire algorithm and has direct influence on<strong>the</strong> running time and must be handled with care. For now it is merely confusing and Iwill postpone <strong>the</strong> introduction of this until <strong>the</strong> appropriate section.As we shall see, it comes in handy that <strong>the</strong> notion of claims and <strong>the</strong> concept of sharedleaf set sizes both deal with sub<strong>trees</strong>. The first preprocessing step is to calculate <strong>the</strong>shared leaf set sizes, as explained in Sec. 5.2, which has quadratic time and space consumption.The next step is to calculate, for each pair of internal nodes v ∈ T and v ′ ∈ T ′ , withsub<strong>trees</strong> F 1 ,...,F dv and G 1 ,...,G d′v, a matrix I where I [i , j ] = |F i ∩G j |. When processingpairs of edges as mentioned above, we will need this matrix, I , associated with <strong>the</strong> twonodes that <strong>the</strong> edges point to.This is enough information about <strong>the</strong> preprocessing step to complete <strong>the</strong> intuitiveexplanation for counting butterflies in Sec. 7.1.2 and 7.1.3.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!