13.07.2015 Views

computing the quartet distance between general trees

computing the quartet distance between general trees

computing the quartet distance between general trees

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

18 CHAPTER 3. EXPERIMENTAL APPROACHbe chosen as root, so that is what one can expect when parsing such data. This is noproblem and in any case, some entry point is needed to access <strong>the</strong> tree.When initiating <strong>the</strong> process of implementing <strong>the</strong> first algorithm in Python, I did nothave deep and substantial knowledge about <strong>the</strong> requirements each algorithm would askof <strong>the</strong> data structure used to store and represent <strong>the</strong> tree. Consequently, I found myselfusing <strong>the</strong> simple top-down data structure provided by <strong>the</strong> Python Newick parserdescribed. This worked out all right in my first attempt on implementing <strong>the</strong> quarticalgorithm, see Sec. 4.1. However, when attacking <strong>the</strong> o<strong>the</strong>r algorithms, that sometimestake an edge-based approach (remember that sub<strong>trees</strong> are identified by directed edges),it turned out to be insufficient, inexpressible and confusing.Therefore, enriching <strong>the</strong> data structure with some constructs that correspond to <strong>the</strong>ideas used in <strong>the</strong> algorithms, seemed obvious. As long as <strong>the</strong> modifications could bedone in linear time by simple traversal of <strong>the</strong> tree, <strong>the</strong> overhead would easily fit into <strong>the</strong>time bounds of <strong>the</strong> algorithms considered in this <strong>the</strong>sis.One improvement was to decorate <strong>the</strong> tree with more directed edges. A recursivedescent parser built using <strong>the</strong> Toy Parser Generator framework is tail recursive and doesnot leave <strong>the</strong> opportunity to send back information through <strong>the</strong> recursion. The result isthat <strong>trees</strong> parsed with a TPG parser do not have back-edges, i.e. edges pointing towards<strong>the</strong> root. Subsequently adding back-edges, made it easy to traverse <strong>the</strong> tree from anychoice of start node and in any direction. Also, after realising that <strong>the</strong> algorithms mostoften deal with directed edges, I let each edge know its opposite and thus two directededges toge<strong>the</strong>r correspond to one actual undirected edge. The algorithms naturally requireunique identification of leaves and use <strong>the</strong> assumption that internal nodes andedges can be identified uniquely as well. Therefore, ids were also added in a subsequenttraversal of <strong>the</strong> tree. Because I implemented <strong>the</strong> C++ parser and data structure with thisin mind, I made <strong>the</strong> C++ parser a bit more advanced, letting it add <strong>the</strong> necessary edgeswhile parsing, keep track of <strong>the</strong> ids of nodes and edges and collect nodes and edges inlists for easy access. Of course this required that procedures return information about<strong>the</strong> part of <strong>the</strong> tree just parsed (<strong>the</strong> subtree below).The few dissimilarities in <strong>the</strong> two data structures used are not playing a significantrole in <strong>the</strong> implementations. There is a slight difference in <strong>the</strong> way a tree is traversed,since an internal node in <strong>the</strong> Python data structure has a parent and a number of sub<strong>trees</strong>,whereas <strong>the</strong> C++ data structure simply has a number of sub<strong>trees</strong> related. Fur<strong>the</strong>rmore,<strong>the</strong> fact that <strong>the</strong> two implementations do <strong>the</strong> job in different orders result in <strong>the</strong>edges having different ids in <strong>the</strong> two languages which will change <strong>the</strong> order of processingfor edges. This should not have any impact on <strong>the</strong> results of <strong>the</strong> algorithms however.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!