computing the quartet distance between general trees
computing the quartet distance between general trees
computing the quartet distance between general trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2 CHAPTER 1. INTRODUCTIONhas is called <strong>the</strong> degree of <strong>the</strong> node. An unrooted tree does not contain internal nodes ofdegree two. A tree where internal nodes are allowed to be polytomies, that is, <strong>the</strong>y canhave any degree equal to or greater than three, is called a <strong>general</strong> tree. General <strong>trees</strong> areoften used to represent partly resolved relationships where <strong>the</strong> complete topology is notknown and each species <strong>the</strong>refore cannot be represented by a distinct node. Sometimesbranches are assigned a length which adds <strong>the</strong> notion of time to <strong>the</strong> evolution.The true evolutionary relation among a set of EUs is rarely known. Multiple methodsfor determining <strong>the</strong> exact relationship, from biological data, are available. They do notnecessarily agree and might induce different <strong>trees</strong>. Some methods for inferring relationshipswill result in a large range of plausible tree reconstructions. Fur<strong>the</strong>rmore, multipledata sets, e.g. DNA sequences, describing a single species are often at hand. Thus, onemethod may yield a different solution for each data set used. Figure 1.1 is an illustrationof two alternative relationships inferred for <strong>the</strong> Pan<strong>the</strong>ra (big cats).Pan<strong>the</strong>raClouded LeopardJaguarLeopardLionSnow LeopardTigerPan<strong>the</strong>raClouded LeopardTigerJaguarSnow LeopardLeopardLionFigure 1.1: Two alternative views of <strong>the</strong> relationship <strong>between</strong> <strong>the</strong> Pan<strong>the</strong>ra (bigcats). Note that one is a binary tree whereas <strong>the</strong> o<strong>the</strong>r includes a polytomy andis thus a <strong>general</strong> tree. Example from Davis et al. [9].This disagreement <strong>between</strong> <strong>trees</strong> introduces <strong>the</strong> need of some means of assessing<strong>trees</strong>. One approach is to make a pairwise comparison of <strong>trees</strong> in an attempt of quantifying<strong>the</strong> differences or similarities.1.2 Measuring difference or similarityVarious methods for tree comparison have been defined and each measure has certainproperties and takes certain aspects of <strong>the</strong> <strong>trees</strong> into consideration. Some can only handlefully resolved <strong>trees</strong> while o<strong>the</strong>rs are able to take branch lengths into account. Somemetrics consider topological properties only. An example of <strong>the</strong> latter is <strong>the</strong> nearest–neighbor interchange metric, proposed by Waterman and Smith [20], defined as <strong>the</strong>fewest number of nearest–neighbor interchanges required to convert one tree into ano<strong>the</strong>r.The metric only works for binary <strong>trees</strong> and <strong>the</strong> problem of <strong>computing</strong> it has beenshown to be NP-complete (seeDasGupta et al. [8]). In this <strong>the</strong>sis focus will be on <strong>general</strong><strong>trees</strong>. Here, an example is <strong>the</strong> Robinson–Foulds <strong>distance</strong> metric, proposed by Robinsonand Foulds [15], and also known as <strong>the</strong> symmetric difference metric. It is defined as <strong>the</strong>