10 CHAPTER 2. PREREQUISITESidentifies <strong>the</strong> subtree ¯F , which includes every part of <strong>the</strong> tree behind e. The quantity |F |is called <strong>the</strong> leaf set size and will be used to denote <strong>the</strong> number of leaves in <strong>the</strong> subtree F.Therefore, |F | + | ¯F | = n. This is a slight abuse of <strong>the</strong> ma<strong>the</strong>matical notation for sets andone could argue that F should merely denote <strong>the</strong> set of leaves in <strong>the</strong> subtree, however,I will stick to this notation as it is used in <strong>the</strong> background literature. Most often, <strong>the</strong>algorithms will deal with two sub<strong>trees</strong>, one from each tree, where F is a subtree of T andG is a subtree of T ′ . Since T and T ′ contain <strong>the</strong> same set of leaves, F and G might alsohave some leaves in common. This is written |F ∩G| and will be referred to as <strong>the</strong> sharedleaf set size.The concept of <strong>quartet</strong>s of four leaves a, b, c and d has been described in Sec. 1.2.1.The three butterfly topologies, illustrated in Fig. 1.2 (a)–(c), are written ab|cd, ac|bd andad|bc respectively, while <strong>the</strong> star <strong>quartet</strong> in Fig. 1.2(d) is written a b ×c d .2.2 Choice of language and test environmentsMy first choice of implementation language was <strong>the</strong> Python programming language.Python is not among <strong>the</strong> most efficient languages, and usually not used in critical algorithmicor ma<strong>the</strong>matical applications. However it is a popular language for fast prototyping,because it supports clarity and simplicity and a very short workcycle. This was agood match for me, during <strong>the</strong> early parts of <strong>the</strong> implementation process, where I gainedcomplete understanding through <strong>the</strong> experimental work. In addition, I did not know <strong>the</strong>actual time needed for <strong>quartet</strong> <strong>distance</strong> calculation and <strong>the</strong>refore productivity was moreimportant than performance.Later on, I made a decision to implement <strong>the</strong> algorithms in C++ as well. We haveseen, in Sec. 1.3.1, that <strong>the</strong> results of <strong>the</strong> Python implementation were indeed informative,however, <strong>the</strong> practical running times were ra<strong>the</strong>r slow. The time-consuming calculationswere carried out on remote servers and despite <strong>the</strong> fact that this <strong>the</strong>sis is notabout optimizations, I found it interesting to see if I could bring down <strong>the</strong> running timeto a level where <strong>the</strong> experiments could be carried out on my own laptop. Fur<strong>the</strong>rmoreano<strong>the</strong>r language with o<strong>the</strong>r properties might give <strong>the</strong> whole study a new perspective.C++ does indeed have o<strong>the</strong>r properties. It is compiled and, with support for a set oflow-level language features, considerably closer to <strong>the</strong> machine level and often used foralgorithms and o<strong>the</strong>r time critical calculations.These two tracks of implementation will be described in parallel, and in some casesnot distinguished <strong>between</strong>. However, <strong>the</strong>y should not be compared in a one-to-one relationand <strong>the</strong> resulting programs will merely be used as two different opportunities toexperiment with <strong>the</strong> <strong>the</strong>oretic results.
2.2. CHOICE OF LANGUAGE AND TEST ENVIRONMENTS 11Details about <strong>the</strong> two programming languages, libraries and physical environmentsused are listed below. For a guide on how to obtain, compile and run <strong>the</strong> code, see App. B.Python programming environmentplatform:PCprocessor:Intel Xeon 3.00 GHzmemory:1 GBoperating system: Red Hat Linux (kernel: 2.6.18)language: Python (v. 2.4.3)libraries: NumPy (v. 1.2.1)SciPy (v. 0.6.0)BLAS (preinstalled, v. 3.1.1)C++ programming environmentplatform: MacBook Pro 13”model:MacBookPro5,5processor:Intel Core 2 Duo 2.26 GHzmemory:4 GBoperating system:Mac OS X 10.6.4 (10F569)language: C++ (g++ from gcc version 4.2.1 (Apple Inc. build 5664))libraries: Boost.uBlas (Boost libraries v. 1.42.0)Boost.NumericBindings (v. v1)BLAS (Apple Accelerate framework: v. vecLib-268.0)