Refined Buneman Trees
Refined Buneman Trees
Refined Buneman Trees
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Algorithm 5. Offhand, the algorithm looks like it would perform in time O(n 3 ).<br />
We iterate through a loop, reducing the size of the distance matrix by one<br />
(adding one entry and removing two) on each loop, until the distance matrix is<br />
spent. Inside the loop, we in line 2 need to find the minimum entry in an n × n<br />
matrix, normally a O(n 2 ) operation. However, if we overlay our distance matrix<br />
with a quad tree search structure, we can get away with spending constant<br />
time finding the minimum entry in the matrix, while spending only linear time<br />
updating the search structure after we update the distance matrix in line 6.<br />
More about the quad tree data structure in chapter 8.<br />
Algorithm 5 The single linkage clustering tree algorithm<br />
Require: δ is a distance measure on X<br />
Ensure: C is the single linkage clustering tree for δ<br />
1: C a set of clusters, one for each element in X<br />
2: while |C| > 1 do<br />
3: Choose the clusters c 1 ,c 2 ∈ C that minimize the quantity δ(c 1 ,c 2 ).<br />
4: Create c ′ = c 1 ∪ c 2 .<br />
5: Calculate distances from c ′ to all other clusters c ′′ ∈ C by setting<br />
δ(c ′ ,c ′′ )=min{δ(c 1 ,c ′′ ),δ(c 2 ,c ′′ )}.<br />
6: Erase c 1 and c 2 from C, addc ′ .<br />
7: end while<br />
Also, instead of actually inserting and removing entire rows and columns in<br />
the distance matrix, we might reuse a row or column instead, and we might not<br />
delete rows but rather keep track of clusters that are alive. Figure 7.1 illustrates<br />
this point.<br />
7.3 Converting the distance matrix<br />
One important note about the use of the single linkage clustering algorithm<br />
instead of the anchored <strong>Buneman</strong> tree algorithm is the distinction between distance<br />
matrix and similarity matrix. The anchored <strong>Buneman</strong> tree for a species<br />
x ∈ X is computed based on a distance measure δ on a X, while the single<br />
linkage clustering tree for x ∈ X is computed based on an inverted similarity<br />
measure on X with respect to x. In other words we must distinguish between<br />
SLCT(δ, X) andSLCT(F x (δ),X).<br />
Basically, this means that we have to transform δ before using it to calculate<br />
the single linkage clustering tree. The transformation is described in [BB99],<br />
section 3, where is it termed a Farris transformation:<br />
F x (a, b) = 1 2 (δ ax + δ bx − δ ab )<br />
where a, b ∈ X \{x}. The Farris transformation creates a similarity measure,<br />
but this can be converted into a distance measure by e.g. changing signs.<br />
52