22.01.2015 Views

Refined Buneman Trees

Refined Buneman Trees

Refined Buneman Trees

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Algorithm 5. Offhand, the algorithm looks like it would perform in time O(n 3 ).<br />

We iterate through a loop, reducing the size of the distance matrix by one<br />

(adding one entry and removing two) on each loop, until the distance matrix is<br />

spent. Inside the loop, we in line 2 need to find the minimum entry in an n × n<br />

matrix, normally a O(n 2 ) operation. However, if we overlay our distance matrix<br />

with a quad tree search structure, we can get away with spending constant<br />

time finding the minimum entry in the matrix, while spending only linear time<br />

updating the search structure after we update the distance matrix in line 6.<br />

More about the quad tree data structure in chapter 8.<br />

Algorithm 5 The single linkage clustering tree algorithm<br />

Require: δ is a distance measure on X<br />

Ensure: C is the single linkage clustering tree for δ<br />

1: C a set of clusters, one for each element in X<br />

2: while |C| > 1 do<br />

3: Choose the clusters c 1 ,c 2 ∈ C that minimize the quantity δ(c 1 ,c 2 ).<br />

4: Create c ′ = c 1 ∪ c 2 .<br />

5: Calculate distances from c ′ to all other clusters c ′′ ∈ C by setting<br />

δ(c ′ ,c ′′ )=min{δ(c 1 ,c ′′ ),δ(c 2 ,c ′′ )}.<br />

6: Erase c 1 and c 2 from C, addc ′ .<br />

7: end while<br />

Also, instead of actually inserting and removing entire rows and columns in<br />

the distance matrix, we might reuse a row or column instead, and we might not<br />

delete rows but rather keep track of clusters that are alive. Figure 7.1 illustrates<br />

this point.<br />

7.3 Converting the distance matrix<br />

One important note about the use of the single linkage clustering algorithm<br />

instead of the anchored <strong>Buneman</strong> tree algorithm is the distinction between distance<br />

matrix and similarity matrix. The anchored <strong>Buneman</strong> tree for a species<br />

x ∈ X is computed based on a distance measure δ on a X, while the single<br />

linkage clustering tree for x ∈ X is computed based on an inverted similarity<br />

measure on X with respect to x. In other words we must distinguish between<br />

SLCT(δ, X) andSLCT(F x (δ),X).<br />

Basically, this means that we have to transform δ before using it to calculate<br />

the single linkage clustering tree. The transformation is described in [BB99],<br />

section 3, where is it termed a Farris transformation:<br />

F x (a, b) = 1 2 (δ ax + δ bx − δ ab )<br />

where a, b ∈ X \{x}. The Farris transformation creates a similarity measure,<br />

but this can be converted into a distance measure by e.g. changing signs.<br />

52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!