29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A.6. Computational resources<br />

hierarchical clustering algorithm on it. The main difference between our implementation<br />

and the original one lies in the fact that we cut the resulting dendrogram at the<br />

desired number of clusters k, whereas Fred proposes performing the cut at the highest<br />

lifetime level, so that the very consensus function finds the natural number of clusters<br />

in the data set. Its computational complexity is O n 2 l (Fred and Jain, 2005).<br />

– ALSAD (Average-Link on Similarity As Data): this is one of the three consensus<br />

functions presented in (Kuncheva, Hadjitodorov, and Todorova, 2006) based on considering<br />

object similarity measures as object features. Despite the authors do not give<br />

a specific name to this family of consensus functions, we have named them xxSAD<br />

so as to indicate that similarities are deemed as data, replacing xx by the acronym<br />

of the particular clustering algorithm used for obtaining the consensus clustering solution.<br />

In this case, the pairwise object co-association matrix is partitioned using the<br />

average-link (AL) hierarchical clustering algorithm, cutting the resulting dendrogram<br />

at the desired number of clusters. Its computational complexity is O n 2 l for creating<br />

the object similarity matrix plus O n 2 for partitioning it with the hierarchical AL<br />

clustering algorithm (Xu and Wunsch II, 2005).<br />

– KMSAD (K-Means on Similarity As Data): this consensus function belongs to the<br />

same family as the previous one. This time, the object co-association matrix is clustered<br />

using the classic k-means (KM) partitional algorithm. Its computational complexity<br />

is O n 2 l for creating the object similarity matrix plus O (tkm) for partitioning<br />

it with the k-means clustering algorithm (Xu and Wunsch II, 2005) —where t is the<br />

number of iterations of k-means.<br />

– SLSAD (Single-Link on Similarity As Data): following the same approach as the AL-<br />

SAD and KMSAD consensus functions, the pairwise object co-association matrix is<br />

partitioned by means of the single-link (SL) hierarchical clustering algorithm in this<br />

case. As in the ALSAD consensus function, the consensus clustering solution is obtained<br />

by cutting the dendrogram at the desired number of clusters. Its computational<br />

costisthesameasALSAD.<br />

– VMA (Voting Merging Algorithm): this consensus function is based on sequentially<br />

solving the cluster correspondence problem on pairs of cluster ensemble components,<br />

and, at each iteration, applying a weighted version of the sum rule confidence voting<br />

method. This algorithm scales linearly in the number of objects in the data set and<br />

the number of cluster ensemble components, i.e. its complexity is O (nl) (Dimitriadou,<br />

Weingessel, and Hornik, 2002).<br />

A.6 Computational resources<br />

All the experiments conducted in this thesis have been executed under Matlab 7.0.4 on<br />

Dual Pentium 4 3GHz/1 GB RAM computers. The reason for choosing Matlab as the<br />

programming language for codifying our proposals is threefold: besides the fact we are<br />

familiar with it, the existence of multiple built-in functions simplifies the implementation of<br />

many of the processes involved in our proposals (Principal Component Analysis and Random<br />

Projection feature extraction, for instance). Moreover, the availability of the full Matlab<br />

source code of several components of our proposals (e.g. hypergraph consensus functions<br />

230

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!