29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Abstract<br />

When facing the task of partitioning a data collection in an unsupervised fashion, the clustering<br />

practitioner must make several crucial decisions –which clustering algorithm to apply,<br />

how the objects in the data set are represented, how many clusters are to be found, among<br />

others– that condition, to a large extent, the quality of the resulting partition. However,<br />

the unsupervised nature of the clustering problem makes it difficult (if not impossible) to<br />

make well-founded decisions unless domain knowledge is available.<br />

In an attempt to fight these indeterminacies, we propose an approach to the clustering<br />

problem that intentionally reduces user decision making as much as possible. Rather the<br />

contrary, the clustering practitioner is encouraged to simultaneously employ as many clustering<br />

systems as possible (compiling their outcomes into a cluster ensemble), combining<br />

them in order to obtain the final partition (or consensus clustering). The greater the similarity<br />

between the highest quality cluster ensemble component and the consensus clustering,<br />

the larger degree of robustness to the inherent indeterminacies of clustering is achieved.<br />

However, the indiscriminate creation of cluster ensemble components poses two main<br />

challenges to the clustering combination process, namely i) an increase of its computational<br />

complexity, to the point that the creation of the consensus clustering can even become unfeasible<br />

if the number of clustering systems combined is too large, and ii) the obtention of a<br />

low quality consensus partition due to the inclusion of poor clustering systems in the cluster<br />

ensemble. In order to fight against these inconveniences, this thesis introduces hierarchical<br />

self-refining consensus architectures as a means for obtaining good quality partitions at a<br />

reduced computational cost, as confirmed by extensive experimental evaluation.<br />

Aiming to port this robust clustering strategy to a more generic framework, a set of<br />

voting based consensus functions for fuzzy clustering systems combination is proposed.<br />

Several experiments demonstrate that the quality of the consensus clusterings they yield is<br />

comparable or better than that of multiple state-of-the-art soft consensus functions.<br />

Our proposals find a natural field of application in the robust clustering of multimodal<br />

data –a problem of current interest due to the growing ubiquity of multimedia–, as the<br />

existence of multiple data modalities poses additional indeterminacies that challenge the<br />

obtention of robust clustering results. The basis of our proposal is the creation of multimodal<br />

cluster ensembles, which naturally allows the simultaneous use of early and late modality<br />

fusion techniques, thus providing a highly generic and efficient approach to multimedia<br />

clustering —the performance of which is analyzed in multiple experiments.<br />

ix

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!