29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.5. Discussion<br />

solutions on hard cluster ensembles (Dudoit and Fridlyand, 2003; Fischer and Buhmann,<br />

2003; Greene and Cunningham, 2006). To our knowledge, the only voting-based consensus<br />

function for soft cluster ensembles is the Voting-Merging Algorithm (VMA) of (Dimitriadou,<br />

Weingessel, and Hornik, 2002), which employs a weighted version of the sum rule for<br />

confidence voting. Moreover, all these works share a common point in that they use the<br />

Hungarian algorithm for solving the cluster correspondence problem.<br />

Additional techniques for cluster disambiguation developed in the consensus clustering<br />

literature include the correspondence estimation based on common space cluster representation<br />

by clusters clustering or Singular Value Decomposition (Boulis and Ostendorf, 2004),<br />

the Soft Correspondence Ensemble Clustering algorithm, which is based on establishing a<br />

soft correspondence between clusters (in the sense that a cluster of a given clustering corresponds<br />

to every cluster in another clustering with different weights) (Long, Zhang, and<br />

Yu, 2005), the cumulative voting approach, that, unlike common one-to-one voting schemes<br />

(e.g. Hungarian), computes a probabilistic mapping between clusters (Ayad and Kamel,<br />

2008), or the FullSearch, Greedy and <strong>La</strong>rgeKGreedy cluster alignment algorithms (Jakobsson<br />

and Rosenberg, 2007). The first two approaches coincide in that they can be indistinctly<br />

applied for aligning the clusters of crisp and fuzzy partitions. Given the key importance of<br />

the cluster disambiguation process as a prior step to voting, we plan to evaluate these alternatives<br />

to the Hungarian method, so as to investigate their impact on both the quality of<br />

the consensus clusterings obtained and the time complexity of the whole consensus process.<br />

The comparative performance analysis of the four proposed consensus functions has<br />

revealed that they constitute a feasible alternative for conducting consensus clustering processes<br />

on soft cluster ensembles, as they are capable of yielding consensus clustering solutions<br />

of comparable or superior quality to those obtained by state-of-the-art clustering combiners<br />

at a reasonable computational cost. An additional appealing feature of our proposals is<br />

that they naturally deliver fuzzy consensus clustering solutions, which makes all sense in a<br />

soft clustering scenario —a fact other recent consensus functions for soft cluster ensembles<br />

–as the one presented in (Punera and Ghosh, 2007)– do not consider. However, the lack<br />

of a fuzzy ground truth has not allowed evaluating the soft consensus clusterings obtained,<br />

which constitutes one of the future directions of research of the work conducted in this<br />

chapter. As mentioned earlier, this would probably make the differences between the proposed<br />

consensus functions more evident, as it would highlight the differences between the<br />

distinct voting methods employed.<br />

As reported earlier, the sequential application of the cluster disambiguation and the<br />

voting processes penalizes the time complexity of our proposals, specially when they are<br />

compared to VMA. Thus, in the future, we plan to adopt the iterative cluster alignment<br />

plus voting strategy employed by this consensus function, which, in our opinion, will surely<br />

reduce the execution time of the proposed voting-based consensus functions without significantly<br />

reducing the quality of the consensus functions obtained.<br />

Another significant conclusion is that the EAC and HGPA consensus functions yield<br />

the lowest quality consensus clusterings, as already noticed in the vast majority of the<br />

experiments conducted in the hard clustering scenario.<br />

190

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!