29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

7.4. Voting based soft consensus functions<br />

7.4 Voting based soft consensus functions<br />

The outcome of a fuzzy clustering process is much more informative than its crisp counterpart,<br />

as it indicates the strength of association of each object to each cluster. Despite<br />

this fact, soft clustering combination strategies are a minority in the consensus clustering<br />

literature. Allowing for this, we have made several proposals in this area, aiming to extend<br />

all our previous proposals to the more generic framework of soft clustering.<br />

There exists a pretty evident parallelism between the strength of association of each<br />

object to each cluster in a fuzzy clustering solution and the degree of preference of a voter<br />

for a candidate in an election. This fact directly allows the application of certain voting<br />

methods for consolidating soft clusterings, considering the clusters as the candidates, the<br />

cluster ensemble components as voters, and the clusterization of each object as an election.<br />

However, given the ambiguous identification of clusters inherent to clustering, a cluster<br />

alignment between the cluster ensemble components is required prior to voting.<br />

In this work, we have proposed four consensus functions for soft cluster ensembles, which<br />

are the result of applying as many voting strategies for combining the clusterings in the<br />

ensemble. In particular, we have employed two confidence voting methods –the sum and<br />

product rules, which give rise to the SumConsensus (SC) and ProductConsensus (PC) consensus<br />

functions–, and two positional voting techniques —the Borda and Condorcet voting<br />

strategies that constitute the basis of the BordaConsensus (BC) and CondorcetConsensus<br />

(CC) clustering combiners. The main difference between these two families of voting methods<br />

lies in the fact that the former operate directly on the object-to-cluster association<br />

values that make up the cluster ensemble components, whereas the latter operate on the<br />

candidates ranking according to the voters’ preferences. For disambiguating the clusters,<br />

we have employed the classic Hungarian algorithm (Kuhn, 1955).<br />

The experiments conducted have evaluated our four consensus functions (SC, PC, BC<br />

and CC), comparing them with several state-of-the-art soft consensus functions in terms<br />

of their computational complexity and the quality of the consensus clusterings they yield.<br />

In terms of execution time, confidence voting consensus functions are faster than their<br />

positional voting counterparts, as the candidate ranking process penalizes the latter from a<br />

computational standpoint. In this sense, CC is the slowest proposal, due to the exhaustive<br />

pairwise candidate confrontation implicit in the Condorcet voting method. Contrarily, the<br />

more computationally efficient PC and SC consensus functions are as fast or faster than<br />

CSPA, EAC, HGPA and MCLA in a 81% of the experiments conducted —however, they<br />

are slower than VMA in a 92% of the cases.<br />

If the quality of the hardened version of the fuzzy consensus clusterings (measured<br />

in terms of φ (NMI) with respect to the ground truth) is used as the comparison factor,<br />

we observe that the four proposed consensus functions yield (statistically significantly)<br />

better results than any of the state-of-the-art consensus functions in an average 72% of the<br />

experiments conducted, which is a clear indicator of the goodness of our proposals. It is<br />

important to highlight that it has been impossible to evaluate directly the fuzzy consensus<br />

clusterings output by the four proposed consensus functions, due to the unavailability of<br />

soft labels in the data sets employed. As a future direction of research, we plan conducting<br />

this fuzzy evaluation, and we conjecture that greater differences between SC, PC, BC and<br />

CC will be observed, as the differences between the results of the voting strategies they<br />

are based upon are somewhat masked when the fuzzy consensus clusterings they yield are<br />

200

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!