29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.3. Voting based consensus functions<br />

classifier ensembles, as categories (i.e. candidates) are univocally defined in that case. We<br />

elaborate on the cluster disambiguation technique employed in this work in section 6.3.1. It<br />

is important to highlight that consensus functions based on object co-association matrices<br />

circumvent this inconvenience (<strong>La</strong>nge and Buhmann, 2005), although their main drawback<br />

is that the complexity of the object co-association matrix computation is quadratic with<br />

the number of objects in the data set (Long, Zhang, and Yu, 2005).<br />

The problem of combining the outcomes of multiple soft clustering processes by means<br />

of voting strategies implies interpreting the contents of the soft cluster ensemble as the<br />

preference of each voter (clusterer) for each candidate (cluster), as soft clustering algorithms<br />

output the degree of association of each object to all the clusters. For this reason, voting<br />

methods capable of dealing with voters’ preferences (in particular, confidence and ranking<br />

voting strategies) are the basis of our consensus functions, as they lend themselves naturally<br />

to be applied in this context. However, care must be taken as regards how these preferences<br />

are expressed, that is, if they are directly or inversely proportional to the strength of the<br />

association between objects and clusters (e.g. membership probabilities or distances to<br />

centroids, respectively). In section 6.3.2, we describe four voting strategies that give rise to<br />

the proposed consensus functions.<br />

6.3.1 Cluster disambiguation<br />

In this section, we elaborate on the problem of cluster disambiguation, also known as the<br />

cluster correspondence problem.<br />

As pointed out earlier, a single hard clustering solution can be expressed by multiple<br />

equivalent labeling vectors λ, due to the symbolic nature of the labels clusters are identified<br />

with. This also occurs in soft clustering, as the permutation of the rows of a clustering<br />

matrix Λ also gives rise to equivalent fuzzy partitions. Quite obviously, this cluster identification<br />

ambiguity also rises between the multiple clustering solutions compiled in a cluster<br />

ensemble E, and thus, it becomes an issue of concern when it comes to applying voting<br />

strategies for conducting consensus clustering, given the equivalence between clusters and<br />

candidates defined by the previously described analogy with voting procedures. For this<br />

reason, our voting-based consensus functions for soft cluster ensembles make use of a cluster<br />

disambiguation technique prior to proper voting.<br />

In particular, we require from such method the ability to solve the cluster re-labeling<br />

problem —an instance of the cluster correspondence problem in which a one to one correspondence<br />

between clusters is considered (recall that, in this work, all the clusterings in the<br />

ensemble and the consensus clustering are assumed to have the same number of clusters,<br />

namely k).<br />

To solve the cluster re-labeling problem we make use of the Hungarian method (also<br />

known as Kuhn-Munkres algorithm or Munkres assignment algorithm) (Kuhn, 1955), a<br />

technique that allows to obtain the most consistent alignment among the different clusterings<br />

(Ayad and Kamel, 2008).<br />

Given a pair of clustering solutions with k clusters each, the Hungarian method is capable<br />

of finding, among the k! possible cluster permutations, the one that maximizes the overlap<br />

between them. In particular, such cluster permutation is applied on one of the two clustering<br />

solutions, while the other is taken as a reference. Put in probabilistic terms, the Hungarian<br />

algorithm selects the cluster permutation that best fits the empirical cluster assignment<br />

172

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!