29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6. Voting based consensus functions for soft cluster ensembles<br />

probabilities estimated from the reference clustering solution (i.e. it finds the optimal<br />

cluster permutation that yields the largest probability mass over all cluster assignment<br />

probabilities (Fischer and Buhmann, 2003)). Depending on whether the aforementioned<br />

clustering solutions correspond to hard or fuzzy partitions, cluster permutations amount to<br />

label reassignments or to row order rearrangements, respectively.<br />

The Hungarian algorithm poses the cluster correspondence problem as a weighted bipartite<br />

matching problem, solving it in O(k3 ) time. A beautiful analysis of its error probability<br />

can be found in (Topchy et al., 2004). In this work, we have employed the implementation<br />

of (Buehren, 2008), which bases the clusters disambiguation process upon a measure of<br />

the dissimilarity between the clusters of the two clustering solutions under consideration.<br />

Cluster dissimilarity is usually embodied in a k × k matrix, the (i,j)th entry of which is<br />

proportional to the degree of dissimilarity between the ith cluster of one of the clustering<br />

solutions and the jth cluster of the other one.<br />

Cluster dissimilarity can easily be derived upon the considered pair of clustering solutions,<br />

regardless of whether they are hard or fuzzy partitions, as we show next. In the crisp<br />

case, a cluster similarity matrix S λ1 ,λ 2 can be obtained by simple matrix products between<br />

the incidence matrices of both clusterings, denoted as λ1 and λ2 —see equation (6.15).<br />

Sλ1 ,λ = Iλ1 2 Iλ2<br />

T<br />

(6.15)<br />

For illustration purposes, consider the two crisp clustering solutions of equation (6.16):<br />

λ1 =[222111333]<br />

λ2 =[113333322] (6.16)<br />

The incidence matrices corresponding to λ1 and λ2 are presented in equation (6.17).<br />

Iλ1 =<br />

⎛<br />

0 0 0 1 1 1 0 0<br />

⎞<br />

0<br />

⎝1<br />

1 1 0 0 0 0 0 0⎠<br />

0 0 0 0 0 0 1 1 1<br />

Iλ2 =<br />

⎛<br />

1 1 0 0 0 0 0 0<br />

⎞<br />

0<br />

⎝0<br />

0 0 0 0 0 0 1 1⎠<br />

(6.17)<br />

0 0 1 1 1 1 1 0 0<br />

The cluster similarity matrix derived upon these two clustering solutions is the one<br />

presented in equation (6.18).<br />

173

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!