29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Appendix F<br />

Experiments on soft consensus<br />

clustering<br />

This appendix presents the results of the consensus clustering experiments on soft cluster<br />

ensembles. The main purpose of these experiments is to compare the four voting based consensus<br />

functions put forward in chapter 6 –namely BordaConsensus (BC), CondorcetConsensus<br />

(CC), ProductConsensus (PC) and SumConsensus (SC)– with five state-of-the-art<br />

clustering combiners: the soft versions of the hypergraph based hard consensus functions<br />

CSPA, HGPA and MCLA (Strehl and Ghosh, 2002), and the evidence accumulation approach<br />

(EAC) (Fred and Jain, 2005) (see section 6.2), plus the voting-merging soft consensus<br />

function (VMA) of (Dimitriadou, Weingessel, and Hornik, 2002).<br />

Such comparison entails two aspects: the quality of the consensus clustering solutions<br />

obtained (measured in terms of normalized mutual information –φ (NMI) – with respect to<br />

the ground truth of each data set), and the time complexity of each consensus function<br />

(measured in terms of the CPU time required for their execution —see appendix A.6 for a<br />

description of the computational resources employed in this work).<br />

From a formal viewpoint, the results of these experiments are presented by means of a<br />

φ (NMI) vs. CPU time diagram, onto which the performance of each consensus function is<br />

described by means of a scatterplot covering the mean ± 2-standard deviation region of the<br />

corresponding magnitude (i.e. φ (NMI) and CPU time). Moreover, the statistical significance<br />

of the results is evaluated by means of Student’s t-tests that compare all the consensus<br />

functions on a pairwise basis, thus analyzing whether the hypothetical superiority of any of<br />

them is sustained on firm statistical grounds, using the traditional 95% confidence interval<br />

as a reference for distinguishing between significant and non significant differences.<br />

These soft consensus clustering experiments have been conducted on the twelve unimodal<br />

data collections employed in this work (see appendix A.2.1 for a description). The results<br />

corresponding to the Zoo data collection are presented in chapter 6, and the following<br />

paragraphs describe the results obtained on the eleven remaining data sets.<br />

373

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!