29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6. Voting based consensus functions for soft cluster ensembles<br />

CSPA EAC HGPA MCLA VMA BC CC PC SC<br />

CSPA ——— × × 0.0001 0.0001 0.0001 0.0001 0.0013 0.0012<br />

EAC 0.0001 ——— × 0.0001 0.0002 0.0001 0.0001 0.002 0.0019<br />

HGPA 0.0001 0.0001 ——— 0.0001 0.0001 0.0001 0.0001 0.0017 0.0016<br />

MCLA × 0.0001 0.0001 ——— 0.0001 0.0009 × 0.0003 0.0003<br />

VMA 0.0001 0.0001 0.0001 0.0001 ——— 0.0001 0.0001 0.0001 0.0001<br />

BC 0.0001 0.0001 0.0001 0.0001 × ——— 0.0001 × ×<br />

CC 0.0001 0.0001 0.0001 0.0001 × × ——— 0.0001 0.0001<br />

PC 0.0001 0.0001 0.0001 0.0001 × × × ——— ×<br />

SC 0.0001 0.0001 0.0001 0.0001 × 0.0337 0.0419 × ———<br />

Table 6.2: Significance levels p corresponding to the pairwise comparison of soft consensus<br />

functions using a t-paired test on the Zoo data set. The upper and lower triangular sections<br />

of the table correspond to the comparison in terms of CPU time and φ (NMI) , respectively.<br />

Statistically non-significant differences (p >0.05) are denoted by the symbol ×.<br />

differences is denoted by means of the × symbol.<br />

For instance, let us see how does BordaConsensus (BC) compare to the eight remaining<br />

consensus function in terms of execution CPU time —for an easier identification, the<br />

contents of the corresponding boxes of table 6.2 are italicized. In fact, they tell us that the<br />

differences observed in figure 6.2 (according to which BC is apparently faster than MCLA<br />

and CC, and slower than CSPA, EAC, HGPA, VMA, PC and SC) are statistically significant<br />

with respect to all but the PC and SC consensus functions.<br />

If this comparison is based on the φ (NMI) of the consensus clustering solutions, figure 6.2<br />

suggests that BC performs better than CSPA, EAC, HGPA and MCLA, which is true from a<br />

statistical significance standpoint, as the corresponding entries of table 6.2 (which are typed<br />

in boldface for ease of identification) confirm. In contrast, the small differences between<br />

the φ (NMI) values of BC, VMA, CC and PC appreciated in figure 6.2 are statistically non<br />

significant, whereas it is with respect to SC despite its apparent closeness.<br />

In order to provide the reader with a global perspective that illustrates the performance<br />

of the proposed consensus functions compared to their state-of-the-art counterparts across<br />

the twelve unimodal collections employed in this work, we have computed the total percentage<br />

of experiments in which the latter yield better, equivalent or worse results than the<br />

voting-based consensus functions —considering the statistical significance of the differences<br />

between the compared magnitudes (CPU time and φ (NMI) ).<br />

Firstly, table 6.3 presents the results of such comparative analysis when it is referred<br />

to the quality of the consensus clusterings output by the consensus functions in all the<br />

experiments conducted. It can be observed that the four proposed consensus functions<br />

outperform EAC, HGPA and MCLA in a pretty overwhelming percentage of the experiments<br />

(an average 94.4% of the total). When compared to CSPA and VMA, we can appreciate<br />

certain differences between the performance of the consensus functions based on confidence<br />

voting (PC and SC) and the ones based on positional voting (BC and CC). In general terms,<br />

SC and PC perform slightly better than BC and CC. Moreover, notice that BordaConsensus<br />

and CondorcetConsensus attain exactly the same results, whereas the similarity between<br />

the results of SC and PC is also very noticeable. We conjecture that these high degrees<br />

of resemblance is due to the fact that evaluation is conducted upon a hardened version<br />

of the soft consensus clustering output by these consensus functions. Thus, the intrinsic<br />

187

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!