29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2. Flat vs. hierarchical self-refining<br />

Consensus Consensus function<br />

solution CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

non-refined 0.011 0.011 0.026 0.638 0.009 0.011 0.029<br />

best non/self-refined 0.004 0.019 0.005 0.002 0.002 0.001 0.006<br />

Table 4.8: φ (NMI) variance of the non-refined and the best non/self-refined consensus clustering<br />

solutions across the flat, RHCA and DHCA consensus architectures, averaged across<br />

the twelve data collections.<br />

Consensus Consensus function<br />

architecture CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

flat 30.4 50 53.1 11 25 23.8 37.5<br />

RHCA 25 35.9 38.4 3.9 24.4 24.1 36.6<br />

DHCA 12.5 42 40.1 17 0 29.5 37.5<br />

Table 4.9: Percentage of experiments in which the supraconsensus function selects the top<br />

quality consensus clustering solution, averaged across the twelve data collections.<br />

we have assumed the use of the top quality self-refined consensus clustering solution. Quite<br />

obviously, achieving the encouraging results reported would require using a supraconsensus<br />

function that, in an automatic manner, would detect that best self-refined consensus clustering<br />

in any given situation. The next section is devoted to the performance analysis of<br />

such supraconsensus function.<br />

4.2.2 Evaluation of the supraconsensus process<br />

As regards the performance of the supraconsensus function proposed by (Strehl and Ghosh,<br />

2002), we have firstly evaluated the percentage of experiments in which the supraconsensus<br />

function selects the highest quality consensus clustering solution. Table 4.9 presents the<br />

results averaged across all the data collections, for each consensus function and architecture.<br />

The average accuracy with which the supraconsensus function selects the top quality selfrefined<br />

consensus selection is 29%, i.e. it manages to select the best solution in less than a<br />

third of the experiments conducted.<br />

This somehow contradicts the beautiful conclusions of (Strehl and Ghosh, 2002), where<br />

φ (ANMI) (E, λc) is presented as a suitable surrogate of φ (NMI) (γ, λc) for selecting the best<br />

consensus clustering solutions in real scenarios, where a ground truth γ is not available.<br />

Such conclusion was supported by the fact that both φ (ANMI) (E, λc) andφ (NMI) (γ, λc)<br />

follow very similar patterns as regards their growth (i.e. the higher φ (NMI) (γ, λc), the<br />

higher φ (ANMI) (E, λc)). However, such claims were sustained on experiments using synthetic<br />

clustering results. In several of our experiments, in contrast, we have witnessed that<br />

this behaviour is not always obeyed in a strict fashion.<br />

Just for illustration purposes, we have conducted a toy experiment, in which a set of<br />

randomly picked 300 cluster ensemble components corresponding to the Zoo data collection<br />

have been evaluated in terms of i) their φ (NMI) with respect to the ground truth, and ii)<br />

their φ (ANMI) with respect to the 299 remaining clusterings selected. Figure 4.2 depicts<br />

both magnitudes, where the horizontal axis of each figure corresponds to an index of the<br />

clusterings in the ensemble arranged in decreasing order of φ (NMI) with respect to the<br />

120

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!