29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

%of<br />

experiments<br />

relative %<br />

φ (NMI) loss<br />

Chapter 4. Self-refining consensus architectures<br />

Consensus function<br />

CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

37.5 61.2 84.6 30.6 12.5 23 60<br />

24.6 10.4 16.8 12.2 10.9 12.7 11.9<br />

Table 4.16: Percentage of experiments in which the supraconsensus function selects the top<br />

quality clustering solution, and relative percentage φ (NMI) losses between the top quality<br />

clustering solution and the one selected by supraconsensus, averaged across the twelve data<br />

collections.<br />

sensus function). The results of these two experiments are presented in table 4.16, averaged<br />

across all the data collections and for each consensus function.<br />

The average accuracy with which the supraconsensus function selects the top quality<br />

self-refined consensus selection is 44.2%, i.e. it manages to select the best solution in less<br />

than a half of the experiments conducted. Moreover, this apparent lack of precision entails<br />

an average relative φ (NMI) reduction of 14.2%.<br />

These results reinforce the idea that the supraconsensus function proposed in (Strehl<br />

and Ghosh, 2002) is still far from constituting the most appropriate means for selecting, in<br />

a completely unsupervised manner, the best consensus clustering solution among a bunch<br />

of them, specially if they have pretty similar qualities. This is the reason why the average<br />

level of selection accuracy attained by the supraconsensus function in the selection-based<br />

self-refining scenario is higher than in the consensus-based context (44.2% vs. 29%), as<br />

the φ (NMI) differences between the top quality clustering solution and the remaining ones<br />

is notably higher in the former case than in the latter —in selection-based self-refining,<br />

the selected cluster ensemble component λref is often of notably higher quality than the<br />

self-refined consensus clustering solutions λcpi , see appendix D.2.<br />

In contrast, similar results are obtained in both the selection-based and consensusbased<br />

self-refining scenarios when the efficiency of the supraconsensus function is measured<br />

in terms of the φ (NMI) loss caused from erroneous selections (i.e. when a clustering solution<br />

other than the highest quality one is selected by the supraconsensus function). In<br />

selection-based self-refining, this relative percentage φ (NMI) loss is 14.2%, being 14.9% in<br />

the consensus-based self-refining context.<br />

4.4 Discussion<br />

In this chapter, we have put forward a couple of proposals oriented to obtain a high quality<br />

clustering solution given a cluster ensemble and a similarity measure between partitions,<br />

using consensus clustering and following a fully unsupervised procedure. Together with the<br />

computationally efficient consensus architectures presented in chapter 3, these proposals<br />

constitute the basis for constructing robust consensus clustering systems.<br />

Our proposals are based on applying consensus clustering on a set of clusterings –<br />

compiled in a select cluster ensemble– which are chosen from the cluster ensemble according<br />

to their similarity with respect to an initially available clustering solution. By doing so, we<br />

127

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!