29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.5. Discussion<br />

Consensus Consensus function<br />

architecture CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

flat 58.3 25 42.3 33.2 66.7 69.8 41.7<br />

RHCA 69.8 24.7 15.9 74.2 79.1 79.1 50.4<br />

DHCA 83.3 8.3 3.9 77.1 75 77.9 58.3<br />

Table 3.12: Percentage of experiments in which the consensus clustering solution is better<br />

than the median cluster ensemble component.<br />

given ground truth (i.e. the BEC), we observe that the consensus clustering solution attains<br />

higher φ (NMI) values in only a 0.1% of the experiments —see table 3.14. If the degree of<br />

improvement of those consensus clustering solutions that attain a higher φ (NMI) than the<br />

BEC is measured in terms of relative percentage φ (NMI) increase, a modest 0.6% φ (NMI)<br />

gain is obtained in average —see table 3.15 for a detailed view across consensus functions<br />

and architectures.<br />

Consensus Consensus function<br />

architecture CSPA EAC HGPA MCLA ALSAD KMSAD SLSAD<br />

flat 90 12.9 80.2 50.1 107.1 87.7 24.6<br />

RHCA 78.8 16.3 9.2 96.4 94.7 90.6 33.3<br />

DHCA 73.7 11.6 5.6 53.1 83.7 72.2 67.6<br />

Table 3.13: Relative percentage φ (NMI) gain between the consensus clustering solution and<br />

the median cluster ensemble component.<br />

As a conclusion, we can see that the application of consensus clustering processes on<br />

a collection of partitions of a given data collection provides a means for obtaining a summarized<br />

clustering that, although rarely better than the best component available in the<br />

cluster ensemble, is reasonably often quite better than the median data partition. However,<br />

despite these fairly good results, we aim to obtain clustering solutions more robust to the<br />

inherent indeterminacies of clustering (i.e. closer or even better than the maximum quality<br />

cluster ensemble component). For this reason, chapter 4 introduces what we call consensus<br />

self-refining procedures that aim to improve the quality of the consensus clustering solutions<br />

obtained from either hierarchical or flat consensus architectures.<br />

3.5 Discussion<br />

Our proposal for building clustering systems that behave robustly in front of the indeterminacies<br />

inherent to unsupervised classification problems relies on the application of consensus<br />

clustering processes on large cluster ensembles created by the application of multiple mutually<br />

crossed diversity factors.<br />

However, in the consensus clustering literature, relatively few works face the problematics<br />

of combining large amounts of clusterings, as most authors tend to employ rather small<br />

cluster ensembles for evaluating their proposals. However, the application of certain consensus<br />

clustering approaches in computationally demanding scenarios can be difficult. Typical<br />

examples of this include consensus functions based on object co-association measures that<br />

become inapplicable on large data collections, or clustering combiners not executable on<br />

104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!