29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 4. Self-refining consensus architectures<br />

cluster ensembles of distinct sizes, and iv) the use of normalized mutual information as the<br />

guiding principle for comparing clusterings.<br />

However, there also exist several differences between both works, as in (Fern and Lin,<br />

2008) i) refining is not presented as a means for bettering the quality of a previously<br />

derived consensus clustering solution, but as a means for obtaining a good quality one<br />

upon a select cluster ensemble resulting from discarding those poor components that may<br />

induce a quality loss in it, ii) the criteria employed for choosing the components included<br />

in the select cluster ensemble consider both clustering quality and diversity, iii) clustering<br />

refinement results obtained by a single consensus function (CSPA) are reported, and iv) no<br />

supraconsensus methodology for selecting the best refined consensus clustering is studied.<br />

To conclude, we would like to highlight again the significant quality improvements that<br />

can be obtained by means of self-refining consensus procedures. However, it is also important<br />

to be aware that we cannot take full advantage of these gains if a good performing<br />

supraconsensus methodology allows to select the top quality self-refined clustering solution<br />

with a high level of confidence. For this reason, in our opinion, devising such a technique is<br />

a matter of paramount importance as regards the further research to be conducted in this<br />

particular field.<br />

In the future, it would be interesting to analyze how consensus self-refining procedures<br />

perform if the cluster ensemble selection process was based on clustering similarity measures<br />

other than normalized mutual information. Furthermore, we also intend to study the<br />

possibility of creating the select cluster ensemble by including in it all those clusterings<br />

exceeding a certain φ (ANMI) threshold with respect to the reference clustering, instead of<br />

selecting a percentage p of the cluster ensemble components.<br />

4.5 Related publications<br />

As mentioned earlier, the aim of the proposed self-refining consensus clustering approach<br />

is to obtain partitions which are robust to the indeterminacies inherent to the clustering<br />

problem. This has been the main propeller of our research since the early days, which<br />

has been reflected in several publications at several international conferences and national<br />

journals. The application focus of these works was document clustering, so they were<br />

mainly published at Information Retrieval and Natural <strong>La</strong>nguage Processing forums. In<br />

none of these works, however, self-refining procedures are included as a means for obtaining<br />

improved quality clustering results, so our proposals in this specific area remain, by the<br />

moment, unpublished.<br />

The first publication regarding robust document clustering based on cluster ensembles<br />

was presented as a poster at the SIGIR 2006 conference held at Seattle. The details of this<br />

publication follow.<br />

Authors: Xavier Sevillano, Germán Cobo, Francesc Alías and Joan Claudi Socoró<br />

Title: Feature Diversity in Cluster Ensembles for Robust Document Clustering<br />

In: Proceedings of the 29th ACM SIGIR Conference<br />

Pages: 697-698<br />

129

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!