29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.4. Discussion<br />

have experimentally proved that it is very likely to obtain a refined consensus clustering<br />

solution of higher quality than the original one.<br />

The main difference between our two proposals lies in the origin of the clustering employed<br />

as a reference for creating the select cluster ensemble. In the first proposal, referred<br />

to as consensus-based self-refining, this initial clustering is the consensus clustering solution<br />

λc resulting from a previous consensus process run on the whole cluster ensemble. In our<br />

second proposition, the starting point of the refining process is one of the components of<br />

the cluster ensemble, which is selected using an average normalized mutual information<br />

criterion —giving rise to what we call selection-based self-refining.<br />

Unfortunately, the optimal configuration of this self-refining procedure –e.g. the size<br />

of the select cluster ensemble, or the consensus function employed for creating the refined<br />

clustering solutions– is local to each particular experiment. This inconvenience, which<br />

is by no means new in the consensus clustering literature, can be tackled by means of<br />

a supraconsensus function that, in a blind manner, selects the highest quality clustering<br />

solution among a bunch of them, created using distinct self-refining configurations. However,<br />

the application of one of the most extended supraconsensus function (the one proposed in<br />

(Strehl and Ghosh, 2002)) in our experiments has yielded little disappointing results, as it<br />

is capable of selecting the highest quality clustering solution in a relatively low percentage<br />

of the experiments conducted. Moreover, alternative supraconsensus functions based on<br />

average normalized mutual information gave rise to even poorer selection accuracies (not<br />

reported here due to the limited interest of the results obtained), which suggests that it<br />

is necessary to conduct further research in order to devise novel supraconsensus functions<br />

capable of satisfying such a restrictive constraint as the one imposed here —i.e. selecting,<br />

among a set of clustering solutions, the top quality one in a fully unsupervised fashion.<br />

As aforementioned, the concepts of consensus self-refining and supraconsensus functions<br />

are closely related. In fact, supraconsensus is originally presented in (Strehl and Ghosh,<br />

2002) as a means for selecting the best consensus clustering solution among a bunch of<br />

them, created using different consensus functions. Therefore, it seems logical to consider<br />

the application of supraconsensus not on a set of previously derived consensus clustering<br />

solutions, but on the cluster ensemble components themselves, so as to select the highest<br />

quality ones.<br />

Some very recent works have dealt with this issue, such as (Gionis, Mannila, and<br />

Tsaparas, 2007), where the BESTCLUSTERING algorithm is defined as a means for identifying<br />

the individual partition that minimizes the number of disagreements with respect to<br />

the remaining components of the cluster ensemble. Nevertheless, no posterior consensus<br />

clustering based refinement process is applied on this presumably high quality cluster ensemble<br />

component, which, as we have experimentally proved, may bring about important<br />

quality gains.<br />

More recently, the use of clustering solution refinement procedures based on consensus<br />

clustering has been studied in (Fern and Lin, 2008) contemporarily to the completion of this<br />

thesis. That work and ours have multiple points in common, such as i) the primary purpose<br />

of avoiding the negative influence of poor clusterings contained in large cluster ensembles<br />

on the quality of the consensus clustering solutions built upon them, ii) the use of one of<br />

the components of the cluster ensemble as the reference partition for creating the reduced<br />

select cluster ensemble, as we propose in selection-based self-refining, iii) the analysis of<br />

the quality of several refined consensus clustering solutions generated upon multiple select<br />

128

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!