29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

– What do we want to measure?<br />

Chapter 4. Self-refining consensus architectures<br />

i) The quality of the self-refined consensus clusterings obtained by the proposed<br />

methodology when applied on the consensus clusterings output by the flat and<br />

allegedly fastest RHCA and DHCA consensus architectures.<br />

ii) The ability of the proposed self-refining procedure to obtain a consensus clustering<br />

of higher quality than that of a) its non-refined counterpart, and b) the<br />

highest and median quality cluster ensemble components.<br />

iii) The quality of the self-refined consensus clustering of maximum quality compared<br />

to its non-refined counterpart.<br />

iv) The ability of the supraconsensus function to select, in a fully unsupervised<br />

manner, the highest quality self-refined consensus clustering among the set of<br />

self-refined clusterings generated.<br />

v) We analyze whether self-refining constitutes a means for uniformizing the quality<br />

of the consensus clustering solutions yielded by the flat and allegedly fastest<br />

RHCA and DHCA consensus architectures, thus making it possible to decide, on<br />

computational grounds only, which is the most suitable consensus architecture<br />

for a given clustering problem.<br />

– How do we measure it?<br />

i) The quality of the self-refined consensus clusterings is measured in terms of the<br />

φ (NMI) with respect to the ground truth of each data collection.<br />

ii) The percentage of experiments in which the proposed self-refining procedure gives<br />

rise to at least one self-refined consensus clustering of higher quality than that<br />

of a) its non-refined counterpart, and b) the highest and median quality cluster<br />

ensemble components.<br />

iii) We measure the relative φ (NMI) percentage difference between the self-refined<br />

consensus clustering of maximum quality and its non-refined counterpart.<br />

iv) The precision of the supraconsensus function is measured in terms of the percentage<br />

of experiments in which it manages to select the highest quality self-refined<br />

consensus clustering.<br />

v) We compare the average variance between the φ (NMI) scores of the consensus<br />

clusterings λc yielded by the three evaluated consensus architectures (i.e. prior to<br />

self-refining) with the variance between the consensus clustering selected by the<br />

) after the self-refining procedure is conducted.<br />

supraconsensus function (λ final<br />

c<br />

– How are the experiments designed? we only analyze the results of the consensus<br />

self-refining process executed on the highest diversity scenario (i.e. the one where<br />

cluster ensembles are created by applying the |dfA| = 28 clustering algorithms from<br />

the CLUTO clustering package). The reason for this is twofold: besides brevity, this<br />

limitation on our analysis avoids that the results of the self-refining process are masked<br />

by the consensus quality variability observed in lower diversity scenarios —recall that,<br />

in those cases, the quality of the consensus clustering solutions shows larger variances,<br />

as the cluster ensemble changes from experiment to experiment due to the random<br />

selection of |dfA| = {1, 10, 19} clustering algorithms, whereas exactly the same cluster<br />

ensemble is employed across the ten experiments in the highest diversity scenario. As<br />

113

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!