29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2. Flat vs. hierarchical self-refining<br />

the 30% and 60% of the whole cluster ensemble, i.e. λc30 and λc60. Moreover, notice that<br />

supraconsensus selects λc30 as the final consensus clustering solution (that is, it performs<br />

correctly).<br />

In other cases, supraconsensus fails to select the highest quality consensus clustering<br />

solution. See, for instance, that supraconsensus selects the non-refined consensus clustering<br />

solution λc yielded by the flat consensus architecture based on the EAC consensus function<br />

as the optimal one, whereas the refined clusterings λc40, λc50 and λc60 attain higher φ (NMI)<br />

values (leftmost boxplot chart on the second row of figure 4.1). Furthermore, notice that<br />

in some minority cases, the self-refining procedure introduces no or little improvement,<br />

as when it is applied on the consensus solution output by the RHCA using the ALSAD<br />

consensus function —central column boxplot on the fifth row of figure 4.1.<br />

<strong>La</strong>st, notice that the boxplot corresponding to the refining of the consensus clustering<br />

solution output by the flat consensus architecture using MCLA –leftmost boxplot on the<br />

fourth row– only presents the φ (NMI) values corresponding to the cluster ensemble E. This<br />

is due to the fact that, for this particular consensus function and diversity scenario, flat<br />

consensus is not executable with our computational resources —see appendix A.6. Moreover,<br />

as all self-refining consensus processes in our experiments have been conducted using a<br />

flat consensus architecture, the self-refining of the consensus clustering solutions output by<br />

RHCA and DHCA are not computed from λc40 forth due to memory limitations when using<br />

the MCLA consensus function. However, recall from chapter 3 that hierarchical consensus<br />

architectures would allow the computation of consensus clustering solutions in situations<br />

where flat consensus is not executable.<br />

A deeper and more quantitative evaluation of the proposed consensus self-refining procedure<br />

requires analyzing two of its facets. Firstly, it is necessary to evaluate the self-refining<br />

process in itself, answering questions such as: i) how often does the self-refining process yield<br />

a higher quality consensus clustering solution than the non-refined one? ii) to which extent<br />

are the top quality self-refined consensus clustering solutions better than their non-refined<br />

counterpart? iii) how do the best self-refined consensus clustering solutions compare to<br />

the cluster ensemble components? or iv) does the self-refining procedure reduce the differences<br />

between the quality of the consensus clustering solutions output by distinct consensus<br />

architectures? The answers to these questions are presented in section 4.2.1.<br />

And secondly, given a set of self-refined consensus clustering solutions, a supraconsensus<br />

function capable of blindly selecting the highest quality self-refined solution is required. Its<br />

performance can be evaluated in terms of i) the percentage of occasions the supraconsensus<br />

function selects the highest quality consensus solution, and ii) the quality loss degree due<br />

to the supraconsensus selection of suboptimal consensus clustering solutions. These aspects<br />

are evaluated in section 4.2.2.<br />

4.2.1 Evaluation of the consensus-based self-refining process<br />

As regards the evaluation of the self-refining process, we have firstly analyzed the percentage<br />

of self-refining experiments in which at least one of the self-refined consensus clustering<br />

solutions attains a φ (NMI) with respect to the ground truth that is higher than the one<br />

achieved by the consensus clustering solution available prior to self-refinement. The results<br />

presented in table 4.2, which correspond to an average across all the data sets for each<br />

consensus architecture and consensus function, reveal that the proposed self-refining proce-<br />

116

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!