29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.2. Flat vs. hierarchical self-refining<br />

in all the experimental sections of this thesis, consensus processes have been replicated<br />

using the set of seven consensus functions described in appendix A.5, namely: CSPA,<br />

EAC, HGPA, MCLA, ALSAD, KMSAD and SLSAD. Results are averaged across ten<br />

independent experiments consisting of ten consensus function runs each. With the<br />

objective of analyzing the expectable dependence between the degree of refinement of<br />

the consensus clustering solution and the percentage p of cluster ensemble components<br />

included in the select cluster ensemble Ep, the experiments have been replicated for a<br />

set of percentage values in the range p ∈ [2, 90]. Subsequently, the final consensus label<br />

vector λ final<br />

c is selected among all the available (i.e. non-refined and refined) consensus<br />

clustering solutions through the application of the supraconsensus function presented<br />

in equation (4.3). <strong>La</strong>st, it is important to state that, although it is possible (and, in<br />

fact, recommendable) to apply either flat or hierarchical consensus on the select cluster<br />

ensemble depending on which is the most computationally efficient option, all selfrefining<br />

consensus processes in our experiments have been conducted, for simplicity,<br />

using a flat consensus architecture.<br />

– How are results presented? Results are presented by means of boxplot charts of<br />

the φ (NMI) values corresponding to the consensus self-refining process. In particular,<br />

each subfigure depicts –from left to right– the φ (NMI) values of: i) the components of<br />

the cluster ensemble E, ii) the non-refined consensus clustering solution (i.e. the one<br />

resulting from the application of either a hierarchical or a flat consensus architecture,<br />

denoted as λc), and iii) the self-refined consensus labelings λc p i obtained upon select<br />

cluster ensembles created using percentages pi = {2, 5, 10, 15, 20, 30, 40, 50, 60, 75, 90}.<br />

Moreover, the consensus clustering solution deemed as the optimal one (across a<br />

majority of experiment runs) by the supraconsensus function is identified by means<br />

of a vertical green dashed line. Moreover, the quality comparisons between the selfrefined<br />

consensus clusterings, the non-refined consensus clusterings and the cluster<br />

ensemble components are presented by means of tables showing the average values of<br />

the measured magnitudes.<br />

– Which data sets are employed? These experiments span the twelve unimodal data<br />

collections employed in this work. For brevity reasons, and following the presentation<br />

scheme of the previous chapter, this section only describes in detail the results of<br />

the self-refining procedure obtained on the Zoo data set, deferring the portrayal of<br />

the results obtained on the remaining data collections to appendix D.1. However,<br />

the global evaluation of the self-refining and the supraconsensus processes entails the<br />

results obtained on the twelve unimodal collections employed in this work.<br />

Figure 4.1 presents the boxplot charts of the φ (NMI) values corresponding to the consensus<br />

self-refining process applied on the Zoo data set. Notice that figure 4.1 is organized into<br />

three columns of subfigures, each one of which corresponds to one of the three consensus<br />

architectures, i.e. flat, RHCA and DHCA.<br />

Pretty varied results can be observed in figure 4.1, as regards both the performance of<br />

the self-refining process in itself and of the supraconsensus selection function. For instance,<br />

when the consensus clustering solution output by the flat consensus architecture using the<br />

CSPA consensus function is subject to self-refining (see the leftmost boxplot on the top<br />

row of figure 4.1), we can observe that two of the refined solutions yield clearly higher<br />

φ (NMI) values than their non-refined counterpart —in particular, the ones obtained using<br />

114

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!