29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3. Hierarchical consensus architectures<br />

in case that both the real and estimated running times are minimized by the<br />

same RHCA variant, and the percentage of experiments in which prediction is<br />

successful is given as a measure of its performance. In order to measure the<br />

impact of incorrect predictions, we also measure the execution time differences<br />

(in both absolute and relative terms) between the truly and the allegedly fastest<br />

RHCA variants in the case prediction fails. This evaluation process is replicated<br />

for a range of values of c ∈ [1, 20], so as to measure the influence of this factor<br />

on the prediction accuracy of the proposed methodology.<br />

– How are the experiments designed? All the RHCA variants corresponding to<br />

the sweep of values of b resulting from the proposed running time estimation methodology<br />

have been implemented (see table 3.2). In order to test our proposals under a<br />

wide spectrum of experimental situations, consensus processes have been conducted<br />

using the seven consensus functions for hard cluster ensembles presented in appendix<br />

A.5 (i.e. CSPA, EAC, HGPA, MCLA, ALSAD, KMSAD and SLSAD), employing<br />

cluster ensembles of the sizes corresponding to the four diversity scenarios described<br />

in appendix A.4 —which basically boils down to compiling the clusterings output by<br />

|dfA| = {1, 10, 19, 28} clustering algorithms. In all cases, the real running times correspond<br />

to an average of 10 independent runs of the whole RHCA, in order to obtain<br />

representative real running time values (recall that the mini-ensemble components<br />

change from run to run, as they are randomly selected). For a description of the<br />

computational resources employed in or experiments, see appendix A.6.<br />

– How are results presented? Both the real and estimated running times of the<br />

serial and parallel implementations of the RHCA variants are depicted by means of<br />

curves representing their average values.<br />

– Which data sets are employed? For brevity reasons, this section only describes<br />

the results of the experiments conducted on the Zoo data collection. The presentation<br />

of the results of these same experiments on the Iris, Wine, Glass, Ionosphere, WDBC,<br />

Balance and MFeat unimodal data collections is deferred to appendix C.2.<br />

One word before proceeding to present the results obtained. In practice, only serial<br />

RHCA have been implemented in our experiments. The real execution times of their parallel<br />

counterparts are, in fact, an estimation based on retrieving the execution time of the longestlasting<br />

consensus process of each stage of the serial RHCA and plugging them into equation<br />

(3.9).<br />

Diversity scenario |df A| =1<br />

Firstly, figure 3.4 presents the results corresponding to the lowest diversity scenario, i.e.<br />

the one resulting from using a single randomly chosen clustering algorithm for generating<br />

the cluster ensemble —that is, the cardinality of the algorithmic diversity factor is equal<br />

to one, i.e. |dfA| = 1, which, on this data set, gives rise to a cluster ensemble size l = 57.<br />

Following the methodology of table 3.2, the sweep of values of the mini-ensemble size is<br />

b = {2, 3, 4, 6, 7, 28, 57} —recall that each value of b gives rise to a distinct RHCA variant.<br />

Figure 3.4(a) presents the serial RHCA estimated running time (SERTRHCA), while figure<br />

3.4(b) depicts the real serial running time (or SRTRHCA) of the implemented RHCA<br />

57

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!