29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

C.2. Estimation of the computationally optimal RHCA<br />

C.2 Estimation of the computationally optimal RHCA<br />

Section 3.2 presents a methodology for selecting the most computationally efficient implementation<br />

variant of random hierarchical consensus architectures. In short, such methodology<br />

consists of estimating the running time of several RHCA variants differing in the<br />

mini-ensembles size b, selecting the one that yields the minimum running time, which is the<br />

one to be truly executed.<br />

So as to validate this procedure, in this section we present the estimated and real running<br />

times of several variants of the fully serial and parallel implementations of RHCA on the<br />

Iris, Wine, Glass, Ionosphere, WDBC, Balance and MFeat unimodal data sets (see appendix<br />

A.2.1 for a description of these collections) across the four experimental diversity scenarios<br />

employed in this work —see appendix A.4. The objective of this experiment is twofold:<br />

firstly, we seek to verify whether the proposed strategy succeeds in predicting the most<br />

computationally efficient RHCA variant. And secondly, we intend to analyze the conditions<br />

under which random hierarchical consensus architectures are computationally advantageous<br />

compared to flat consensus clustering. The experimental design that has been followed is<br />

outlined next.<br />

– What do we want to measure?<br />

i) The time complexity of random hierarchical consensus architectures.<br />

ii) The ability of the proposed methodology for predicting the computationally optimal<br />

RHCA variant, in both the fully serial and parallel implementations.<br />

– How do we measure it?<br />

i) The time complexity of the implemented serial and parallel RHCA variants is<br />

measured in terms of the CPU time required for their execution —serial running<br />

time (SRTRHCA) and parallel running time (PRTRHCA).<br />

ii) The estimated running times of the same RHCA variants –serial estimated running<br />

time (SERTRHCA) and parallel estimated running time (PERTRHCA)– are<br />

computed by means of the proposed running time estimation methodology, which<br />

is based on the measured running time of c = 1 consensus clustering process. Predictions<br />

regarding the computationally optimal RHCA variant will be successful<br />

in case that both the real and estimated running times are minimized by the<br />

same RHCA variant, and the percentage of experiments in which prediction is<br />

successful is given as a measure of its performance. In order to measure the<br />

impact of incorrect predictions, we also measure the execution time differences<br />

(in both absolute and relative terms) between the truly and the allegedly fastest<br />

RHCA variants in the case prediction fails. This evaluation process is replicated<br />

for a range of values of c ∈ [1, 20], so as to measure the influence of this factor<br />

on the prediction accuracy of the proposed methodology.<br />

– How are the experiments designed? All the RHCA variants corresponding to<br />

the sweep of values of b resulting from the proposed running time estimation methodology<br />

have been implemented (see table 3.2). In order to test our proposals under a<br />

wide spectrum of experimental situations, consensus processes have been conducted<br />

using the seven consensus functions for hard cluster ensembles presented in appendix<br />

252

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!