29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

C.3. Estimation of the computationally optimal DHCA<br />

and in terms of the execution time overheads (in both absolute and relative terms)<br />

between the truly and the allegedly fastest DHCA variants in the case prediction<br />

fails.<br />

– How are the experiments designed? The f! DHCA variants corresponding to<br />

all the possible permutations of the f diversity factors employed in the generation<br />

of the cluster ensemble have been implemented (see table 3.6). As described in appendix<br />

A.4, cluster ensembles have been created by the mutual crossing of f =3<br />

diversity factors: clustering algorithms (dfA), object representations (dfR) and data<br />

dimensionalities (dfD). Thus, in all our experiments, the number of DHCA variants is<br />

f! = 3! = 6, which are identified by an acronym describing the order in which diversity<br />

factors are assigned to stages —for instance, ADR describes the DHCA variant<br />

defined by the ordered list O = {df1 = dfA,df2 = dfD,df3 = dfR}. For a given data<br />

collection, the cardinalities of the representational and dimensional diversity factors<br />

(|dfR| and |dfD|, respectively) are constant, while the cardinality of the algorithmic<br />

diversity factor takes four distinct values |dfA| = {1, 10, 19, 28}, giving rise to the four<br />

diversity scenarios where our proposals are analyzed. Moreover, consensus clustering<br />

has been conducted by means of the seven consensus functions for hard cluster<br />

ensembles described in appendix A.5, which allows evaluating the behaviour of our<br />

proposals under distinct consensus paradigms. In all cases, the real running times<br />

correspond to an average of 10 independent runs of the whole RHCA, in order to<br />

obtain representative real running time values. As described in appendix A.6, all the<br />

experiments have been executed under Matlab 7.0.4 on Pentium 4 3GHz/1 GB RAM<br />

computers.<br />

– How are results presented? Both the real and estimated running times of the<br />

serial and parallel implementations of the DHCA variants are depicted by means of<br />

curves representing their average values.<br />

– Which data sets are employed? For brevity reasons, this section only describes<br />

the results of the experiments conducted on the Zoo data collection. On this data<br />

set, the cardinalities of the representational and dimensional diversity factors are<br />

|dfR| = 5 and |dfD| = 14, respectively. The presentation of the results of these<br />

same experiments on the Iris, Wine, Glass, Ionosphere, WDBC, Balance and MFeat<br />

unimodal data collections is deferred to appendix C.3.<br />

C.3.1 Iris data set<br />

In this section, we present the estimated and real running times of the serial and parallel implementations<br />

of DHCA on the Iris data collection. As aforementioned, this experiment has<br />

been replicated across four diversity scenarios that, in the case of this data set, correspond<br />

to cluster ensembles of size l =9, 90, 171 and 252.<br />

The left and right columns of figure C.15 present the estimated and real running times<br />

of several variants of the serial implementation of the DHCA and flat consensus on this data<br />

set across the four diversity scenarios. There are a couple of issues worth noting: firstly,<br />

SERTDHCA is a pretty accurate estimator of the real execution time of the serial DHCA<br />

implementation, SRTDHCA. Secondly, notice that flat consensus is faster than the most<br />

efficient DHCA variants regardless of the consensus function and the diversity scenario.<br />

272

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!