29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1. Motivation<br />

An additional and very relevant point as regards the computational efficiency of hierarchical<br />

consensus architectures is that they naturally allow the parallel execution of the<br />

consensus clustering processes of every HCA stage —quite obviously, this will ultimately<br />

depend on the availability of computing resources. Thus, the degree of parallelism in executing<br />

the consensus of every HCA stage will set the lower and upper bounds of the time<br />

required for obtaining the final consensus clustering λc.<br />

In the best-case scenario, the HCA running time can be as low as the sum of the<br />

execution times of the longest-lasting consensus task of each stage of the architecture,<br />

provided that the available computational resources allow the parallel computation of all<br />

the intermediate consensus solutions of any given stage.<br />

On the contrary, if the execution of the halfway consensus is serialized, the time required<br />

for running the whole HCA amounts to the sum of the execution times of all the consensus<br />

processes of the stages of the hierarchical consensus architecture, which constitutes the<br />

upper bound of the running time of a hierarchical consensus architecture.<br />

Therefore, depending on the design of the HCA, the simultaneously available computing<br />

resources and the characteristics of the data set, structuring the consensus clustering task<br />

in a hierarchical manner may be more or less computationally beneficial (or not beneficial<br />

at all) as compared to its flat counterpart. From a practical viewpoint, our general idea is<br />

to provide the user with simple tools that, for a given consensus clustering problem, allow<br />

to decide a priori whether hierarchical consensus architectures are more computationally<br />

efficient than traditional flat consensus and, if so, implement the HCA variant of minimal<br />

complexity.<br />

Moreover, it is important to highlight the fact that, in cases where the flat execution<br />

of the consensus function F becomes impossible due to memory limitations caused by the<br />

large size of the cluster ensemble, a carefully designed HCA will allow obtaining a consensus<br />

clustering solution.<br />

Let us now elaborate briefly on several notational definitions regarding hierarchical<br />

consensus architectures that will be of help when describing our proposals in detail. We<br />

suggest the reader resort to the generic HCA topology depicted in figure 3.1(b) for a better<br />

understanding of the concepts we are about to expose.<br />

Firstly, a hierarchical consensus architecture is structured in s successive stages. The<br />

number of intermediate consensus solutions obtained at the output of the ith stage is denoted<br />

as Ki —notice that Ks = 1 (i.e. the last stage yields the single final consensus<br />

clustering solution λc). The notation used for designating the jth halfway consensus clustering<br />

created at the ith HCA stage is λ i cj ,wherei∈ [1,s− 1] and j ∈ [1,Ki].<br />

Another important factor in the definition of HCAs is the size of the mini-ensembles,<br />

which may vary from stage to stage or even within the same stage. For this reason, we<br />

denote as bij the size of the mini-ensemble upon which the jth consensus process of the ith<br />

HCA stage is conducted. Notice that bs1 = Ks−1 (i.e. the last consensus stage combines<br />

all the intermediate clusterings output by the previous stage into the single final consensus<br />

clustering solution λc), while, in the HCA presented in figure 3.1(b), b1j =2∀j ∈ [1,K1].<br />

Moreover, notice that hierarchical architectures naturally allow the use of distinct consensus<br />

functions across the distinct stages (or even within the same stage). However, in<br />

this work we assume that a single consensus function F is applied for conducting all the<br />

consensus processes involved.<br />

48

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!