29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1. Motivation<br />

– the computational complexity of combining a large number of individual partitions, by<br />

means of hierarchical consensus architectures, which consist in the layered construction<br />

of the consensus clustering solution through a hierarchical structure of low complexity<br />

intermediate consensus processes.<br />

– the negative bias induced by poor quality clusterings in the consensus clustering solution,<br />

by means of a self-refining post-processing that, using the obtained consensus<br />

clustering solution as a reference, builds a select and reduced cluster ensemble (i.e. a<br />

subset of the original cluster ensemble), deriving a new and refined consensus clustering<br />

upon it in a fully unsupervised manner.<br />

Although both strategies are complementary (not in vain they can be naturally combined<br />

giving rise to SHCA), their description and study is decoupled in the present and the<br />

next chapters, respectively. Thus, in our description and analysis of hierarchical consensus<br />

architectures (chapter 3), we ultimately aim to design computationally optimal consensus<br />

architectures and, consequently, we will solely focus on aspects regarding their time complexity.<br />

Meanwhile, the study of consensus self-refining procedures, which are presented in<br />

chapter 4, is centered on improving the quality of the consensus solutions yielded by the<br />

most computationally efficient consensus architectures devised in the present chapter.<br />

The introduction, the discussion of their rationale and the theoretical description of<br />

hierarchical consensus architectures are complemented by the presentation of multiple experiments<br />

analyzing multiple aspects of their performance on several real data collections.<br />

<strong>La</strong>st but not least, it is to note that although all the proposals put forward in this chapter<br />

are focused on a hard cluster ensemble scenario, they are also applicable for fuzzy clusterings<br />

combination.<br />

3.1 Motivation<br />

The construction of consensus clustering solutions is usually tackled as a one-step process,<br />

in the sense that the whole cluster ensemble E is input to the consensus function F at once<br />

—see figure 3.1(a). This is what we call flat consensus clustering. However, as outlined in<br />

chapter 2, the time and space complexities of consensus functions typically scale linearly or<br />

quadratically with the size of the cluster ensemble l –i.e. O (l w ), where w ∈{1, 2}–, which<br />

may lead to a highly costly or even impossible execution of the consensus clustering task if<br />

it is to be conducted on a cluster ensemble containing a large number of partitions 1 .<br />

For this reason, a natural way for avoiding this limitation besides reducing the computational<br />

complexity of the consensus solution creation process consists in applying the<br />

classic divide-and-conquer strategy (Dasgupta, Papadimitriou, and Vazirani, 2006) which<br />

basically:<br />

– breaks the original problem into subproblems which are nothing but smaller instances<br />

of the same type of problem<br />

1 Moreover, the time complexity of consensus functions also depends –linearly or quadratically, see appendix<br />

A.5– on the number of objects in the data set n and the number of clusters k of the clusterings<br />

in the ensemble. However, as we assume that these two factors are constant for a given cluster ensemble<br />

corresponding to a specific data set, the only dependence of concern is that referring to the cluster ensemble<br />

size l.<br />

46

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!