29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3. Hierarchical consensus architectures<br />

3.3 Deterministic hierarchical consensus architectures<br />

This section is devoted to the description of deterministic hierarchical consensus architectures<br />

(or DHCA). As in the previous section, we present a generic definition of this<br />

architectural variant along with a study of its computational complexity.<br />

3.3.1 Rationale and definition<br />

As opposed to random HCA, this proposal drives the creation of the mini-ensembles by<br />

a deterministic criterion. The main idea behind DHCA is to exploit the distinct ways of<br />

introducing diversity in the cluster ensemble as the guiding principle for creating the miniensembles<br />

upon which the intermediate consensus clustering solutions are built. That is,<br />

a key differential factor between DHCA and RHCA is that the former type of architecture<br />

is indirectly designed by the user when creating the cluster ensemble, whereas the latter<br />

requires the user to fix an architectural defining factor (i.e. assign a value to the size of the<br />

mini-ensembles b).<br />

Enlarging on the relationship between the creation of the cluster ensemble and the<br />

configuration of the DHCA, it is important to recall the strategies employed for introducing<br />

diversity in cluster ensembles (see section 2.1).<br />

For instance, heterogeneous cluster ensembles –whose components are generated by<br />

the execution of multiple clustering algorithms on the data set– have a single diversity<br />

factor, i.e. the set of distinct clustering algorithms employed. Meanwhile, when creating<br />

homogeneous cluster ensembles (those compiling the outcomes of multiple runs of a single<br />

clustering algorithm), a wider spectrum of diversity factors can be applied, such as the<br />

random starting configuration of a stochastic algorithm, or the use of distinct attributes for<br />

representing the objects in the data set, among others.<br />

As aforementioned, in this work we combine both the homogeneous and heterogeneous<br />

approaches for creating cluster ensembles, aiming not only to obtain highly diverse cluster<br />

ensembles, but also to design a strategy for fighting against clustering indeterminacies. This<br />

means that we employ several mutually crossed diversity factors (e.g. multiple clustering<br />

algorithms are run on several data representations with varying dimensionalities), and this<br />

will be the scenario where DHCA will be defined.<br />

In general terms, let us denote the number of diversity factors employed in the cluster<br />

ensemble creation process as f. Each diversity factor dfi ∀i ∈ [1,f] has a cardinality |dfi|<br />

—e.g. |dfi| denotes the number of clustering algorithms employed for creating the cluster<br />

ensemble in case that the ith diversity factor dfi represents the algorithmic diversity of the<br />

ensemble.<br />

Finally, notice that, if fully mutual crossing between all diversity factors is ensured (e.g.<br />

each cluster ensemble component is the result of running each clustering algorithm on each<br />

document representation of each distinct dimensionality), the cluster ensemble size l can be<br />

expressed as:<br />

f<br />

l = |dfk| (3.12)<br />

k=1<br />

Let us see how the design of the cluster ensemble determines the topology of a deterministic<br />

hierarchical consensus architecture. The guiding principle is that the consensus<br />

69

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!