29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.1. Related work on cluster ensembles<br />

– weak clustering: another approach to the generation of homogeneous cluster ensembles<br />

is the repeated application of computationally cheap and conceptually simple<br />

clustering procedures that, although yielding poor clustering solutions by themselves<br />

(this is why they are said to be weak),mayleadtobetterdataclusteringifcombined.<br />

This type of strategies are of special interest when clustering high dimensional and/or<br />

large data collections, as deriving multiple partitions by traditional means may become<br />

too costly (Fern and Brodley, 2003). Examples of this include using random<br />

hyperplanes for splitting the data (Topchy, Jain, and Punch, 2003) or prematurely<br />

halted executions of k-means (Hadjitodorov and Kuncheva, 2007).<br />

– noise injection: the random perturbation of the representation of the objects (Hadjitodorov,<br />

Kuncheva, and Todorova, 2006) or the labels contained in the individual<br />

clustering solutions (Hadjitodorov and Kuncheva, 2007) through noise injection has<br />

also been applied in a few research works, although these strategies constitute a far<br />

less natural way of creating diverse cluster ensembles if compared to the previous ones.<br />

The second approach for creating cluster ensembles consists of applying several distinct<br />

clustering algorithms for generating the individual components of the ensemble, which gives<br />

rise to what are known as heterogeneous cluster ensembles. If clustering algorithms with<br />

substantially different biases are employed, cluster ensembles with a high degree of diversity<br />

can be obtained. This strategy has been applied in several works, such as (Strehl and<br />

Ghosh, 2002; <strong>La</strong>nge and Buhmann, 2005; Gonzàlez and Turmo, 2006; Gionis, Mannila, and<br />

Tsaparas, 2007; Gonzàlez and Turmo, 2008a; Gonzàlez and Turmo, 2008b).<br />

Notice that the strategies used for creating homogeneous and heterogeneous cluster<br />

ensembles can be combined so as to create even more diverse ensembles, as in (Sevillano et<br />

al., 2007c), where a bunch of clustering algorithms are applied on different representations –<br />

obtained by means of multiple feature extraction techniques with distinct dimensionalities–<br />

of the objects in the data set. In this work, we will follow this approach as regards the<br />

generation of cluster ensembles, using the clustering algorithms and object representations<br />

described in appendices A.1 and A.3, as our goal is to overcome the indeterminacies resulting<br />

from the selection of a particular clustering configuration.<br />

There exist several recent works in the literature dealing with the design of the cluster<br />

ensemble. In general terms, they can be divided into two categories: i) those works focused<br />

on analyzing which strategies should be followed for creating cluster ensemble components<br />

that give rise to good quality consensus clustering solutions, and ii) those centered on<br />

obtaining a good quality consensus clustering given a particular cluster ensemble.<br />

Among the first group, we highlight the works by Kuncheva and Hadjitorov. In (Hadjitodorov,<br />

Kuncheva, and Todorova, 2006), the authors analyze the diversity of the individual<br />

partitions composing a hard cluster ensemble and its effect on the quality of the<br />

consensus clustering. To do so, several measures for evaluating the diversity of a cluster ensemble<br />

are proposed and evaluated. Moreover, such measures are employed in the derivation<br />

of a procedure for selecting the candidate with the median diversity among a population<br />

of cluster ensembles, a criterion that leads to the obtention of equal or better consensus<br />

clustering solutions than those obtained on arbitrarily chosen cluster ensembles.<br />

The notion that moderately diverse cluster ensembles lead to good quality consensus<br />

is reinforced by the experimental results presented in (Hadjitodorov and Kuncheva, 2007),<br />

where the authors apply a standard genetic algorithm for driving the selection of the cluster<br />

32

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!