29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.1 Related work on cluster ensembles<br />

Chapter 2. Cluster ensembles and consensus clustering<br />

Our aim in this section is to review the strategies applied in the literature as regards the<br />

construction of cluster ensembles, given its influence on the consensus clustering process results.<br />

Two alternative approaches have been traditionally followed in this context, differing<br />

in the number of distinct clustering algorithms used for generating the individual partitions<br />

in the ensemble.<br />

The first cluster ensemble creation strategy consists of compiling the outcomes of multiple<br />

runs of a single clustering algorithm, which gives rise to what is known as a homogeneous<br />

cluster ensemble (Hadjitodorov, Kuncheva, and Todorova, 2006). In this case, the diversity<br />

of the ensemble components can be induced by several means, often in a combined manner:<br />

– application of a stochastic clustering algorithm: this strategy relies on the fact that<br />

the outcome of a stochastic clustering algorithm depends on how its parameters are<br />

adjusted. For instance, diverse clustering solutions can be obtained by the random<br />

initialization of the starting centroids of k-means (Fred, 2001; Fred and Jain, 2002a;<br />

Fred and Jain, 2003; Dimitriadou, Weingessel, and Hornik, 2001; Greene et al., 2004;<br />

Long, Zhang, and Yu, 2005; Hore, Hall, and Goldgof, 2006; Kuncheva, Hadjitodorov,<br />

and Todorova, 2006; Li, Ding, and Jordan, 2007; Nguyen and Caruana, 2007; Ayad<br />

and Kamel, 2008; Fern and Lin, 2008) or fuzzy c-means (Dimitriadou, Weingessel,<br />

and Hornik, 2002), or the initial settings of EM clustering (Punera and Ghosh, 2007;<br />

Gonzàlez and Turmo, 2008a; Gonzàlez and Turmo, 2008b).<br />

– random number of clusters: in this case, at each run of the clustering algorithm,<br />

the number of clusters to be found is set randomly (Fred and Jain, 2002b; Fred and<br />

Jain, 2005; Topchy, Jain, and Punch, 2004; Kuncheva, Hadjitodorov, and Todorova,<br />

2006; Hadjitodorov and Kuncheva, 2007; Gonzàlez and Turmo, 2008a; Gonzàlez and<br />

Turmo, 2008b; Ayad and Kamel, 2008). In general terms, this number of clusters<br />

is usually set to be much larger than the expected number of categories in the data<br />

set (Dimitriadou, Weingessel, and Hornik, 2001; Fred and Jain, 2002a), being often<br />

selected at random from a predefined interval (Long, Zhang, and Yu, 2005; Hore, Hall,<br />

and Goldgof, 2006).<br />

– distinct object representations: another source of diversity lies in the way objects<br />

are represented. Indeed, as we showed in section 1.4, running the same clustering<br />

algorithm on distinct representations of the same data set often leads to pretty diverse<br />

clustering solutions. Allowing for this fact, cluster ensembles have been created by<br />

running a single clustering algorithm on different data representations generated by<br />

random feature selection (Agogino and Tumer, 2006; Hadjitodorov and Kuncheva,<br />

2007; Fern and Lin, 2008), random feature extraction (Greene et al., 2004; Long,<br />

Zhang, and Yu, 2005; Hore, Hall, and Goldgof, 2006; Hadjitodorov and Kuncheva,<br />

2007; Fern and Lin, 2008) or deterministic feature extraction (Sevillano et al., 2006a;<br />

Sevillano et al., 2006b; Sevillano et al., 2007a; Sevillano, Alías, and Socoró, 2007b).<br />

– data subsampling: the creation of multiple clustering solutions upon distinct random<br />

subsamples of the data set has been applied as a means for generating diverse cluster<br />

ensembles (Fischer and Buhmann, 2003; Dudoit and Fridlyand, 2003; Minaei-Bidgoli,<br />

Topchy, and Punch, 2004; Kuncheva, Hadjitodorov, and Todorova, 2006; Punera and<br />

Ghosh, 2007).<br />

31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!