29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

fuzzy consensus clustering solutions (Dimitriadou, Weingessel, and Hornik, 2002; Punera<br />

and Ghosh, 2007; Sevillano, Alías, and Socoró, 2007b).<br />

Regardless of whether the cluster ensemble is hard or soft, combining the results of<br />

several clustering processes has multiple applications, a good description of which can be<br />

found in (Strehl and Ghosh, 2002). In a nutshell, consensus clustering is useful for:<br />

– knowledge reuse: in some scenarios, one may want to create a partition of a set of<br />

objects, but the access to the original data may be restricted due to copyright or privacy<br />

reasons (customer databases are the most prototypical examples of this type of<br />

situation). However, if a set of legacy partitions of the data exist (e.g. segmentations<br />

of a customer database based on distinct criteria –such as residence, purchasing patterns,<br />

age, etc.), consensus clustering provides a means for reconciling the knowledge<br />

contained in those legacy clusterings.<br />

– distributed clustering: due to security or operational reasons, there exist situations in<br />

which the data to be clustered is scattered across different locations. In this context,<br />

as an alternative to gathering and processing all the data at one site –which can be<br />

unfeasible, for instance, due to storage costs–, the data available at each location<br />

would be subject to a clustering process, and the label vectors obtained would be<br />

combined by means of consensus clustering, yielding a consolidated classification of<br />

the data.<br />

– robust clustering: in this case, the goal is to obtain a consensus clustering solution<br />

that improves the quality of the component clusterings, based on the fact that if the<br />

distinct clustering processes disagree, combining their outcomes may offer additional<br />

information and discriminatory power, thus obtaining a combined better clustering<br />

closer to a hypothetical true classification (Pinto et al., 2007).<br />

It is in this latter application that consensus clustering can be more clearly regarded as<br />

the unsupervised counterpart of classifier committees, as the objective of both strategies is to<br />

combine the outcomes of several classification processes aiming to improve the quality of the<br />

component classifiers (Dietterich, 2000). However, the purely symbolic nature of the labels<br />

returned by unsupervised classifiers makes consensus clustering a more challenging task.<br />

Possibly due to this fact, consensus clustering has historically been far less popular than<br />

classifier committees, and it has only began to draw considerable attention of researchers<br />

during the last decade.<br />

In the quest for obtaining good quality consensus clustering solutions, the design of<br />

both the cluster ensemble and the consensus function are of critical importance. Although<br />

having a cluster ensemble is always necessary in order to conduct consensus clustering, some<br />

works focus mainly on the design of the consensus function, relegating the construction of<br />

the cluster ensemble, and vice versa. Given the importance of both elements, we split the<br />

revision of the related work in this field into two separate parts, devoting section 2.1 to the<br />

previous work regarding the construction of cluster ensembles and section 2.2 to overview<br />

the existing approaches to the design of consensus functions.<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!