29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.2. Related work on consensus functions<br />

mation (φ (NMI) ) with respect to all the partitions in the cluster ensemble. The authors<br />

prove that, by maximizing the number of shared objects in the consolidated clusters, the<br />

EAC consensus function maximizes the aforementioned information theoretical objective<br />

function, although reaching its global optimum is not ensured in all situations. Moreover,<br />

cutting the dendrograms resulting from the application of the single-link clustering on the<br />

co-association matrix at the highest lifetime level leads to a minimization of the variance<br />

of the average φ (NMI) , which guarantees the robustness of the clustering solution to small<br />

variations in the composition of the cluster ensemble —furthermore, this also avoids making<br />

assumptions on the number of clusters, a significant advantage with respect to other<br />

consensus functions. A compendium on the evidence accumulation consensus clustering<br />

approach is presented in (Fred and Jain, 2005), extending the previous consensus functions<br />

through the application of other hierarchical clustering algorithms on the pairwise object<br />

co-association matrix.<br />

The clustering of high dimensional data is the main motivation of the work presented<br />

in (Fern and Brodley, 2003). In this scenario, Random Projection (RP) is an efficient<br />

dimensionality reduction technique, although it often gives rise to highly unstable clustering<br />

results. In order to reduce this variability, the authors propose creating cluster ensembles<br />

by compiling partitions resulting from distinct RP runs, combining them using a consensus<br />

function very similar to EAC, as it applies an agglomerative clustering algorithm on an<br />

object similarity matrix.<br />

One of the two consensus functions presented by Dudoit and Fridlyand (2003), named<br />

BagClust2, resembles evidence accumulation, as it builds a pairwise object dissimilarity<br />

matrix which is subject to a partitioning process for obtaining the consensus clustering.<br />

However, BagClust2 and EAC differ in that the former requires that the desired number<br />

of clusters is passed as a parameter to the consensus function (the same happens with<br />

BagClust1).<br />

In (Greene et al., 2004), consensus clustering was conducted by means of variants of the<br />

EAC consensus functions using distinct hierarchical clustering algorithms (i.e. single-link,<br />

complete-link and average-link) for partitioning the pairwise object co-association matrix,<br />

as proposed in (Fred and Jain, 2005). However, the central matter of study in that work<br />

is the analysis of the diversity of the cluster ensemble as a factor determining the quality<br />

of the consensus clustering. In this sense, the authors focused on random techniques for<br />

introducing diversity in the cluster ensemble, such as random subspacing, random algorithm<br />

initialization, random number of clusters or random feature projection.<br />

A related work is the Majority Rule consensus function of (Goder and Filkov, 2008),<br />

which is also based on clustering the pairwise object co-dissociation matrix, which can be<br />

done by simply setting a dissimilarity threshold like in the the first version of EAC (Fred,<br />

2001), or by applying the average-link hierarchical clustering algorithm —like in the latest<br />

versions of EAC (Fred and Jain, 2005).<br />

Moreover, there exist several consensus functions that make indirect use of pairwise<br />

object co-association (or co-dissociation) matrices, despite the way the consensus clustering<br />

is obtained differs from that of EAC. Examples of this include some graph partition-based<br />

consensus functions, such as CSPA (Strehl and Ghosh, 2002) and BALLS (Gionis, Mannila,<br />

and Tsaparas, 2007), the Iterative Pairwise Consensus (IPC) (Nguyen and Caruana, 2007)<br />

(a consensus function based on cluster centroids in which objects are iteratively reassigned<br />

to the clusters of the consensus partition according to their similarity), or the CC Pivot<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!