29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 2. Cluster ensembles and consensus clustering<br />

consensus function (Goder and Filkov, 2008), which obtains the consensus partition by<br />

conducting an iterative pivoting on the object dissimilarity matrix.<br />

2.2.4 Consensus functions based on categorical clustering<br />

A different approach to consensus clustering is the one related to categorical clustering,<br />

which basically consists of transforming the contents of the cluster ensemble into quantitative<br />

features that represent the objects, for subsequently clustering them according to this<br />

novel representation —thus obtaining the consensus partition. The QMI (Quadratic Mutual<br />

Information) consensus function of (Topchy, Jain, and Punch, 2003) posed the problem of<br />

combining the partitions contained in a hard cluster ensemble in an information theoretic<br />

framework, and consists of applying the k-means clustering algorithms on this new feature<br />

space, which forces the user to set the desired number of clusters k in advance.<br />

In (Punera and Ghosh, 2007), a novel fuzzy consensus function based on the Information<br />

Theoretic K-means (ITK) algorithm was presented. Its rationale follows a similar approach<br />

to that of (Topchy, Jain, and Punch, 2003). In this case, though, consensus clustering<br />

is conducted on soft cluster ensembles (i.e. the compilation of the outcomes of multiple<br />

fuzzy clustering processes), so each object in the data set is represented by means of the<br />

concatenated posterior cluster membership probability distributions corresponding to each<br />

one of the l fuzzy partitions in the cluster ensemble. Thus, using the Kullback-Leibler<br />

divergence (KLD) between those probability distributions as a measure of the distance<br />

between objects, the k-means algorithm is applied so as to obtain the consensus clustering<br />

solution. Note that the ITK consensus function is capable of combining fuzzy partitions<br />

with variable number of clusters, while producing a crisp consensus clustering solution.<br />

Moreover, this consensus function allows assigning distinct weights to each clustering in<br />

the cluster ensemble, which can be useful for the user to express his/her confidence on the<br />

quality of some individual clusterings.<br />

2.2.5 Consensus functions based on probabilistic approaches<br />

Consensus clustering has also been approached from a probabilistic perspective. One of<br />

the pioneering works in this direction was the Expectation-Maximization (EM) consensus<br />

function proposed in (Topchy, Jain, and Punch, 2004), where a probabilistic model of the<br />

consensus clustering solution is defined in the space of the contributing clusters. Such<br />

model is based on a finite mixture of multinomial distributions, each component of which<br />

corresponds to a cluster of the combined clustering, which is obtained as the solution to<br />

the maximum likelihood problem solved by means of the EM algorithm. Contrasting with<br />

other consensus functions, the authors highlight the low computational complexity of the<br />

proposed method and its ability to combine partitions with different numbers of clusters.<br />

Another probabilistic approach to the consensus clustering problem was presented in<br />

(Long, Zhang, and Yu, 2005). The central matter in that work was finding a solution to<br />

the cluster correspondence problem (which, as mentioned earlier, is due to the symbolic<br />

identification of clusters caused by the unsupervised nature of the clustering problem). In<br />

particular, the goal was to derive a correspondence matrix that desambiguates the clusters<br />

of each individual clustering in the cluster ensemble (represented as a probabilistic or binary<br />

membership matrix depending on whether the cluster ensemble is soft or hard) with regard<br />

39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!