29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.2. Related work on consensus functions<br />

to a hypothetical consensus clustering solution membership matrix. The goal is to find<br />

a correspondence matrix that yields the best projection of each individual clustering on<br />

the space defined by the consensus clustering solution. From a practical viewpoint, both<br />

the correspondence and consensus clustering matrices are derived simultaneously, using an<br />

EM-like approach.<br />

The beautiful proposal of (<strong>La</strong>nge and Buhmann, 2005) introduced a consensus function<br />

named Probabilistic <strong>La</strong>bel Aggregation (PLA), which operates on soft cluster ensembles<br />

(although it also works on crisp ones). Its rationale is as follows: given a single fuzzy<br />

partition, a pairwise object co-association matrix is created by simply multiplicating the<br />

membership probabilities matrix by its own transpose. Repeating this process on all the<br />

partitions in the soft cluster ensemble and aggregating (and subsequently normalizing) the<br />

resulting matrices gives rise to a joint probability matrix of finding two objects in the<br />

same cluster. Neatly, the authors propose subjecting this joint probability matrix to a<br />

non-negative matrix factorization process that yields estimates for class-likelihoods and<br />

class-posteriors, upon which the consensus clustering solution is based. This factorization<br />

process is posed as an optimization problem which is solved by applying the EM algorithm.<br />

Besides the elegance of the proposed solution, this work also stands out by the fact that<br />

it supports an out-of-sample extension that makes it possible to assign previously unseen<br />

objects to classes of the consensus clustering solution. Moreover, the proposed method also<br />

allows combining weighted partitions, i.e. it gives the user the chance to assign different<br />

degrees of relevance to the cluster ensemble partitions.<br />

A closely related proposal is the application of Non-Negative Matrix Factorization<br />

(NMF) for solving the consensus clustering problem presented (Li, Ding, and Jordan, 2007).<br />

In contrast to (<strong>La</strong>nge and Buhmann, 2005), the aim is to combine crisp partitions, which<br />

imposes a series of constraints on the optimization problem that is solved via symmetric<br />

NMF—which, from an algorithmic viewpoint, is implemented by means of multiplicative<br />

rules. Moreover, the same approach is employed for conducting semi-supervised clustering,<br />

a problem that lies beyond the scope of this work.<br />

2.2.6 Consensus functions based on reinforcement learning<br />

Reinforcement learning has also been applied for the construction of consensus clustering<br />

solutions (Agogino and Tumer, 2006). In that work, the average φ (NMI) of the consensus<br />

clustering solution with respect to the cluster ensemble is regarded as the reward that must<br />

be maximized by the actions of the agents. In this case, each agent casts a vote indicating<br />

which cluster each object should be assigned to (i.e. it operates on hard cluster ensembles).<br />

The application of a majority voting scheme on these votes yields the consensus clustering<br />

solution, which is iteratively refined as the agents learn how to vote so as to maximize the<br />

average φ (NMI) . The authors highlight the ease of their approach for combining clusterings<br />

in distributed scenarios, which makes it specially suitable in failure-prone domains.<br />

2.2.7 Consensus functions based on interpeting object similarity as data<br />

The work by Kuncheva, Hadjitodorov, and Todorova (2006) introduced three consensus<br />

functions based on interpreting object similarity as data. That is, each object is represented<br />

by n features, where the jth feature of the ith object corresponds to the co-association<br />

40

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!