29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6. Voting based consensus functions for soft cluster ensembles<br />

their version for soft cluster ensembles becomes pretty straightforward, as it simply requires<br />

substituting the object co-association matrix derived from a hard cluster ensemble OEλ by<br />

its fuzzy counterpart, OEΛ . Notice, that, despite taking object co-association matrices<br />

derived upon a soft cluster ensemble as their input, all these consensus functions output a<br />

hard consensus clustering solution λc.<br />

As mentioned at the introduction of this section, we have established an analogy between<br />

hard and soft consensus functions based on the similarities between object co-association<br />

matrices, assuming that the results of fuzzy clustering are expressed in terms of membership<br />

probabilities. However, we consider that the present study is quite generic, as it could be<br />

extended to the case fuzzy clustering results were expressed in any other form by transforming<br />

them into membership probabilities prior to computing the corresponding object<br />

co-association matrices.<br />

6.3 Voting based consensus functions<br />

In this section, we put forward a set of proposals in the shape of a family of novel consensus<br />

functions especially devised for their application on soft cluster ensembles. These consensus<br />

functions are inspired in voting strategies, which have also been a source of inspiration for<br />

the development of systems for combining supervised classifiers (van Erp, Vuurpijl, and<br />

Schomaker, 2002), search engines (Aslam and Montague, 2001), or word sense disambiguation<br />

systems (Buscaldi and Rosso, 2007). A distinguishing factor of the consensus functions<br />

we propose in this section is that they yield fuzzy consensus clustering solutions, whereas<br />

other soft consensus functions found in the literature ouput crisp consensus clusterings,<br />

despite they are applied on soft cluster ensembles (Punera and Ghosh, 2007).<br />

In fact, voting is a formal way of combining the opinions of several voters into a single<br />

decision (e.g. the election of a president). Therefore, it seems quite logical that voting<br />

strategies can be readily applied for combining the outcomes of multiple decision systems,<br />

using the voting strategy that best fits the way these decisions are expressed.<br />

Given that a clusterer is an unsupervised classifier, the most natural parallelism we can<br />

establish is related to voting based supervised classifier combination (aka classifier ensembles).<br />

In this case, each classifier is a voter, the possible categories objects can be filed under<br />

are the candidates, and an election is the classification of an object (van Erp, Vuurpijl, and<br />

Schomaker, 2002). Quite obviously, the voting strategy employed for combining the votes<br />

–and thus obtain the winner of the election (i.e. the resulting classification of the object by<br />

the ensemble of classifiers)– depends on how votes are expressed, be it an assignment to a<br />

single class (i.e. single-label classification (Sebastiani, 2002)), or an either ranked or scored<br />

list of classes. The former case calls for the application of unweighed voting methods such<br />

as plurality or majority voting, whereas the latter make it possible to apply more sophisticated<br />

voting strategies such as confidence and ranking voting methods (van Erp, Vuurpijl,<br />

and Schomaker, 2002).<br />

Nevertheless, prior to the application of any voting strategy on soft cluster ensembles,<br />

there is a crucial problem to be solved. This has to do with the inherent unsupervised<br />

nature of clustering processes, which makes clusters be ambiguously identified. Therefore,<br />

it is necessary to perform a cluster alignment (or disambiguation) process before voting.<br />

Notice that this is not an issue of concern when applying voting strategies on supervised<br />

171

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!