29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Positional voting<br />

Chapter 6. Voting based consensus functions for soft cluster ensembles<br />

Positional (aka ranking) voting methods rank the candidates according to the confidence<br />

scores emitted by the voters. Thus, fine-grain information regarding preference differences<br />

between candidates is ignored, although problems in scaling the voters confidence scores<br />

are avoided —that is, positional voting is useful in situations where confidence values are<br />

hard to scale correctly (van Erp, Vuurpijl, and Schomaker, 2002).<br />

As an aid for describing the positional voting methods that constitute the core of our<br />

consensus functions, equation (6.38) defines Λi (the ith component of the soft cluster ensemble<br />

E) in terms of its columns, represented by vectors λij (∀j =1,...,n).<br />

⎛ ⎞<br />

Λ1<br />

⎜<br />

⎜Λ2<br />

⎟<br />

E = ⎜ ⎟<br />

⎝ . ⎠ where Λi = <br />

λi1 λi2 ... λin<br />

Λl<br />

(6.38)<br />

In this work, we propose employing two positional voting strategies (namely the Borda<br />

and the Condorcet voting methods) for deriving the consensus clustering solution, which<br />

gives rise to the following consensus functions:<br />

– BordaConsensus (BC): the Borda voting method (Borda, 1781) computes the mean<br />

rank of each candidate over all voters, reranking them according to their mean rank.<br />

This process results in a grading of all the n objects with respect to each of the k<br />

clusters, which is embodied in a k × n Borda voting matrix BE. Such grading process<br />

is conducted as follows: firstly, for each object (election), clusters (candidates) are<br />

ranked according to their degree of association with respect to it (from the most to<br />

the least strongly associated). Then, the top ranked candidate receives k points, the<br />

second ranked cluster receives k − 1 points, and so on. After iterating this procedure<br />

across the l cluster ensemble components, the grading matrix BE is obtained. The<br />

whole process is described in algorithm 6.3. Notice that the Rank procedure orders<br />

the clusters from the most to the least strongly associated to each object, yielding a<br />

ranking vector r which is a list of the k clusters ordered according to their degree of<br />

association with respect to the object under consideration (i.e. its first component<br />

(r(1)) identifies the most strongly associated cluster, and so on. Thus, the Rank<br />

procedure must take into account whether the scalar values contained in λab are<br />

directly or inversely proportional to the strength of association between objects and<br />

clusters.<br />

When applied on our toy example, the resulting Borda grading matrix is the one<br />

presented in equation (6.39).<br />

⎛<br />

6 6 5 4 4 4 4 4<br />

⎞<br />

4<br />

BE = ⎝3<br />

3 2 2 2 2 4 6 6⎠<br />

(6.39)<br />

3 3 5 6 6 6 4 2 2<br />

According to Borda voting, the higher the value of the (i,j)th entry of BE, themore<br />

likely the jth object belongs to the ith cluster. Thus, again, dividing each column of<br />

matrix BE by its L1-norm transforms it into a cluster membership probability matrix,<br />

181

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!