29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Appendix A. Experimental setup<br />

Data set name |dfA| =1 |dfA| =10 |dfA| =19 |dfA| =28<br />

CAL500 102 1020 1938 2856<br />

IsoLetters 111 1110 2109 3108<br />

InternetAds 183 1830 3477 5124<br />

Corel 123 1230 2337 3444<br />

Table A.5: Cluster ensemble sizes l corresponding to distinct algorithmic diversity configurations<br />

for the multimodal data sets.<br />

A.5 Consensus functions<br />

In this section, we briefly describe the consensus functions employed in the experimental<br />

section of this thesis, placing special emphasis on specific implementation details when<br />

necessary. Moreover, we present the time complexity of each consensus function for a given<br />

cluster ensemble size (l), number of objects in the data set (n) and clusters (k).<br />

The first seven consensus functions are employed on experiments considering both hard<br />

and soft cluster ensembles (i.e. chapters from 3 to 6). On its part, the last one (VMA) is<br />

only applied on soft cluster ensembles, that is, in chapter 6.<br />

The Matlab source code of the first three consensus functions is available for download<br />

at http://www.strehl.com, whereas the remaining ones were implemented ad hoc for this<br />

work. For a more theoretical description of these consensus functions, see section 2.2.<br />

– CSPA (Cluster-based Similarity Partitioning Algorithm): this consensus function<br />

shares a lot of the rationale of the Evidence Accumulation consensus function (see<br />

below), as it is based on deriving a pairwise object similarity measure from the cluster<br />

ensemble and applying a similarity-based clustering algorithm on it—the METIS<br />

graph partitioning algorithm (Karypis and Kumar, 1998) in this case. Its computational<br />

complexity is O n 2 kl (Strehl and Ghosh, 2002).<br />

– HGPA (HyperGraph Partitioning Algorithm): this clustering combiner exploits a hypergraph<br />

representation of the cluster ensemble, re-partitioning the data by finding<br />

a hyperedge separator that cuts a minimal number of hyperedges, yielding k unconnected<br />

components of approximately the same size—which makes HGPA not an<br />

appropriate consensus function when clusters are highly imbalanced. The hypergraph<br />

partition is conducted by means of the HMETIS package (Karypis et al., 1997). Its<br />

time complexity is O (mkl) (Strehl and Ghosh, 2002).<br />

– MCLA (Meta-CLustering Algorithm): as in HGPA, each cluster corresponds to a hyperedge<br />

of the hypergraph representing the cluster ensemble. Subsequently, related<br />

hyperedges are detected by grouping them using the METIS graph-based clustering algorithm<br />

(Karypis and Kumar, 1998). Next, related hyperedges are collapsed and each<br />

object is assigned to the collapsed hyperedge in which it participates most strongly.<br />

Its computational complexity is O mk 2 l 2 (Strehl and Ghosh, 2002).<br />

– EAC (Evidence Accumulation): this is a pretty direct implementation of the consensus<br />

function presented in (Fred and Jain, 2002a). It consists in the computation of the<br />

pairwise object co-association matrix and the subsequent application of the single-link<br />

229

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!