29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Appendix A. Experimental setup<br />

g. slink: traditional single-link criterion function (for agglomerative hierarchical clustering<br />

only).<br />

h. wslink: cluster-weighted single-link criterion function (for agglomerative hierarchical<br />

clustering only).<br />

i. clink: traditional complete-link criterion function (for agglomerative hierarchical clustering<br />

only).<br />

j. wclink: cluster-weighted complete-link criterion function (for agglomerative hierarchical<br />

clustering only).<br />

k. upgma: traditional unweighted pair-group method with arithmetic means criterion<br />

function (for agglomerative hierarchical clustering only).<br />

Finally, as regards the similarity measures that can be employed by the clustering algorithms<br />

implemented in CLUTO, they are described next (Zhao and Karypis, 2001):<br />

i. cos: the similarity between objects is computed using the cosine function.<br />

ii. corr : the similarity between objects is computed using the correlation coefficient.<br />

iii. dist: the similarity between objects is computed to be inversely proportional to the<br />

Euclidean distance (for graph-based clustering only).<br />

iv. jacc: the similarity between objects is computed using the extended Jaccard coefficient<br />

(for graph-based clustering only).<br />

For further insight on the distinct implementations of the clustering strategies, formal<br />

definitions of the criterion functions and similarity measures, or the criterion functions<br />

optimization procedure, the interested reader is referred to (Zhao and Karypis, 2001; Zhao<br />

and Karypis, 2003b).<br />

As the reader may infer from the previous enumerations, not all the clustering strategycriterion<br />

function-similarity measure combinations are possible in CLUTO. Table A.1 presents<br />

which triplets are allowed (denoted by the symbol), which are not allowed (denoted<br />

by ×), and which have been employed in our experiments (denoted by •)—28 out of the 68<br />

combinations allowed by CLUTO.<br />

In the experiments, each specific algorithm is identified by the clustering strategysimilarity<br />

measure-criterion function triplet employed, e.g. agglo-cos-slink (agglomerative<br />

clustering using the single link criterion and measuring object proximity with the cosine<br />

similarity), graph-jacc-i2 (graph-based clustering using the internal criterion function #2<br />

and the Jaccard distance), etc.<br />

A.2 Data sets<br />

In this work we have applied clustering processes on a total of sixteen data sets, 12 unimodal<br />

and four multimodal. In this section, we present their main features, such as their origin,<br />

the number of objects they contain (denoted throughout this work by n), the number (d)<br />

and meaning of their attributes, and the expected number of categories (k).<br />

219

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!