29.04.2013 Views

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

TESI DOCTORAL - La Salle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A.1. The CLUTO clustering package<br />

following paragraphs:<br />

1. direct: this method computes the desired k-way clustering solution by simultaneously<br />

finding all k clusters.<br />

2. rb: repeated-bisecting clustering process, in which the desired k-way clustering solution<br />

is computed by performing a sequence of k −1 repeated bisections of the data set.<br />

At each bisecting step, one of the obtained clusters is selected and bisected further, so<br />

that each partial 2-way clustering solution optimizes the selected clustering criterion<br />

function locally.<br />

3. rbr: a refined repeated-bisecting method that performs a global optimization of the<br />

clustering solution obtained by the rb algorithm.<br />

4. agglo: agglomerative hierarchical clustering that locally optimizes the selected criterion<br />

function, stopping the agglomeration process when k clusters are obtained.<br />

5. bagglo: biased agglomerative clustering, which applies the agglo clustering method<br />

on an augmented representation of the objects created by concatenating the d original<br />

attributes of each object and √ n new features which are proportional to the<br />

similarity between that object and its cluster centroid according to a √ n-way partitional<br />

clustering solution that is initially computed on the data set by means of the<br />

rb algorithm.<br />

6. graph: graph-based clustering, in which the data set is modelled as a nearest-neighbor<br />

graph (each object is a vertex connected to the vertices representing its most similar<br />

objects) that is partitioned into k clusters according to one of the graph criterion<br />

functions.<br />

An enumeration of the eleven criterion functions implemented in the CLUTO software<br />

package follows (Zhao and Karypis, 2001):<br />

a. i1 : internal criterion function that maximizes the sum of the average pairwise similarities<br />

between the objects assigned to each cluster, weighted according to its size. Its<br />

maximization is equivalent to minimize sum-of-squared-distances between the objects<br />

in the same cluster, as in traditional k-means (Zhao and Karypis, 2001).<br />

b. i2 : internal criterion function that maximizes the similarity between each object and<br />

the centroid of the cluster it is assigned to.<br />

c. e1 : external criterion function that minimizes the proximity between each cluster’s<br />

centroid and the common centroid of the rest of the data set.<br />

d. h1 : hybrid criterion function that simultaneosly maximizes i1 and minimizes e1.<br />

e. g1 : MinMaxCut criterion function applied on the graph obtained by computing pairwise<br />

object similarities, partitioning the objects into groups by minimizing the edgecut<br />

of each partition (for graph-based clustering only).<br />

f. g1p: normalized Cut criterion function applied on the graph obtained by viewing the<br />

objects and their features as a bipartite graph, simultaneously partitioning the objects<br />

and their features (for graph-based clustering only).<br />

218

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!