10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 Unsupervised Learning 553.3.3 Density based ClusteringK-means <strong>and</strong> hierarchical clustering algorithms generally perform well on grouping large data,yet they are awkward on discovering clusters of arbitrary shapes <strong>and</strong> moreover, their performanceis sensitive to the outliers in the data. To address these issues, density based clusteringstrategies have b<strong>ee</strong>n proposed, in which DBSCAN is a representative one.The DBSCAN algorithm was first introduced by Ester et al. [89]. The authors grouped theobjects by recognizing the density of them. Given a pre-defined density threshold Eps, thoseareas with a higher density compared with Eps are considered as qualified clusters <strong>and</strong> theothers are treated as outliers or noise. With the help of the inherent characteristic of clusterdensity, DBSCAN can discover any kind of arbitrary shape of clusters. Figure 3.17 illustratessome example datasets in which the arbitrary clusters are identified by the DBSCAN algorithm[89].Fig. 3.17. Some clusters identified by DBSCAN in thr<strong>ee</strong> datasets [89]There are two main parameters in DBSCAN: Eps, the radius that bound the neighborhoodregion of an object; <strong>and</strong> MinPts, the minimum number of objects that must exist in theobject’s neighborhood region. The key idea of the DBSCAN algorithm is that, given the twoparameters, the neighborhood (determined by Eps) of each object in a cluster has to containthe objects whose total number is greater than MinPts. In other words, the density of the clustershould exc<strong>ee</strong>d some threshold. The pseudo code of DBSCAN is presented in Algorithm3.9 [89, 2].Algorithm 3.9: The DBSCAN algorithmInput: A set of objects D, the radius Eps, the threshold number MinPts of neighbor objectsOutput: lustersC=/0;For each unvisited object o in D doo.visited=true;N = GetNeighbors(o,Eps);If |N| < MinPts theno.attribute=noise;endC=next cluster;Exp<strong>and</strong>Cluster(o, N, C, Eps, MinPts);end

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!