Multilevel Graph Clustering with Density-Based Quality Measures

More documents

Recommendations

Info

$Eine Einführung in LaTeX-Beamer - studiy - Brandenburgische ...$

4 Evaluationsignificance scale also becomes visible in such figures by including results gainedwith and without refinement.It can be expected that not all graphs are equally well suited for the study andcomparison of clustering algorithms. Obviously some may not contain any meaningfulcluster structure. Structural properties like the number of clusters, inhomogeneouscluster size, and the degree of separation will influence the algorithms. Mostimportantly approximation algorithms and heuristics directly or implicitly exploitstructural properties of the graphs to make their local decisions.Therefore it is advisable to not blindly use a few graphs for evaluation but alsostudy their structural properties. Unfortunately still only little is known about theinfluence of various structural properties on the modularity clustering problem. Inorder to avoid pitfalls here the effectiveness of clustering methods is studied using alarge collection of real-world graphs from different application domains.The next paragraph shortly comments on why no randomly generated graphs wereused in this study. Finally the second paragraph introduces the benchmark graphs.On Random Graphs Using graph generators it is very easy to acquire huge amountsof benchmark graphs with roughly known structural properties. In the past quitesome studies on clustering algorithms, for example [18], used simple generatedgraphs. Still random graphs should to be used with caution for following reasons:• Depending on the generator the expected and the actual number of edges inthe generated graph do not match. In that case density and optimal qualityvary even with identical parameters.• The random graph model determines the vertex degree distribution and manyother structural properties. For example placing edges with equal probabilitybetween vertices (Bernoulli random graph) produces a binomial distribution.On the other hand certainly only few graphs in real-world applications willhave such a structure. At the same time in some application domains it is stilldifficult to generate graphs with properties close to the real structures.• Some volume models depend on the vertex degree which again depends onwhere the edges are placed. This interdependency makes it very difficult toproduce random graphs with specific intra- and inter-cluster densities.• The intended clustering of generated graphs appear to be known, but anotherclustering might well have a better quality for the chosen quality measure. Onsmaller random graphs the fluctuations may dominate the clusterings.The Real-World Benchmark Graphs Many researchers and institutions publishedtypical graphs of their research areas on the Internet. In the context of modularityclustering for example Mark Newman, Alex Arenas, and Andreas Noack collectedan published some interesting graphs. Besides these small, individual collections thePajek project [8] brought together quite many graphs from different disciplines.These collections were used to compile a set of benchmark graphs. The completeset is listed in Table A.1 in the appendix. For each graph the number of vertices62
4.1 Methods and Dataand edges, the weighted mean vertex degree (mean wdeg) and the global connectiondensity in the degree volume model (wgt. density) is printed. In addition the sourceof the graph is indicated in the last column and Table A.2 provides web addressesto this data.For various reasons also subsets of the collection will be used. For example somesingle evaluation steps will compare a big number of configurations. To keep feasiblecomputation times a reduced graph set will be employed. The second column marksthe graphs of the reduced set with R. In addition the graphs from the scalabilityanalysis are marked with S and from the comparison to reference algorithms withC.Many graphs of the collection are directed and contain parallel edges. But themodularity measure is defined for simple, undirected graphs. Therefore the graphswere pre-processed with the following aim: Between each adjacent pair of verticeslies exactly one edge in each direction and their weight equals the sum of the originaledge weights between both vertices. This pre-processing is allowed as the focus ofthis work lies on the evaluation of algorithms and not in the interpretation of singleclustering results. However other publications may have used different normalizationstrategies.The pre-processing is accomplished in four steps: First parallel edges are removedand their weight is added to the first edge. Then missing inverse edges are addedwith zero weight. Self-edges are used as their own inverse edge. In the third passthe edge weights are made symmetric by taking the sum of each edge and its inverseedge, ignoring self-edges. Finally in disconnected graphs the largest connectivitycomponent is chosen. These graphs are labeled with the suffix main.4.1.3 Efficiency and ScalabilityIn order to compare the efficiency of different configurations information about thecomputational expenses are necessary. Then trade offs between quality and runtimeare identified in combination with measured differences in the mean modularity.Unfortunately it is not advisable to compare average computation times. Thereis no shared time scale between the graphs. The average values would be stronglyinfluenced by the few largest graphs. Therefore just the measured times from a singlegraph are compared. Through the evaluation the graph DIC28 main is used. Forsome aspects also the graph Lederberg main is considered. All timings were measuredon a 3.00GHz Intel(R) Pentium(R) 4 CPU with 1GB main memory. Similar to themean modularity comparing the runtime with and without refinement provides asignificance scale.Computation times measured on different graphs can be used to study the scalabilityof the algorithms. By nature the runtime depends on the number of verticesand edges. Plotting the measured times against the number of vertices will visualizethese dependencies. In practice the absolute times are uninteresting as they also dependon startup time, implementation style, and the runtime environment. Insteadthe progression of runtime curves of different algorithms is compared.63
Page 1:
Brandenburgische Technische Univers
Page 5 and 6:
ContentsList of FiguresList of Tabl
Page 7:
List of Figures1.1 Graph of the Mex
Page 11 and 12:
1 IntroductionSince the rise of com
Page 13 and 14:
1.2 Objectives and Outline1.2 Objec
Page 15 and 16:
2 Graph ClusteringThis chapter intr
Page 17 and 18:
2.2 The Modularity Measure of Newma
Page 19 and 20:
2.3 Density-Based Clustering Qualit
Page 21 and 22: 2.3 Density-Based Clustering Qualit
Page 27 and 28: 2.4 Fundamental Clustering Strategi
Page 35 and 36: 3 The Multi-Level Refinement Algori
Page 37 and 38: 3.1 The Multi-Level Schemeas starti
Page 39 and 40: 3.2 Graph CoarseningData: graph,sel
Page 41 and 42: 3.2 Graph Coarseningnearly no edges
Page 43 and 44: 3.3 Merge SelectorsExtent Name Desc
Page 45 and 46: 3.3 Merge Selectorsdifferent size.
Page 47 and 48: 3.3 Merge SelectorsThe probability
Page 49 and 50: 3.3 Merge SelectorsAs selection qua
Page 51 and 52: 3.3 Merge Selectorsvectors the eige
Page 53 and 54: 3.4 Cluster Refinementleave the loc
Page 55 and 56: 3.4 Cluster Refinementmoving v from
Page 57 and 58: 3.4 Cluster RefinementAlgorithm Sea
Page 59 and 60: 3.4 Cluster RefinementData: graph,c
Page 61 and 62: 3.4 Cluster RefinementModularity0.2
Page 63 and 64: 3.5 Further Implementation NotesInd
Page 65 and 66: 3.5 Further Implementation NotesBOO
Page 67: 3.5 Further Implementation Notesfor
Page 70 and 71: 4 Evaluationparameter component des
Page 74 and 75: 4 EvaluationModularity by Match Fra
Page 76 and 77: 4 Evaluation5% 10% 30% 50% 100%G-no
Page 78 and 79: 4 Evaluationmean modularity0.50 0.5
Page 80 and 81: 4 Evaluation1 2 3 4RWreach-none 1 0
Page 82 and 83: 4 EvaluationG-none M-none G-sgrd M-
Page 84 and 85: 4 Evaluationmean modularity time DI
Page 86 and 87: 4 EvaluationRuntime vs. Graph SizeR
Page 88 and 89: 4 EvaluationComparison of Modularit
Page 90 and 91: 4 Evaluation(a) karate(b) dolphinsF
Page 92 and 93: 4 Evaluation(a) jazz(b) celegans me
Page 94 and 95: 4 Evaluationadministrators, and gra
Page 97 and 98: 5 Results and Future WorkThe object
Page 99 and 100: 5.3 Directions for Future Workstrat
Page 101: 5.3 Directions for Future Workties
Page 104 and 105: BIBLIOGRAPHY[14] B.L. Chamberlain.
Page 106 and 107: BIBLIOGRAPHY[42] H. Jeong, B. Tombo
Page 108 and 109: BIBLIOGRAPHY[71] A. J. Soper and C.
Page 110 and 111: A The Benchmark Graph Collectionsub
Page 112 and 113: B Clustering ResultsRWreach-sgrd 1
Page 114: B Clustering Resultswalktrap leadin
show all

Multilevel Graph Clustering with Density-Based Quality Measures

Create successful ePaper yourself

Delete template?

Save as template?