Multilevel Graph Clustering with Density-Based Quality Measures

More documents

Recommendations

Info

$Eine Einführung in LaTeX-Beamer - studiy - Brandenburgische ...$

2 Graph Clusteringclustering really describes and makes it difficult to compare different selectors for atask. Another important question is to which global aim the selector really is leading.Its possible that the local decisions optimize another property than desired. Here asmall analogy to statistical mechanics is visible. For example in Lattice-Boltzmannfluid simulations statistical methods are necessary to proof that the simple discretemodel with microscopic (local) collision rules results in the desired macroscopic(global) flows.Spectral methods replace the graph by a matrix and clusters by vectors. Thenthe clustering quality measure is reformulated with matrix-vector products. Thisallows to apply matrix diagonalization and map the problem into the eigenvectorspace. Similar to principal component analysis a few strongest eigenvalues and theireigenvectors are used to derive vertex vectors. Then selection criteria are computedfrom their length, distance, or direction.The vertex vectors actually position the vertices in a vector space. Other methodsto compute such positions exist. In example for graph drawing often spring embedderalgorithms minimizing an energy model are used. With an appropriate model vertexdistances are related to the cluster structure. This transforms graph clusteringinto a geometric clustering problem. For the modularity measure such layouts arecomputable from the LinLog energy model [62]. In energy minima of this model thedistance between two clusters is roughly inverse proportional to their inter-clusterdensity. However despite the provided global information this method is practicallyunusable as computing layouts is very expensive.Linear Programming Linear programming methods reformulate the clustering qualitymeasure as a linear target function. As input to this function a vector of pairwisevertex distances is given. Vertices of the same cluster have for example zero distanceand vertices in different clusters have distance one. The transitivity, reflexivity, andsymmetry of clusterings is enforced through linear constraints similar to triangleinequalities.Now additional constraints limit the vertex distances to the range from zero toone. This is known as fractional linear programming. Using inner-point methodsexact fractional solutions can be found in polynomial time. Agarwal and Kempe [2]proposed an agglomeration algorithm to round the distances to zero and one. Thusbasically the distances are used as merge selector. As side product the fractionalsolution provides an upper bound for the highest possible modularity of a graph.Because the set of possible clusterings is a subset of all allowed distance vectors, noclustering can have a better modularity than the fractional solution.Brandes et al. [13] used a similar method to derive a binary linear program. Therethe vertex distances are restricted to zero and one. Finding an optimum solutionstill is NP-complete and requires exponential time. But quite good algorithms existsimplementing this search given the target function and constraints. This allowed tofind the global optimum clusterings of graphs with up to 100 vertices.Randomization All optimization heuristics may become trapped in suboptimalclusterings. Randomizing the decisions a little bit is one method to avoid or escape20
2.4 Fundamental Clustering Strategiessuch situations. For example this can be achieved by using random selection criteriaor adding a little bit of randomness to the selector.One often used method is simulated annealing [67, 68, 39, 38]. Typically a clusteringssimilar to the current clustering is constructed, i.e. by randomly moving avertex to another cluster. The main idea is to direct the search by not accepting eachgenerated clustering. Better clusterings are always accepted. But worse clusteringsare just accepted with a probability proportional to the modularity decrease ∆Q.Often the metropolis criterion exp(T −1 ∆Q) is applied. The parameter T controlsthe temperature of the algorithm. In the beginning the temperature is high andmany modularity decreasing moves are accepted. This enables a widespread search.Later the temperature is lowered, increasing the portion of modularity increasingmoves. With zero temperature only better clusterings are accepted and thus thealgorithm finishes in a local optimum.Duch and Arenas [23] proposed a recursive bisection algorithm. The two clustersare computed using extremal optimization applied to an initial random bisection.Extremal optimization can be interpreted as a biased simulated annealing. Themoved vertices are not randomly chosen but selected based on their current contributionto the modularity. This selection criterion is called vertex fitness. The ideais that moving low-fitness vertices also improves the clustering quality.2.4.2 Constructing ComponentsThis subsection presents various heuristics to construct clusterings. They use theselection criteria presented above to direct the construction. Dissection is one ofthe oldest strategies often used in the analysis of social networks. It observes howgraphs and clusters fall apart. More details are presented in the next paragraph. Thesecond paragraph describes agglomeration method. They observe how large clusterscan be put together from smaller clusters. The last paragraph presents refinementmethods that try to improve given clusterings by moving vertices between clusters.Dissection Dissection algorithms repeatedly remove vertices or edges from thegraph and observe the developing connectivity components. Because clusterings ofthe vertex set are searched it is more common to remove edges. Removing an edgecan split a cluster in at most two parts. Thus the produced hierarchical clusteringis a binary tree.Girvan [32] proposed a dissection algorithm which removes edges which are betweenclusters and least central to any cluster. These are identified by counting howmany shortest paths from arbitrary vertex pairs pass through each edge. Less edgesare expected between clusters and thus more paths should touch them. After eachremoval all betweenness values are recalculated. Later Girvan and Newman [59]proposed the random walk betweenness as improved selector. In the same paper themodularity was introduced as measure for the optimal number of clusters.Agglomeration Agglomeration methods grow clusters by merging smaller clusters.The process begins with each vertex placed in a separate cluster. In each step a pairof clusters is selected and merged. Greedy methods only merge pairs which increase21
Page 1: Brandenburgische Technische Univers
Page 5 and 6: ContentsList of FiguresList of Tabl
Page 7: List of Figures1.1 Graph of the Mex
Page 11 and 12: 1 IntroductionSince the rise of com
Page 13 and 14: 1.2 Objectives and Outline1.2 Objec
Page 15 and 16: 2 Graph ClusteringThis chapter intr
Page 17 and 18: 2.2 The Modularity Measure of Newma
Page 19 and 20: 2.3 Density-Based Clustering Qualit
Page 27 and 28: 2.4 Fundamental Clustering Strategi
Page 29: 2.4 Fundamental Clustering Strategi
Page 33 and 34: 2.4 Fundamental Clustering Strategi
Page 35 and 36: 3 The Multi-Level Refinement Algori
Page 37 and 38: 3.1 The Multi-Level Schemeas starti
Page 39 and 40: 3.2 Graph CoarseningData: graph,sel
Page 41 and 42: 3.2 Graph Coarseningnearly no edges
Page 43 and 44: 3.3 Merge SelectorsExtent Name Desc
Page 45 and 46: 3.3 Merge Selectorsdifferent size.
Page 47 and 48: 3.3 Merge SelectorsThe probability
Page 49 and 50: 3.3 Merge SelectorsAs selection qua
Page 51 and 52: 3.3 Merge Selectorsvectors the eige
Page 53 and 54: 3.4 Cluster Refinementleave the loc
Page 55 and 56: 3.4 Cluster Refinementmoving v from
Page 57 and 58: 3.4 Cluster RefinementAlgorithm Sea
Page 59 and 60: 3.4 Cluster RefinementData: graph,c
Page 61 and 62: 3.4 Cluster RefinementModularity0.2
Page 63 and 64: 3.5 Further Implementation NotesInd
Page 65 and 66: 3.5 Further Implementation NotesBOO
Page 67: 3.5 Further Implementation Notesfor
Page 70 and 71: 4 Evaluationparameter component des
Page 72 and 73: 4 Evaluationsignificance scale also
Page 74 and 75: 4 EvaluationModularity by Match Fra
Page 76 and 77: 4 Evaluation5% 10% 30% 50% 100%G-no
Page 78 and 79: 4 Evaluationmean modularity0.50 0.5
Page 80 and 81:
4 Evaluation1 2 3 4RWreach-none 1 0
Page 82 and 83:
4 EvaluationG-none M-none G-sgrd M-
Page 84 and 85:
4 Evaluationmean modularity time DI
Page 86 and 87:
4 EvaluationRuntime vs. Graph SizeR
Page 88 and 89:
4 EvaluationComparison of Modularit
Page 90 and 91:
4 Evaluation(a) karate(b) dolphinsF
Page 92 and 93:
4 Evaluation(a) jazz(b) celegans me
Page 94 and 95:
4 Evaluationadministrators, and gra
Page 97 and 98:
5 Results and Future WorkThe object
Page 99 and 100:
5.3 Directions for Future Workstrat
Page 101:
5.3 Directions for Future Workties
Page 104 and 105:
BIBLIOGRAPHY[14] B.L. Chamberlain.
Page 106 and 107:
BIBLIOGRAPHY[42] H. Jeong, B. Tombo
Page 108 and 109:
BIBLIOGRAPHY[71] A. J. Soper and C.
Page 110 and 111:
A The Benchmark Graph Collectionsub
Page 112 and 113:
B Clustering ResultsRWreach-sgrd 1
Page 114:
B Clustering Resultswalktrap leadin
show all

Multilevel Graph Clustering with Density-Based Quality Measures

Create successful ePaper yourself

Delete template?

Save as template?