Multilevel Graph Clustering with Density-Based Quality Measures

More documents

Recommendations

Info

$Eine Einführung in LaTeX-Beamer - studiy - Brandenburgische ...$

3 The Multi-Level Refinement Algorithmnot overlap for a fixed number of steps. Hence to obtain comparable distributionsthe probability to visit a vertex in at most t steps Pu≤t (v) = ∑ ti=1 P u(v) i is used likein [40].Random Walk Distance Latapy and Pons [64] propose short random walks to definea measure of vertex distance. They postulate that if to vertices lie in the samecommunity (optimum cluster) the probabilities of reaching each other Pu(v), t Pv(u)tare high and both vertices have a similar neighborhood with Pu(w) t ≃ Pv(w). t Thusa low distance is expected between both distributions. They observe from the stationarydistribution that high degree vertices are preferred by random walkers. Toremove this influence they propose following Euclidean distance weighted by deg(w):√ ∑wRW distanced(u, v) =(P u (w) − P v (w)) 2deg(w)(3.8)To account for bipartite situations the implemented random walk distance selectoruses d ≤3 (u, v), the distance of the sums over walks of length at most 3.During merge operations the distance of merged edges is updated by taking themaximum. This is also known as complete-linkage in the literature. It corresponds tousing the largest distance between all vertices of both vertex groups and is consistentwith the aim to avoid merge errors. The extreme opposite for example is using theshortest distance (single-linkage). Here it is possible to merge very distant verticesby chaining a sequence of intermediate vertices of much lower distance.Random Walk Reachability The random walk reachability selection quality is similarto the weight density f(a, b)/ Vol(a, b) but uses modified edge weights. Randomwalks are used to compute weights which become weaker for edges crossing lowdensitycuts and stronger for edges in high-density groups. Its assumed that verticesin the same optimum cluster are reachable over many paths. And thus as startingpoint the probability to visit one end-vertex starting from the other end-vertex withoutreturning back to the start vertex is used. This is very similar to the escapeprobability of Harel and Koren [40].To obtain actual edge weights these probabilities are scaled with the start vertexdegree. Applying the weighted symmetry of inverse paths yields:RW reachabilityr t (u, v) = deg(u)P t u(v) + deg(v)P t v(u)2(3.9)Thus after the first step the weight r 1 (u, v) = deg(u)f(u, v)/ deg(u) equals theoriginal edge weight. Additional random walk steps will add weight to the edgeproportional to the strength of alternate paths to this neighbor. The paths returningback to the start vertex are suppressed by setting Pu ≥1 (u) = 0 in each step. Toaccount for local bipartite situations again the probabilities are summed over asmall number of steps with r ≤t (u, v). Because this reachability is so similar tonormal weights the whole calculation can be applied to the result again leading toa reinforcing feedback [40].38
3.3 Merge SelectorsAs selection quality the density r t,i (u, v)/ Vol(u, v) is used. Here r t,i is the weightfrom i iterative applications of the reachability computation with random walks ofat most t steps. During merging vertices the edge weights r t,i of merged edges aresummed and the densities of all incident vertex pairs are recomputed.Implementation Notes The visit probabilities Pu≤t (v) are computed separately foreach source vertex u using two vectors storing Pu(v) t and the new Put+1 (v) of eachtarget vertex v. A random walk step requires to iterate over all edges and transfera portion of the visit probability from the edge’s start-vertex in the old vector tothe end-vertex in the new vector. The selection qualities are computed from theintermediate vectors between the steps. In order to avoid expensive memory managementthe two vectors are reused by swapping source and target vector betweeneach step.The worst case time complexity for the reachability selector is O(|E||V |it) as foreach vertex and each step all edges have to be visited. The implementation triesto improve the speed by only processing edges when their start-vertex has a nonzerovisit probability. Thus the computation time depends on the typical size ofneighborhoods. Still on complete graphs the worst case is reached after one step.The complexity of the random walk distance selector is even worse. Here the worstcase is O(|E| 2 |V |t) as for each pair of adjacent vertices two probability vectors haveto be computed. On large graphs it is impossible to store and reuse these vectorsfor all vertices. Hence the implementation tries to safe time by reusing the vectorof one vertex and processing all neighbor vertices in a row.3.3.3 Spectral MethodsRandom Walks as described above collect semi-global information about the densitystructureof the graph. But the relation to modularity clustering is very indirect andholds just for one specific volume model. Some applications may require other volumemodels and random walks do not work well on some kinds of graphs. Thus thesearch for non-local selection qualities which are fully compatible to the modularitycontinues.The approach of this section is based on the spectral methods described by Newman[57]. The modularity computation is rewritten in a matrix-vector form andthe matrix is replaced by its decomposition into eigenvalues and eigenvectors. Theeigenvectors of the strongest positive eigenvalues are then used to define vertex vectors.These describe the contribution of each vertex in a higher dimensional spacewhere the modularity is improved by maximizing the length of the vector sum ineach cluster.The eigenvalues and eigenvectors are calculated at the beginning of each coarseninglevel. Given the vectors of two adjacent vertices the spectral length and spectrallength difference selectors analyze the length of vertex vectors. On the other end thespectral angle selector uses the directions of these vectors. The following subsectionspresent the mathematical derivation of the vertex vectors and conclude with a moredetailed description of the selection qualities and implementation notes.39
Page 1: Brandenburgische Technische Univers
Page 5 and 6: ContentsList of FiguresList of Tabl
Page 7: List of Figures1.1 Graph of the Mex
Page 11 and 12: 1 IntroductionSince the rise of com
Page 13 and 14: 1.2 Objectives and Outline1.2 Objec
Page 15 and 16: 2 Graph ClusteringThis chapter intr
Page 17 and 18: 2.2 The Modularity Measure of Newma
Page 19 and 20: 2.3 Density-Based Clustering Qualit
Page 27 and 28: 2.4 Fundamental Clustering Strategi
Page 35 and 36: 3 The Multi-Level Refinement Algori
Page 37 and 38: 3.1 The Multi-Level Schemeas starti
Page 39 and 40: 3.2 Graph CoarseningData: graph,sel
Page 41 and 42: 3.2 Graph Coarseningnearly no edges
Page 43 and 44: 3.3 Merge SelectorsExtent Name Desc
Page 45 and 46: 3.3 Merge Selectorsdifferent size.
Page 47: 3.3 Merge SelectorsThe probability
Page 51 and 52: 3.3 Merge Selectorsvectors the eige
Page 53 and 54: 3.4 Cluster Refinementleave the loc
Page 55 and 56: 3.4 Cluster Refinementmoving v from
Page 57 and 58: 3.4 Cluster RefinementAlgorithm Sea
Page 59 and 60: 3.4 Cluster RefinementData: graph,c
Page 61 and 62: 3.4 Cluster RefinementModularity0.2
Page 63 and 64: 3.5 Further Implementation NotesInd
Page 65 and 66: 3.5 Further Implementation NotesBOO
Page 67: 3.5 Further Implementation Notesfor
Page 70 and 71: 4 Evaluationparameter component des
Page 72 and 73: 4 Evaluationsignificance scale also
Page 74 and 75: 4 EvaluationModularity by Match Fra
Page 76 and 77: 4 Evaluation5% 10% 30% 50% 100%G-no
Page 78 and 79: 4 Evaluationmean modularity0.50 0.5
Page 80 and 81: 4 Evaluation1 2 3 4RWreach-none 1 0
Page 82 and 83: 4 EvaluationG-none M-none G-sgrd M-
Page 84 and 85: 4 Evaluationmean modularity time DI
Page 86 and 87: 4 EvaluationRuntime vs. Graph SizeR
Page 88 and 89: 4 EvaluationComparison of Modularit
Page 90 and 91: 4 Evaluation(a) karate(b) dolphinsF
Page 92 and 93: 4 Evaluation(a) jazz(b) celegans me
Page 94 and 95: 4 Evaluationadministrators, and gra
Page 97 and 98: 5 Results and Future WorkThe object
Page 99 and 100:
5.3 Directions for Future Workstrat
Page 101:
5.3 Directions for Future Workties
Page 104 and 105:
BIBLIOGRAPHY[14] B.L. Chamberlain.
Page 106 and 107:
BIBLIOGRAPHY[42] H. Jeong, B. Tombo
Page 108 and 109:
BIBLIOGRAPHY[71] A. J. Soper and C.
Page 110 and 111:
A The Benchmark Graph Collectionsub
Page 112 and 113:
B Clustering ResultsRWreach-sgrd 1
Page 114:
B Clustering Resultswalktrap leadin
show all

Multilevel Graph Clustering with Density-Based Quality Measures

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?