12.07.2015 Views

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 <strong>Graph</strong> <strong>Clustering</strong>clustering really describes and makes it difficult to compare different selectors for atask. Another important question is to which global aim the selector really is leading.Its possible that the local decisions optimize another property than desired. Here asmall analogy to statistical mechanics is visible. For example in Lattice-Boltzmannfluid simulations statistical methods are necessary to proof that the simple discretemodel <strong>with</strong> microscopic (local) collision rules results in the desired macroscopic(global) flows.Spectral methods replace the graph by a matrix and clusters by vectors. Thenthe clustering quality measure is reformulated <strong>with</strong> matrix-vector products. Thisallows to apply matrix diagonalization and map the problem into the eigenvectorspace. Similar to principal component analysis a few strongest eigenvalues and theireigenvectors are used to derive vertex vectors. Then selection criteria are computedfrom their length, distance, or direction.The vertex vectors actually position the vertices in a vector space. Other methodsto compute such positions exist. In example for graph drawing often spring embedderalgorithms minimizing an energy model are used. With an appropriate model vertexdistances are related to the cluster structure. This transforms graph clusteringinto a geometric clustering problem. For the modularity measure such layouts arecomputable from the LinLog energy model [62]. In energy minima of this model thedistance between two clusters is roughly inverse proportional to their inter-clusterdensity. However despite the provided global information this method is practicallyunusable as computing layouts is very expensive.Linear Programming Linear programming methods reformulate the clustering qualitymeasure as a linear target function. As input to this function a vector of pairwisevertex distances is given. Vertices of the same cluster have for example zero distanceand vertices in different clusters have distance one. The transitivity, reflexivity, andsymmetry of clusterings is enforced through linear constraints similar to triangleinequalities.Now additional constraints limit the vertex distances to the range from zero toone. This is known as fractional linear programming. Using inner-point methodsexact fractional solutions can be found in polynomial time. Agarwal and Kempe [2]proposed an agglomeration algorithm to round the distances to zero and one. Thusbasically the distances are used as merge selector. As side product the fractionalsolution provides an upper bound for the highest possible modularity of a graph.Because the set of possible clusterings is a subset of all allowed distance vectors, noclustering can have a better modularity than the fractional solution.Brandes et al. [13] used a similar method to derive a binary linear program. Therethe vertex distances are restricted to zero and one. Finding an optimum solutionstill is NP-complete and requires exponential time. But quite good algorithms existsimplementing this search given the target function and constraints. This allowed tofind the global optimum clusterings of graphs <strong>with</strong> up to 100 vertices.Randomization All optimization heuristics may become trapped in suboptimalclusterings. Randomizing the decisions a little bit is one method to avoid or escape20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!