12.07.2015 Views

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.3 Merge SelectorsAs selection quality the density r t,i (u, v)/ Vol(u, v) is used. Here r t,i is the weightfrom i iterative applications of the reachability computation <strong>with</strong> random walks ofat most t steps. During merging vertices the edge weights r t,i of merged edges aresummed and the densities of all incident vertex pairs are recomputed.Implementation Notes The visit probabilities Pu≤t (v) are computed separately foreach source vertex u using two vectors storing Pu(v) t and the new Put+1 (v) of eachtarget vertex v. A random walk step requires to iterate over all edges and transfera portion of the visit probability from the edge’s start-vertex in the old vector tothe end-vertex in the new vector. The selection qualities are computed from theintermediate vectors between the steps. In order to avoid expensive memory managementthe two vectors are reused by swapping source and target vector betweeneach step.The worst case time complexity for the reachability selector is O(|E||V |it) as foreach vertex and each step all edges have to be visited. The implementation triesto improve the speed by only processing edges when their start-vertex has a nonzerovisit probability. Thus the computation time depends on the typical size ofneighborhoods. Still on complete graphs the worst case is reached after one step.The complexity of the random walk distance selector is even worse. Here the worstcase is O(|E| 2 |V |t) as for each pair of adjacent vertices two probability vectors haveto be computed. On large graphs it is impossible to store and reuse these vectorsfor all vertices. Hence the implementation tries to safe time by reusing the vectorof one vertex and processing all neighbor vertices in a row.3.3.3 Spectral MethodsRandom Walks as described above collect semi-global information about the densitystructureof the graph. But the relation to modularity clustering is very indirect andholds just for one specific volume model. Some applications may require other volumemodels and random walks do not work well on some kinds of graphs. Thus thesearch for non-local selection qualities which are fully compatible to the modularitycontinues.The approach of this section is based on the spectral methods described by Newman[57]. The modularity computation is rewritten in a matrix-vector form andthe matrix is replaced by its decomposition into eigenvalues and eigenvectors. Theeigenvectors of the strongest positive eigenvalues are then used to define vertex vectors.These describe the contribution of each vertex in a higher dimensional spacewhere the modularity is improved by maximizing the length of the vector sum ineach cluster.The eigenvalues and eigenvectors are calculated at the beginning of each coarseninglevel. Given the vectors of two adjacent vertices the spectral length and spectrallength difference selectors analyze the length of vertex vectors. On the other end thespectral angle selector uses the directions of these vectors. The following subsectionspresent the mathematical derivation of the vertex vectors and conclude <strong>with</strong> a moredetailed description of the selection qualities and implementation notes.39

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!