12.07.2015 Views

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Brandenburgische Technische Universität CottbusInstitut für InformatikLehrstuhl Software-SystemtechnikDiplomarbeitA Multi-Level Algorithm forModularity <strong>Graph</strong> <strong>Clustering</strong>Randolf RottaJune 30, 20081. Gutachter: Prof. Claus Lewerentz2. Gutachter: Prof. Klaus MeerBetreuer: Dr. Andreas Noack


SelbstständigkeitserklärungIch erkläre hiermit, dass ich die vorliegende Diplomarbeit selbstständig und ohneunerlaubter Hilfe angefertigt habe. Alle verwendeten Hilfsmittel und Quellen sindim Literaturverzeichnis vollständig aufgeführt und die aus den benutzten Quellenwörtlich oder inhaltlich entnommenen Stellen als solche kenntlich gemacht.Cottbus, Juni 2008DanksagungenEinen ganz besonderen Dank möchte ich meinem Betreuer Dr. Andreas Noack ausprechen.Erst seine wertvollen, zielgerichteten Kommentare haben diese Arbeitmöglich gemacht. Weiterer Dank gilt den Lehrstühlen Software-Systemtechnik undTheoretische Informatik für die Bereitstellung von Rechentechnik für die Evaluationder Algorithmen. Zu großem Dank fühle ich mich auch meinen Eltern und Dr.Romain Gengler für ihre Geduld und motivierenden Hilfestellungen verpflichtet.Desweiteren danke ich meinen zahlreichen Kommilitonen für viele anregende undaufschlussreiche Diskussionen.Zu guter Letzt sei den vielen Autoren hilfreicher Werkzeuge und Programmegedankt. Dazu zählen die GNU Autotools, die GNU Compiler Collection und dieC++ Boost-Bibliotheken, insbesondere dem Spirit Parser und Scanner Frameworkfür die Dateiverarbeitung, dem IOStreams-Paket für transparente Kompression, undder Boost <strong>Graph</strong> Library für praktische Algorithmen und viele Ideen. Die <strong>Graph</strong>enfür die Evaluation stammen aus Sammlungen von Mark Newman, Alex Arenas,Uri Alon, und dem Pajek Project um Vladimir Batagelj. Die Analyse der experimentellenDaten wurde durch das GNU R Project for Statistical Computing, deri<strong>Graph</strong> Bibliothek und dem GGobi Data Visualization System ermöglicht.


ContentsList of FiguresList of TablesVIIIX1 Introduction 11.1 An Introductory Example . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Objectives and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 32 <strong>Graph</strong> <strong>Clustering</strong> 52.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.1 <strong>Graph</strong>s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Attributed <strong>Graph</strong>s . . . . . . . . . . . . . . . . . . . . . . . . 62.1.3 <strong>Graph</strong> <strong>Clustering</strong>s and <strong>Quality</strong> <strong>Measures</strong> . . . . . . . . . . . 72.2 The Modularity Measure of Newman . . . . . . . . . . . . . . . . . . 72.3 <strong>Density</strong>-<strong>Based</strong> <strong>Clustering</strong> <strong>Quality</strong> <strong>Measures</strong> . . . . . . . . . . . . . . 92.3.1 Volume and <strong>Density</strong> . . . . . . . . . . . . . . . . . . . . . . . 92.3.2 Bias and the Null Model . . . . . . . . . . . . . . . . . . . . . 102.3.3 Derived <strong>Quality</strong> <strong>Measures</strong> . . . . . . . . . . . . . . . . . . . . 102.3.4 Volume Models . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Fundamental <strong>Clustering</strong> Strategies . . . . . . . . . . . . . . . . . . . 172.4.1 Directing Components . . . . . . . . . . . . . . . . . . . . . . 192.4.2 Constructing Components . . . . . . . . . . . . . . . . . . . . 212.4.3 Meta-Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . 223 The Multi-Level Refinement Algorithm 253.1 The Multi-Level Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 <strong>Graph</strong> Coarsening . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.1 Greedy Grouping . . . . . . . . . . . . . . . . . . . . . . . . . 293.2.2 Greedy Matching . . . . . . . . . . . . . . . . . . . . . . . . . 313.3 Merge Selectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3.1 Local Merge Selectors . . . . . . . . . . . . . . . . . . . . . . 343.3.2 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . 353.3.3 Spectral Methods . . . . . . . . . . . . . . . . . . . . . . . . . 393.4 Cluster Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.4.1 Design Space of Local Refinement Algorithms . . . . . . . . . 433.4.2 Greedy Refinement . . . . . . . . . . . . . . . . . . . . . . . . 483.4.3 Kernighan-Lin Refinement . . . . . . . . . . . . . . . . . . . . 493.5 Further Implementation Notes . . . . . . . . . . . . . . . . . . . . . 533.5.1 Index Spaces and <strong>Graph</strong>s . . . . . . . . . . . . . . . . . . . . 53V


Contents3.5.2 Data Management . . . . . . . . . . . . . . . . . . . . . . . . 564 Evaluation 594.1 Methods and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.1.1 Configuration Space . . . . . . . . . . . . . . . . . . . . . . . 594.1.2 Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.1.3 Efficiency and Scalability . . . . . . . . . . . . . . . . . . . . 634.2 Effectiveness of the <strong>Graph</strong> Coarsening . . . . . . . . . . . . . . . . . 644.2.1 Match Fraction . . . . . . . . . . . . . . . . . . . . . . . . . . 644.2.2 Coarsening Methods . . . . . . . . . . . . . . . . . . . . . . . 654.2.3 Reduction Factor . . . . . . . . . . . . . . . . . . . . . . . . . 674.3 Effectiveness of the Merge Selectors . . . . . . . . . . . . . . . . . . . 674.3.1 Random Walk Distance . . . . . . . . . . . . . . . . . . . . . 674.3.2 Random Walk Reachability . . . . . . . . . . . . . . . . . . . 694.3.3 Comparison of Merge Selectors . . . . . . . . . . . . . . . . . 704.4 Effectiveness of the Cluster Refinement . . . . . . . . . . . . . . . . . 724.4.1 Greedy Refinement . . . . . . . . . . . . . . . . . . . . . . . . 734.4.2 Kernighan-Lin Refinement . . . . . . . . . . . . . . . . . . . . 744.5 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 754.6 Comparison to Reference Algorithms . . . . . . . . . . . . . . . . . . 764.7 Comparison to Published Results . . . . . . . . . . . . . . . . . . . . 794.7.1 The <strong>Graph</strong>s and <strong>Clustering</strong>s . . . . . . . . . . . . . . . . . . . 794.7.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855 Results and Future Work 875.1 Results of this Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 875.2 By-Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885.3 Directions for Future Work . . . . . . . . . . . . . . . . . . . . . . . 895.3.1 Pre-Coarsening . . . . . . . . . . . . . . . . . . . . . . . . . . 895.3.2 Study of Merge Selectors . . . . . . . . . . . . . . . . . . . . 905.3.3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 905.3.4 Multi-Pass <strong>Clustering</strong> and Randomization . . . . . . . . . . . 905.3.5 High-Level Refinement Search . . . . . . . . . . . . . . . . . . 90Bibliography 93A The Benchmark <strong>Graph</strong> Collection 99B <strong>Clustering</strong> Results 101VI


List of Figures1.1 <strong>Graph</strong> of the Mexican political elite . . . . . . . . . . . . . . . . . . 22.1 Example Volume Models . . . . . . . . . . . . . . . . . . . . . . . . . 142.2 <strong>Graph</strong> of US Airports . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Recursive Subdivision and Hierarchical <strong>Clustering</strong>s . . . . . . . . . . 233.1 Operations for local cluster modification . . . . . . . . . . . . . . . . 263.2 The Multi-Level Scheme . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 The Multi-Level Algorithm . . . . . . . . . . . . . . . . . . . . . . . 273.4 Coarsening Method: Greedy Grouping . . . . . . . . . . . . . . . . . 293.5 Merging two Vertices and their Edges . . . . . . . . . . . . . . . . . 303.6 Coarsening Method: Greedy Matching . . . . . . . . . . . . . . . . . 313.7 Contribution of Neighbor Vertices to the Visit Probability . . . . . . 363.8 Spectral vertex vectors and two cluster vectors . . . . . . . . . . . . 403.9 Dependencies when Moving a Vertex . . . . . . . . . . . . . . . . . . 453.10 Refinement Method: Complete Greedy . . . . . . . . . . . . . . . . . 483.11 Refinement Method: Sorted Greedy . . . . . . . . . . . . . . . . . . 493.12 Refinement Method: basic Kernighan-Lin . . . . . . . . . . . . . . . 503.13 Kernighan-Lin Refinement Creating Clusters on <strong>Graph</strong> Epa main . . 513.14 Effective Search Depth of Kernighan-Lin Refinement . . . . . . . . . 523.15 Index Spaces: Class and Concept Diagram . . . . . . . . . . . . . . . 533.16 Index Maps: Class and Concept Diagram . . . . . . . . . . . . . . . 543.17 C++ Example: Calculation of Vertex Degrees . . . . . . . . . . . . . 554.1 Mean Modularity by Match Fraction (reduced set) . . . . . . . . . . 644.2 Modularity and Runtime by Reduction Factor . . . . . . . . . . . . . 654.3 The Random Walk Distance . . . . . . . . . . . . . . . . . . . . . . . 684.4 The Random Walk Reachability . . . . . . . . . . . . . . . . . . . . . 694.5 Mean Modularity of the Merge Selectors (large set) . . . . . . . . . . 714.6 Runtime of the Merge Selectors . . . . . . . . . . . . . . . . . . . . . 714.7 Mean Modularity by Refinement Method (reduced set) . . . . . . . . 734.8 Mean Modularity by Refinement Method (large set) . . . . . . . . . 754.9 Runtime by <strong>Graph</strong> Size . . . . . . . . . . . . . . . . . . . . . . . . . 764.10 <strong>Clustering</strong> Results and Runtime of the Reference Algorithms . . . . 784.11 Reference <strong>Graph</strong>s (karate and dolphins) . . . . . . . . . . . . . . . . 804.12 Reference <strong>Graph</strong>s (polBooks and afootball) . . . . . . . . . . . . . . 814.13 Reference <strong>Graph</strong>s (polBooks and celegans metabolic) . . . . . . . . . 824.14 Reference <strong>Graph</strong>s (circuit s838 and email) . . . . . . . . . . . . . . . 83VII


List of Tables2.1 Overview of Modularity <strong>Graph</strong> <strong>Clustering</strong> Algorithms . . . . . . . . 182.2 Overview of <strong>Clustering</strong> and Partitioning Algorithms . . . . . . . . . 193.1 Proposed Merge Selection Qualities . . . . . . . . . . . . . . . . . . . 333.2 Classification of Refinement Algorithms . . . . . . . . . . . . . . . . 473.3 Hierarchical Naming Convention for File Names . . . . . . . . . . . . 564.1 The Configuration Space . . . . . . . . . . . . . . . . . . . . . . . . . 604.2 Mean Modularity by Match Fraction . . . . . . . . . . . . . . . . . . 654.3 Mean Modularity by Reduction Factor . . . . . . . . . . . . . . . . . 664.4 Runtime by Reduction Factor . . . . . . . . . . . . . . . . . . . . . . 664.5 Mean Modularity <strong>with</strong> Random Walk Distance . . . . . . . . . . . . 684.6 Mean Modularity <strong>with</strong> Random Walk Reachability . . . . . . . . . . 704.7 Mean Modularity of Merge Selectors . . . . . . . . . . . . . . . . . . 724.8 Mean Modularity by Refinement Method (reduced set) . . . . . . . . 744.9 Mean Modularity by Refinement Method (large set) . . . . . . . . . 744.10 <strong>Clustering</strong> Results of Reference Algorithms . . . . . . . . . . . . . . 774.11 Comparison to Published Results . . . . . . . . . . . . . . . . . . . . 85A.1 The Benchmark <strong>Graph</strong> Collection . . . . . . . . . . . . . . . . . . . . 99A.2 References to the <strong>Graph</strong> Sources . . . . . . . . . . . . . . . . . . . . 100B.1 Random Walk Distance by <strong>Graph</strong> . . . . . . . . . . . . . . . . . . . 101B.2 Random Walk Reachability, Length 2 . . . . . . . . . . . . . . . . . 101B.3 Random Walk Reachability, Length 3 . . . . . . . . . . . . . . . . . 101B.4 <strong>Clustering</strong> Results from the Refinement Phase . . . . . . . . . . . . . 102B.5 Runtime Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 103B.6 Runtime of the Reference Algorithms . . . . . . . . . . . . . . . . . . 104IX


1 IntroductionSince the rise of computers and related information technologies it became easy tocollect vast amounts of data. Combined <strong>with</strong> technological advances in linking persons,discussions, and documents, various relational information is readily availablefor analysis. These networks are able to provide insight into the function of wholesocieties. Thus nowadays network analysis is an important tool also outside scientificcommunities, for example in market analysis and politics.The exploration and utilization of collected data requires elaborate analysis strategiessupported by computational methods. A common abstraction for relational networksare graphs. In these entities like persons are represented by vertices. <strong>Graph</strong>sdescribe the observed relationships between entities by edges connecting related vertices.In addition the edges may be attributed by weights to describe the strengthof connections.<strong>Graph</strong> clustering is the problem and process of grouping vertices together intodisjoint sets called clusters. Discovering such clusters allows to analyze the underlyingshared properties in and differences between groups. Certainly many differentclusterings are possible. In order to find practically useful ones a quality measure isnecessary. This measure leads optimization algorithms to interesting clusterings. Afew years ago Mark Newman introduced a specific quality measure called modularity[60]. It was developed for the analysis of social networks but also proved to beuseful in other scientific areas.1.1 An Introductory ExampleAn example drawing of such a network is given in Figure 1.1. It is based on astudy of Jorge Gil-Mendieta and Samuel Schmidt [30] about the network of influentialMexican politicians between 1910 and 1994. In the graph each politician isrepresented by a vertex and edges connect pairs of politicians based on friendships,business partnerships, marriages, and similar. In the figure vertices are displayedas boxes and circles <strong>with</strong> a size proportional to their number of connections. Theplacement of the vertices was computed <strong>with</strong> the LinLog energy model [62] andedges are drawn as straight lines connecting related vertices.In the figure the vertices are colored according to a clustering of the politicians.<strong>Based</strong> on optimizing the modularity measure, the shown clustering was automaticallyconstructed by the algorithm developed in this work. Three major clusters arevisible. These roughly corresponding to three generational changes and differencesin the military vs. civil background. Similar groups were also discovered by theGil-Mendieta and Schmidt, who manually derived groups by studying the importanceand influence of single persons to the network using various special purposecentrality indices.1


1 IntroductionFigure 1.1: <strong>Graph</strong> of the Mexican political elite [30] from 1910 to 1994. Each boxrepresents a Mexican politician. The family name and the main year of politicalactivity is printed for each. The names of actual presidents are printed in red color.The lines connect related politicians based on friendships, business partnerships,marriages, and similar. Finally the color of the boxes shows the best clusteringcomputed <strong>with</strong> the algorithms presented in this work.2


1.2 Objectives and Outline1.2 Objectives and OutlineThe main objective of this work is the implementation and evaluation of effective andefficient clustering algorithms for the modularity quality measure of Newman [60].Optimizing the modularity is a NP-complete problem [13]. Therefore it is unlikelythat a fast, polynomial time algorithm finds a clustering of optimal modularity inall graphs. Instead relatively good clusterings have to suffice.The major design goal for the clustering algorithm is the effectiveness. The developedalgorithm shall be able to find clusterings of very good modularity comparedto alternative methods. Secondly the efficiency is a concern. Complicated and expensivealgorithms should replaced by simpler alternatives in case they are not ableto produce significantly better clusterings.This work focuses on multi-level refinement heuristics. In the past these weresuccessfully applied on similar problems like minimum cut partitioning [41, 43, 78].However modularity clustering differs significantly from these. The optimizationaim is for example more complex because the number of clusters is not known inadvance but has to be discovered by the algorithm. To date no detailed study existson the adaption of multi-level refinement to modularity clustering. In this contextalso a comparison of the effectiveness and efficiency to other clustering algorithmsis necessary.Chapter 2 introduces the mathematical formulation of the modularity clusteringaim and discusses related concepts. This analysis provides the base for the laterdevelopment of strategies specific to the modularity. The chapter concludes in Section2.4 <strong>with</strong> the study of existing clustering algorithms. From these fundamentalstrategies and components are derived. The developed multi-level algorithm willcombine several particularly useful components to benefit from their advantages.<strong>Based</strong> on the insights in the mathematical structure and fundamental clusteringstrategies Chapter 3 presents the developed family of algorithms. The algorithms arebased on multi-level refinement methods adapted to the special needs of modularityclustering. The algorithm operates in two phases. First a hierarchy of coarsenedgraphs is produced. Then this hierarchy is used in the refinement of an initialclustering.The coarsening phase involves the selection of pairs of clusters to be merged. Asub-goal of this work is the exploration of alternative selection criteria in order toimprove the effectiveness and overall clustering results. This includes criteria derivedfrom random walks and spectral methods like presented in Section 3.3. In additionthe implementation should incorporate appropriate data structures and algorithmsto efficiently support these merge selector.The second major component of the algorithm are cluster refinement heuristics.These try to improve an initial clustering by exploring similar clusterings and selectingone of them. Here such neighbor clusterings are generated by moving singlevertices. The objective of Section 3.4 is the exploration of heuristics for the searchof local optima, i.e. clusterings <strong>with</strong>out any neighbor clusterings of better modularity.Here the efficiency is an important concern because many traditional heuristics,like Kernighan-Lin refinement [45, 24], cannot be efficiently implemented for themodularity measure.3


1 IntroductionIn order to assess the practical usefulness of the developed algorithm an empiricalevaluation is carried out in Chapter 4. The experimental measurements are based ona set of benchmark graphs collected from a wide range of application domains. Firstthe different configurations of the algorithm are compared regarding their efficiencyand effectiveness in finding good clusterings. <strong>Based</strong> on this evaluation an efficientand effective default configuration and a few effective variations are chosen. Thenthese are compared against a range of reference algorithms and clustering resultsfound in the literature.4


2 <strong>Graph</strong> <strong>Clustering</strong>This chapter introduces into graph clustering <strong>with</strong> the focus on Newman’s modularityas optimization aim. It should provide the necessary foundations to developand compare clustering algorithms. For this purpose a mathematical analysis of themodularity and related models is as necessary as the study of existing algorithms.This chapter is organized as follows. First graphs and clusterings are formallydefined together <strong>with</strong> helpful mathematical notations. The second section introducesthe modularity measure in the form used throughout this work. In order tosubstantiate its relationship to other quality measures the third sections presents anaxiomatic derivation from the density concept. The chapter concludes <strong>with</strong> a reviewand summary of fundamental clustering strategies.2.1 Basic DefinitionsThis section formally introduces graphs, weights and clusterings. In the first subsectiondifferent types of graphs are discussed. Undirected graphs will be used in thederivation of quality measures while symmetric, directed graphs are used as internalrepresentation in the implementation. The second subsection on attributed graphsdefines vertex and edge properties and basic computations on edge weights. Thefinal subsection formally introduces clusterings and quality measures.2.1.1 <strong>Graph</strong>sAn undirected graph G = (V, E) is a set of vertices V combined <strong>with</strong> a set of edgesE. Two vertices u, v ∈ V are said to be adjacent, if they are connected by an edgee ∈ E. In that case, u and v are called end-vertices of the edge e and the edge isincident to both vertices. Each edge connects at most two different vertices and iscalled self-edge if it connects a single vertex just to itself.Let E(X, Y ) ⊆ E be the subset of edges connecting vertices in X ⊆ V <strong>with</strong>vertices in Y ⊆ V in an undirected graph. The edges incident to vertex v are E(v) :=E({v}, V ) and two vertices u, v are adjacent if E(u, v) is non-empty. Conversely letV (e) ⊆ V be the set of end-vertices incident to edge e ∈ E. For example theneighborhood of a vertex is the set of all adjacent vertices including itself and isexpressed by N(v) := V (E(v)) ∪ {v}. The subgraph induced by vertices X ⊆ V isG[X] := (X, E(X, X)). A graph is bipartite if the vertices can be dissected into twodisjoint sets A, B and all edges lie between both sets, i.e. E(A, B) = E.In directed graphs all edges are oriented and connect a start-vertex <strong>with</strong> an end-vertex. A directed graph is symmetric if for each edge also an edge in the opposingdirection exists. In that case the graph topology is equal to an undirected graph. Insymmetric, directed graphs for each vertex an enumeration of its incident edges isundirected graphE(X, Y ), V (e),N(v)directed, symmetric5


2 <strong>Graph</strong> <strong>Clustering</strong>stored and these edges point to the other end vertex. This form is used as internalrepresentation for undirected graphs because it is easier to traverse.An undirected graph is called simple if each pair of vertices is connected by atmost one edge. Otherwise several parallel edges may exist connecting the same endvertices.Such graphs are called multi graph. Similarly a directed graph is simple ifit has at most one edge per direction. Throughout this work only simple graphs areused.2.1.2 Attributed <strong>Graph</strong>sCommonly vertices and edges are attributed <strong>with</strong> additional properties. A vertexproperty is a function mapping each vertex to its property value. Edge propertiesare defined analogous.The simplest properties are textual labels like VertexName : V → Σ ∗ . They canw(v), f(e) be used in visual graph representations to identify and describe vertices and edges inmore detail. Another frequently used type are numerical properties like size, volume,weight, color, or the spatial position of vertices and the distance between adjacentvertices. Throughout this work mainly the abstract vertex weight w : V → Qand edge weight f : E → Q is used. When not otherwise noted these weights arenon-negative.To simplify calculations the edge weight between a pair of vertices is denotedf(u, v), f(X, Y ) by f(u, v) := ∑ ∑e∈E(u,v)f(e) and between vertex sets X, Y ⊆ V by f(X, Y ) :=e∈E(X,Y )f(e). Therefore the edge weight between non-adjacent vertices (no connectingedge) is zero. In undirected graphs all edge weights are symmetric <strong>with</strong>f(u, v) = f(v, u).In case edge weights ω(u, v) between arbitrary vertex pairs are required the(V, ω) weight-induced graph (V, ω) <strong>with</strong> E(u, v) ⇔ ω(u, v) ≠ 0 can be used. Howevernote, that while non-zero edge weights indicate connected vertices, the opposite isnot necessarily true because the edges of a graph define neighborhood relationshipswhere nevertheless some edges might have a zero weight. In cases where the edgeweights of the graph are more important than the single edges the notation (V, f)will be used.The total edge weight f(X) of a set of vertices X ⊆ V is the weight of allf(X)edges having both end-vertices in X. Some care is necessary not to count edgestwice in undirected graphs. For disjoint subsets X ∩ Y = ∅ the equation f(X, Y ) =f(X ∪ Y ) − f(X) − f(Y ) holds. Therefore total edge weight in X is:f(X) =∑u≠v∈X12 f(u, v) + ∑ f(v, v) (2.1)v∈Xdeg(v)The degree of a vertex is the number of incident edges |E({v}, V )|. The weighteddegree deg f (v) is the sum over the weights f(e) of incident edges. Self-edges arecounted twice in order to gain ∑ v deg f (v) = 2f(V ). From hereon only the weighteddegree is used and hence the subscript is dropped. In summary the degree is calcu-6


2.2 The Modularity Measure of Newmanlated by:deg(v) = ∑ x∈Vf(v, x) + f(v, v) (2.2)2.1.3 <strong>Graph</strong> <strong>Clustering</strong>s and <strong>Quality</strong> <strong>Measures</strong>A clustering C partitions the vertices of a graph into non-empty, disjoint subsetscovering the whole set of vertices. The clustering consists of the clusters i ∈ C <strong>with</strong>their vertices C i ⊆ V . Hence the number of clusters is |C|. The cluster containingvertex v is denoted by C(v) ∈ C and thus C(v) = i ⇔ v ∈ C i . To improve readabilityC[v] := C C(v) directly refers to the vertices of the cluster containing vertex v.The term clustering is mostly used in the context of (social) network analysisbecause there the aim is to group vertices together. The load balancing communitymostly uses the term partitioning to emphasize the search for balanced dissections.A clustering quality measure Q(C) maps clusterings of a graph to rational numbersQ called clustering quality. This way the quality measure defines a semi-orderon the set of clusterings <strong>with</strong> clustering C being at least as good as clustering Dif Q(C) ≥ Q(D). This enables the easy comparison of clusterings and the searchfor high quality clusterings although several different clusterings may be of samequality.Two quality measures Q 1 , Q 2 are said to be ranking equivalent when they orderall clusterings equally. This is the case when for all pairs of clusterings C, D holdsthat Q 1 (C) ≤ Q 1 (D) ⇔ Q 2 (C) ≤ Q 2 (D). Ranking equivalence is invariant underaddition of constants α ∈ Q and multiplication <strong>with</strong> positive constants 0 < β ∈ Q:Q(C) ≤ Q(D) ⇐⇒ α + β Q(C) ≤ α + β Q(D). This allows the normalization ofquality measures against other graph properties which is necessary before comparingqualities between different graphs. Often the maximum quality over all clusteringsor upper limits are used for normalization.C i, C(v), C[v]Q(C)ranking equivalence2.2 The Modularity Measure of NewmanThis section shortly introduces the clustering quality measure called modularity,which used throughout this work. The modularity was developed by Newman forcommunity detection in social networks [58, 59]. Here the modularity definitionfrom [57] is used and it is connected to a volume and density model. The sectionconcludes <strong>with</strong> basic properties of this quality measure. An axiomatic derivation ofthe modularity is presented in the next section. Proofs for the NP-completeness ofmodularity optimization and other properties can be found in [13]. The modularityhas a resolution limit. On growing graphs small clusters become invisible [25, 5].Looking at the edges covered by a cluster Newman observes the necessity tomake a ”judgment about when a particular density of edges is significant enoughto define a community” [57]. Hence more edges should be in clusters than expected∑from a random redistribution of all edges. This is achieved by maximizingC(u)=C(v) (f(u, v) − P (u, v)), where f(u, v) is the actual edge weight and P (u, v) 7


2 <strong>Graph</strong> <strong>Clustering</strong>the expected weight called null model. The model chosen by Newman has followingproperties: The weights are symmetric <strong>with</strong> P (u, v) = P (v, u), the total weightequals the total weight of the original graph <strong>with</strong> P (V ) = f(V ) and the vertex degreeshould be reproduced for all vertices by deg P (v) = deg f (v). Putting everythingtogether yields P (u, v) := deg(u) deg(v)/(2f(V )). As last step Newman normalizedthe modularity against the total edge weight f(V ) and gained:Q(C) := 1f(V )= ∑C(u)=C(v)∑C(u)=C(v)( f(u, v)f(V )(f(u, v) −)deg(u) deg(v)2f(V ))deg(u) deg(v)−2f(V ) 2(2.3)(2.4)More insight into this measure can be gained by using the volume Vol(u, v) <strong>with</strong>Vol(u, v) = deg(u) deg(v) and Vol(v, v) = deg(v) 2 /2. Then the null model equalsthe volume scaled to the total edge weight: P (u, v) = f(V ) Vol(u, v)/ Vol(V ). HereVol(V ) is the total volume and equals (2f(V )) 2 /2 because ∑ deg(v) = 2f(V ). Asthe above scaling factor looks similar to a density (mass per volume) it is calledconnection density and denoted by ρ(V ) = f(V )/ Vol(V ). In this context the nullmodel is the local volume multiplied by the global density and the modularity isrewritten as:Q(C) = 1f(V )= ∑C(u)=C(v)∑C(u)=C(v)( f(u, v)f(V )(f(u, v) − ρ(V ) Vol(u, v)) (2.5))Vol(u, v)−Vol(V )(2.6)Now it is visible that the modularity is optimized by maximizing the fraction ofedges lying in clusters while minimizing the fraction of expected edges. As consequencethe modularity of a clustering <strong>with</strong> just one cluster is zero. Moreover in agraph where the actual edge weights are randomly distributed like the null model allclusterings have zero modularity. A further useful property of the modularity is theadditivity as it can be written as sum of edge weights or the contribution of verticesor clusters:∑( )f(u, v) Vol(u, v)Q(C) =−f(V ) Vol(V )C(u)=C(V )= ∑ ∑( )f(u, v) Vol(u, v)−f(V ) Vol(V )v∈V u∈C[v]= ∑ ( f(Ci )f(V ) − Vol(C )i)Vol(V )i∈C(2.7)(2.8)(2.9)8


2.3 <strong>Density</strong>-<strong>Based</strong> <strong>Clustering</strong> <strong>Quality</strong> <strong>Measures</strong>2.3 <strong>Density</strong>-<strong>Based</strong> <strong>Clustering</strong> <strong>Quality</strong> <strong>Measures</strong>The popular modularity measure of Newman was introduced in the previous section.However often algorithms proposed in the literature optimize different but relatedquality measures. In order to gain some insight into these an axiomatic derivationis presented in this section. Further connections, differences and possible interpretationsare outlined. The whole discussion underlines why the modularity is usedand where its properties come from.The main purpose of the clustering quality measure is to direct the search for goodvertex clusters. The measures of interest here are based on following properties:• The optimization should lead to clusters <strong>with</strong> strong internal connectivity andsparse connections between clusters. This is the intuition behind ”a group ofpeople” and ”a subsystem”.• Secondly the strength of connectivity should be relative to an expected, averageconnectivity. This allows to adapt the search to application-specific questionsby providing a reference graph of expected edge weights.• In structureless graphs similar to the expected average all clusterings shouldhave only low quality. Otherwise the measure is biased towards discoveringstructure where none is expected.• Finally the number of clusters is not known in advance. It is the task of thequality measure to determine this number. Thus the measure has to avoidtrivial solutions. For example measures always leading to bisections are of nouse when more than two clusters are possible.The axiomatic derivation of quality measures is a popular approach. Here, like inmost cases, also the desired properties (axioms) are more or less designed to fit to thetarget quality measures. In contrast sometimes directly axioms leading to impossiblequality measures are searched. For example in [46] three properties for similarityclustering based on vertex distances are defined: Scale-invariance (scaling all distancesdoes not change the cluster structure), richness (for any possible clusteringalso a matching distance-function exists), and consistency (increasing inter-clusterdistances or decreasing intra-cluster distances does not change the clustering). Noquality measure can fulfill all three properties at the same time. Fortunately theproperties chosen above for graph clustering are not that restrictive.The next subsection introduces the connection density and the basic volume model.Then the relation between biased measures and the density analyzed. The thirdsubsection explores density-based clustering qualities by starting from the interclusteredge weight. This discussion also shows how the widely accepted modularityembeds into this framework. All quality measures will be independent of concretevolume models. Some specific volume models are discussed in the last subsection.2.3.1 Volume and <strong>Density</strong>How to measure the strength of connectivity in and between clusters? One pos-sibility arises by transferring the density from physics to graph clustering. Thenρ(A, B)9


2 <strong>Graph</strong> <strong>Clustering</strong>the connection density is the proportion of actual edges (mass) to ”possible” edges(volume). The actual edges are measured by the sum of their weights f(u, v) and the”possible” edge weight is given by the reference graph (V, Vol(u, v)) called volumemodel from hereon. Hence the connection density is defined as:Vol(u, v)f(A, B)ρ(A, B) :=Vol(A, B)ρ(A) :=f(A)Vol(A)(2.10)(2.11)As only undirected graphs are considered, it is reasonable to require the volumeto be symmetric Vol(u, v) = Vol(v, u) and positive Vol(u, v) ≥ 0. Because edgescan also be expected where no real edges are, the volume weights are defined on allvertex pairs. The volume has to be non-zero where the edge weight is above zero.2.3.2 Bias and the Null Modelf 0(u, v)Given graph (V, f) an uniform density graph (V, f 0 ) is derived by evenly distributingthe global density over all vertex pairs using ρ(V ) = f 0 (u, v)/ Vol(u, v). The uniformdensity weights f 0 (u, v) = ρ(V ) Vol(u, v) are called null model. The scaling by ρ(V )transforms the volume into average edge weights.A quality measure is called biased when in it does not assign the same qualityto all clusterings of the graph (V, f 0 ). This bias is unfavorable because it signalsstructure where no structure is expected. In reverse a quality measure is unbiasedif applying it to the null model yields a constant quality.The bias can be removed from a quality measure by either dividing by its nullmodel quality (scaling) or subtracting the null model quality (shifting) [28, 62]. LetQ f0 (C) be the quality of clustering C in the null model. Then unbiased measures arederived like in the equations below. In uniform density graphs the scaled measurehas the same quality 1.0 for all clusterings and shifting yields constant zero quality.2.3.3 Derived <strong>Quality</strong> <strong>Measures</strong>Q scaled (C) := Q(C)/Q f0 (C) (2.12)Q shifted (C) := Q(C) − Q f0 (C) (2.13)Reformulating the first clustering aim in terms of density, vertex partitions <strong>with</strong>low connection density between clusters and high inner density are searched. Thisaim may be approached directly by maximizing the difference or the ratio betweenintra-cluster density and inter-cluster density:Q diff (C) :=Q ratio (C) :=∑∑ i∈C f(C i)i∈C Vol(C i) −∑∑ i∈C f(C i)i∈C Vol(C i)∑i≠j∈C f(C i, C j )∑i≠j∈C Vol(C i, C j )( ∑i≠j∈C f(C i, C j )∑i≠j∈C Vol(C i, C j )(2.14)) −1(2.15)10


2.3 <strong>Density</strong>-<strong>Based</strong> <strong>Clustering</strong> <strong>Quality</strong> <strong>Measures</strong>Both quality measures are not well studied and probably rarely used. In practiceit proofed sufficient to concentrate on either the inter- or intra-cluster edges. To improvereadability let Cut ω (C) be the total inter-cluster weights ω(u, v) of clusteringC and Int ω (C) the total intra-cluster weights (the interior):Cut(C), Int(C)Cut ω (C) := ∑ω(C i , C j )/2 (2.16)i≠j∈CInt ω (C) := ∑ i∈Cω(C i ) (2.17)Interior and cut are related by Cut ω (C) = ω(V )−Int ω (C) where ω(V ) is the totalweight. The edge cut Cut f refers to the edge weights f whereas the cut volume isCut Vol . Using this short notation for example both measures of above can be writtenas:Q diff (C) := Int f (C)Int Vol (C) − Cut f (C)Cut Vol (C)Q ratio (C) := Int f (C)Int Vol (C)( Cutf (C)Cut Vol (C)(2.18)) −1(2.19)Traditionally the edge cut is used in load balancing problems to minimize the communicationoverhead represented by the edge weights. But <strong>with</strong>out any further constrains,like a predefined number of clusters or balance requirements, edge cut minimizationhas trivial solutions. For example placing all vertices in the same partitionhas zero cut. Additionally the edge cut is biased <strong>with</strong> Cut f0 (C) = ρ(V ) Cut Vol (C)and the interior <strong>with</strong> Int f0 (C) = ρ(V ) Int Vol (C). Removing this bias yields the scaledcut and scaled interior as discussed in the next subsection and the shifted cut andshifted interior discussed in the second subsection.Scaled Cut and Interior Removing the bias from cut and interior by scaling yieldsthe scaled cut and scaled interior:Q scaled cut (C) := 1 Cut f (C)ρ(V ) Cut Vol (C)Q scaled int (C) := 1 Int f (C)ρ(V ) Int Vol (C)= Cut ( )f (C) CutVol (C) −1(2.20)f(V ) Vol(V )= Int ( )f (C) IntVol (C) −1(2.21)f(V ) Vol(V )Remembering the definition of density it is visible that the scaled cut minimizes theinter-cluster density. The complete measure is the inter-cluster density normalizedagainst the global density. It can also be interpreted as the ratio between cut edges ininput graph and null model. The same holds for the scaled interior which maximizesthe intra-cluster density.11


2 <strong>Graph</strong> <strong>Clustering</strong>In order to optimize the separation between cluster the scaled cut is minimized.However this prefers coarse clusterings and bisections. Merging two clusters i ≠ j ∈C does not increase the inter-cluster density in when following condition holds:Cut f (C)Cut Vol (C) ≥ Cut f (C) − f(C i , C j )Cut Vol (C) − Vol(C i , C j )(2.22)This is the case for ρ(C i , C j ) ≥ Cut f (C)/ Cut Vol (C), i.e. when C i , C j have a higherinter-cluster density than the overall inter-cluster density. Intuitively there shouldalways be at least one pair <strong>with</strong> inter-density higher than the current ”average” interdensityand therefore at least one nontrivial clustering of minimal inter-density existshaving only two clusters. Another good reason to distrust the scaled cut is that thelowest inter-cluster density 0.0 can easily be obtained by putting all vertices into thesame cluster.Also emphasizing the inner cohesion in clusters by maximizing the scaled interiortends to trivial solutions: Splitting a cluster C i into C i,1 , C i,2 does not decreasethe overall intra-density in case ρ(C i,1 , C i,2 ) ≤ Int f (C)/ Int Vol (C). Finding a clusterwhich can be cut <strong>with</strong> lower inter-density than the current overall intra-clusterdensity is very likely. Optimal quality thus leads to very fine-grained clusterings.Placing each vertex into a singleton cluster except the vertex pair of highest intradensityyields an optimal clustering. Adding other clusters can only lower the overallintra-density.Although the scaled cut and interior are not biased they have trivial solutions.Hence both are not well suited for graph clustering. The scaled cut is related torandom walks and is known as conductance in that context [49].Shifted Cut and Interior Shifting the edge cut and interior by their null modelbias produces the shifted cut and shifted interior:Q shifted cut (C) := Cut f (C) − ρ(V ) Cut Vol (C) (2.23)Q shifted int (C) := Int f (C) − ρ(V ) Int Vol (C) (2.24)Minimizing the shifted cut searches clusters where the weight of cut edges is lowerthan the expected weight. Similarly shifted interior maximization prefers clusters<strong>with</strong> more internal edge weights than expected. Because of Int f (C) = f(V ) −Cut f (C) maximizing the shifted interior is ranking equivalent to minimizing theshifted cut. Thus both produce identical clusterings and share the same properties.Applying the analysis method of the previous section on the shifted cut yields thecondition ρ(C i , C j ) ≥ ρ(V ) for successfully merging two clusters and ρ(C i,1 , C i,2 ) ≤ρ(V ) for successfully splitting cluster C i into C i,1 ∪ C i,2 = C i . Hence the shiftedcut tends towards cuts <strong>with</strong> inter-cluster density smaller than the global ”average”density. The parameter ρ(V ) was introduced through the null model and is independentof the current clustering. It is possible to replace it by αρ(V ) in order tocontrol the granularity of the clusterings. But <strong>with</strong> α going to zero for example theshifted cut would develop towards the cut which ignores the volume.12


2.3 <strong>Density</strong>-<strong>Based</strong> <strong>Clustering</strong> <strong>Quality</strong> <strong>Measures</strong>A further question is how the quality measure handles clusters composed of disconnectedcomponents. Let C ′ be the components of a cluster not connected byany edge. By definition then the cut edge weight is zero: Cut f (C ′ ) = 0. Hencesplitting the components into separate clusters does not change the cut and interioredge weight. But the interior volume decreases and the cut volume increases.Therefore conforming <strong>with</strong> intuition the quality measured <strong>with</strong> shifted interior andcut improves through this separation.As already mentioned the shifted cut minimizes the cut of actual edges while maximizingthe cut of expected edges. Normalizing against the total edge weight f(V )allows another interpretation related to the scaled cut: The normalized shifted cutQ n-shifted cut (C) minimizes the fraction of inter-cluster edges while maximizing thefraction in the reference graph. Similarly the normalized shifted interior maximizesfraction of edges in clusters while minimizing the fraction of reference edges. This isconsistent <strong>with</strong> the desired property to find groups <strong>with</strong> stronger connections thanexpected.Q n-shifted cut (C) := Cut f (C)f(V )Q n-shifted int (C) := Int f (C)f(V )− Cut Vol(C)Vol(V )− Int Vol(C)Vol(V )(2.25)= −Q n-shifted cut (C) (2.26)Here it is visible that the normalized version of the shifted interior looks verysimilar to the modularity measure introduced in Section 2.2. In fact just thematching volume-model needs to be inserted. The condition for successful mergesρ(C i , C j ) ≥ ρ(V ) also helps to explain the observed resolution limit of the modularity[25]: Increasing the size of a graph by inserting vertices and edges most likelylowers the global density. Optimizing the modularity or shifted interior will make itnecessary to place several previous clusters into one as long as their inter-density isstill above to new global density. Recently Arenas et al. proposed a parametric correctionfor this problem by adding weighted self-edges [5]. Increasing the self-edgeweights mostly increases the vertex degree <strong>with</strong>out changing the graph topology.This results in additional terms in the volume and thus indirectly lowers the globaldensity ρ(V ).Altogether the preceding analysis shows that shifted cut and interior meet thedesired properties for clustering qualities: Stronger than expected internal connectivityis preferred by both measures, they are unbiased on uniform density graphs,and do not tend to trivial solutions.2.3.4 Volume ModelsCommunity structure results from the uneven distribution of edges in the inputgraph. The purpose of the volume model is to describe an even distribution asreference. The simplest method is to assign each possible edge unit volume byVol(u, v) := 1. But then vertices <strong>with</strong> a higher degree dominate the found clusterstructure as they have much more edges than the volume model expects. However forexample it is quite common that important vertices have a higher number if incident13


2 <strong>Graph</strong> <strong>Clustering</strong>V ol(u, v) = 6V ol(u, v) = 5w(u) = 3 w(v) = 2(a) Undirected Atomic Verticesw + (u) = 3 w + (v) = 2(b) Undirected Atomic VerticesFigure 2.1: Example Volume Models. Fig. (a) shows two vertices containing threeand resp. two atomic vertices. In (b) the vertices contain three and resp. two atomicsources but just one atomic target.edges than less important vertices. Thus for many applications it also necessary toreproduce chosen structural properties.In this section the basic volume model used in this work, called degree multiplicity,is derived from simple principles. The derivation is based on atomic verticesand coarsening invariance. Both are introduced first and then the degree modelis presented as a special case. Finally the discussion of related models highlightsimportant aspects for flexible implementations.A volume model for the graph (V, f) is a function Vol : V × V → Q. As the graphis undirected the volume should be symmetric Vol(u, v) = Vol(v, u). No negativeamounts of vertices can exist, thus the volume is positive Vol(u, v) ≥ 0 for all vertexpairs. A zero volume would produce infinite density. Therefore the volume has to benonzero Vol(u, v) > 0 where the edge weight f(u, v) is nonzero. With the normalizedquality measures of above the volume is scale invariant because constant positivefactors are canceled out by the normalization and the global density term.Given vertex weights w(v) ∈ N each vertex can be interpreted as a collection ofw(v) atomic vertices. This is like vertices representing whole subsystems as shownin Figure 2.1a. Effectively a vertex <strong>with</strong> weight n is accounted <strong>with</strong> n times moreedges than it would have <strong>with</strong> unit weight. Thus the vertex weight describes theimportance of the vertex. Counting all possible unit edges between these atomicvertices leads to the family of multiplicity volume models <strong>with</strong> Vol(u, v) := w(u)w(v)for all vertex pairs u ≠ v. Thanks to the scale invariance the relaxation to positiverational ”multiplicities” w(v) ∈ Q + is possible. The internal volume of a vertex isVol(v, v). Several reasonable alternatives exists. For example in case no self-edgesare expected anywhere it can be zero. But this work also handles self-edges andVol(v, v) := w(v) 2 /2 is used. This also greatly simplifies the computations as willbe visible below.Total volumes in the multiplicity model can be computed efficiently becausethe sum over weight products can be replaced by a product of two sums. Theinter-cluster volume is computed summing the vertex weights separately and justmultiplying these two sums. A similar approach is used for the intra-cluster vol-14


2.3 <strong>Density</strong>-<strong>Based</strong> <strong>Clustering</strong> <strong>Quality</strong> <strong>Measures</strong>ume. Here each vertex pair is counted twice and all vertices get a fixed self-edgevolume which needs to be compensated. However inserting the self-edge volumeVol(v, v) := w(v) 2 /2 removes this compensation and yields:Vol(A, B) =∑w(u)w(v) = ∑ w(u) ∑u∈A,v∈B u∈A v∈B( ) 2Vol(A) = 1 ∑w(v) − 1 ∑w(v) 2 + ∑ Vol(v, v)22v∈Av∈A v∈Aw(v) = w(A)w(B) (2.27)= w(A)22(2.28)This will proof to be an important property for the optimization algorithms of thiswork which are based on graph coarsening. A volume model is coarsening invariant ifvertices can be merged <strong>with</strong>out changing the volume: Let v ′ be the vertex producedby merging v 1 and v 2 , then Vol(v ′ , x) = Vol(v 1 , x) + Vol(v 2 , x) shall hold for allother vertices x and Vol(v ′ , v ′ ) = Vol(v 1 , v 1 ) + Vol(v 1 , v 2 ) + Vol(v 2 , v 2 ). In examplethe multiplicity model is coarsening invariant by using w(v ′ ) = w(v 1 ) + w(v 2 ) formerging the vertex weights.The degree multiplicity uses the above model to reproduce the degree central-ity by using the weighted vertex degree deg(v) as vertex weight. In this specialcase the global volume is Vol(V ) = w(V ) 2 /2 = 2f(V ) 2 and the global densityis ρ(V ) = (2f(V )) −1 . Therefore the null model derived from this volume isf 0 (u, v) = deg(u) deg(v)/(2f(V )). This is exactly the same null model as used forthe modularity derived from expected values of random graphs <strong>with</strong> probabilitiesproportional to the vertex degree [57].Other volume models based on the multiplicity are possible. In this work just thedegree multiplicity is used simply because it allows the comparison <strong>with</strong> other algorithmsand implementations. Nevertheless the study of alternative models highlightsdemands on the implementation. The following list points out some possibilities:degreemultiplicity• Other centrality indices (importance measures) can be used as vertex multiplicity.For example in the analysis of software system often the size of therepresented components in number of source lines (SLOC) is preferred as itproduces subjectively better results. 1• When using another self-edge volume it is necessary to store it for each vertexin order to correctly implement the coarsening invariance. Then the vertexweights and the self-edge volume are merged together <strong>with</strong> the vertices.• In some graphs vertex classes exist <strong>with</strong> no edges between vertices of the sameclass. The volume between vertices of the same class should be zero because noedges can be expected there. This can be achieved by storing separate vertexweights w 1 , . . . , w n for each class and replacing the product w(u)w(v) by asum of products for each pair of classes <strong>with</strong> expected edges. When mergingvertices simply their weight vectors are summed. For example bipartite graphshave two classes and w(u)w(v) = w 1 (u)w 2 (v) + w 2 (u)w 1 (v).1 from private communication <strong>with</strong> Ralf Kopsch at the Software Systems Engineering ResearchGroup in Cottbus15


2 <strong>Graph</strong> <strong>Clustering</strong>Figure 2.2: <strong>Graph</strong> of US Airports connected by direct flights. Airports lying far awaywere removed from the plot. The graph is called USAir97 in the graph collection.• For directed graphs the atomic vertex model could be extended by splitting theatomic vertices into two classes like shown in Fig. 2.1b: Atomic sources andtargets for directed edges <strong>with</strong> multiplicities based on the original in- and outdegrees.This leads to volume models similar to the one proposed for bipartitegraphs. This way the graph and volume used for the graph clustering stillcan be undirected and symmetric and existing algorithms can be reused. Inexample for software systems the source multiplicity of a component could bethe number of contained function calls and the target multiplicity the numberof provided functions.Reviewing the above possibilities indicates that the multiplicity model has a mathematicalstructure similar to inner product spaces. Let K be the set of possible vertexweights w(v) ∈ K. Then calculating the volume w(u) · w(v) ∈ Q is an operationsimilar to the inner product. The symmetry of the volume equals the commutativityof the implemented operator. Merging vertices uses the addition w(u) + w(v) ∈ K.The coarsening invariance is satisfied by the distributivity of addition and multiplication.However compared to standard vector spaces the inner products used forvolume models have much complexer semantics.A completely different class of volume models is based on geographic information16


2.4 Fundamental <strong>Clustering</strong> Strategiesabout the vertices. In such cases vertices lying far away are expected to have weakerconnectivity. For example the number of direct flights between airports is mainlyinverse proportional to their distance. Now consider a graph of airports connectedby edges when they have direct flight connections like shown in Figure 2.2. Theedges are weighted by the frequency of direct flights. <strong>Graph</strong> clustering methodsthen would indirectly just discover the distance as building component of clusters.However some airports might have exceptionally strong connections because theyfor example have economical ties or belong to the same country. In order to studysuch structures it would be necessary to remove the influence of the geographicaldistance. This might be achieved by integrating this distance into the volume model.More abstractly the volume is derived from a suitable chosen vertex (dis)similaritymeasure. Then weak connections are expected between dissimilar, high-distance vertices.In reverse similar, low-distance vertices are expected to be strongly connected.In this context the density-based graph clustering can be interpreted as a search forvertex groups being actually stronger connected than their similarity suggests.2.4 Fundamental <strong>Clustering</strong> StrategiesThis section tries to give an overview about clustering methods. Many methods combineseveral basic approaches. These components are provide useful ideas for thedesign of new algorithms. From a very short review of the literature common, oftenused components were identified. They can be broadly classified into constructingcomponents, directing components, and meta-strategies. Constructing componentsgenerate a clustering. They contain for example dissection, subdivision, agglomeration,and refinement algorithms. Directing components control how the clusteringis generated. This includes selection criteria and randomization methods. Finallymeta-strategies combine constructing components. Examples are recursive subdivision,multi-level refinement, basin hopping and genetic algorithms. In the followingtext the literature is shortly summarized. The next three sections describe basicdirecting and constructing components, and meta-strategies.Table 2.1 summarizes the graph clustering algorithms for the modularity measure.Most information was derived from reviews of Danon et al. [18] and Newman [56].The entries are classified by their basic strategy and the third column summarizessome details. A few publications appear several times as they incorporate differentstrategies.Other algorithms exist, which were not designed to optimize the modularity measure.These include algorithms for the minimum cut partitioning problem, which isan important topic in load balancing of for example computational fluid dynamicsimulations. In this context a lot of research is already done on graph clusteringalgorithms. Therefore this literature might contain useful ideas. A historic reviewof graph partitioning methods can be found in [14]. Table 2.2 summarizes somerelated methods.17


2 <strong>Graph</strong> <strong>Clustering</strong>ArticleBasic Strategyagglomeration strategies[15, 60] greedy cluster merging by maximal modularity increase[17] greedy merging by modified selector accounting for the cluster size[76] greedy merging by modularity increase and cluster size (consolidation ratio)[65] greedy merging by modularity increase; pre-coarsening of the graph usingrandom walks[4] pre-coarsening by directly merging single-edge vertices <strong>with</strong> their neighbor[57] vector clustering using vertex vectors derived from eigenvectors of the modularitymatrix[80] k-means clustering or recursive bisection on of the random walk matrix[64] greedy merging by similarity of visit probability distributions from shortrandom walks[21] greedy merging by distance of vertex vectors from the Laplace-Matrix[20] improved by using a modified matrix similar to the random walk matrixsubdivision strategies[58] recursive bisection using the first eigenvector of the modularity matrix[80] recursive bisection based on a few eigenvectors of the random walk matrix[81] bisection at voltage gap, similar to random walk betweenness; replaces matrixinversion by truncated power series[69] recursive spectral k-way subdivision <strong>with</strong> normalized Laplacian matrix;mixed <strong>with</strong> refinement using vertex moves and cluster mergesdissection strategies[59] removes edges by highest random walk betweenness[26] removes edges by highest information centrality[66] removes edges by lowest edge-clustering coefficientrefinement methods[57] Kernighan-Lin refinement of clusterings from spectral methods; recommendsgreedy refinement[23] extremal optimization: moves vertices <strong>with</strong> bad contribution to modularity;combined <strong>with</strong> recursive bisection[67, 68] simulated annealing <strong>with</strong> fuzzy clustering (co-appearance in the same cluster),null-model derived from a Q-potts spin model[39, 38] simulated annealing, spin glass system[52] simulated annealing combined <strong>with</strong> greedy refinement (quenches); basin hopping(move several vertices, apply greedy refinement, and then accept newclustering like <strong>with</strong> simulated annealing)[19] reduction to minimum cut partitioning <strong>with</strong> negative edge weights; adaptedmulti-level refinement from the METIS library[5] tabu search: move vertices and prohibit reversing any of the last few moveslinear programming[13] modularity optimization formulated in integer linear programming[2] formulated as fractional linear program; produces vertex distances in polynomialtime, then rounding the distances to {0, 1}Table 2.1: Overview of Modularity <strong>Graph</strong> <strong>Clustering</strong> Algorithms18


2.4 Fundamental <strong>Clustering</strong> StrategiesArticleBasic Strategy[7] growing cluster as shortest path neighborhood around a start vertexuntil only a few edges go outside this cluster[83] greedy merging of clusters using mean first-passage time of randomwalks; transition probability is biased by number of common neighbors[40] iteratively modifies the edge weights based on short random walks; theweight of ”inter-cluster” edges declines while edge weights in clustersraise; developed for clustering spatial data sets[32] removes edges by highest shortest path betweenness[75] clusters from flows in the graph; a generalized form of random walksand other Markov processes[6] approximation algorithm for a quality measure similar to inter-clusterdensity; applies spectral methods and linear programming[45, 24] moves vertices to neighbor clusters <strong>with</strong> escaping from local optima[43, 1] greedy merging by inter-cluster edge weight; already minimizes theedge cut during graph coarsening[43, 1, 78, 41] multi-level refinement based on Kernighan-Lin[40] uses snapshots of the agglomeration, called multi-scale; applies greedysmoothing as refinement[74, 71, 73, 72] evolutionary (genetic) search combined <strong>with</strong> multi-level min-cut partitioningTable 2.2: Overview of <strong>Clustering</strong> and Partitioning Algorithms2.4.1 Directing ComponentsModularity clustering is a computational difficult problem and polynomial time algorithmwill unlikely be able to find the optimum clustering in all cases. Actuallymost algorithms are heuristics which try to construct a relatively good clusteringfrom many small decisions. Unbiased heuristics use completely random decisions.In contrast biased heuristics use a selection criterion to direct the algorithm. Ineach step this selector chooses one of several available alternatives using informationinferred from the graph.The next paragraphs presents commonly used deterministic selection criteria. Thesecond paragraph show how linear programming can be used to construct selectors.And the last paragraph discusses randomized selectors.Selection Criteria Selection criteria can be derived from the global optimizationaim. For clustering by modularity simple selectors directly follow from the mathematicalformulation. A more global view might be attained using spectral methods[3, 21, 21, 6, 80, 57]. Other heuristics use centrality or betweenness indicesto select single edges or vertices. These are for example derived from shortestpaths [32], the information centrality [26], clustering coefficients [66], and randomwalks [40, 59, 64, 81, 83].Unfortunately sometimes just the selector alone is defined as optimization aimwhile the global reference measure is ignored. This leaves unclear what the final19


2 <strong>Graph</strong> <strong>Clustering</strong>clustering really describes and makes it difficult to compare different selectors for atask. Another important question is to which global aim the selector really is leading.Its possible that the local decisions optimize another property than desired. Here asmall analogy to statistical mechanics is visible. For example in Lattice-Boltzmannfluid simulations statistical methods are necessary to proof that the simple discretemodel <strong>with</strong> microscopic (local) collision rules results in the desired macroscopic(global) flows.Spectral methods replace the graph by a matrix and clusters by vectors. Thenthe clustering quality measure is reformulated <strong>with</strong> matrix-vector products. Thisallows to apply matrix diagonalization and map the problem into the eigenvectorspace. Similar to principal component analysis a few strongest eigenvalues and theireigenvectors are used to derive vertex vectors. Then selection criteria are computedfrom their length, distance, or direction.The vertex vectors actually position the vertices in a vector space. Other methodsto compute such positions exist. In example for graph drawing often spring embedderalgorithms minimizing an energy model are used. With an appropriate model vertexdistances are related to the cluster structure. This transforms graph clusteringinto a geometric clustering problem. For the modularity measure such layouts arecomputable from the LinLog energy model [62]. In energy minima of this model thedistance between two clusters is roughly inverse proportional to their inter-clusterdensity. However despite the provided global information this method is practicallyunusable as computing layouts is very expensive.Linear Programming Linear programming methods reformulate the clustering qualitymeasure as a linear target function. As input to this function a vector of pairwisevertex distances is given. Vertices of the same cluster have for example zero distanceand vertices in different clusters have distance one. The transitivity, reflexivity, andsymmetry of clusterings is enforced through linear constraints similar to triangleinequalities.Now additional constraints limit the vertex distances to the range from zero toone. This is known as fractional linear programming. Using inner-point methodsexact fractional solutions can be found in polynomial time. Agarwal and Kempe [2]proposed an agglomeration algorithm to round the distances to zero and one. Thusbasically the distances are used as merge selector. As side product the fractionalsolution provides an upper bound for the highest possible modularity of a graph.Because the set of possible clusterings is a subset of all allowed distance vectors, noclustering can have a better modularity than the fractional solution.Brandes et al. [13] used a similar method to derive a binary linear program. Therethe vertex distances are restricted to zero and one. Finding an optimum solutionstill is NP-complete and requires exponential time. But quite good algorithms existsimplementing this search given the target function and constraints. This allowed tofind the global optimum clusterings of graphs <strong>with</strong> up to 100 vertices.Randomization All optimization heuristics may become trapped in suboptimalclusterings. Randomizing the decisions a little bit is one method to avoid or escape20


2.4 Fundamental <strong>Clustering</strong> Strategiessuch situations. For example this can be achieved by using random selection criteriaor adding a little bit of randomness to the selector.One often used method is simulated annealing [67, 68, 39, 38]. Typically a clusteringssimilar to the current clustering is constructed, i.e. by randomly moving avertex to another cluster. The main idea is to direct the search by not accepting eachgenerated clustering. Better clusterings are always accepted. But worse clusteringsare just accepted <strong>with</strong> a probability proportional to the modularity decrease ∆Q.Often the metropolis criterion exp(T −1 ∆Q) is applied. The parameter T controlsthe temperature of the algorithm. In the beginning the temperature is high andmany modularity decreasing moves are accepted. This enables a widespread search.Later the temperature is lowered, increasing the portion of modularity increasingmoves. With zero temperature only better clusterings are accepted and thus thealgorithm finishes in a local optimum.Duch and Arenas [23] proposed a recursive bisection algorithm. The two clustersare computed using extremal optimization applied to an initial random bisection.Extremal optimization can be interpreted as a biased simulated annealing. Themoved vertices are not randomly chosen but selected based on their current contributionto the modularity. This selection criterion is called vertex fitness. The ideais that moving low-fitness vertices also improves the clustering quality.2.4.2 Constructing ComponentsThis subsection presents various heuristics to construct clusterings. They use theselection criteria presented above to direct the construction. Dissection is one ofthe oldest strategies often used in the analysis of social networks. It observes howgraphs and clusters fall apart. More details are presented in the next paragraph. Thesecond paragraph describes agglomeration method. They observe how large clusterscan be put together from smaller clusters. The last paragraph presents refinementmethods that try to improve given clusterings by moving vertices between clusters.Dissection Dissection algorithms repeatedly remove vertices or edges from thegraph and observe the developing connectivity components. Because clusterings ofthe vertex set are searched it is more common to remove edges. Removing an edgecan split a cluster in at most two parts. Thus the produced hierarchical clusteringis a binary tree.Girvan [32] proposed a dissection algorithm which removes edges which are betweenclusters and least central to any cluster. These are identified by counting howmany shortest paths from arbitrary vertex pairs pass through each edge. Less edgesare expected between clusters and thus more paths should touch them. After eachremoval all betweenness values are recalculated. Later Girvan and Newman [59]proposed the random walk betweenness as improved selector. In the same paper themodularity was introduced as measure for the optimal number of clusters.Agglomeration Agglomeration methods grow clusters by merging smaller clusters.The process begins <strong>with</strong> each vertex placed in a separate cluster. In each step a pairof clusters is selected and merged. Greedy methods only merge pairs which increase21


2 <strong>Graph</strong> <strong>Clustering</strong>the global quality until no such pair is left. The merges are directed by a selectorthat tries to predict the optimal cluster structure. As the method alone is unableto correct wrong merges later, good predictors are necessary. These should lead toa faithful cluster structure <strong>with</strong> a high quality. As side product the merge historyprovides a hierarchical clustering.One of the most used agglomeration methods is due to Ward [79]. He developeda hierarchical clustering method to assign US Air Force personnel to tasks whileoptimizing multiple objectives. By design agglomeration is very similar to the graphcoarsening used in multi-level graph partitioning algorithms [43, 78, 41]Mark Newman was the first using agglomeration to directly optimize the modularitymeasure [60]. He called it greedy joining and selected clusters by the maximalincrease of modularity. Later the algorithm was accelerated by using sorted treesfor faster lookups [15]. Recently Wakita and Tsurumi [76] proposed a different implementation.Like in [17] they observed an unbalanced growth behavior <strong>with</strong> theoriginal selector. They introduced various consolidation ratios as relative size ofcluster pairs. Then as selection criterion the quotient of modularity increase andconsolidation ratio was applied.Cluster Refinement Refinement methods improve an initial clustering by applyinglocal modifications. They are an ideal way to correct small errors in the clustering.Generally refinement methods construct several similar clusterings from the current.The most common type of such neighborhood is formed by moving single vertices intoother clusters. As the next clustering the best from this neighborhood is selected.Greedy refinement accepts only new clusterings <strong>with</strong> a better quality than thecurrent. Thus the refinement ends in a local optimum where all clusterings of theneighborhood have worse quality. Kernighan and Lin [45] proposed a very popularmethod that tries to escape such local optima by accepting small sequences of slightlyworse clusterings. Fiduccia and Mattheyses [24] simplified this strategy by justconsidering single vertex moves instead of constructing and moving whole groups.Most of these methods were developed for minimum cut graph partitioning. Inthe context of modularity clustering recently simple refinement methods based ongreedy and Kernighan-Lin refinement were proposed [57, 5, 69]. Unfortunately nodetailed study exists how the refinement perform in modularity clustering althoughthe objective is quite different from minimum cut partitioning.2.4.3 Meta-StrategiesFinally meta-strategies combine construction methods from the previous subsection.The idea is to exploit advantages of single components while compensating disadvantages.The strategies discussed in the next paragraphs are recursive subdivisions,multi-level approaches, basin hopping, and genetic algorithms.Recursive Subdivision and Hierarchical <strong>Clustering</strong>s Some algorithms can only finda fixed number of clusters, for example [80, 81, 80, 58]. In order to find more clustersand better clusterings the algorithms are applied recursively on each previouslycomputed cluster. Subdivisions into exactly two groups are called bisection and22


2.4 Fundamental <strong>Clustering</strong> Strategiesfirst bisectionsecond bisection(a) Recursive Bisection(b) Hierarchical <strong>Clustering</strong> and DendrogramFigure 2.3: Recursive Subdivision and Hierarchical <strong>Clustering</strong>s. The red dashed linemarks the cut through the dendrogram. The corresponding clustering is drawn byred circles.an example recursive bisection is shown in Figure 2.3a. The recursive applicationresults in a hierarchical clustering.Like agglomeration methods recursive subdivision is unable to correct errors.Moreover the subdivision methods work <strong>with</strong> a fixed number of clusters. This imposesa structure on the clusterings which may hide the correct structure [57]. Forexample a bisection of a graph <strong>with</strong> three clusters might falsely split one of theclusters into two parts. Further bisections will not correct this.Also other algorithms producing hierarchical clusterings exist. For example themerge steps in agglomeration methods and the connectivity components of dissectionmethods provide a cluster hierarchy. Such hierarchies are often drawn as a tree calleddendrogram. The root is a cluster containing all vertices and each leaf contains justa single vertex.To derive a normal, flat clustering a cut through this tree is searched as shown inFigure 2.3b. The simplest variants select the level <strong>with</strong> best modularity [59]. Moreelaborate implementations search for better clusterings by shifting the level up anddown in subtrees [63]. Of course this search is inherently limited by the underlyingtree.Multi-Level Approaches Multi-level methods were developed to improve the searchrange and speed of partition refinement methods by moving small vertex groups atonce. These groups and the connections between the groups are represented as acoarse graph. In this graph it is sufficient for the refinement to move just singlecoarse-vertices. Applying this idea recursively leads to multi-level refinement methodslike [41, 78]. Karypis and Kumar [43] were the first proposing to use agglomerationmethods to already optimize the clustering quality during the constructionof coarse graphs. However up to now no multi-level refinement algorithm exists todirectly optimize the modularity.23


2 <strong>Graph</strong> <strong>Clustering</strong>Basin Hopping The basin hopping of Massen and Doye [52] is a high-level searchalgorithm. The algorithm proceeds by jumping between local optima. Like in simulatedannealing a neighbor local optimum is accepted as new clustering based onthe metropolis criterion. These neighbor clusterings are found by first performingsome random vertex moves to leave the current local optimum. Then greedy refinement,called quenching, is applied to move into a (hopefully new) local optimum.In summary this strategy combines unbiased search methods <strong>with</strong> strongly directedlocal optimization.Genetic Algorithms Another form of high-level randomization are genetic algorithms.These store a pool of candidate clusterings. New clusterings are created bycombining several old clusterings (crossing) or modifying a clustering (mutation).The clusterings are rated according to a quality measure, which is called fitness inthis context. Just the best, fittest clusterings survive while old are removed.For minimum cut load balancing Soper et al. [74, 71, 73, 72] developed a particularlygood genetic algorithm called evolutionary search. In order to push thepool to high-quality clusterings they improve the genetically generated clusteringsusing greedy multi-level refinement. Crossing and mutation is implemented by temporarilymodifying the edge weights of the graph based on the input clusterings.Afterwards the quality of the refined clustering is evaluated on the original graphand used as fitness of the clustering.24


3 The Multi-Level Refinement AlgorithmThis chapter presents the design of a family of algorithms for finding solutions tothe modularity clustering problem. The input is an undirected graph <strong>with</strong> weightededges and the algorithms search a clustering (partitioning) of the vertex set <strong>with</strong>high modularity. Finding clusterings <strong>with</strong> maximum modularity is known to be NPcomplete[13]. Therefore the presented algorithms are heuristics <strong>with</strong> polynomialbounded runtime and are unlikely to find global optima.Nowadays quite often agglomeration heuristics are employed for modularity clustering.These operate by successively merging adjacent clusters until the clusteringquality cannot be improved further. Most of them share one strong disadvantage:The merge decisions are based on insufficient local information but wrong decisionsare not corrected later.On the other hand refinement heuristics have the potential to correct these mergeerrors. In general refinement heuristics construct a neighborhood of problem solutionssimilar to the current solution and select the one <strong>with</strong> the best quality out ofthese. A situation in which all neighbor solutions have worse quality is called localoptimum. In the context of graph clustering the problem solutions are clusterings ofthe graph and local cluster refinement searches for clusterings <strong>with</strong> higher modularityby moving single vertices between clusters. Unfortunately all refinement methodsrequire an initial clustering and it is difficult to find good clusterings to start <strong>with</strong>.Furthermore refinement is not effective in huge graphs where each vertex has onlya tiny effect on the modularity. Many very small movements would be necessaryto walk in and out of local optima. This makes it very unlikely to find other localoptima by walking through a series of low-quality clusterings.Both refinement problems are tackled by the multi-level approach presented in thenext sections. It is based on the min-cut partitioning methods of Karypis [1] andbasically widens and accelerates the refinement search by moving whole groups ofvertices at once. These groups are constructed using agglomeration methods whichare presented the section about graph coarsening. They are similar to Newman’sgreedy joining [60, 15] but use other merge selection criteria. The fourth section exploresthe available refinement methods. Those are mostly derived from Kernighan,Lin, Fiduccia, and Mattheyses [45, 24] but are adapted to the dynamic number ofclusters present in modularity optimization. The employed local operations are summarizedin Figure 3.1. Merging clusters is exclusively used during graph coarsening.And during refinement vertex movements are applied. The last section completesthe chapter <strong>with</strong> additional insight into key concepts of the implementation.25


3 The Multi-Level Refinement Algorithmmove to other clustermove to new clustermerge clustersFigure 3.1: Operations for local cluster modification. The operations move thecluster boundaries, displayed as dashed lines. This changes which edges are in theedge cut. These are highlighted by red lines.input graphcoarseningrefinementrecursion orinitial clusteringFigure 3.2: The Multi-Level Scheme3.1 The Multi-Level SchemeAs already mentioned the refinement heuristic requires a good initial clustering.Unfortunately also the number of clusters is not known in advance. Assigning allvertices to a big cluster cannot be good because the number of clusters is muchtoo small. Similarly assigning each vertex to a separate cluster starts <strong>with</strong> toomany and there is no good reason why a random clustering should be better. Onthe other hand agglomerative clustering methods perform relatively well at findinggood clusterings.To accelerate and widen the refinement the algorithm initially trades search rangeagainst accuracy by moving whole groups of vertices at once (cf. Fig. reffig:alg:multilevel).Afterwards smaller groups or single vertices are moved for local improvements.The underlying data structures are unified by contracting these vertex groupsinto a coarse graph. This approach is applied recursively to produce a hierarchy ofbigger groups and coarser graphs. Refinement is applied at each of these coarseninglevels beginning on the coarsest. The question of good vertex groups is coupled tothe initial clustering: The coarsest groups already are a clustering and can be used26


3.1 The Multi-Level Schemeas starting point. Inversely the coarsening can be interpreted as an agglomerativemethod which regularly stores snapshots as coarsening levels.Data: original graph,coarsener,refinerResult: clusteringcreate level[1] from original graph;for l from 1 to max levels do// coarsening phaseclustering ← coarsener(level[l]);create level[l + 1] from clustering;reduce edge and vertex weights level[l] → level[l + 1];l max ← l;if not successful then break;clustering ← vertices of level[l max ] ; // initial clustering phasefor l from l max to 2 do// refinement phaseproject clustering onto level[l − 1];clustering ←refiner(clustering);project clustering onto original graph;Figure 3.3: The Multi-Level AlgorithmThis approach is also shown in pseudo-code in Figure 3.3. It is known as multilevelpartitioning and is a special form of the Algebraic Multi-Grid method (AMG).To simplify the implementation the method is reformulated in three phases. Thismakes it easier to centrally select the implementations used for the coarsener andrefiner. In the coarsening phase vertex groups are created by merging clusters beginning<strong>with</strong> singleton clusters until the number of groups is satisfyingly reduced.From these groups the coarse graph is build. This is repeated until the graph cannotbe coarsened further. The initial clustering phase simply uses the vertices of thecoarsest graph as initial clustering. Finally during the refinement phase the clusteringis successively projected onto the next finer graph and refined <strong>with</strong> vertexmovements until the original graph is reached.Implementation Notes The vertex groups build during the coarsening phase areclusterings. They are stored as a mapping of vertices to their current cluster. Toefficiently update this mapping on cluster merges the disjoint-set data structure <strong>with</strong>path compression is used like in many union-find algorithms. Thus for all practicalgraph sizes updates and lookups take constant time on average.After each coarsening pass the contracted graph is generated by registering acoarse vertex for each cluster. For each edge in the fine graph an edge betweenthe coarse vertices of their end-vertices is added if not already existing. Withthe necessary edge lookups this costs O(|V | + |E| 2 ), using a temporary search treeO(|V ||E| log |E|). Because in practice this proved to be a significant difference theimplementation uses a tree for each vertex to quickly search in its incident edges.Simultaneously a mapping of fine-graph vertices to their coarse vertex and fineedges to their coarse edge is built. Structurally this is a graph homomorphismrespecting vertex adjacency. The homomorphism is stored together <strong>with</strong> the coarsegraph in level[l+1] and allows to transfer vertex and edge properties between both27


3 The Multi-Level Refinement Algorithmgraphs. Transferring weights from the fine to the coarse graph requires to add upthe weights and is therefore called reduction or restriction. Some care is necessaryto transfer the weight of edges reduced to self-edges correctly because two directededges (one in each direction) are mapped to onto one self-edge. Later clusteringsare transfered back from the coarse to the fine graph by assigning all fine vertices tothe same cluster like their coarse vertex. This is called projection or prolongation.Thanks to the stored homomorphism these transfers cost linear time in the size ofthe finer graph, i.e. O(|V | + |E|).3.2 <strong>Graph</strong> Coarsening<strong>Graph</strong> coarsening is the first phase of multi-level refinement methods. Its task isto group vertices together and build condensed, smaller graphs from these groups.The overall aim is to find a good initial clustering, which is indirectly given throughthe coarsest graph, and to build vertex groups that assist the refinement heuristicslater. This section presents the implemented coarsening algorithms.The coarsening is accomplished in two steps: First an agglomeration heuristicproduces a clustering of the vertex set. Later <strong>with</strong> this the next coarsening levelis built by contracting all vertices of each cluster into a coarse-graph vertex. Theagglomeration heuristic successively merges pairs of clusters starting from one-vertexclusters. Each merge decreases the number of future coarse-graph vertices by one.The maximal number of merges is defined by the reduction factor multiplied <strong>with</strong>the number of initial vertices. Thus ideally each coarsening pass reduces the numberof vertices by a constant factor leading to a logarithmic number of levels.To find good coarse graphs the process is controlled by a selection strategy: Onlypairs of adjacent clusters increasing the modularity are considered for merges andthese pairs are further ranked by a selection quality. Various selection qualities arediscussed in the subsequent section about merge selectors.The next subsections discuss the two implemented agglomeration methods calledgreedy grouping and greedy matching. The first is based on Newman’s greedy joining[60]. Using matchings is a more traditional approach from multi-level graphpartitioning [43]. Following list summarizes all parameters controlling the coarsening.Their use is explained together <strong>with</strong> both algorithms in the next two sections.coarsening method The agglomeration method: Either grouping or ”matching.reduction factor Multiplied <strong>with</strong> the number of vertices. Gives the number of mergesteps to perform on each coarsening level. Indirectly defines the number ofcoarsening levels.match fraction Multiplied <strong>with</strong> number of good vertex pairs. Excludes worst rankedpairs from the matchingselector The merge selection quality to use.28


3.2 <strong>Graph</strong> CoarseningData: graph,selector,reduction factorResult: clusteringmerge count ← reduction factor ∗ num vertices(graph ) + 1;while merge count > 0 do// (a, b) ← selector best edge() in two steps:a ← selector best vertex();b ← selector best neighbor(a);if found (a, b) thenclustering.merge(a,b), update graph and selector;merge count ← merge count − 1;elsebreak;Figure 3.4: Coarsening Method: Greedy Grouping3.2.1 Greedy GroupingIn each step greedy grouping searches the globally best ranked pair of clusters andmerges both <strong>with</strong>out compromises. Figure 3.4 summarizes the algorithm in pseudocode.To quickly find the best ranked pair the clustering is also represented by atemporary coarse graph. There each cluster is a vertex and the best neighbor of eachvertex is stored by the selector. Thus just the vertex <strong>with</strong> the best ranked neighboris searched and then its internally stored best neighbor is retrieved. The coarseningends if either no good pair is found or enough clusters were merged.During each merge the best-neighbor information has to be updated. This requiresto visit the neighbors of the two merged clusters for updating the selectionqualities between them. Thus the temporary coarse graph is used to cache mergedvertex and edge weights and the merged adjacency lists. Incrementally updatingthis graph leads to complex data structures and interactions but still is much fasterthan collecting the information in the original fine graph each time.Implementation Notes Computing selection quality and updating the modularityrequires to access edge and vertex weights. With each merge these weights have tobe updated. This is achieved by informing the container data structures during themerge operation about elementary operations like merging and moving edges. Additionallythe merge selector is informed in order to update the best merge neighborlike described in [76]: For finding the best merge neighbor of a vertex the selectionquality to all adjacent vertices has to be evaluated. But a new best neighbor has tobe searched only when the selection quality to the old best neighbor decreases. Inall other cases it is sufficient to compare the changed selection quality of a vertexpair to the current best neighborWith greedy grouping the next best merge pair has to be found repeatedly. Thisis efficiently done <strong>with</strong> a binary heap stored in a supplementary array. The heap isupdated when the best merge neighbor of a vertex changes. In this case the merge29


3 The Multi-Level Refinement Algorithm11mergeedges2a11bmoveedgeab11newself-edgeFigure 3.5: Merging two Vertices and their Edgesselector informs the heap about the new neighbor and selection quality and thevertex is moved up and down through the heap into its correct position. Empirictests suggested that the runtime is already acceptable <strong>with</strong>out the heap. Thus it isdisabled by default to reduce possible sources of error and regression.A temporary coarse graph is maintained to store edge weights and adjacencyinformation for the merge selector. In this graph each cluster is represented by avertex. Merging two clusters equals merging their vertices and incident edges. Thisis known as edge contraction and is the most expensive operation of each merge stepbecause the incident edges of both vertices have to be processed. Let the two involvedvertices be the to be removed and the target vertex. For example in Figure 3.5 thevertex b is to be removed and vertex a is the target. Edges to a common neighborare merged by simply dropping the superfluous edge of the removed vertex. Edgesto a vertex not adjacent to the target vertex have to be moved to it.The merge data structures proposed by Wakita and Tsurumi [76] were implemented.The outgoing edges of each vertex are stored in a double-linked list sortedby the end-vertex index. Now both lists can be merged in one pass <strong>with</strong>out searchingthe matching out-edges of the target vertex. Some special cases arise however:Updating self-edges is difficult because their position is not predictable. Thus theyare always stored as the first out-edge. Unfortunately the original paper leaves openhow to correct the position of moved edges in the list of their end-vertices. Thelists should be sorted by the end-vertex index but moving an edge changes the indexfrom the removed to the target vertex. The simplest fix is to reinsert such edges intothe list, which costs linear time in the size of the list. Therefore worst-case runtimeO(|V | 2 ) of this coarsening method is reached in case all |V | edges of the removedvertex have to be moved to the target vertex <strong>with</strong>out merging edges.The overall worst case time complexity for a complete merge step is O(|V | 2 ):Selecting the best pair costs O(|V |) <strong>with</strong> a linear search or O(log |V |) using theheap. The two clusters are merged in constant time using union-find. Combiningthe edge lists requires quadratic time O(|V | 2 ) in worst case. Updating the bestneighborinformation also requires a linear search at each adjacent vertex in worstcase. This adds quadratic time O(|V | 2 ). In addition updating the best pair heapmay cost up to O(|V | log |V |). However in sparse graphs the number of incidentedges often is much smaller than the total number of vertices. And in dense graphs30


3.2 <strong>Graph</strong> Coarseningnearly no edges require movement because they point to a common neighbor. Hencethe average merge time should be nearly linear instead of quadratic.Related Work Clauset at al. use search trees to store the incident edges [15]. Eachedge of the removed vertex is removed from the tree of its end-vertex. Its entryis searched in the list of the target vertex. In case target and end-vertex are notalready adjacent the edge is inserted into both trees. Each vertex can have atmost |V | adjacent vertices, thus merging two vertices <strong>with</strong> this method would costO(|V | log |V |) time in the worst-case.Greedy grouping is structurally very similar to the minimum spanning tree problem<strong>with</strong> the selection quality as edge weights. But most selection qualities change theirvalues dynamically depending on merge steps. Still some alternative techniques<strong>with</strong>out directly contracting edges but deferring some operations to later steps mightbe applicable, cf. [27].3.2.2 Greedy MatchingData: graph,selector,match fraction,reduction factorResult: clusteringforeach pair of adjacent vertices (a,b) do // collect possible mergesif merging a, b would improve quality thensq ← selector quality(a,b);add (a, b, sq) to pair list;num pairs ← num pairs + 1;sort pair list by decreasing sq;good = match fraction ∗ num pairs + 1;merge count = reduction factor ∗ num vertices(graph ) + 1;foreach pair (a, b) in pair list doif a and b not marked thenmark a and b;merge clusters of a and b;merge count ← merge count − 1;good ← good − 1;if merge count = 0 ∨ good = 0 then break;Figure 3.6: Coarsening Method: Greedy MatchingThe greedy matching method operates by first constructing a matching, i.e. alist of independent vertex pairs, and the next coarsening level could be directlygenerated from these pairs. The pseudo-code is shown in Figure 3.6. In the firstphase all pairs of adjacent vertices are collected and sorted by their selection quality.Pairs not improving modularity are filtered out in advance. This list is processedbeginning <strong>with</strong> the best ranked pairs. Merged vertices are marked and pairs <strong>with</strong>31


3 The Multi-Level Refinement Algorithmat least one already marked vertex are ignored. When a pair of unmatched verticesis found they are merged.Unfortunately the matching also selects many bad ranked pairs even when theirvertices have much better but already matched neighbors. Especially at the end ofa long coarsening pass bad pairs would be grouped together just because all goodpairs are already used up. The purpose of the ”good” counter is to avoid this byconsidering only a fraction of the best ranked pairs.The matching method also has a second downside: It is possible to fail the reductionaim when most vertices have only few common merge partners which arealready matched. For example if a vertex has many neighbor vertices which haveno further neighbors but this central vertex, many not independent merges are necessary.In that case many coarsening levels will be produced. This situation occursalso in praxis for example <strong>with</strong> graphs which feature a power-law vertex degreedistribution [1].Constructing the matching adds some complexity to the algorithm compared tothe simple looking greedy grouping method. But thanks to the independence of themerged pairs no updated selection qualities are required and the dynamic coarsegraphcan be omitted in favor of static data structures. However for simplicity thisimplementation reuses most data structures from the grouping method.Producing all coarsening levels has time complexity O(log |V | |E| log |E|) on wellstructuredgraphs: There are around log |V | coarsening levels and in each level theedges have to be sorted by the selection quality which costs O(|E| log |E|). Theactual merge operations are done in constant time because no complicated updatesare necessary.Related Work For a theoretical analysis of approximation guarantees other matchingsthan the implemented greedy strategy might be more useful. For exampleMonien et al. [54] developed more elaborate greedy matching strategies for min-cutpartitioning and were able to derive approximation factors. Also optimal matchingare computable in polynomial time using Edmond’s algorithm [55].3.3 Merge SelectorsThe merge selector is used by coarsening algorithms to select cluster pairs for merging.The clusters and their relation are encoded in a coarsened graph in which eachcluster is represented by a vertex. Each edge in this graph (except self-edges) together<strong>with</strong> its two end-vertices embodies such a candidate merge pair. The mergeselector ranks these pairs by computing a selection quality and pairs <strong>with</strong> high selectionquality are selected first.The selector’s task is to lead to a good hierarchy of coarse graphs supporting thesubsequent cluster refinement. In a good hierarchy the coarsest graph provides aninitial clustering of high modularity. Secondly all coarsening levels shall provideevenly grown vertex groups <strong>with</strong> relatively similar influence on the modularity ofthe clustering. Otherwise the refinement will have only move options changing themodularity either too much or too weak which would obstruct the search.32


3.3 Merge SelectorsExtent Name Descriptionlocal modularity increase highest increase of modularitylocal weight density highest density between both clusterssemi-global random walk distance (t) best neighborhood similarity<strong>with</strong> t random walk stepssemi-global random walk reachability (t,i) highest weight density <strong>with</strong> reachabilityof neighbor vertices as weight;<strong>with</strong> t steps and i iterationsglobal spectral length approx. best modularity contribution inspectral decompositionglobal spectral length difference approx. highest modularity increase inspectral decompositionglobal spectral angle most similar direction of the spectralvertex vectorsTable 3.1: Proposed Merge Selection QualitiesBoth aims are pursued by trying to predict which vertex pairs lie in the samecluster compared to the unknown optimum clustering. In this setting the selectionqualities are the predictors. Their prediction quality (success probability) can beroughly measured by comparing the initial ranking of pairs against the best knownclustering. The second aim may be fulfilled by proposing pairs well spread overthe graph and not concentrated in a single region. But other factors may be moreimportant, like for example the question when merge errors (merging vertices ofdifferent optimum clusters) occur during the coarsening. Ultimately the only reliablemeasure of success are the modularity clustering qualities reached after coarseningand after refinement.The presented selection qualities are classifiable by the extent of incorporatedinformation into local, semi-global, and global measures. Local measures use onlyinformation about the two adjacent vertices and their connecting edge. Semi-globalmeasures try to incorporate more information from the neighborhood of both verticesand global measures consider the topology of the whole graph at once.Non-local selectors can differ in how they are updated during merge steps. Dynamicallyupdated selectors recompute the selection qualities after each merge, whereasstatic selectors just use simple approximations on the intermediate coarse graphs.All presented selection qualities are static. The key properties are computed once atthe beginning of each coarsening pass. During single merge steps values are mergedand updated just locally.Table 3.1 summarizes the proposed selection qualities. They are discussed in moredetail in the following sections beginning <strong>with</strong> local selectors followed by randomwalks and spectral methods. Each of these sections is subdivided into a short introductionof the key ideas, presentation of the theoretical background and concludes<strong>with</strong> the discussion of derived selection qualities.33


3 The Multi-Level Refinement Algorithm3.3.1 Local Merge SelectorsStudying the change of modularity during merges in the context of greedy agglomerationleads to a constraint depending on local properties. Two selection qualitiesmodularity increase and weight density are derived from this constraint. Thefollowing sections present the theoretical background and discusses both selectionqualities.Theoretical Background Like mentioned earlier clusters are represented by temporarycoarse vertices. As only adjacent clusters are considered for merging, bothvertices of each merge pair are connected by an edge. Therefore the available localinformation about each pair is the edge weight f(a, b), the volume Vol(a, b) and theexpected edge weight ρ(V ) Vol(a, b). In addition properties of both clusters like theinterior edge weight f(a, a) and the interior volume Vol(a, a) are locally available.Merging two clusters a, b changes the modularity Q(C) to Q(C ′ ) and the greedyagglomeration requires increases quality <strong>with</strong> Q(C ′ ) > Q(C). Inserting the modularitydefinition((unnormalized: simpler to read and ranking equivalent)) Q(C) =Int f (C)−ρ(V ) Int Vol (C) gives the constraint f(a, b)−ρ(V ) Vol(a, b) > 0 because themerge adds f(a, b) to the interior edge weights. As visible this constraint dependsjust on local properties an may be used to derive selection qualities.Modularity Increase In order to reach a good final modularity it sounds reasonableto merge the pair of highest quality increase Q(C ′ ) − Q(C) in each step. Thecalculation is simplified by removing the clusters where nothing changes. This leavesthe contribution of both clusters before and after the merge. These differ just bythe removed edge weight and volume between both. Thus the modularity increaseis:modularityincreaseQ(C ′ ) − Q(C) = f(a, b) − ρ(V ) Vol(a, b) (3.1)This selection quality is also a form of the greedy constraint from above andwas already proposed by Newman [60] for his agglomerative clustering algorithm.Afterwards several authors (including himself) observed problems <strong>with</strong> the growthbehavior triggered by this selector and proposed various fixes [17, 76].The dynamic behavior can be described as follows: The cluster of the largestcontribution to modularity is merged <strong>with</strong> its best neighbor. This further improvesthe strength of this cluster and the improvements possible by merges <strong>with</strong> smallneighbors. The process goes on until all good neighbors of the optimum clusteringare used up but then continues <strong>with</strong> bad neighbors until the influence of this hugecluster is low enough again to leave other merge pairs a chance. The whole cyclerepeats several times producing clusters <strong>with</strong> an exponential distribution of thecluster size.The proposed corrections involve scaling the selection quality by the inverse clustersizes to prefer merging either small clusters or pairs of similar size. But the realproblem are the merge errors introduced in the second phase of shrinking influenceand in many situations vertices and clusters belong together in spite of their very34


3.3 Merge Selectorsdifferent size. Therefore these modifications were not implemented in favor to theselection quality presented in the next section.Weight <strong>Density</strong> The greedy constraint for merge pairs was f(a, b)−ρ(V ) Vol(a, b) >0. Selectors are derived from such inequalities by using the left side as selectionquality and the right side as filter condition. As the greedy filtering is already donefor all selectors it can be ignored here. Other selection qualities are obtained bytransforming the inequality.Adding the expected edge weight on both sides yields f(a, b) > ρ(V ) Vol(a, b).This resembles heavy edge matching [43] and ignores the volume model completely.Hence it most probably is not suitable for modularity clustering. But now dividingthe constraint by the volume Vol(a, b) results in:f(a, b)/ Vol(a, b) > ρ(V ) (3.2)weight densityThis suggests to select pairs by highest inter-density and is implemented as theweight density selector. A nice advantage of this selector is its consistency <strong>with</strong> theoverall density-based approach.In each merge step these local selection qualities are recalculated from the mergededge and vertex weights. The weights of merged edges are summed and the coarsevertexweight is the sum over both vertices. From the definition follows that onlythe values to adjacent vertices can change.3.3.2 Random WalksAdditional to using direct information from the vertices also their neighborhoods canbe compared. Ideally the neighborhoods of vertices in the same optimum clusterare more similar than of vertices in different clusters because more edges are inclusters than between clusters. But the unweighted, graph-theoretic neighborhoodignores edge weights. In extreme cases the neighborhood is the whole graph andthus provides no information.Note that already the assumption of more edges lying in optimum clusters thanbetween does not always hold because <strong>with</strong> density-based clustering only the differenceto the expected edges matters. A cluster border could be along a cut where theabsolute number of edges is higher than everywhere else but still much lower thanexpected.A better alternative is to compute weighted neighborhoods using random walks.Two random walks are started at both vertices and aborted after a small number ofsteps. The probabilities to visit another vertex are computed deterministically andresult in two probability distributions. Basically the selection quality is then definedas Euclidean distance of both distributions. The random walk distance selector isbased on this idea refined by ideas from Latapy and Pons [64] as described furtherbelow.On the other hand the random walk reachability selector based on [40] does notdirectly compare the neighborhoods. It uses random walks to measure how well avertex is reachable from another one walking over short paths. This is driven by the35


3 The Multi-Level Refinement AlgorithmP t−1 (v)p(v, v)w 1 v w 2P t−1 (w 1 )p(w 1 , v) P t−1 (w 2 )p(w 2 , v)Figure 3.7: Contribution of Neighbor Vertices to the Visit Probabilityassumption that vertices of the same optimum cluster are connected over severalpaths while only few paths connect vertices of different clusters.p(u, v), P t u(v)Theoretical Background This paragraph introduces a formal definition for randomwalks and discusses basic properties. The most important one is the connectionbetween the connection density from Section 2.3.3 and the convergence of randomwalks: Random walks converge to a stationary distribution of visit probabilities.But the so called mixing rate of this convergence is coupled to the minimum scaledcut. This is the lowest inter-cluster density over all bisections of the graph and alow density relative to the global density leads to slow convergence. Thus it canbe deduced that random walks pass local barriers of lower density less often thanhigh-density areas.Unfortunately this connection is far from ideal. Firstly the results apply just tothe degree multiplicity volume model. There is no apparent way to adapt randomwalks to other volume models. And finally the minimum inter-density (scaled cut)is not the modularity quality measure (shifted cut). The optimum clusterings ofboth may differ significantly.A random walk starts at a vertex and walks from vertex to vertex by randomlychoosing a neighbor. The transition probability for moving from vertex u to v is proportionalto the edge weight connecting both vertices, i.e. p(u, v) = f(u, v)/ deg(u).Instead of actually performing walks just the probability to visit certain verticesis calculated. Let Pu(v) t be the probability of visiting vertex v after t steps startingfrom u. It is calculated incrementally from the initial distribution Pu(v) 0 = δ u,v <strong>with</strong>steps Pu(v) t = ∑ w∈V P ut−1 (w)p(w, v).For example Figure 3.7 depicts the contributions of neighbor vertices to the visitprobability of the central vertex v. Edges leaving the vertices in other directions arenot shown.The visit probabilities are not symmetric, i.e. Pv t 0(v t ) ≠ Pv t t(v 0 ). But both arerelated by the start and end vertex degrees <strong>with</strong> deg(v 0 )Pv t 0(v t ) = deg(v t )Pv t t(v 0 ).To show this relation the visit probability is expanded into single steps:P t v 0(v t ) = p(v 0 , v 1 ) · · · p(v t−1 , v t ) (3.3)The single transition probabilities contain edge weights which are symmetricand factors 1/ deg(v i ) which can be shifted to the previous transition. The ratiodeg(v t )/ deg(v 0 ) accounts for the inserted and the removed factor at both ends.36


3.3 Merge SelectorsThe probability distribution P t is also a vector over all vertices. In this contexta random walk step is the matrix vector product P t = MP t−1 <strong>with</strong> the transitionmatrix M v,w = p(w, v). The distribution P ∗ (v) = deg(v)/(2f(V )) is the unique stationarydistribution <strong>with</strong> P ∗ = MP ∗ . The proof uses MP ∗ (v) = ∑ w P ∗ (w)p(w, v). stationary distributionP ∗Inserting∑the definitions the vertex degrees deg(w) are canceled out. This leavesw f(w, v)/(2f(V )) which equals P ∗ (v) thanks to the symmetry of the edge weights.In non-bipartite, connected graphs long random walks converge against the stationarydistribution regardless of the starting point. Thus the visit probability of avertex becomes proportional to its degree. The mixing rate measures the speed ofconvergence to this limit.The evolution of the probability distribution can be expressed in terms of thesymmetric matrix N = D −1/2 MD 1/2 using eigenvalue decomposition. The eigenvaluesare 1 = λ 1 > λ 2 ≥ . . . ≥ λ n > −1 <strong>with</strong> eigenvectors v 1 , . . . , v n . Then the visitprobability in spectral form is:P ti (j) = P ∗ (j) +n∑k=2λ t k v k,iv k,j√deg(j)deg(i)(3.4)If all eigenvalues are near zero, i.e. max(|v 2 |, |v n |) is small, the second term quicklydisappears. In that case the distribution is converging fast and the mixing rate ishigh.The next step uses a known relation of the eigenvalue gap 1 − λ 2 and graphconductance. The conductance of a bisection S, V − S of a graph G = (V, E) isdefined as:Φ(S) :=f(S, V − S)2f(V )P ∗ (S)P ∗ (V − S)(3.5)The numerator is the probability of switching between both sets using a randomwalk started on the stationary distribution. And the denominator is the sameprobability in a sequence of independent randomly chosen vertices. Inserting thedefinition of P ∗ and using the clustering C = S, V − S this formula equals:Φ(S) =Cut f (C)ρ(V ) Cut Vol (C)(3.6)The conductance of a graph is defined as the minimum over all possible bisections.This however is nothing else than the minimum normalized inter-density, which isalso called scaled cut:Φ = min ρ(Cut(C))/ρ(V ) (3.7)CNow the discrete version of Cheeger’s inequality from differential geometry statesthat Φ 2 /8 ≤ 1 − λ 2 ≤ Φ. For a proof see [49]. Therefore low inter-density leads toa small eigenvalue gap and thus to slow mixing.In bipartite graphs the vertices are divided into two groups and all edges run betweenboth groups. Thus <strong>with</strong> each step also the visit probability completely swapsbetween both. This is also visible in the spectral form where the most negative eigenvalueλ n is −1. Taking two adjacent vertices their visit probability distributions will37


3 The Multi-Level Refinement Algorithmnot overlap for a fixed number of steps. Hence to obtain comparable distributionsthe probability to visit a vertex in at most t steps Pu≤t (v) = ∑ ti=1 P u(v) i is used likein [40].Random Walk Distance Latapy and Pons [64] propose short random walks to definea measure of vertex distance. They postulate that if to vertices lie in the samecommunity (optimum cluster) the probabilities of reaching each other Pu(v), t Pv(u)tare high and both vertices have a similar neighborhood <strong>with</strong> Pu(w) t ≃ Pv(w). t Thusa low distance is expected between both distributions. They observe from the stationarydistribution that high degree vertices are preferred by random walkers. Toremove this influence they propose following Euclidean distance weighted by deg(w):√ ∑wRW distanced(u, v) =(P u (w) − P v (w)) 2deg(w)(3.8)To account for bipartite situations the implemented random walk distance selectoruses d ≤3 (u, v), the distance of the sums over walks of length at most 3.During merge operations the distance of merged edges is updated by taking themaximum. This is also known as complete-linkage in the literature. It corresponds tousing the largest distance between all vertices of both vertex groups and is consistent<strong>with</strong> the aim to avoid merge errors. The extreme opposite for example is using theshortest distance (single-linkage). Here it is possible to merge very distant verticesby chaining a sequence of intermediate vertices of much lower distance.Random Walk Reachability The random walk reachability selection quality is similarto the weight density f(a, b)/ Vol(a, b) but uses modified edge weights. Randomwalks are used to compute weights which become weaker for edges crossing lowdensitycuts and stronger for edges in high-density groups. Its assumed that verticesin the same optimum cluster are reachable over many paths. And thus as startingpoint the probability to visit one end-vertex starting from the other end-vertex <strong>with</strong>outreturning back to the start vertex is used. This is very similar to the escapeprobability of Harel and Koren [40].To obtain actual edge weights these probabilities are scaled <strong>with</strong> the start vertexdegree. Applying the weighted symmetry of inverse paths yields:RW reachabilityr t (u, v) = deg(u)P t u(v) + deg(v)P t v(u)2(3.9)Thus after the first step the weight r 1 (u, v) = deg(u)f(u, v)/ deg(u) equals theoriginal edge weight. Additional random walk steps will add weight to the edgeproportional to the strength of alternate paths to this neighbor. The paths returningback to the start vertex are suppressed by setting Pu ≥1 (u) = 0 in each step. Toaccount for local bipartite situations again the probabilities are summed over asmall number of steps <strong>with</strong> r ≤t (u, v). Because this reachability is so similar tonormal weights the whole calculation can be applied to the result again leading toa reinforcing feedback [40].38


3.3 Merge SelectorsAs selection quality the density r t,i (u, v)/ Vol(u, v) is used. Here r t,i is the weightfrom i iterative applications of the reachability computation <strong>with</strong> random walks ofat most t steps. During merging vertices the edge weights r t,i of merged edges aresummed and the densities of all incident vertex pairs are recomputed.Implementation Notes The visit probabilities Pu≤t (v) are computed separately foreach source vertex u using two vectors storing Pu(v) t and the new Put+1 (v) of eachtarget vertex v. A random walk step requires to iterate over all edges and transfera portion of the visit probability from the edge’s start-vertex in the old vector tothe end-vertex in the new vector. The selection qualities are computed from theintermediate vectors between the steps. In order to avoid expensive memory managementthe two vectors are reused by swapping source and target vector betweeneach step.The worst case time complexity for the reachability selector is O(|E||V |it) as foreach vertex and each step all edges have to be visited. The implementation triesto improve the speed by only processing edges when their start-vertex has a nonzerovisit probability. Thus the computation time depends on the typical size ofneighborhoods. Still on complete graphs the worst case is reached after one step.The complexity of the random walk distance selector is even worse. Here the worstcase is O(|E| 2 |V |t) as for each pair of adjacent vertices two probability vectors haveto be computed. On large graphs it is impossible to store and reuse these vectorsfor all vertices. Hence the implementation tries to safe time by reusing the vectorof one vertex and processing all neighbor vertices in a row.3.3.3 Spectral MethodsRandom Walks as described above collect semi-global information about the densitystructureof the graph. But the relation to modularity clustering is very indirect andholds just for one specific volume model. Some applications may require other volumemodels and random walks do not work well on some kinds of graphs. Thus thesearch for non-local selection qualities which are fully compatible to the modularitycontinues.The approach of this section is based on the spectral methods described by Newman[57]. The modularity computation is rewritten in a matrix-vector form andthe matrix is replaced by its decomposition into eigenvalues and eigenvectors. Theeigenvectors of the strongest positive eigenvalues are then used to define vertex vectors.These describe the contribution of each vertex in a higher dimensional spacewhere the modularity is improved by maximizing the length of the vector sum ineach cluster.The eigenvalues and eigenvectors are calculated at the beginning of each coarseninglevel. Given the vectors of two adjacent vertices the spectral length and spectrallength difference selectors analyze the length of vertex vectors. On the other end thespectral angle selector uses the directions of these vectors. The following subsectionspresent the mathematical derivation of the vertex vectors and conclude <strong>with</strong> a moredetailed description of the selection qualities and implementation notes.39


3 The Multi-Level Refinement Algorithm˜x u˜X 1˜X 2 = ˜x u + ˜x v˜x vFigure 3.8: Spectral vertex vectors and two cluster vectorsTheoretical Background The modularity can also be described as sum over allvertex pairs using the Kronecker delta δ(C(u), C(v)) as filter <strong>with</strong> δ(C(u), C(v)) = 1only for pairs of the same cluster and zero elsewhere. This representation of themodularity is:Q(C) = ∑ [ ]f(u, v) Vol(u, v)− δ(C(u), C(v)) (3.10)f(V ) Vol(V )u,vmodularitymatrixThe inner part of the sum is replaced by the modularity matrix M u,v = f(u, v)/f(V )−Vol(u, v)/ Vol(V ). And for each cluster i ∈ C the delta is separately replaced bya binary vector s i over all vertices <strong>with</strong> [s i ] v = δ(C(v), i). The contribution of thecluster i ∈ C thus is s T i Ms i. The spectral decomposition of the modularity matrixis M = UDU T using the matrix of eigenvectors U = (u 1 | . . . |u n ) and the diagonalmatrix of eigenvalues D = diag(λ 1 , . . . , λ n ). This is inserted into the modularityformula and the calculation is separated for each eigenvalue:Q = ∑ i∈Cs T i Ms i = ∑ i∈C(U T s i ) T D(U T s i ) = ∑ i∈Cn∑λ j (u T j s i ) 2 (3.11)The previous step exploits that u T j s i = ∑ v∈C iU v,j is a scalar. As next step theeigenvalues shall be moved into sum contained in the squares by using the squareroots√ (λ j ). Unfortunately this is difficult <strong>with</strong> negative eigenvalues. Thus theeigenvalues and the inner sum are split up into positive and negative eigenvalues.Without loss of generality λ 1 ≥ · · · ≥ λ p > 0 ≥ λ p+1 ≥ · · · ≥ λ n and thus:j=1⎛Q = ∑ p∑⎝ ( √ λ j u T j s i ) 2 −i∈C j=1⎞n∑( √ −λ j u T j s i ) 2 ⎠ = ∑ (||Xi || 2 2 − ||Y i || 2 2)i∈Cj=p+1(3.12)The last step replaced the sums by squared Euclidean norms of the cluster vec-tors X i and Y i . This is achieved by moving the eigenvalues further down into theeigenvectors forming vertex vectors x v = (. . . , √ λ j U v,j , . . .) T <strong>with</strong> j = 1, . . . , p. Thevertex vectors y v are analogously constructed from all negative eigenvalues and theireigenvectors. The former scalar product <strong>with</strong> s i is transformed into X i = ∑ v∈C ix vand Y i = ∑ v∈C iy v respectively.Therefore the cluster vectors X i are formed by adding the vectors x v of all verticescontained in the cluster. It becomes visible that the modularity is maximized bygrouping vertices together <strong>with</strong> maximal X i and minimal Y i lengths. In the vertexvertex vectorx v, y v40


3.3 Merge Selectorsvectors the eigenvectors are weighted <strong>with</strong> the square-root of their eigenvalues. Thusonly eigenvectors <strong>with</strong> a large eigenvalue have a significant influence. Moreover itis assumed that just using the positive eigenvalues suffices for the maximization.Thus all negative and weak positive eigenvalues are dropped from now on. Thisis controlled by the two parameters in the list below. The resulting approximatevertex vectors are denoted by ˜x v . An example of the relation between vertex and ˜x v, ˜X vcluster vectors is shown in Figure 3.8.spectral ratio Multiplied <strong>with</strong> the largest eigenvalue.accepted eigenvalues.Defines the lower limit forspectral max ev The maximal number of eigenvalues and -vectors to compute.Another important property besides the length of the vertex and cluster vectorsis their direction because in the optimum clustering the cluster vectors have to beorthogonal. This follows from the increase of vector length (and modularity) whenmerging two adjacent clusters <strong>with</strong> vectors which are separated by an angle below90 degree. Thus in m dimensions at most m + 1 clusters are representable. Hencein practice it is necessary to compute as many eigenvectors as expected clusters.Spectral Length and Length Difference From the spectral analysis of the previoussection follows that the modularity is improved by maximizing the length of eachcluster vector ˜X i = ∑ v∈C i ˜x v . The first idea was to select cluster pairs a, b <strong>with</strong> thelongest vector sum ||X a +X b || 2 . Hence this selection quality is called spectral length.spectral lengthOn second sight this selection quality is related to the modularity increase selectoras was already pointed out by Newman in [57]. The modularity increase in spectraldecomposition is the contribution of the new cluster ||X a + X b || 2 2 − ||Y a + Y b || 2 2 minusthe previous contribution of both clusters ||X a || 2 2 + ||X b|| 2 2 − ||Y a|| 2 2 − ||Y b|| 2 2 . Usingjust the positive eigenvalues results in the spectral length difference selector:2 ˜X T a ˜X b = || ˜X a + ˜X b || 2 2 − (|| ˜X a || 2 2 + || ˜X b || 2 2) (3.13)length differenceThe relation of both selection qualities to the modularity increase selector alsoinherits its disadvantages in the growth behavior. Looking at a long vector anadjacent long vector is ranked higher than all short vectors even when pointing intoa nearly orthogonal direction. Since the vector length roughly correlates <strong>with</strong> thevertex degree, high-degree vertices will be grouped together early leading to mergeerrors. Measuring the change in vector length instead reduces this domination butis nevertheless similar to the modularity increase selector. It is not directly obviouswhy dropping negative and weak eigenvalues should provide a better insight into theselection problem.Spectral Angle Instead of the misleading vector length the direction of the vectorscan be used. The spectral angle selector prefers pairs of adjacent clusters where thecluster vectors have the same direction, i.e. small angles. The angle is computedusing the cosine:41


3 The Multi-Level Refinement Algorithmspectral angle cos( ˜X a , ˜X b ) = ˜X T a ˜X b /(|| ˜X a || 2 || ˜X b || 2 ) (3.14)A conversion into angles is not necessary because orthogonal vectors have zerocosine and small angles are near the value 1, which suffices for the ranking.Implementation Notes For the computation of the eigenvalues and eigenvectorsthe C++ interface of ARPACK [48] is used. Since the modularity matrix is symmetricand only the largest eigenvalues are to be searched it suffices to provide animplementation for the matrix-vector product Mx.The modularity matrix M itself is dense and would require O(|V | 2 ) multiplicationsper evaluation. Fortunately the product can be computed much faster byexploiting the inner structure of the matrix and the multiplicity volume model:M = f(V ) −1 A + Vol(V ) −1 ww T <strong>with</strong> the adjacency matrix A u,v = f(u, v) and thevector of vertex weights w. For example in the degree multiplicity model the weightsare w(v) = deg(v) and [ww T ] u,v = deg(u) deg(v) = Vol(u, v). The required matrixvector product thus becomes Mx = f(V ) −1 Ax + Vol(V ) −1 w(w T x). The adjacencymatrix is sparse when the graph is sparse and the volume term reduces to a scaledvector because w T x is a scalar. Therefore just O(|E| + 2|V |) multiplications arerequired.All vertex vectors have the same size and thus are stored in a simple memorypool <strong>with</strong>out any overheads. On the outside a specialized mapping from vertices tothe position of their vector is used. It transparently wraps the vectors into moreconvenient temporary objects on access.On merges the vectors of both coarse vertices are simply added. Then the selectionqualities of all incident edges are recalculated.3.4 Cluster RefinementCluster refinement is the key ingredient of the third multi-level phase. Its task isimproving the initial clustering by searching for better clustering using sequences oflocal modifications. This section explores the design space of refinement methodsand discusses implemented algorithms.Two fundamental types of refinement can be distinguished: Greedy refinementonly allows operations improving the quality of the clustering. On the other sidesearch refinement cross phases of low-quality clusterings in the hope to find betterclusterings later. A clustering in which a chosen greedy heuristic cannot improvethe quality further is called local optimum. A local optimum <strong>with</strong> the best qualityof all local optima is called global optimum. More than one global optimum mayexist.Searching global optima commonly faces some problems. Again the NP-completenessof modularity clustering hides in these problems and enforces that not all are solvable.Following observations were identified from the literature [22, 77, 52]:foothill problem The search gets stuck in a low-quality local optimum like it isalways the case <strong>with</strong> pure greedy heuristics. Some mechanism is necessary to42


3.4 Cluster Refinementleave the local optimum, for example sequences of quality decreasing moves orcrossing different locally optimal clusterings.basin problem Many low-quality local optima in a bigger valley <strong>with</strong> better optimabeing far away. In order to reach them several local optima of low-qualityhave to be passed. This is also known as funnels in the context of simulatedannealing and similar methods.plateau problem The search space has nearly the same quality everywhere exceptfor some very local peaks. Thus there is no local information about the rightdirection of search and most heuristics would be as good as random search.Monte-Carlo methods or a mathematical derivation of search directions mighthelp here.ridge problem The direction of motion in the search space is different to the true directionof quality improvements. This might happen for example <strong>with</strong> heuristicselection criteria as a trade off to computational efforts. Also heuristicsfor leaving local optima can exhibit this property. This may be detected bychecking <strong>with</strong> other selection criteria.multi-level reduction problem The global optimal clustering of the original graphis most likely not a local optimum in coarsened graphs because it is not representablewhen vertices of different clusters are merged together. Inverselya global optimum of a coarse graph not necessarily leads to the real globaloptimum of the original graph after projection and refinement. Improvementsmight be possible by considering multiple candidate clusterings or applyingseveral multi-level passes <strong>with</strong> different coarsening hierarchies.This leads to apparently conflicting objectives. As there is no reason to acceptobviously improvable clusterings it is necessary to fully exploit local optima <strong>with</strong>outstopping half way. On the other hand exploring a wider range of local optima requiresto leave local optima and even avoid already explored optima. Therefore thefirst important step towards reliable refinement algorithms is the design of greedyrefinement and search algorithms. Later search and greedy refinement may be combinedto effectively explore a wide search space <strong>with</strong>out missing local optima.The next section explores the design space of refinement methods based on localoperations like moving single vertices. After classifying concrete algorithms the secondsection discusses the implemented greedy algorithms. The third section presentsthe adaption of the well-known Kernighan-Lin refinement to modularity clustering.3.4.1 Design Space of Local Refinement AlgorithmsThis section explores components of simple refinement strategies <strong>with</strong> just two basicoperations: moving a single vertex to another cluster, and moving a vertex intoa new cluster. The analysis is based on often used implementations like greedyrefinement [44], Kernighan-Lin refinement [45, 24, 58], simulated annealing [39, 38,68] and Extremal Optimization [23, 11, 10] while looking for common componentsand differences.43


3 The Multi-Level Refinement AlgorithmSubstantial differences to the classic k-way partition refinement 1 exist: The numberof clusters is not predetermined and no balance constraints on their size aregiven. Instead determining the right number of clusters has to be achieved by thealgorithm. Clusters may be created and emptied as necessary. The modularityclustering quality is more complex than the minimum edge cut because the volumeintroduced global dependencies. Due to the multi-level approach it suffices to movesingle vertices instead of constructing vertex groups like in the original Kernighan-Lin algorithm.The algorithms are dissected into the components listed below. The next paragraphsdiscuss these components in more detail and comment on their dependencies.The section concludes <strong>with</strong> an overview how basic refinement algorithms are fit intothis design space.Vertex Ranking Which vertices should be selected for movement?Target Ranking To where should selected vertices be moved?Search Mode Which vertices and targets are evaluated before selecting one of them?Ranking Updates Is the vertex ranking updated after each move or fixed?On Modularity Decrease What to do <strong>with</strong> selected moves not improving modularity?maxmodVertex and Target Ranking The decision which vertices are moved to which clusteris split into two aspects. The vertex ranking directs the selection of vertices and thetarget ranking selected the cluster. The vertex ranking is used as a predictor similarto the merge selector in the coarsening phase and should direct the refinement intogood, modularity-increasing directions.Given vertex v, its current cluster C(v), and a target cluster j ∈ C ∪ {∅} thechange of modularity ∆ v,j Q(C) = Q(C ′ ) − Q(C) is calculated as shown in theequation below. The cluster ∅ is used for new, empty clusters. By definition edgeweights and volumes involving this cluster are zero. The maxmod vertex rankingsorts the vertices by their best increase of modularity maxmod(v) = max j ∆ v,j Q(C).The single implemented target ranking is the maxmod ranking which selects theneighbor cluster <strong>with</strong> maximal increase of modularity arg max j ∆ v,j Q(C).∆ v,j Q(C) = f(v, C j − v) − f(v, C[v] − v)− Vol(v, C j − v) − Vol(v, C[v] − v)(3.15)f(V )Vol(V )f(v, C[v] − v) Vol(v, C[v] − v)∆ v,∅ Q(C) = − + (3.16)f(V )Vol(V )The first term depends just on the adjacent vertices as shown in Figure 3.9.To compute the edge weights f(v, C j − v) all incident edges of v are visited andtheir weight is summed in an array grouped by the cluster of the end-vertex. Thus1 k equal sized clusters, minimum edge cut44


3.4 Cluster Refinementmoving v from C(v) to jcurrent cluster:f(v, C[v] − v)vother clusters:no changetarget cluster:f(v, C j − v)Figure 3.9: Dependencies when Moving a Vertexcomputing the maxmod ranking costs linear time in the number of incident edgesand the number of clusters. In average this is O(|E|/|V | + |C|).As visible the change of modularity depends through the volume term on theglobal clustering structure and changes <strong>with</strong> nearly each vertex move: The volumeis computed according to the volume model <strong>with</strong> Vol(v, C j − v) = (w(v) − 1)w(C j ).The latter value is the sum over all vertex weights in the cluster. To evaluate thevolume in constant time this value is stored per cluster and iteratively updatedeach time a vertex moves out or into the cluster. This global dependency is the keypoint which makes modularity refinement more expensive then the standard min-cutrefinement.Instead of ranking vertices by the expensive to compute modularity increase goodcandidate vertices may also be identified by their current contribution to the modularityf(v, C[v]) − ρ(V ) Vol(v, C[v]). Vertices <strong>with</strong> a low or negative contributionhave a good chance of not belonging to their current cluster. Moving a vertex doesnot modify the contribution of its self-edge. Therefore omitting self-edges leavesmod(v) = f(v, C[v] − v) − ρ(V ) Vol(v, C[v] − v). This computation can be done inconstant time by storing f(v, C[v] − v) for each vertex and iteratively updating itwhen adjacent vertices are moved.<strong>Based</strong> on the contribution to modularity several vertex rankings were derived. Itcan be assumed that the placement of high-degree vertices has a higher influence onthe modularity. To suppress this effect the modularity contribution can be dividedby its degree. Or to stay consistent <strong>with</strong> the density model, it can be scaled bythe inter-volume between the vertex and its cluster. Since these vertex rankings arestructurally and semantically similar to the vertex fitness used in extremal optimization[23] they are called fitness measures. In contrast to other vertex rankings herelower values are preferred. In summary these are:mod-fitness The modularity contribution mod(v) = f(v, C[v]−v)−ρ(V ) Vol(v, C[v]−v).fitnesseo-fitness The vertex fitness used in extremal optimization eof(v) = mod(v)/ deg(v).density-fitness The density of connections to the current cluster ρ(v, C[v] − v) =f(v, C[v]−v)/V ol(v, C[v]−v). This is ranking equivalent to mod(v)/[deg(v) deg(C[v]−v)] in the degree multiplicity volume model.45


3 The Multi-Level Refinement Algorithmbest vertexbest unmovedSearch Mode The search mode controls how vertices are selected based on thevertex ranking. It plays a similar rule like the grouping and matching algorithmsin graph coarsening. The most accurate search method is to consider always allpossible operations: In each step all possible moves of all vertices are evaluated andthe global best modularity increase is selected. This is a costly operation becauseevaluating all moves requires to visit all edges in each move step. For k vertex movesthe k|V | maxmod evaluations lead to O(k|E|+k|V | max |C|) average run-time wheremax |C| is the highest number of clusters the occurred during the moves.Fortunately this method can be divided into two stages by first selecting the bestvertex according to a vertex ranking and then evaluating further operations only forthat vertex (mode: best vertex). In combination <strong>with</strong> the maxmod vertex rankingthis is equal to the previous method. Instead the vertex ranking should be replacedby a constant-time predictor. Then performing k vertex moves requires at most k|V |evaluations of the vertex ranking and one maxmod target ranking per moved vertex.Thus the average run-time will be in O(k|V | + k|E|/|V | + k max |C|).Most predictors are likely to also select vertices which cannot be moved formodularity increase. In greedy refinement these vertices must be skipped. In orderto not select these vertices again all processed vertices are marked and ignored later(mode: best unmoved). This search mode thus selects one of the remaining verticesuntil no vertex is left. The whole process is restarted in case some early selectedvertices became movable after later moves.Ranking Updates An algorithm could simply iterate over a fixed ordering of thevertices or update the ranking after each move. The updated ranking is implementedby simply visiting all vertices in each move step. Alternatively a heap can be used toefficiently retrieve the best ranked vertex. However empirical comparisons suggestedthat all variants are similarly fast. Therefore visiting all vertices is the preferredmethod.On Modularity Decrease In case the selected operation does not improve themodularity following three choices are possible:skip, abort,acceptskip Mark the vertex as moved but leave it in the current cluster.abort Abort the refinement.accept Perform the operation anyway.Greedy heuristics are characterized by not accepting non-improving operations.Whether to directly abort the refinement depends on the search method. In caseother vertices still allow modularity increasing moves the currently selected vertexshould be skipped. Otherwise the refinement can be aborted.Classification of Basic Refinement Algorithms Table 3.2 provides an overviewand classification of refinement methods building on the components described in46


3.4 Cluster RefinementAlgorithm SearchMode Vertex Ranking Mod. Decrease Impl.greedy refinementcomplete greedy best maxmod abort Xsorted-maxmod best unmoved maxmod abortsorted-mod best unmoved mod-fitness skip Xsorted-eo best unmoved eo-fitness skip Xsorted-dens best unmoved dens-fitness skip Xdeterministic searchKL-maxmod best unmoved maxmod accept XKL-mod best unmoved mod-fitness acceptKL-eo best unmoved eo-fitness acceptKL-dens best unmoved dens-fitness acceptsimulatedannealingspinglasssystemextremaloptimizationrandomized searchall random random target ranking,accept <strong>with</strong>probability dependingon modularityincreaseall random random target ranking<strong>with</strong> an energymodel, accepts likesimulated annealingbest+random eo-fitness acceptTable 3.2: Classification of Refinement Algorithms47


3 The Multi-Level Refinement Algorithmthe previous subsections. Specific observations about greedy and Kernighan-Linrefinement are discussed in the next section.The implemented algorithms are marked in the table. Other algorithms are includedto show how they align <strong>with</strong>in this design space. Sorted greedy refinement<strong>with</strong> maxmod vertex ranking (sorted-maxmod) is not implemented because it is asslow as complete greedy but also is restricted in its search. Thus it has no advantagesover other greedy algorithms. The three fitness-based Kernighan-Lin algorithms KLmod,KL-eo, and KL-dens are not considered further because they do not reliablyfind local optima. Their vertex ranking combined <strong>with</strong> accepting quality decreasingmoves prevents this. Finally randomized algorithms are generally excluded for thesame reason. They are interesting just in combination <strong>with</strong> greedy refinement. Butthen too many combinations exist to be discussed in the scope of this work.3.4.2 Greedy RefinementData: graph,clustering,selectorResult: clusteringrepeatv ← selector:find best maxmod vertex;j ← selector:find best target cluster for v;if move v → C j is improving modularity thenmove v to cluster j and update selector;until move was not improving ;Figure 3.10: Refinement Method: Complete GreedyGreedy refinement algorithms are characterized by accepting only vertex movesthat increase modularity. The complete greedy algorithm, as displayed in Figure 3.10,uses the maximal modularity increase maxmod as vertex selector. This enforces theglobally best move in each iteration. Selecting and moving vertices is repeated untilthe modularity is not further improved. In each iteration |V | vertex rankings haveto be evaluated <strong>with</strong> each requiring O(|E|/|V | + max |C|) time in average. Selectingthe best target cluster and updating the cluster weights w(C(v)), w(C j ) costs lineartime in the number of incident edges. Moving the vertex is done in constant timeby updating the mapping C(v) which is stored in an array. Thus k moves requireO(k|E| + k|V | max |C|) time.To improve search speed the sorted greedy algorithm splits vertex and move selectioninto separate steps. Figure 3.11 displays the algorithm in pseudo-code. Theinner loop selects all vertices once sorted by their vertex ranking. Since the rankingchanges <strong>with</strong> each move all vertices are re-visited in each inner iteration. Withconstant-time rankings like mod-fitness this will cost O(|V | 2 ) time for the completeinner loop. Again the most expensive part is selecting the target cluster for thecurrent vertex. The inner loop does this exactly once for each vertex which costsO(|E| + |V | max |C|). Moving a vertex and updating cluster weights and vertex48


3.4 Cluster RefinementData: graph,clustering,selectorResult: clusteringrepeatmark all vertices as unmoved;while unmoved vertices exist dov ← selector:find best ranked, unmoved vertex;j ← selector:find best target cluster for v;mark v as moved;if move v → C j is improving modularity thenmove v to cluster j and update selector;// outer loop// inner loopuntil no improving move found;Figure 3.11: Refinement Method: Sorted Greedyfitness costs O(|E|/|V |) in average. Therefore the worst-case time for all iterationsof the inner loop is in O(|V | 2 + |E| + |V | max |C|).In case vertices were moved the outer loop restarts the refinement. This processesvertices which were visited early but became movable <strong>with</strong> modularity increase justafter later moves of other vertices. In practice only a small number of outer iterationsis necessary. These restarts ensures that sorted greedy always finds a local optimum:If at least one improving move exists its vertex will be visited and moved even ifit is not the best ranked vertex. Higher ranked vertices are simply skipped. Whenimprovements were found the refinement is restarted until no single improving moveexists. Nevertheless the found optimum depends on the vertex ranking. The variantsmay end up in different nearby local optima.3.4.3 Kernighan-Lin RefinementThe central idea of Kernighan-Lin refinement is to escape local optima by movingvertices <strong>with</strong> the least modularity decrease in case no improvements are possible.Selecting the least modularity decreasing moves is like a careful depth-first walk intothe surrounding clustering space along a ridge while avoiding clusterings of very lowmodularity.The basic algorithm is presented in the following subsection. Unfortunately it isnot very effective considering its long run-time. To improve this the next subsectionsanalyze two aspects of the dynamic behavior: The creation of clusters and theeffective search depth. <strong>Based</strong> on the results the algorithm is improved by restrictingthe search depth.Basic Algorithm The basic algorithm is shown in Figure 3.12. The best clusteringfound during the refinement is called peak clustering and is separately stored. Anew peak clustering would have to be stored when the clustering after a modularityincreasing move is better than the last peak. To save some work just the lastclustering in a series of modularity increasing moves is stored. This situation is49


3 The Multi-Level Refinement AlgorithmData: graph,clustering,selectorResult: clusteringcurrent ← clustering ;// outer looprepeatstart ← current;peak ← current;mark all vertices as unmoved ;// inner loopwhile unmoved vertices exist dov ← selector:find best ranked, unmoved vertex;j ← selector:find best target cluster for v;mark v as moved;if modularity is decreasing ∧ current better than peak thenpeak ← current;move v to cluster j and update selector;if peak better than current thencurrent ← peak;until current not better than start ;clustering ← current;Figure 3.12: Refinement Method: basic Kernighan-Linchecked at modularity decreasing moves in the inner loop. In order to not directlyrevert modularity decreasing moves processed vertices are marked and ignored inlater inner iterations. In case the inner loop found an improved clustering it iscontained in the current or the peak clustering. This is checked after the inner loopand the best is returned to the outer loop. Altogether the outer loop restarts therefinement on the best intermediate clustering until no further improvements werefound.Just the Kernighan-Lin algorithm <strong>with</strong> maxmod vertex ranking fully moves intolocal optima. Its inner loop starts like greedy refinement and performs modularityincreasing moves until reaching a local optimum and suppressed moves of markedvertices are performed by the second outer iteration. Unfortunately this is not sharedby the other vertex rankings. When vertices <strong>with</strong> no modularity increasing movesare proposed before reaching a local optimum they are moved. Thus it becomesunlikely to reach local optima.Similar to greedy refinement <strong>with</strong> maxmod vertex ranking the time required fora complete run of the inner loop is O(|V |(|E| + |V | max |C|)). But in contrastto the complete greedy refinement this algorithm does not abort in local optima.This makes the basic algorithm very expensive which is also confirmed by empiricexperience.Creation of Clusters The creation of clusters is an aspect specific to modularityclustering because the optimal number of clusters is not known. Initial observations<strong>with</strong> the basic algorithm discovered an excessive creation of new clusters in theinner loop. For example the scatter plot in Figure 3.13a shows the relation between50


3.4 Cluster RefinementModularity0.2 0.3 0.4 0.5 0.6●●● ●●● ●● ●●●●●●●●●●●●●●● ● ●●● ● ●● ●●●peak●●●● ●Modularity0.54 0.56 0.58 0.60 0.62 0.64●peak● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●20 40 60 80 100 120Clusters0 500 1000 1500 2000Clusters(a) Modularity and Cluster Count4000 6000 8000 10000 12000 16000Move(b) Dynamics of Kernighan-Lin RefinementFigure 3.13: Kernighan-Lin Refinement Creating Clusters on <strong>Graph</strong> Epa mainmodularity and the number of clusters at the end of inner loops. In addition theposition of peak clusterings is shown by blue circles. Moreover Figure 3.13b showsthe beginning of multi-level Kernighan-Lin refinement in the same graph. The blackline plots the modularity and the red line the number of clusters. Again blue circlesmark cluster counts at peak clusterings. Both graphics underline that all peakclusterings are found <strong>with</strong> a small number of clusters while the algorithm tends tocreate a huge number of clusters.This behavior can be explained as follows: After all vertices fitting well into anothercluster are moved the algorithm continues <strong>with</strong> vertices very well connectedto their current cluster. Often moving them into a new cluster decreases the modularityless than moving them into other existing clusters. Thus a cluster is createdfor that vertex.The excessive creation of clusters brings two big problems: Firstly the searchfor better clusterings is quickly directed to low-modularity clusterings and exploresonly few <strong>with</strong> cluster counts near the optimum. This wastes time in uninterestingparts of the clustering space. But more importantly the time complexity of thewhole algorithm is coupled to the number of clusters. Thus high cluster countssignificantly impair the performance.Effective Search Depth The moves performed by the algorithm are interpretableas a depth-first search into the space of possible clusterings <strong>with</strong> the vertex andtarget ranking determining the direction. The complete execution of the inner loopis very time consuming. However already after a small number of moves the searchis far away from optimal clusterings and cannot return back because of the markedvertices.In order to safely abort the inner loop earlier the number of moves between peakclusterings is analyzed. For this purpose let the effective search depth be the max-51


3 The Multi-Level Refinement AlgorithmModularity0.646 0.647 0.648 0.649 0.650 0.651 0.652 0.653peak12 7430 35 40 45ClustersObserved Search Depth0 20 40 60 80●●●●●●● ●● ●●●● ●●●●● ●●●● ●●● ●● ● ●●● ● ●●● ●●●●●● ●●● ●● ●● ● ●● ●●●● ●● ● ●● ●● ●● ● ●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●0 50 100 150Move(a) Detail of the Dynamics on graph Epa main10 50 100 500 1000 5000Vertices(b) Observed Effective Search DepthFigure 3.14: Effective Search Depth of Kernighan-Lin Refinementimal number of moves from a peak clustering to a clustering <strong>with</strong> equal or bettermodularity that may occur in real-world graphs. Of course this depth depends onthe number of vertices.A typical situation <strong>with</strong> about 1500 vertices is displayed in Figure 3.14a. Themodularity of the clusterings is shown by the black line. Blue vertical lines markthe moves where peak clusterings were found. At the beginning of the inner loopall 12 moves directly increasing modularity were executed. From this first peakclustering a series of 62 modularity decreasing and increasing moves were necessaryto find better clusterings and the second peak. In example this represents an effectivesearch depth of around 60 moves.In order to find a simple lower bound for the effective search depth the basic algorithmwas applied <strong>with</strong> multi-level refinement to a selection of 17 real-world graphs.At each peak clusterings the observed search depth together <strong>with</strong> the number of verticesat the current coarsening level was recorded. The scatter plot in Figure 3.14bvisualizes the dependency between vertex count and search depth. A logarithmicscale is used for the vertex count. It is visible that at extreme values the depthnearly linearly grows <strong>with</strong> the logarithm of the vertex count. Now a lower bound isobtained by taking the maximal quotient. This yielded a factor of about 20. Thusit is moderately safe to abort the inner loop after around 20 log 10 |V | moves <strong>with</strong> amodularity below the last peak clustering. This bound is also shown by the dashedline.<strong>Based</strong> on these observations the basic algorithm is improved by restricting thesearch depth. The parameter search depth controls the accepted search depth. Theinner loop is terminated after log 10 |V | times search depth moves <strong>with</strong> a modularitylower than at the last peak clustering or beginning of the inner loop. Factors around20 should be used based on the measurements. Due to this early termination it isunnecessary to also restrict the creation of clusters.52


3.5 Further Implementation NotesIndexSpace+const_iterator: typedef+iterator: typedef+key_type: typedef = Key+index_type: typedef = size_t+SuperSpacePtr: typedef+contains_key(key:key_type): bool const+begin(): const_iterator const+end(): const_iterator const+size(): size_t const+upper_bound(): size_t const+lower_bound(): size_t const+get_key(idx:size_t): key_type const+get_index(key:key_type): size_t+super_space(): SuperSpacePtr const+root_space(): void* const+register_upper_bound(id:size_t,function)+unregister_upper_bound(id)FilterSubspace+map: IndexMap+filter: FilterFunctorSuperSpace:FilterFunctor:RangeSpaceGrowableSpace+new_key(): key_type+new_key(idx:size_t): key_typeBitSubspaceSuperSpace:RecycleSpacereuses removed keys laterDynamicSpace+add_key(key)+remove_key(key)+remove_all_keys()Figure 3.15: Index Spaces: Class and Concept Diagram3.5 Further Implementation NotesThis section provides additional hints on the implementation. These details are notdirectly important for the functioning of the clustering algorithm but describe crucialcomponents enabling and driving its implementation. Class diagrams and samplecode give a small insight into its structure and appearance. The next subsectionintroduces the concept of index spaces and shows how graphs and data are managedin the program. The second subsection describes how the data is externally stored.3.5.1 Index Spaces and <strong>Graph</strong>s<strong>Graph</strong> algorithms commonly deal <strong>with</strong> two different aspects: The adjacency structureof the graph and properties like vertex and edge weights. For navigation troughthe graph the structure is used. On the other hand calculations typically involveweights retrieved from the current vertex or edge. For example the implementationof the volume model stores vertex weights. At the same time the merge selectorstores the selection quality of each pair of adjacent vertices. These properties representdifferent aspects of the algorithm and should be managed separately. Oftenit is necessary to construct temporary properties in sub-algorithms not visible tothe outside. In addition a noteworthy observation is that in graph algorithms hugeamounts of 100, 000 and more small objects for vertices, edges and temporal propertiesexists. All these have no own life-cycle and do not need any garbage collectionas their existence is coupled to the graph and algorithm phases.53


3 The Multi-Level Refinement AlgorithmIndexMapReadableVT:typenameKeySpaceIndexSpace+KeySpacePtr: typedef+value_type: typedef = VT+param_type: typedef = VT const &+reference: typedef = VT&+key_type: typedef = KeySpacePtr::key_type+get(key_type): param_type const+key_space(): KeySpacePtr+at(key_type): param_type constIndexMap+at(key): reference+put(key,value:param_type)VT:typenameChunkIndexMapVT:typenameKeySpace:IndexSpaceConstantDummyMap+value: value_typeVT:typenameKeySpace:IndexSpaceFigure 3.16: Index Maps: Class and Concept DiagramAs solution the implementation of the presented algorithms uses the concept ofindex spaces and index maps. For similar approaches see [29, 9]. Index spacescarry the structure <strong>with</strong>out considering additional properties. This is similar to acollection of array indices. Each index abstractly represents an object. Specificproperties of these objects are managed by index maps. These allow to read andwrite the data stored for an object using its index as access key. This approachbasically is an external variant of the decorator pattern. It is external because itdoes not wrap the single objects when extending them.Figure 3.15 contains a class diagram of the implemented index spaces and theirinterfaces. Basic spaces allow to iterate over the indices using the methods beginand end like in the C++ Standard Template Library (STL). The size in number ofentries can be queried using size. Grow-able and dynamic spaces allow to createand remove indices. Subspaces mask parts of an index space. This can be usedfor example to blend out removed entries or iterate over all entries <strong>with</strong> a commonproperty. For this purpose all index spaces hold a reference called super space totheir superset. The super space of a root space points to itself.The class RangeSpace is the simplest, most compact implementation. It describesa continuous range of indices from a lower to an upper bound. It is used when thisgrowing behavior suffices. In case it is necessary to remove indices from the space theclass RecycleSpace is used. It recycles removed indices later when new are created.Internally a BitSubspace is used which marks removed indices in a bitset. The classFilterSubspace allows to easily operate <strong>with</strong> subsets defined by a filter function. Thefilter is provided at instantiation as function object.Index maps, as shown in Figure 3.16, provide access to specific data stored for theobjects referred to by indices. Several maps can be created on the same index spaceand independently destroyed when they are no longer in use. The implementationsuse C++ templates to define the type of data to store. The index maps supply an54


3.5 Further Implementation NotesBOOST_FOREACH(Vertex v, vertices(g)) {put(deg, v, 0.0);BOOST_FOREACH(Edge e, out_edges(v,g)) {if (v == target(e,g)) // self-edgeat(deg, v) += 2 * get(ewgt, e);elseat(deg, v) += get(ewgt, e);}}Figure 3.17: C++ Example: Calculation of Vertex Degrees. Input are the graph gand the edge weights ewgt. The index map deg will contain the computed degrees.automatic memory management and are resized when the underlying index spacegrows. To implement this the map holds a reference to its index space. An observeris registered at the space to inform the map when new storage has to be allocated.In order to efficiently implement the access to data just continuous ranges of integerindices are used in this implementation. Index Maps provide storage space for indicesfrom a lower to an upper bound. The lower bound is always zero and the upperbound grows <strong>with</strong> the creation of new entries.The only real implementation is ChunkIndexMap. It stores the data not in a singlehuge array but in a sequence of fixed size array called chunks. Access to entries isperformed in two steps by splitting the numeric index in a chunk number and theinner offset. In practice this does not cost much but allows to allocate memorymuch easier. When the index space grows no resizing of arrays and moving of datais necessary. Just a new chunk is allocated when necessary. In addition it is easier tofind memory space for fixed-size chunks than for huge continuous arrays. The otherimplementation ConstantDummyMap provides read-only access to a fixed value. Itis used in some specializations of algorithms to represent unit edge weights.<strong>Graph</strong>s are implemented using a combination of index spaces and maps. Theycontain a separate space for the vertices and for the edges. A map over the edgesis used to store the start- and end-vertex indices. Separate implementations fordirected and undirected (symmetric) graphs exists. Both allow iteration over theoutgoing edges of a vertex and insertion and removal of edges. For this purpose twoadditional index maps provide access to sibling edges like in a double-linked list anda reference from each vertex to the start of its edge list. The undirected variant isused during graph coarsening and enforces for each edge the existence of its inverseedge. This enables the implementation of a vertex merge operation similar to edgecontraction.Access to the graphs is mainly provided in the style of the Boost <strong>Graph</strong> Library(BGL) [70]. Besides basic graph algorithms it provides abstract concepts for accessingand manipulating graphs. The key point is the use of external adapters. Insteadof defining a single interface to graphs the concepts define separate, external functions.With this strategy the functionality can be extended <strong>with</strong>out the necessityto extend existing interfaces and replace proxy classes. Simply new functions are55


3 The Multi-Level Refinement AlgorithmFilenamegname.meta.gzgname.vertices.gzgname.edges.gzgname.cluster.algname.gzgname.cluster.algname.loggname.cluster.algname.meta.gzgname.cluster_best.gzgname.cluster_info.gzgname.vertex-degree.gzDescriptionmeta-information about the graph gname(source, content, type, . . . )the vertex data (vertex name)the edge data (start-vertex, end-vertex, weight)the clustering computed <strong>with</strong> algorithm algnamefree form log-file of the algorithmconfiguration of the algorithm(parameters, cluster count, modularity, . . . )copy of the best found clusteringtable comparing the clusterings(algorithm, modularity, runtime, . . . )weighted and unweighted vertex degreesTable 3.3: Hierarchical Naming Convention for File Namesadded which retrieve the graph object as argument. <strong>Based</strong> on the argument typesthe compiler then exploits this function overloading to select the correct implementation.Another advantage of this strategy is better readability. Many small details can behidden behind the adapter functions. Therefore such adapters are provided for theown graph implementations and index maps. The C++ code in Figure 3.17 showsa small example to calculate the weighted degree of all vertices. Using a for-eachconstruct the outer loop iterates over all vertices of the graph. At each vertex theout-going edges are visited by the inner loop and their weights summed up.In summary the index spaces and maps <strong>with</strong> their abstract access by indicesare conceptually very similar to graphs and property maps in Boost. Here theconcept is extended by the explicit integration of index spaces. Without them itis not possible to directly describe the dependencies between property maps andthe structures they attribute. The Boost <strong>Graph</strong> Library already supplies the user<strong>with</strong> basic implementations for various graph types. However the simple collections(vectors, maps) of the Standard Template Library are used which do not scale wellto big graphs. Here the chunks-based implementation for index maps provides ascalable alternative.3.5.2 Data ManagementThe management of graphs, clusterings and other data faces some special requirementsin this project. Most importantly the evaluation will handle many graphsand produce huge amounts of clusterings. Thus in each clustering no duplicate informationlike the graph structure and vertex names should be stored. Besides theactual graph data quite some structured meta data emerges. This includes informationabout used algorithms, parameters, source references and comments. This datashould be easily accessible to computers and humans. And finally the data shouldbe easy to import in other software like GNU-R for post-processing.In consequence simple text files similar to the CSV (comma separated values)56


3.5 Further Implementation Notesformat are used. Each file stores exactly one table. The columns are separated bytabs (\t) and rows by a single newline (\n )character. Strings may be delimitedby quotation marks. The value NA may be used for missing values. The first rowcontains the table header <strong>with</strong> column titles and the first column is always used asindex. To save storage space all files are transparently compressed using the gzipformat.Separate files are used for different data. For example graphs are stored in threefiles containing vertex data, edge data, and meta data. For each vertex its name ororiginal identifier is stored. The edge table contains the pairs of end-vertex indicesand the edge weights. Additional files are used to store clusterings and similar.In order to retrieve data easily a hierarchical naming convention is employed.The dot is universally used as separator in filenames. The first component namesthe graph and following components differentiate data sets. The naming schemeis best explained by example as in the table below. Meta data is organized in asimilar fashion. Here character strings are used as row indices and the dot is usedas hierarchical separator.57


4 EvaluationThis chapter evaluates the family of algorithms developed in the previous chapter.The algorithms are composed of different components like coarsening method, mergeselector and so on. Specific algorithms are configured using a set of parameters.The key aspects of the evaluation are effectiveness, reliability, simplicity, andefficiency. An algorithm is effective when it finds better clusterings in terms of themodularity measure compared to the alternatives. This is strongly related to thereliability, i.e. finding these clusterings systematically and not just sometimes byrandom chance. Simplicity and efficiency are concerned <strong>with</strong> the question whethercomplicated or expensive components of the algorithm significantly improve theresults. This includes the study of the scalability to check how the runtime ofcomponents develops <strong>with</strong> the graph size. Expensive components <strong>with</strong> just small orno improvements could be omitted to simplify future implementations. At the sametime it might be possible to gain similar improvements <strong>with</strong> less effort using otherheuristics.Not all of these aspects can be treated in full depth in the scope of this work.Instead the evaluation mostly concentrates on the effectiveness compared to otherconfigurations and reference algorithms.The chapter is organized as follows. The next section summarized the configurationspace and introduces the main evaluation methods used in this work. The nextthree section study the effectiveness of the graph coarsening, the merge selectors andthe refinement. Efficiency aspects are also discussed where appropriate. The fourthsection discusses experimental results on the scalability of the algorithms. And thelast two sections compare the presented multi-level refinement method against otherclustering algorithms and results collected from the literature.4.1 Methods and DataThis section centrally discusses various aspects of the employed evaluation methods.The first subsection summarizes the algorithm components and describes the configurationspace spanned by the parameters. The second subsection introduces themean modularity method used to study the effectiveness of clustering algorithms.In this context also the collection of benchmark graphs is presented. Finally thelast subsection contains some notes on the evaluation of computation times andscalability.4.1.1 Configuration SpaceThe multi-level algorithm is configured by the parameters listed in the Table 4.1.This includes for example the coarsening and refinement methods and configuration59


4 Evaluationparameter component description+values defaultcoarsening method coarsening method to merge cluster pairs;greedy grouping, greedy matchingreduction factor coarsening number of clusters to merge ineach coarsening level; 5%–50%match fraction coarsening number of best ranked pairsto consider in matching; 50%–100%merge selector coarsening ranking of cluster pairs;modularity increase, weight density,RW distance, RW reachability,spectral length, spectrallength difference, spectral anglegreedy grouping10%50%weight densityRW steps selector length of random walks 2RW iterations selector iterative applications of reachability3spectral ratio selector number of eigenvectors to use,20%cut off value for λ j /λ 1spectral max ev selector maximal number of eigenvectors 30refinement method refinement method to move vertices;complete greedy, sorted greedy,Kernighan-Linvertex selector refinement ranking of vertices insorted greedy refinement;mod-fitness, eo-fitness, densityfitnesssearch depth refinement when to abort inner-loop searchin Kernighan-Lin, multiplied bylog 10 |V |sorted greedydensity fitness25Table 4.1: The Configuration Space60


4.1 Methods and Dataof merge selectors. The parameters are grouped into a coarsening and refinementphase depending on what they directly influence. Some merge selectors are furtherconfigurable. Their parameters are placed in the component called selector. Notethat not all parameters are meaningful in all combinations. For example the matchfraction is used only by greedy matching. The last column contains the defaultvalues used throughout the evaluation were nothing different is stated.The parameters controlling spectral merge selectors and the search depth are fixedand their influence will not be further evaluated for following reasons. The vertexvectors used in spectral merge selector always are approximations. Using moreeigenvectors should improve the accuracy, but requires more memory and time tocompute them. Thus the maximal number of eigenvectors is limited to 30. In graphs<strong>with</strong> less than 30 clusters also less eigenvectors contain meaningful information. Byassuming that the positive eigenvalues quickly decay to zero only the few eigenvectors<strong>with</strong> eigenvalues larger than 20% of the largest eigenvalue are used. The searchdepth is only used by the Kernighan-Lin refinement. Previous experiments in Section3.4.3 on refinement algorithms showed that it can be restricted to 20 log 10 |V |moves <strong>with</strong>out missing better clusterings. For safety the factor 25 will be used.4.1.2 EffectivenessThe main tool used in this evaluation is the comparison of mean modularities. Forthis purpose clusterings produced by specific algorithm configurations on a set ofgraphs are collected. The mean modularity of a configuration over the graphs iscalculated using the arithmetic mean.Of course graphs <strong>with</strong> high modularity most influence the absolute value of themean modularity. But the modularity measure is normalized to the range from zeroto one. Thus all modularity values will be in the same range regardless of differinggraph properties. In addition here the absolute values are unimportant as algorithmsand configurations are compared. The differences between mean modularities aremost influenced by graphs <strong>with</strong> high modularity variations between the algorithms.In many places also the results gained <strong>with</strong> and <strong>with</strong>out refinement will be compared.Without refinement it becomes visible how much parameters influence theraw graph coarsening. On the other hand results <strong>with</strong> refinement indicate how wellcoarsening <strong>with</strong> the observed parameters supports the refinement algorithms later.Kernighan-Lin refinement is expected to be more resistant against difficulties as itcan escape from local optima. Hence instead just sorted greedy refinement <strong>with</strong> thedensity-fitness vertex selector is used for this comparisons. This reference configurationwill be abbreviated by sgrd in most places.As a useful side effect the modularity improvements <strong>with</strong> refinement over no refinementprovide a significance scale. Fluctuations of the modularity much smallerthan this scale are probably not of much interest. Similarly improvements of differentcoarsening configurations which are easily compensated by greedy refinementare neither important.For many questions of the evaluation tables containing mean modularities willbe produced. However it is difficult to compare these raw values. Therefore wherepossible the mean modularity results will be plotted and graphically visualized. The61


4 Evaluationsignificance scale also becomes visible in such figures by including results gained<strong>with</strong> and <strong>with</strong>out refinement.It can be expected that not all graphs are equally well suited for the study andcomparison of clustering algorithms. Obviously some may not contain any meaningfulcluster structure. Structural properties like the number of clusters, inhomogeneouscluster size, and the degree of separation will influence the algorithms. Mostimportantly approximation algorithms and heuristics directly or implicitly exploitstructural properties of the graphs to make their local decisions.Therefore it is advisable to not blindly use a few graphs for evaluation but alsostudy their structural properties. Unfortunately still only little is known about theinfluence of various structural properties on the modularity clustering problem. Inorder to avoid pitfalls here the effectiveness of clustering methods is studied using alarge collection of real-world graphs from different application domains.The next paragraph shortly comments on why no randomly generated graphs wereused in this study. Finally the second paragraph introduces the benchmark graphs.On Random <strong>Graph</strong>s Using graph generators it is very easy to acquire huge amountsof benchmark graphs <strong>with</strong> roughly known structural properties. In the past quitesome studies on clustering algorithms, for example [18], used simple generatedgraphs. Still random graphs should to be used <strong>with</strong> caution for following reasons:• Depending on the generator the expected and the actual number of edges inthe generated graph do not match. In that case density and optimal qualityvary even <strong>with</strong> identical parameters.• The random graph model determines the vertex degree distribution and manyother structural properties. For example placing edges <strong>with</strong> equal probabilitybetween vertices (Bernoulli random graph) produces a binomial distribution.On the other hand certainly only few graphs in real-world applications willhave such a structure. At the same time in some application domains it is stilldifficult to generate graphs <strong>with</strong> properties close to the real structures.• Some volume models depend on the vertex degree which again depends onwhere the edges are placed. This interdependency makes it very difficult toproduce random graphs <strong>with</strong> specific intra- and inter-cluster densities.• The intended clustering of generated graphs appear to be known, but anotherclustering might well have a better quality for the chosen quality measure. Onsmaller random graphs the fluctuations may dominate the clusterings.The Real-World Benchmark <strong>Graph</strong>s Many researchers and institutions publishedtypical graphs of their research areas on the Internet. In the context of modularityclustering for example Mark Newman, Alex Arenas, and Andreas Noack collectedan published some interesting graphs. Besides these small, individual collections thePajek project [8] brought together quite many graphs from different disciplines.These collections were used to compile a set of benchmark graphs. The completeset is listed in Table A.1 in the appendix. For each graph the number of vertices62


4.1 Methods and Dataand edges, the weighted mean vertex degree (mean wdeg) and the global connectiondensity in the degree volume model (wgt. density) is printed. In addition the sourceof the graph is indicated in the last column and Table A.2 provides web addressesto this data.For various reasons also subsets of the collection will be used. For example somesingle evaluation steps will compare a big number of configurations. To keep feasiblecomputation times a reduced graph set will be employed. The second column marksthe graphs of the reduced set <strong>with</strong> R. In addition the graphs from the scalabilityanalysis are marked <strong>with</strong> S and from the comparison to reference algorithms <strong>with</strong>C.Many graphs of the collection are directed and contain parallel edges. But themodularity measure is defined for simple, undirected graphs. Therefore the graphswere pre-processed <strong>with</strong> the following aim: Between each adjacent pair of verticeslies exactly one edge in each direction and their weight equals the sum of the originaledge weights between both vertices. This pre-processing is allowed as the focus ofthis work lies on the evaluation of algorithms and not in the interpretation of singleclustering results. However other publications may have used different normalizationstrategies.The pre-processing is accomplished in four steps: First parallel edges are removedand their weight is added to the first edge. Then missing inverse edges are added<strong>with</strong> zero weight. Self-edges are used as their own inverse edge. In the third passthe edge weights are made symmetric by taking the sum of each edge and its inverseedge, ignoring self-edges. Finally in disconnected graphs the largest connectivitycomponent is chosen. These graphs are labeled <strong>with</strong> the suffix main.4.1.3 Efficiency and ScalabilityIn order to compare the efficiency of different configurations information about thecomputational expenses are necessary. Then trade offs between quality and runtimeare identified in combination <strong>with</strong> measured differences in the mean modularity.Unfortunately it is not advisable to compare average computation times. Thereis no shared time scale between the graphs. The average values would be stronglyinfluenced by the few largest graphs. Therefore just the measured times from a singlegraph are compared. Through the evaluation the graph DIC28 main is used. Forsome aspects also the graph Lederberg main is considered. All timings were measuredon a 3.00GHz Intel(R) Pentium(R) 4 CPU <strong>with</strong> 1GB main memory. Similar to themean modularity comparing the runtime <strong>with</strong> and <strong>with</strong>out refinement provides asignificance scale.Computation times measured on different graphs can be used to study the scalabilityof the algorithms. By nature the runtime depends on the number of verticesand edges. Plotting the measured times against the number of vertices will visualizethese dependencies. In practice the absolute times are uninteresting as they also dependon startup time, implementation style, and the runtime environment. Insteadthe progression of runtime curves of different algorithms is compared.63


4 EvaluationModularity by Match Fractionmean modularity0.565 0.570 0.575 0.580 0.585 0.590 0.595●●●M−sgrd 10M−sgrd 30M−none 10M−none 30●●●20 40 60 80 100match fraction [%]Figure 4.1: Mean Modularity by Match Fraction (reduced set)4.2 Effectiveness of the <strong>Graph</strong> CoarseningThis section has three aims. First the best value for the parameter match fractionused by the matching algorithm is searched. Then greedy grouping is comparedagainst the best greedy matching variant. And finally the influence of the reductionfactor on coarsening and refinement is studied.4.2.1 Match FractionThe match fraction controls how many of the best ranked cluster pairs are consideredwhen constructing the matchings. All other pairs are simply ignored in order to avoidselecting pairs ranked very low just because all better pairs are blocked by previousmatches.The greedy matching is evaluated in the default configuration. Two differentreduction factors 10% and 30% and match fraction values 10%, 25%, 50%, 75% and100% are used. With 100% no pairs are ignored. The parameters are compared <strong>with</strong>and <strong>with</strong>out refinement. In summary the two basic algorithms are greedy matching<strong>with</strong> refinement (M-sgrd) and <strong>with</strong>out (M-none).Table 4.2 compares the mean modularities gained <strong>with</strong> the 20 configurations. Asbenchmark graphs the reduced set is used. The rows contain the two algorithmscombined <strong>with</strong> both reduction factors. The columns list the match fractions. Themaximum value of each row is marked in order to highlight the best match fraction.In addition Figure 4.1 contains a plot of the four rows.As visible in the table and the figure in average best modularities are gained<strong>with</strong> the 50% match fraction. Thus this value is used as default in the followingevaluations. Altogether the influence of this parameter seems to be negligible.64


4.2 Effectiveness of the <strong>Graph</strong> Coarsening10% 25% 50% 75% 100%M-none 10% 0.56318 0.56436 0.56595 0.56590 0.56572M-sgrd 10% 0.59582 0.59573 0.59592 0.59566 0.59567M-none 30% 0.56415 0.56336 0.56431 0.56441 0.56252M-sgrd 30% 0.59516 0.59513 0.59453 0.59432 0.59432Table 4.2: Mean Modularity by Match Fraction. Columns contain the match fractionand rows contain the algorithm and reduction factor.mean modularity0.56 0.57 0.58 0.59Modularity by Reduction Factor● ● ● ● ●●G−noneG−sgrdM−noneM−sgrdmean modularity0.590 0.592 0.594 0.596●Reduction Factor vs. Modularity and Runtime●●●●modularityruntime40 60 80 100 120 140runtime [s]●20 40 60 80 100reduction factor(a) Modularity by Reduction Factor (reduced set)20 40 60 80 100reduction factor [%](b) Runtime and Modularity on DIC28 mainFigure 4.2: Modularity and Runtime by Reduction Factor4.2.2 Coarsening MethodsNext greedy grouping and greedy merging are compared. To stay independent ofspecific reduction factors the values 5%, 10%, 30%, 50%, and 100% are used. Thefactor controls the number of coarsening levels and thus indirectly influences the refinementlater. The value 100% produces just one coarsening level. However factorsabove 50% cannot be used <strong>with</strong> greedy matching. Again the default parametersare used. In summary the four algorithms are greedy grouping <strong>with</strong> refinement(G-sgrd), <strong>with</strong>out refinement (G-none), greedy matching <strong>with</strong> refinement (M-sgrd),and <strong>with</strong>out (M-none).Table 4.3 compares the mean modularities gained <strong>with</strong> the 18 configurations onthe reduced graph set. The rows contain the four algorithms and the columns thereduction factors. The maximum value of each column is highlighted to better seethe best algorithm. Additionally Figure 4.2a visualizes these values.The second Table 4.4a compares the runtime of the configurations on the largestgraph of the set (DIC28 main). Furthermore the number of actually producedcoarsening levels is printed in parenthesis.65


4 Evaluation5% 10% 30% 50% 100%G-none 0.56849 0.56849 0.56849 0.56846 0.56846G-sgrd 0.59689 0.59677 0.59617 0.59557 0.58852M-none 0.56616 0.56595 0.56431 0.55364M-sgrd 0.59613 0.59592 0.59453 0.59154Table 4.3: Mean modularity by Reduction Factor. The Columns are the reductionfactor and rows list the algorithms. Empty cells are impossible combinations.5% 10% 30% 50% 100%G-none 50.084s (123) 26.579s (60) 17.285s (19) 16.785s (10) 17.152s (2)G-sgrd 136.239s (123) 80.033s (60) 40.683s (19) 33.463s (10) 30.903s (2)M-none 45.77s (123) 22.2s (60) 11.614s (20) 11.235s (17)M-sgrd 139.139s (123) 72.939s (60) 36.83s (20) 32.077s (17)(a) Runtime and Coarsening Levels on DIC28 main5% 10% 30% 50% 100%G-none 30.289s (129) 13.518s (61) 6.85s (18) 5.604s (10) 4.558s (2)G-sgrd 40.514s (129) 19.513s (61) 9.851s (18) 8.261s (10) 7.656s (2)M-none 56.541s (200) 47.313s (200) 41.586s (200) 39.505s (200)M-sgrd 62.994s (200) 49.97s (200) 43.598s (200) 43.89s (200)(b) Runtime and Coarsening Levels on Lederberg mainTable 4.4: Runtime by Reduction Factor. The columns are the reduction factor androws list the coarsening method. The cells contain the runtime in seconds and thenumber of coarsening levels. Empty cells are impossible combinations.66


4.3 Effectiveness of the Merge SelectorsAs another extreme the times measured on the graph Lederberg main are listedin Table 4.4b. While the mean degree of this graph is around 10 it contains somehigh-degree vertices <strong>with</strong> an extreme vertex having 1103 incident edges. Here thematching algorithm reached the maximal number of coarsening levels (200) regardlessof the reduction factor.Concerning grouping vs. matching, the table and plot of mean modularities showsthat greedy grouping always performed slightly better than matching. Moreover thebehavior of the matching algorithm on the graph Lederberg main confirmed that itis unreliable. Hence greedy grouping should be used as coarsening method.4.2.3 Reduction FactorIn order to find the best reduction factor the mean modularity Table 4.3 of theprevious subsection is reused. In addition Figure 4.2b shows a plot of the meanmodularities of G-sgrd and the runtime on the graph DIC28 main against the reductionfactors.Without refinement the reduction factor has no real influence on greedy grouping(G-none). Just a small influence is visible <strong>with</strong> greedy matching (M-none). Maybebecause <strong>with</strong> more coarsening levels it select less bad cluster pairs. Combined <strong>with</strong>greedy refinement smaller reduction factors improve the clusterings. However halvingthe reduction factor from 10% to 5% doubles the runtime and the number ofcoarsening levels while the difference in mean modularity is very small. Therefore asa trade off between quality and runtime the reduction factor 10% is a good choice.4.3 Effectiveness of the Merge SelectorsThe merge selector is used by the coarsening methods to rank cluster pairs in theselection process. This section experimentally compares the developed merge selectorsto find the best. First the best parameters for the random walk distance andreachability selectors are searched. Then these are compared <strong>with</strong> the other mergeselectors.4.3.1 Random Walk DistanceThe random walk distance selector has one parameter RW length controlling thelength of random walks. Here the lengths 1–5 are studied. The clusterings arecompared using greedy grouping <strong>with</strong> refinement, named RWdist-sgrd, and <strong>with</strong>out,named RWdist-none. All other parameters are set to their default values.Table 4.5 summarizes the mean modularities gained <strong>with</strong> the 10 configurations onthe reduced graph set. The columns contain the length of random walks and the rowscontain the two basic algorithm combinations. The best mean modularity of eachrow is highlighted to make good lengths visible. The same values are also plotted inFigure 4.3a. As reference the dashed lines mark the mean modularity produced bythe standard weight density selector <strong>with</strong> refinement (WD-sgrd) and <strong>with</strong>out (WDnone).Appendix B.1 contains a detailed table of the clusterings produced on eachgraph.67


4 Evaluationmean modularity0.50 0.52 0.54 0.56 0.58 0.60●Modularity <strong>with</strong> RW Distance●●●●●RWdist−sgrdRWdist−noneWD−sgrdWD−noneruntime [s]0 500 1000 1500 2000 2500 3000●●Runtime <strong>with</strong> RW Distance (DIC28_main)RWdist−sgrdRWdist−none●●●●1 2 3 4 5length of random walks(a) Modularity <strong>with</strong> Random Walk Distance1 2 3 4 5random walk length(b) Runtime <strong>with</strong> Random Walk Distance onDIC28 mainFigure 4.3: The Random Walk Distance1 2 3 4 5RWdist-none 0.48510 0.54798 0.54424 0.55213 0.55495RWdist-sgrd 0.53202 0.57045 0.56610 0.57051 0.57226Table 4.5: Mean modularity <strong>with</strong> Random Walk Distance. Columns contain thelength of random walks and rows contain the algorithm.68


4.3 Effectiveness of the Merge SelectorsModularity <strong>with</strong> RW ReachabilityRuntime <strong>with</strong> RW Reachability (DIC28_main)mean modularity0.56 0.57 0.58 0.59●● RWreach−sgrd 1RWreach−sgrd 3RWreach−sgrd 5RWreach−none 1RWreach−none 3RWreach−none 5WD−sgrdWD−none●●●time [s]0 200 400 600 800 1000 1200●● RWreach−sgrd 1RWreach−sgrd 3RWreach−sgrd 5RWreach−none 1RWreach−none 3RWreach−none 5●●●1.0 1.5 2.0 2.5 3.0 3.5 4.0length of random walks(a) Modularity <strong>with</strong> Random Walk Reachability1.0 1.5 2.0 2.5 3.0 3.5 4.0length of random walks(b) Runtime <strong>with</strong> Random Walk Reachability onDIC28 mainFigure 4.4: The Random Walk ReachabilityFinally Figure 4.3b compares the runtime on the graph DIC28 main. The values<strong>with</strong> and <strong>with</strong>out refinement are plotted to provide a time scale relative to therefinement costs.The first table and figure show that the random walk distance selector becomesslightly better <strong>with</strong> longer random walks. However the mean modularities are verylow compared to the weight density selector. With refinement RWdist is nearly asgood as weight density is already <strong>with</strong>out any refinement. In fact RWdist foundbetter clusterings just on the two smallest graphs (SouthernWomen and dolphins).At the same time computing this selector is extremely expensive compared to therefinement. Therefore it is removed from the further evaluation.4.3.2 Random Walk ReachabilityThe random walk reachablity modifies the edge weights using short random walks.This modification operator can be applied on its own results and the RW iterationsparameter controls this number of applications. Here the values 1, 3, and 5 areconsidered. Again the parameter RW length defines the length of the random walksand the values 1–4 are used. The combination <strong>with</strong> length one and just one iterationis equal to the original weight density selector. Because iterated application doesnot change the weights the other iteration counts were excluded in this combination.The remaining setup is identical to the previous evaluations and the two algorithmsare named RWreach-sgrd and RWreach-none.Table 4.6 summarizes the mean modularities produced by the 20 configurationson the reduced set of graphs. The columns contain the length of the walks andthe rows the two algorithms RWreach-sgrd and RWreach-none combined <strong>with</strong> the69


4 Evaluation1 2 3 4RWreach-none 1 0.56852 0.57150 0.57218 0.57380RWreach-sgrd 1 0.59678 0.59616 0.59663 0.59676RWreach-none 3 0.57530 0.57672 0.57468RWreach-sgrd 3 0.59658 0.59609 0.59760RWreach-none 5 0.57720 0.57102 0.55942RWreach-sgrd 5 0.59732 0.59464 0.59016Table 4.6: Mean modularity <strong>with</strong> Random Walk Reachability. Columns contain thelength of random walks and rows contain the algorithm combined <strong>with</strong> the numberof iterations. Empty cells are impossible combinations.number of iterations. Figure 4.4a shows a plot of the mean modularities vs. randomwalk length. Again the dashed lines mark the mean modularity produced by theweight density selector. Appendices B.2 and B.3 contain two detailed tables of themodularities gained on each graph <strong>with</strong> walks of length 2 and 3.In addition Figure 4.4a shows the runtime measured on the graph DIC28 main.Again the runtime difference between the sorted greedy refinement and no refinementprovides a visual relation to other components of the algorithm.In the mean modularity plot some small improvements are visible <strong>with</strong>out refinement.However <strong>with</strong> refinement these completely disappear. Three iterations seemto be slightly better then just one iteration. Using five iterations has a negativeimpact when combined <strong>with</strong> longer walks.In summary no clear tendency or improvements were measurable. Maybe the usedbenchmark graphs are structurally unsuited for these merge selectors. In the furtherevaluation simply two steps long walks <strong>with</strong> three iterations (RW reachability (2,3))are chosen as default. This combination still can be computed quickly and does notreally degrade the mean modularity.4.3.3 Comparison of Merge SelectorsTo find the best merge selector in terms of effectiveness and efficiency this sectioncompares mean modularities produced <strong>with</strong> each merge selector and their runtime.The larger graph collection is used to gain more reliable mean values. As a side effectthis allows to verify whether the results from the reduced graph set are representative.For the same reason greedy matching is considered again checking whether itworks better <strong>with</strong> other merge selectors than the weight density.The studied configurations are composed of three components: Merge selector,coarsening method, and refinement method using the naming scheme Selector-Coarsening-Refinement. Cluster pairs are selected either by modularity increaselike used in the greedy joining algorithm of Newman (MI ), or by the weight density(WD), random walk reachability (RWR), spectral length (SPL), spectral length difference(SPLD), or the spectral angle (SPA). As coarsening method greedy grouping(G) and greedy matching (M ) are used. The third component is the refinement algorithm.Like before no refinement (none) and sorted greedy refinement (sgrd) isused.70


4.3 Effectiveness of the Merge SelectorsModularity by Merge Selector (large set)mean modularity0.44 0.46 0.48 0.50 0.52 0.54●●●●●●●G−noneM−noneG−sgrdM−sgrdSPL SPLD SPA MI WD RWRmerge selectorFigure 4.5: Mean Modularity of the Merge Selectors (large set)Runtime by Merge Selector (DIC28_main)Runtime by Merge Selector (Lederberg_main)runtime [s]20 50 100 200 500 1000 2000● ● ●●●●G−noneM−noneG−sgrdM−sgrd●runtime [s]20 50 100 200 500●●●●●●G−noneM−noneG−sgrdM−sgrd●SPL SPLD SPA MI WD RWRmerge selector(a) Runtime by Merge Selector on DIC28 mainSPL SPLD SPA MI WD RWRmerge selector(b) Runtime by Merge Selector onLederberg mainFigure 4.6: Runtime of the Merge Selectors71


4 EvaluationG-none M-none G-sgrd M-sgrdSPL 0.43653 0.44014 0.48521 0.51962SPLD 0.45479 0.48348 0.49926 0.53231SPA 0.52405 0.52252 0.54041 0.54341MI 0.52963 0.51985 0.53968 0.54369WD 0.52305 0.51866 0.54608 0.54529RWR 0.52971 0.52768 0.54718 0.54744Table 4.7: Mean Modularity of Merge Selectors. Columns contain the algorithmsand rows the merge selectors.Table 4.7 displays the mean modularities of all 24 configurations on the large graphset. The rows contain the six merge selectors and the columns the four algorithmvariants. The rows are sorted by their overall mean modularity. Figure 4.5 shows aplot of the same data.In order to compare the runtime of the configurations the times measured on thegraphs DIC28 main and Lederberg main are shown in Figures 4.6a and 4.6b. Alogarithmic scale is used to easier differentiate the runtime.The first two spectral merge selectors SPL and SPLD did not perform well interms of runtime and clustering results. This could also be expected from theirmathematical derivation as approximation of the modularity increase. Howeverthe modularity values on these two show that the matching (M ) method is lesssensitive against bad selectors than grouping (G). All other merge selectors SPA,WD, MI, and RWR yielded very similar clustering results when compared to themodularity improvement possible by refinement. This is an interesting observationbecause the modularity increase selector (MI ) was expected to produce worse resultsthan the weight density (WD). Very small modularity improvements over the otherselectors were gained by the random walk reachability (RWR). On the better mergeselectors greedy matching (M ) turned out to be as good as greedy grouping (G)when refinement was used. Without refinement grouping produced slightly betterresults. On DIC28 main matching <strong>with</strong> refinement was slightly faster than grouping.But on the graph Lederberg main <strong>with</strong> its extremely skewed distribution of vertexdegrees greedy grouping always was considerably faster.Altogether the weight density selector (WD) often had the lowest runtime whilestill producing good mean modularities. Therefore it will be used as default merge selector.On the large graph set the grouping and matching methods were comparablygood in terms of mean modularity. This result slightly differs to the measurementsfrom the reduced graph set.4.4 Effectiveness of the Cluster RefinementThis section compares the multi-level refinement algorithms against raw graph coarsening<strong>with</strong>out refinement. The first subsection concentrates on the greedy refinementvariants to select the best in terms of modularity improvement and search speed.And finally the second subsection compares the best greedy refinement against the72


4.4 Effectiveness of the Cluster RefinementModularity by Refinement MethodRuntime by Refinement Method (DIC28_main)mean modularity0.570 0.575 0.580 0.585 0.590 0.595runtime [s]0 1000 2000 3000 4000none SGR−mod SGR−eo CGR SGR−density KL(a) Modularity by Refinement Methodnone SGR−mod SGR−eo CGR SGR−density KL(b) Runtime by Refinement Method onDIC28 mainFigure 4.7: Mean Modularity by Refinement Method (reduced set)presented Kernighan-Lin method.4.4.1 Greedy RefinementThe complete greedy refinement method moves the vertex <strong>with</strong> the best modularityincrease in each move step until no further improvement is possible. However itis computational expensive to find the best vertex because it requires to evaluateall possible moves of all vertices in each step. For this reason the sorted greedyrefinement was introduced. It saves search time by trying to move the vertices ina specific order according to a vertex selector. In the following evaluation the bestof the available vertex selectors is selected. And finally it is studied whether thecomplete greedy refinement can be replaced by the sorted variant <strong>with</strong>out degradingthe results too much.The three sorted greedy refinement methods are SGR-density using the densityfitnessvertex selector, SGR-mod using the mod-fitness, and SGR-eo using the eofitness.The complete greedy refinement is named CGR. To provide a significancescale Kernighan-Lin refinement (KL) and no refinement (none) is included. All otherparameters are set to their default values, i.e. greedy grouping by weight density isused for the coarsening.Table 4.8 summarizes the mean modularity produced by each refinement algorithmon the reduced graph set. The second column displays the runtime of the algorithmson the graph DIC28 main. The rows are sorted by their mean modularity. Inaddition to the table the bar plot 4.7a visually compares the mean modularities.The runtime is also plotted in Figure 4.7b.Between the three sorted greedy variants (SGR-density, SGR-mod, and SGR-73


4 Evaluationmean modularity time DIC28 main mod. DIC28 mainnone 0.56849 24.11900 0.80154SGR-mod 0.59661 75.93500 0.84754SGR-eo 0.59664 74.89400 0.84727CGR 0.59672 798.17800 0.84758SGR-density 0.59677 76.71600 0.84747KL 0.59792 4672.12100 0.84781Table 4.8: Mean Modularity by Refinement Method (reduced set). The first columncontains mean modularities and the second column lists the runtime on the graphDIC28 main.none SGR-density CGR KLmean modularity 0.52305 0.54608 0.54610 0.54810Table 4.9: Mean Modularity by Refinement Method (large set)eo) no significant differences in modularity are visible. Their clustering results arealso comparable to complete greedy refinement (CGR). On the graph DIC28 mainthe complete greedy refinement was about 10 times slower than any sorted greedyrefinement. This is in agreement <strong>with</strong> the higher worst case complexity of thecomplete greedy refinement. Therefore sorted greedy refinement <strong>with</strong> the densityfitnessvertex selector (SGR-density) is chosen as default greedy refinement method.4.4.2 Kernighan-Lin RefinementThis subsection analyzes how much the refinement variants improve the clusteringresults compared to no refinement. This includes the question whether Kernighan-Lin refinement performs significantly better than greedy refinement. The algorithmsare configured like in the previous subsection. Considered are the variants none,SGR-density, CGR, and KL. The evaluation is applied to the large graph set to gainmore reliable mean modularity values. Table 4.9 lists the produced mean modularityvalues and Figure 4.8 provides a bar plot of the same values. Appendix B.4 containsa table of the single modularity values gained by the algorithms on each graph.On the large graph set the mean modularity was improved <strong>with</strong> sorted greedy refinement(SGR-density) by 4.4% compared to no refinement 1 . This range providesthe significance scale like already used in the previous evaluations. In comparisonKernighan-Lin refinement (KL) improved the results by 4.79%. The mean improvement<strong>with</strong> Kernighan-Lin refinement over sorted greedy refinement was 0.37%. Theruntime and modularity values of the graph DIC28 main from the previous subsectionshow that the Kernighan-Lin refinement was about 10 times slower than sortedgreedy refinement. At the same time there it improved the modularity by just0.04%. Like on the reduced set also on the large graph set sorted greed refinement(SGR-density) was equally good as the complete greedy refinement (CGR).1 max / min ∗100%74


4.5 ScalabilityModularity by Refinement Method (large set)mean modularity0.525 0.530 0.535 0.540 0.545none SGR−density CGR KLFigure 4.8: Mean Modularity by Refinement Method (large set)Altogether Kernighan-Lin refinement reliably improves the clusterings by a smallamount but requires considerably more runtime. Therefore it should be used inplace of sorted greedy refinement only when best clusterings are searched. Similaror better modularity improvements might be easily achievable by other refinementmethods in less time.4.5 ScalabilityThe purpose of this section is to experimentally study how well the runtime of thealgorithms scales <strong>with</strong> the graph size. The considered configurations are the multilevelKernighan-Lin refinement (KL), sorted greedy refinement by density-fitness(SGR-density), complete greedy refinement (CGR) and the raw graph coarsening<strong>with</strong>out refinement (none). For all other parameters the default values are used.The runtime of all graphs and algorithm has to be measured on the same computer.Thus only a subset of 24 graphs from the large graph collection is used. The graphsare also marked in the graph table in the appendix.Figure 4.9a shows the total runtime of the algorithms versus the number of vertices.In addition Figure 4.9b shows the computation time of the coarsening phase,which equals the configuration none. The time spend on sorted greedy refinementis included. In order to show the influence of the vertices the names of three largergraphs and their vertex count are inserted into the figure. The complete runtimemeasurements of each graph can be found in the appendix in Table B.5.Of course the runtime not only depends on the number of vertices but also onthe edges. On the other hand the number of edges mostly scales <strong>with</strong> the verticesbecause nearly all graphs have a similar mean vertex degree. The graphs eatRSand hep-th-new main are the two largest graphs of the collection. Both have morethan three times more edges than other graphs of similar vertex count (for example75


4 EvaluationRuntime vs. <strong>Graph</strong> SizeRuntime vs. <strong>Graph</strong> Sizeruntime [s]0 1000 2000 3000 4000●noneSGR−densityCGRKL● ●● ●● ● ● ● ● ● ●●●●●runtime [s]0 50 100 150●coarseningSGR−density <strong>with</strong>out coarsening● ●● ●● ● ● ●●●●eatRS 305501 edges ●hep−th−new_main 352059 edges ●●DIC28_main 71014 edges ●●0 5000 10000 15000 20000 25000vertices(a) Runtime by <strong>Graph</strong> Size0 5000 10000 15000 20000 25000vertices(b) Coarsening and Greedy Refinement Time by<strong>Graph</strong> SizeFigure 4.9: Runtime by <strong>Graph</strong> SizeDIC28 main). In the second figure it is visible that coarsening and refinementalso takes around three times longer on these graphs. Kernighan-Lin and completegreedy refinement even stronger depend on the number of edges. Both coarseningand sorted greedy refinement were equally fast on most graphs and scale much better<strong>with</strong> the graph size than complete greedy and Kernighan-Lin refinement.4.6 Comparison to Reference AlgorithmsThis section compares the developed algorithm to other publicly available algorithms.Many of these are implemented as proof of concept and only work onunweighted graphs <strong>with</strong>out self-edges. Therefore the evaluation is based on the 19graphs of the collection usable <strong>with</strong> all implementations. Information about thegraphs can be found in Appendix A.1. First the reference algorithms are introducedshortly. Then the modularity of the clusterings produced by the algorithms andtheir runtime are compared.The following algorithms are compared here. For each a short name is given tobe used in tables and summaries. One of the most popular algorithms is the fastgreedy joining (fgj ) of Clauset, Newman, and Moore [15]. 2 It is an agglomerationmethod selecting cluster pairs by highest modularity increase. Recently also themodified greedy joining of Wakita and Tsurumi [76] became available. 3 It selectscluster pairs by highest modularity increase multiplied <strong>with</strong> a consolidation ratiomin(|C i |/|C j |, |C j |/|C i |). The wakita HN variant uses the number of vertices as2 available at http://cs.unm.edu/~aaron/research/fastmodularity.htm3 available at http://www.is.titech.ac.jp/~wakita/en/76


4.6 Comparison to Reference Algorithmswalktrap leadingev wakita HE wakita HN fgj ML-none spinglass ML-sgrd ML-KLChain8 0.31633 0.31633 0.37755 0.31633 0.35714 0.35714 0.37755 0.35714 0.37755Star9 -0.21875 0.00000 -0.078125 -0.078125 -0.00781 -0.00781 -0.00781 0.00000 0.00000K55 0.00000 0.30000 0.30000 0.30000 0.00000 0.30000 0.30000 0.30000 0.30000Tree15 0.47449 0.50510 0.51276 0.50510 0.50510 0.50510 0.50510 0.50510 0.51276ModMath main 0.34641 0.42287 0.31712 0.35488 0.41051 0.42018 0.44880 0.42905 0.44880SouthernWomen 0.00000 0.21487 0.22775 0.24157 0.31467 0.31530 0.33121 0.31972 0.33601karate 0.39908 0.37763 0.35947 0.41880 0.38067 0.40869 0.41979 0.41979 0.41979mexican power 0.22869 0.29984 0.33772 0.31949 0.33089 0.32811 0.35119 0.34776 0.35952Grid66 0.47833 0.55000 0.42222 0.49236 0.49597 0.51319 0.53250 0.55000 0.54125Sawmill 0.40153 0.44745 0.49102 0.48660 0.55008 0.55008 0.53863 0.55008 0.55008dolphins 0.44043 0.48940 0.48687 0.44838 0.49549 0.51786 0.52852 0.52587 0.52587polBooks 0.49927 0.39922 0.46585 0.49138 0.50197 0.50362 0.52640 0.52724 0.52561adjnoun 0.17640 0.22153 0.24365 0.25766 0.29349 0.28428 0.31336 0.31078 0.31078sandi main 0.78157 0.78667 0.80995 0.82054 0.82729 0.82083 0.81619 0.82773 0.82773USAir97 0.28539 0.26937 0.34439 0.31526 0.32039 0.34353 0.36597 0.36824 0.36824circuit s838 0.70256 0.70215 0.76499 0.79488 0.80472 0.79041 0.81481 0.81551 0.81551CSphd main 0.88971 0.74219 0.91382 0.92001 0.92470 0.92484 0.90584 0.92558 0.92558Erdos02 0.62534 0.59221 0.66991 0.67756 0.67027 0.68484 0.70985 0.71592 0.71611DIC28 main 0.70285 0.70923 0.73570 0.74179 0.78874 0.80154 0.81747 0.84747 0.84781average 0.39630 0.43927 0.45803 0.46445 0.47180 0.49272 0.50502 0.50753 0.51100Table 4.10: <strong>Clustering</strong> Results of Reference Algorithmscluster size |C i | and the wakita HE variant the of number of edges leaving thecluster. A few other algorithms are available through the igraph library of Csárdiand Nepusz [16]. 4 The following were used. The spinglass algorithm of Reichardtand Bornholdt [67] optimizes modularity by simulated annealing on a physical energymodel. The implementation requires an upper bound on the number of clusters. Inthis evaluation the upper bound 120 was used for all graphs. A spectral method isNewman’s [58] recursive bisection based on the first eigenvector of the modularitymatrix (leadingev). Finally an agglomeration method based on random walks is thewalktrap algorithm of Pons and Latapy [64].For comparison the standard configuration <strong>with</strong> three refinement variants waschosen. All use greedy grouping by weight density <strong>with</strong> 10% reduction factor. Thevariant <strong>with</strong>out any refinement (ML-none) is similar to other pure agglomerationmethods, namely fgj, wakita-HN, and wakita-HE. The two other refinement variantsare sorted greedy refinement by density-fitness (ML-sgrd) and the improvedKernighan-Lin refinement (ML-KL).Table 4.10 lists the modularity of the clusterings found by the implementations.The graphs are sorted by number of vertices and the algorithms by mean modularity.The last row shows the arithmetic mean modularity for each algorithm. Foreach graph the algorithms <strong>with</strong> the best result are marked <strong>with</strong> a bold font. A similartable of comparing the runtime of the algorithms is included in Appendix B.6.Figure 4.10 summarizes the mean modularity gained by each algorithm and themeasured runtime.It is visible that in average greedy grouping by density (ML-none) is better than allother agglomeration methods (fgj, wakita HE, wakita HN ) even <strong>with</strong>out refinement.Yet on a few graphs other agglomeration methods perform slightly better. Looking atthe multi-level refinement methods (ML-sgrd, ML-KL) only the spinglass algorithm4 available at http://igraph.sourceforge.net/77


4 EvaluationComparison of ModularityAverage Modularity0.0 0.1 0.2 0.3 0.4 0.50.3963 0.4393 0.4580 0.4644 0.4718 0.4927 0.5050 0.5075 0.5110walktrap leadingev wakita_HE wakita_HN fgj ML−none spinglass ML−sgrd ML−KLAlgorithm(a) Modularity by AlgorithmComparison of Runtimeruntime [s]0 1000 2000 3000 4000 5000●walktrapleadingevwakita_HEwakita_HNfgjML−nonespinglassML−sgrdML−KL●●● ● ● ●0 5000 10000 15000 20000 25000vertex count(b) Runtime of the Algorithms vs. <strong>Graph</strong> SizeFigure 4.10: <strong>Clustering</strong> Results and Runtime of the Reference Algorithms78


4.7 Comparison to Published Resultsof Reichardt and Bornholdt produces comparable good clusterings. However it ismuch slower. On nearly all graphs it is outperformed by the multi-level Kernighan-Lin refinement (ML-KL) in terms of modularity and runtime. The three slowestimplementations were spinglass, ML-KL, and leadingev. All other algorithms had arelatively constant, low runtime.4.7 Comparison to Published ResultsThis section compares the presented multi-level refinement method against otherclustering algorithms. For many of these algorithms clustering results are publishedin the papers presenting the algorithm. Because of their size clusterings are directlyprinted only for very small graphs. Commonly just the modularity value of theclusterings are published.The following discussion is based on the algorithms listed in Section 2.4 aboutfundamental clustering methods. First some general problems of this evaluationmethod are discussed. Then for each graph the modularity values found in articlesare presented. The section concludes <strong>with</strong> a small summary table comparing thebest found values to own results.Only modularity values published together <strong>with</strong> the original algorithms were consideredand for each value the source article is cited. The modularity values arecompared against two multi-level Kernighan-Lin refinement algorithms. Both usegreedy grouping <strong>with</strong> 10% reduction factor and Kernighan-Lin refinement. As mergeselector the weight density is employed by the ML-KL-density variant and the randomwalk reachability (2,3) by the ML-KL-rw variant. The first variant is the defaultconfiguration identified by the previous evaluation. In addition the other variant waschosen because for degree volume model used here it is in many cases able to findbetter clusterings.The comparison of printed modularity values embodies some general problems.Often only a few digits are printed to save space. With just three digits it is difficultto check whether the same or just similar clusterings where found. In this regardalso printing the number of clusters would be helpful. The calculation of modularitymay be done slightly different. Several variants of the modularity measure exist andsmall variations in the handling of self-edges are possible. In addition unweightededges might have been used instead of the available weighted version. Finally smalldiversities can arise in preprocessing and graph conversion by different strategies toobtain undirected, symmetric graphs. Here just the last problem is addressed byexcluding graphs when their published number of vertices and edges differs from theown version. This applies to most graphs in [64, 25, 69].4.7.1 The <strong>Graph</strong>s and <strong>Clustering</strong>sIn the following paragraphs each graph is shortly presented. Thereafter the clusteringresults are discussed. For smaller graphs also example pictures are printed.The layout was computed <strong>with</strong> the LinLog energy model [62] 5 . The best clustering5 available at http://www.informatik.tu-cottbus.de/~an/GD/79


4 Evaluation(a) karate(b) dolphinsFigure 4.11: Reference <strong>Graph</strong>s (karate and dolphins)computed by any configuration of the own multi-level algorithm is shown as vertexcolor.karate This is Zachary’s karate club study [82], maybe the most studied socialnetwork. A picture of the graph colored <strong>with</strong> the optimum clustering is shown inFigure 4.11a. During the 2 years long study, 34 members of a karate club wereobserved and the number of contacts outside of the club were counted. Like inmost other studies of this network the simple unweighted version of the network isused here. The instructor of the club (vertex #1) resigned because of conflicts <strong>with</strong>the club administrator (vertex #34) about lesson fees. His supporters followed himforming a new organization <strong>with</strong> the members 1 to 9, 11 to 14, 17, 18, 20, 22. Person9 joined this faction although he was a supporter of the club president, because hewas only three weeks away from the test for black belt when the split occurred, andhad to follow the instructor to retain this rank.The greedy joining algorithm of Newman found 2 clusters <strong>with</strong> modularity 0.381 [60].By modifying the merge selector Danon found a clustering <strong>with</strong> modularity 0.4087 [17].The spectral clustering algorithms of Donetti et al. yielded modularity 0.412 [21]and 0.419 [20] in the improved version. Also Newman’s recursive spectral bisectionfound modularity 0.419 [58]. Surprisingly the algorithms based on random walksjust found clusterings <strong>with</strong> 0.3937 [65] and 0.38 [64]. Extremal Optimization <strong>with</strong>4 clusters of modularity 0.4188 [23] fell short of the optimum. As verified by [2]the global optimum clustering has 4 clusters <strong>with</strong> modularity 0.4197. The sameclustering was found by both multi-level algorithms.80


4.7 Comparison to Published Results(a) polBooks(b) afootballFigure 4.12: Reference <strong>Graph</strong>s (polBooks and afootball)dolphins The dolphin social network is an undirected graph of frequent associationsbetween 62 dolphins in a community residing in Doubtful Sound in New Zealand [51,50]. For a picture see Figure 4.11b.This graph just recently appeared in the context of modularity clustering. Thusonly very few published results exist. With Fractional Linear Programming a clusteringof modularity 0.529 and the upper bound 0.531 was found [2]. Also the exactInteger Linear Programming gave an optimum clustering <strong>with</strong> modularity 0.529 [13].Using multi-level refinement 5 clusters <strong>with</strong> modularity 0.52586 (ML-KL-density)and 0.52772 (ML-KL-rw) were found. The best clustering found <strong>with</strong> experimentalconfigurations had modularity 0.52851 and the same clustering was found by thespinglass algorithm in the previous section.polBooks A network of books about recent US politics sold by the online booksellerAmazon.com. Edges between books represent frequent co-purchasing of books bythe same buyers (Fig. 4.12a). The network was compiled by Valdis Krebs [47].With Fractional Linear Programming a clustering of modularity 0.5272 and anupper bound 0.528 was found [2]. This result was confirmed by the exact IntegerLinear Programming method <strong>with</strong> modularity 0.527 [13]. The multi-level algorithmML-KL-rw and very many other configurations found 5 clusters <strong>with</strong> modularity0.52723. Surprisingly the ML-KL-density variant just found 0.52560 <strong>with</strong> 6 clustersalthough greedy refinement in the same configuration found the better clustering.afootball A real-world graph <strong>with</strong> known group structure is the games schedule ofDivision I of the United States college football for the 2000 season [31]. The 115teams are grouped into conferences of 8-12 teams and some independent teams. In81


4 Evaluation(a) jazz(b) celegans metabolicFigure 4.13: Reference <strong>Graph</strong>s (polBooks and celegans metabolic)average seven intra- and four inter-conference games are played and inter-conferencegames between geographical close teams are more likely. A picture of the graph isvisible in Figure 4.12b.Newman’s greedy joining algorithm found 6 clusters <strong>with</strong> modularity 0.546 [60]whereas the spectral methods of White et al. found 11 clusters <strong>with</strong> 0.602 [80].Similarly the random walk based agglomeration of Pons found modularity 0.60 [64].With Fractional Linear Programming and rounding a clustering of 0.6046 and theupper bound 0.606 was found [2]. The random walk based multi-level clusteringML-KL-rw comes close to this bound <strong>with</strong> 10 clusters and modularity 0.60582. Thesimpler variant ML-KL-density just got modularity 0.59419 <strong>with</strong> 7 clusters.jazz A network of 198 Jazz musicians compiled by Geisler and Danon [33]. Suchgraphs are interesting for the social sciences as Jazz musicians very often worktogether in many different small groups (Fig. 4.13a). This graph also became verypopular for the comparison of clustering methods.A picture of the graph is shown on the right.The best clustering was reported for Extremal Optimization <strong>with</strong> 5 clusters ofmodularity 0.4452 [23]. Fractional Linear programming <strong>with</strong> rounding gave modularity0.445 [2] and an upper bound of 0.446. Spectral clustering methods yieldedmodularity 0.437 [21], 0.444 [20], and 0.442 [58]. The worst modularity 0.4409 [17]was produced by Danon’s greedy joining <strong>with</strong> modified merge selector. Both multilevelalgorithms found 4 clusters <strong>with</strong> modularity 0.44514 (ML-KL-rw) and 0.44487(ML-KL-density).82


4.7 Comparison to Published Results(a) circuit s838(b) emailFigure 4.14: Reference <strong>Graph</strong>s (circuit s838 and email)celegans metabolic A metabolic network for the nematode Caenorhabditis elegans[42, 23]. The graph describes chemical reactions as well as the regulatoryinteractions that guide these reactions (Fig. 4.13b).The first published clustering result of this graph was found by Extremal Optimization<strong>with</strong> 12 clusters of modularity 0.4342 [23]. Greedy joining <strong>with</strong> precoarseningby short random walks found 7 clusters <strong>with</strong> modularity 0.4164 [65].Newman’s recursive spectral bisection yielded 0.435 [58]. The recursive bisection algorithm<strong>with</strong> Quadratic Programming of Agarwal and Kempe found the to date bestpublished clustering <strong>with</strong> modularity 0.450. The two multi-level refinement algorithmsgave similar results <strong>with</strong> 0.44992 (ML-KL-rw) and 0.45090 (ML-KL-density).The best clustering <strong>with</strong> experimental configurations had modularity 0.45136.circuit s838 The electric circuit s838 st.txt of Uri Alon’s collection of complexnetworks [53]. This graph is an undirected version <strong>with</strong> 512 vertices connected by819 edges (Fig. 4.14a).The only published clustering has modularity 0.815 <strong>with</strong> 13 clusters [69]. Bothmulti-level configurations found clusterings of similar modularity. The random walkbased greedy grouping ML-KL-rw is <strong>with</strong> 0.815507 slightly worse than ML-KLdensity<strong>with</strong> the weight density selector which got 0.815513. However better clusteringsexist. For example <strong>with</strong> experimental configurations a clustering <strong>with</strong> 12clusters and modularity 0.81719 was found.email The network of e-mail interchanges between members of the UniversityRovira i Virgili in Tarragona, Spain [36, 37]. The graph contains 1133 users ofthe university e-mail system, including faculty, researchers, technicians, managers,83


4 Evaluationadministrators, and graduate students. Two users are connected by an edge if bothexchanged e-mails <strong>with</strong> each other (Fig. 4.14b). Thus the graph is undirected andunweighted.In the last years also this graph became popular for the evaluation of graph clusteringmethods. The first published result is for Extremal Optimization. It found15 clusters and modularity 0.5738 [23]. Danon’s greedy joining <strong>with</strong> modified mergeselector just yielded modularity 0.5569 [17]. The recursive spectral bisection of Newmanfound a clustering <strong>with</strong> modularity 0.572 [58]. The best published clustering<strong>with</strong> modularity 0.579 was found by recursive bisection <strong>with</strong> Quadratic Programming[2]. The multi-level algorithm ML-KL-density produced 11 clusters <strong>with</strong> modularity0.57767. A better clustering <strong>with</strong> 9 clusters and modularity 0.58137 wasobtained from the random walk variant ML-KL-rw.hep-th main A scientist collaboration network in high-energy physics. The graphcontains the collaborations of scientists posting preprints on the high-energy theoryarchive at http://www.arxiv.org between 1995 and 1999. The 5835 vertices representauthors and edges connect co-authors. The edges are weighted as describedin [61].The only published clustering is from a recursive spectral bisection method. Itproduced 114 clusters <strong>with</strong> modularity 0.707 [21]. The multi-level algorithm ML-KL-density found 57 clusters of 0.85523. A better clustering was found <strong>with</strong> therandom-walk variant ML-KL-rw. It contains 59 clusters and has modularity 0.85629.Erdos02 This is the year 2002 version of the collaboration network around Erdös [34].The vertices represent authors and the unweighted edges connect co-authors. Onlyauthors <strong>with</strong> a co-author distance from Erdös up to 2 are included.The only published clustering was found <strong>with</strong> greedy joining combined <strong>with</strong> randomwalk based pre-coarsening. It contains 20 clusters and has modularity 0.6817 [65].The multi-level algorithm ML-KL-density produced 39 clusters <strong>with</strong> modularity0.71611. A slightly better clustering <strong>with</strong> the same number of clusters and modularity0.716396 was found by the random-walk variant ML-KL-rwPGPgiantcompo A giant component of the network of users of the Pretty-Good-Privacy algorithm for secure information interchange compiled by Arenas et al. [12,35]. The component contains 10680 vertices. Edges connect users trusting eachother.The modified greedy joining of Danon found the worst clustering <strong>with</strong> modularity0.7462 [17]. Extremal Optimization found 365 clusters <strong>with</strong> modularity 0.8459 [23].The best published modularity got Newman’s spectral bisection <strong>with</strong> 0.855 [58].The best of both multi-level algorithms was ML-KL-rw. It found 96 clusters <strong>with</strong>modularity 0.88462. The greedy grouping by density ML-KL-density yielded 102clusters <strong>with</strong> 0.88392. Finally the best clustering found in own experiments so farhas 94 clusters and modularity 0.885450.cond-mat-2003 main A collaboration network of scientists posting preprints onthe condensed matter archive at http://www.arxiv.org. This version contains data84


4.7 Comparison to Published Resultsgraph |V | |E| article modularity ML-KL-density ML-KL-rwkarate 34 78 [2] 0.4197 (4) 0.41978 (4) 0.41978 (4)dolphins 62 159 [2] 0.529 0.52586 (5) 0.52772 (5)polBooks 105 441 [2] 0.5272 0.52560 (6) 0.52723 (5)afootball 115 613 [2] 0.6046 0.59419 (7) 0.60582 (10)jazz 198 2742 [23] 0.4452 (5) 0.44487 (4) 0.44514 (4)celegans metabolic 453 2040 [2] 0.450 0.45090 (9) 0.44992 (8)circuit s838 512 819 [69] 0.815 (13) 0.81551 (15) 0.81550 (15)email 1133 5451 [2] 0.579 0.57767 (11) 0.58137 (9)hep-th main 5835 13815 [21] 0.707 (114) 0.85523 (57) 0.85629 (59)Erdos02 6927 11850 [65] 0.6817 (20) 0.71611 (39) 0.71639 (39)PGPgiantcompo 10680 24316 [58] 0.855 0.88392 (102) 0.88462 (96)cond-mat-2003 main 27519 116181 [65] 0.7251 (44) 0.81377 (77) 0.81586 (78)Table 4.11: Comparison to Published Resultsfrom 1995 to 2003. Vertices represent authors and edges connect co-authors. Theedges are weighted as described in [61]. Here only the largest component containing27519 vertices is used.Extremal Optimization discovered 647 clusters <strong>with</strong> modularity 0.6790 [23]. Latermuch better clusterings were found. Newman’s recursive spectral bisection yielded0.723 [58] and greedy joining combined <strong>with</strong> pre-coarsening found 44 clusters <strong>with</strong>modularity 0.7251 [65]. Of the own algorithms the multi-level refinement ML-KLdensityproduced 77 clusters <strong>with</strong> 0.81377. A better clustering <strong>with</strong> 78 clustersand modularity 0.81586 was found by ML-KL-rw. However even better clusteringswere discovered in experiments. For example one <strong>with</strong> 82 clusters and modularity0.81718.4.7.2 SummaryThe Table 4.11 summarizes the clustering results. For each graph the best modularityfound in the literature and the modularity computed <strong>with</strong> the two multi-levelalgorithms ML-KL-density and ML-KL-rw is given. The graphs are sorted by thenumber of vertices and the best modularity on each graph is marked in bold.On smaller graphs optimal or at least very good clusterings were obtained <strong>with</strong>the Linear Programming methods from [2] and [13]. In these cases the multi-levelalgorithms found comparable good clusterings. Only few published results for biggraphs exist. There the multi-level algorithm ML-KL-rw performed much betterthan other methods. Still the clusterings found by experimental configurations showthat on some graphs better results are possible. But these were found more byfortunate circumstances than by a reliable, systematic search.85


5 Results and Future WorkThe objective of this work was developing and evaluating a multi-level refinementalgorithm for the modularity graph clustering problem. In this context the mainfocus was on the derivation and evaluation of better merge selection criteria for thegraph coarsening, implementing more efficient data structures for coarsening andrefinement, and researching and evaluating applicable refinement algorithms.The following sections discuss the results of this work. First immediate results,like the default configuration derived from the evaluation, are presented. The secondsection shortly presents by-products developed during the course of this work andthe last section comments on possible directions for future work.5.1 Results of this WorkMerge selection criteria direct the selection of cluster pairs in the coarsening phaseof the multi-level algorithm. The selected clusters are merged until the clusteringis not further improved. This merge process produces the initial clustering and thevertex groups for the later refinement. Therefore merge selectors play a crucial rolein the overall algorithm.In this work a range of semi-global and global merge selectors was presented along<strong>with</strong> simple local selectors. The first group of selectors uses the visit probabilitiesof short random walks and the second groups applies spectral method to analyzethe global structure of the graph. The implemented variants were experimentallycompared using mean modularities over a large collection of graphs. Of the sevendeveloped selectors four performed nearly equally well in terms of mean modularity.These are, in the order of best to worser results, the random walk reachability, theweight density, the modularity increase and the spectral angle. Comparing runtimeand simplicity the weight density proved to produce good results in short time whilebeing very simple to calculate.Two coarsening algorithms were implemented. The greedy grouping method workslike the greedy joining of Newman [60] by directly merging the selected pair of clustersin each step. Efficient implementations of this method require a dynamic graphdata structure to quickly retrieve and update selection qualities. The implementedstructure is similar to the algorithm of Wakita [76]. The second method is greedymatching and resembles the graph coarsening methods traditionally used <strong>with</strong> multilevelgraph algorithms. It constructs a matching of some vertices using only the bestranked pairs. The experimental comparison showed that greedy matching is lesssensitive against bad merge selectors when combined <strong>with</strong> refinement than greedygrouping. However the grouping method proved to be more reliable and producedin average slightly better results.87


5 Results and Future WorkFor developing effective and efficient refinement algorithms the design space ofsimple methods was analyzed. The main focus was on algorithms moving singlevertices between clusters. In the domain of greedy refinement methods this analysisallowed to develop a more efficient greedy algorithm. It is named sorted greedyrefinement and selects vertices to move using a simple move selector instead of theexpensive to calculate modularity increase. Three move selectors based on the vertexfitness concept were proposed. In addition a hill-climbing refinement algorithm inthe style of Kernighan and Lin [45] was implemented.All refinement algorithms were able to improve the clustering results considerablycompared to the raw coarsening phase. The sorted greedy refinement producedequally good results compared to the traditional greedy refinement. At the sametime it was much faster and the runtime scaled better <strong>with</strong> the graph size. Themove selectors had an insignificant influence on the results. The Kernighan-Linrefinement was improved in its runtime and it also produced slightly better resultsthan greedy refinement. However its bad scalability still makes it unusable for largegraphs. For example on graphs <strong>with</strong> about 25,000 vertices it was around 60 timesslower than the sorted greedy refinement. 1The presented algorithm reflects the state of the art of multi-level refinementmethods. In comparison to other clustering algorithms the multi-level refinementwas competitive in both clustering quality and runtime. However studying clusteringsproduced <strong>with</strong> other configurations shows that still improvements of the reliabilityare possible as sometimes small variations of the parameters lead to betterclusterings.Summarizing the results, following default configuration was identified. For thegraph coarsening phase greedy merging by the weight density merge selector shouldbe used <strong>with</strong> reduction factor 10%. For the refinement phase the sorted greedyrefinement <strong>with</strong> density-fitness should be used. When the computation time is nota concern alternatively Kernighan-Lin refinement could be used to find slightlybetter clusterings.5.2 By-ProductsAs by-product the study of the mathematical properties of the modularity measurepresented in Chapter 2 shed light on the relationship between modularity and otherclustering quality measures. In this context the concept of connection density relativeto a volume model was introduced and a mathematical connection betweendensity and modularity was established. The volume model describes expected edgeweights derived from a vertex similarity measure. This enables the quick and easyadaption of the modularity measure together <strong>with</strong> the clustering algorithms to specificapplication domains.The implementation of the presented clustering algorithm is relatively flexibleand configurable. The algorithm’s components like merge selectors and refinementalgorithms can be replaced <strong>with</strong>out much effort. At the same time the use of indexspaces and maps supports the generic implementation of graph algorithms. This1 Measured on the graph DIC28 main, cf. Table B.588


5.3 Directions for Future Workstrategy turned out to be useable and it is possible to transfer the key concepts intoother programming languages.Over the course of this project a small website was build up collecting informationabout clustering algorithms, benchmark graphs and related topics. 2 The sitecontains a rich set of data about the graph collection, including various eigenvaluespectra and other graph statistics. <strong>Clustering</strong> results are summarized for each graphand clustering methods can be compared globally. Such a website could be used infuture as a reference for the evaluation of algorithms by providing a common collectionof graphs and a tool chain to compare clusterings to older results. For examplein and around the load-balancing community the collection of Walshaw et al. [74]containing graphs and partitioning results helped to track the current state of artand provided inspiration to new combined optimization strategies.5.3 Directions for Future WorkIn this section a few ideas for potential improvements of the clustering algorithmare proposed. The first three subsections are mostly related to the graph coarseningphase and the last two discuss high-level meta-strategies.5.3.1 Pre-CoarseningPerformance and quality might be improved by applying a pre-coarsening to theinitial graph: The vertex degree of quite some real-world networks follows a powerlaw distribution. In that case most vertices have only a single edge attaching themto a more central vertex. These vertices are called hairs and certainly will begrouped into the same cluster as their single neighbor. Particularly in unweightedgraphs maximum modularity clusterings cannot contain clusters of a single one-edgevertex [13]. Thus a special first coarsening pass should merge them in advance. Thiswill improve the information available to the merge selector. Additionally the greedymatching may become more effective and reliably because it is no longer obstructedby single-edge vertices. Similar preprocessing methods were already proposed byArenas et al. [4] for size reduction of graphs.An implementation would collect the unweighted vertex degrees first. All hairvertices have unit degree and thus are easy to find. They are merged to its singleneighbor vertex if this increases the modularity. In addition some two-edge verticesmight be reduced: These bridge vertices lie between two neighbors and could bemerged <strong>with</strong> the better of both neighbors.Furthermore this pre-coarsening is a strong counter-example to merge selectorsbased on vertex-size or consolidation ratios. These try to prefer vertex pairs ofsimilar size in order to grow clusters more evenly. But merging hairs <strong>with</strong> theirmuch bigger neighbor is an expected and necessary behavior in spite the huge sizedifference.2 presently at http://goya.informatik.tu-cottbus.de/~clustering/89


5 Results and Future Work5.3.2 Study of Merge SelectorsDuring the course of this work a measure for the prediction quality of merge selectorswas developed. It counts the percentage of the vertex pairs having a high selectionquality that would actually select a pair of vertices in the same final cluster. Forthis purpose a reference clustering is necessary.However it turned out that this prediction quality alone does not much about thegeneral performance of a merge selector. In practice it is also important to studywhen and where coarsening errors, i.e. putting together vertices of different clusters,occur. For example it might be that the spectral angle merge selector performs wellon big, fine grained graphs but just fails on the coarser coarsening levels. On the finelevels all information is non-locally stretched while the coarsening joins informationabout the structure and may produce good local information later on. In that casethe weight density selector should be used on the coarser levels instead.5.3.3 Linear ProgrammingIt is possible to formulate the clustering problem as linear or quadratic program [13,2]. Instead of classic rounding techniques the computed distances could be used asmerge selection quality. This would enable multi-level refinement on the roundingresults.The fractional linear program is solvable in polynomial time but requires |V | 2space for the distance matrix and cubic space for the transitivity constraints. Thusit becomes impracticable already for medium-sized graphs unless the constraintsare replaced by a more compact implicit representation. However for the presentedmulti-level refinement approximate distances between adjacent vertices would suffice.Maybe such approximations could be faster computed using other representationsof the optimization aim. For example in [6] an embedding into higher-dimensionalunit spheres under square Euclidean norm was used for similar quality measures.5.3.4 Multi-Pass <strong>Clustering</strong> and RandomizationA meta-strategy similar to evolutionary search [72] is multi-pass clustering. Becausethe refinement corrects coarsening errors the computed clustering contains valuableinformation. In a second pass this can be fed back into the coarsening by ignoringall vertex pairs crossing previous cluster boundaries. This effectively producesa corrected coarsening hierarchy and allows further improvements by refinementheuristics. Coarsening and refinement are repeated until the clustering does notfurther improve. The multi-pass search may be widened by applying some kind ofrandomization during the coarsening.5.3.5 High-Level Refinement SearchIn the domain of cluster refinement several improvements might be possible. Forexample restarting the Kernighan-Lin search on intermediate clusterings allows toimprove the search depth. In this context a slight randomization would help breaking90


5.3 Directions for Future Workties and avoiding to cycle repeatedly through the same movement sequence. But atoo strong randomization prevents the detection of local optima again.Still the Kernighan-Lin approach is very slow because in modularity clusteringthe quality improvements of vertex moves are difficult to compute. On the otherhand in this work a very fast greedy refinement method was developed. Similarly afast, randomized method to leave local optima could be developed. Combining bothwould allow to walk between local optima like in the basin hopping method [52].However some open questions remain how this can be effectively combined <strong>with</strong> themulti-level strategy. For example currently a lot of information about the graph islost between each coarsening level because just the last best clustering is projectedto the finer level.91


Bibliography[1] A. Abou-Rjeili and G. Karypis. <strong>Multilevel</strong> algorithms for partitioning powerlawgraphs. 20th International Parallel and Distributed Processing Symposium,page 10, 2006.[2] Gaurav Agarwal and David Kempe. Modularity-maximizing network communitiesvia mathematical programming. 0710.2533, October 2007.[3] Charles J. Alpert, Andrew B. Kahng, and So-Zen Yao. Spectral partitioning<strong>with</strong> multiple eigenvectors. Discrete Applied Mathematics, 90:3–26, 1999.[4] A. Arenas, J. Duch, A. Fernandez, and S. Gomez. Size reduction of complexnetworks preserving modularity. New Journal of Physics, 9:176, June 2007.[5] A. Arenas, A. Fernández, and S. Gómez. Analysis of the structure of complexnetworks at different resolution levels. New Journal of Physics, 10:053039, 2008.[6] Sanjeev Arora, Satish Rao, and Umesh Vazirani. Expander flows, geometricembeddings and graph partitioning. Proceedings of the thirty-sixth annual ACMsymposium on Theory of computing, pages 222–231, 2004.[7] James P. Bagrow and Erik M. Bollt. Local method for detecting communities.Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 72:046108–10, October 2005.[8] Vladimir Batagelj and Andrej Mrvar. Pajek datasets, 2006.[9] M. Besch and H.W. Pohl. Dependence-Free <strong>Clustering</strong> of Shift-Invariant DataStructures. Proceedings of the Third International Euro-Par Conference onParallel Processing, pages 338–341, 1997.[10] Stefan Boettcher and Allon G. Percus. Extremal optimization for graph partitioning.Physical Review E, 64:026114, July 2001.[11] Stefan Boettcher and Allon G. Percus. Optimization <strong>with</strong> extremal dynamics.Physical Review Letters, 86:5211, June 2001.[12] M. Boguñá, R. Pastor-Satorras, A. Díaz-Guilera, and A. Arenas. Modelsof social networks based on social distance attachment. Physical Review E,70(5):56122, 2004.[13] Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke, Martin Hoefer,Zoran Nikoloski, and Dorothea Wagner. On modularity clustering. IEEE Trans.on Knowl. and Data Eng., 20:172–188, 2008.93


BIBLIOGRAPHY[14] B.L. Chamberlain. <strong>Graph</strong> partitioning algorithms for distributing workloads ofparallel computations. University of Washington Technical Report UW-CSE-98-10, 3, 1998.[15] Aaron Clauset, M. E. J. Newman, and Cristopher Moore. Finding communitystructure in very large networks. Physical Review E, 70:066111, December 2004.[16] Gábor Csárdi and Tamás Nepusz. The igraph software package for complexnetwork research. InterJournal Complex Systems, 1695, 2006.[17] L. Danon, A. Díaz-Guilera, and A. Arenas. Effect of size heterogeneity oncommunity identification in complex networks. Arxiv preprint physics/0601144,2006.[18] Leon Danon, Albert Díaz-Guilera, Jordi Duch, and Alex Arenas. Comparingcommunity structure identification. Journal of Statistical Mechanics: Theoryand Experiment, 2005:P09008, 2005.[19] Hristo Djidjev. A scalable multilevel algorithm for graph clustering and communitystructure detection. Workshop on Algorithms and Models for the Web<strong>Graph</strong>, Lecture Notes in Computer Science, pages LA–UR–06–6261, 2006.[20] L. Donetti and M. A Munoz. Improved spectral algorithm for the detection ofnetwork communities. physics/0504059, April 2005.[21] Luca Donetti and Miguel A. Muñoz. Detecting network communities: a newsystematic and efficient algorithm. Journal of Statistical Mechanics: Theoryand Experiment, 2004:P10012, 2004.[22] Jonathan Doye and David Wales. On the thermodynamics of global optimization.cond-mat/9709019, September 1997. Phys. Rev. Lett. 80, 1357 (1998).[23] Jordi Duch and Alex Arenas. Community detection in complex networks usingextremal optimization. Physical Review E (Statistical, Nonlinear, and SoftMatter Physics), 72:027104–4, 2005.[24] C.M. Fiduccia and R.M. Mattheyses. A linear-time heuristic for improvingnetwork partitions. In Design Automation, 1982. 19th Conference on, pages175–181, 1982.[25] Santo Fortunato and Marc Barthelemy. Resolution limit in community detection.Proceedings of the National Academy of Sciences, 104(1):36, 2007.[26] Santo Fortunato, Vito Latora, and Massimo Marchiori. Method to find communitystructures based on information centrality. Physical Review E (Statistical,Nonlinear, and Soft Matter Physics), 70:056104–13, November 2004.[27] H. N. Gabow, Z. Galil, and T.H. Spencer. Efficient Implementation of <strong>Graph</strong>Algorithms Using Contraction. Journal of the Association for Computing Machinery,36(3):540–572, 1989.94


BIBLIOGRAPHY[28] M. Gaertler, R. Görke, and D. Wagner. Significance-driven graph clustering.Proceedings of the 3rd International Conference on Algorithmic Aspects in Informationand Management (AAIM’07). Lecture Notes in Computer Science,June 2007.[29] J. Gerlach and P. Gottschling. A generic c++ framework for parallel meshbasedscientific applications. Proceedings of the 6th International Workshop onHigh-Level Parallel Programming Models and Supportive Environments, pages45–54, 2001.[30] Jorge Gil-Mendieta and Samuel Schmidt. The political network in mexico.Social Networks, 18:355–381, October 1996.[31] M. Girvan and J. Newman. Community structure in social and biological networks.Proceedings of the National Academy of Sciences, 99(12):7821, 2002.[32] M. Girvan and M. E. J. Newman. Community structure in social and biologicalnetworks. Proceedings of the National Academy of Sciences, 99:7821–7826, June2002.[33] P. Gleiser and L. Danon. Community structure in jazz. Advances in ComplexSystems, 6(4):565–573, 2003.[34] Jerry Grossman. The erdös number project, 2002.[35] X. Guardiola, R. Guimera, A. Arenas, A. Diaz-Guilera, D. Streib, and L. A. N.Amaral. Macro-and micro-structure of trust networks. Arxiv preprint condmat/0206240,2002.[36] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. Self-similarcommunity structure in a network of human interactions. Physical Review E,68:065103, December 2003.[37] R. Guimera, L. Danon, A. Diaz-Guilera, F. Giralt, and A. Arenas. The realcommunication network behind the formal chart: Community structure in organizations.Journal of Economic Behavior & Organization, 61:653–667, December2006.[38] Roger Guimera and Luis A. Nunes Amaral. Functional cartography of complexmetabolic networks. q-bio/0502035, February 2005. Nature 433, 895-900 (2005).[39] Roger Guimerà, Marta Sales-Pardo, and Luís A. Nunes Amaral. Modularityfrom fluctuations in random graphs and complex networks. Physical Review E,70:025101, 2004.[40] David Harel and Yehuda Koren. On clustering using random walks. LectureNotes in Computer Science, 2245:18–41, 2001.[41] Bruce Hendrickson and Robert W. Leland. A multilevel algorithm for partitioninggraphs. Proc. Supercomputing, 95:285, 1995.95


BIBLIOGRAPHY[42] H. Jeong, B. Tombor, R. Albert, Z. N Oltvai, and A. L Barabasi. The largescaleorganization of metabolic networks. cond-mat/0010278, October 2000.Nature, v407 651-654 (2000).[43] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioningirregular graphs. SIAM Journal on Scientific Computing, 20(1):359,1998.[44] George Karypis and Vipin Kumar. Parallel multilevel k-way partitioning schemefor irregular graphs. SIAM Review, 41(2):278–300, 1999.[45] B.W. Kernighan and S. Lin. An efficient heuristic procedure for partitioninggraphs. Bell System Technical Journal, 49(2):291–307, 1970.[46] J. Kleinberg. An impossibility theorem for clustering. Advances in NeuralInformation Processing Systems 15: Proceedings of the 2002 Conference, 2003.[47] Valdis Krebs. A network of books about recent us politics sold by the onlinebookseller amazon.com. http://www.orgnet.com/.[48] R. Lehoucq, K. Maschhoff, D. Sorenson, and C. Yang.Arpack: An efficient portable large scale eigenvalue packagehttp://www.caam.rice.edu/software/arpack/, 1996.[49] László Lovász. Random walks on graphs: A survey. Combinatorics, Paul Erdosis Eighty, 2:1–46, 1993.[50] David Lusseau. Evidence for social role in a dolphin social network. q-bio/0607048, July 2006.[51] David Lusseau, Karsten Schneider, Oliver J. Boisseau, Patti Haase, ElisabethSlooten, and Steve M. Dawson. The bottlenose dolphin community of doubtfulsound features a large proportion of long-lasting associations. BehavioralEcology and Sociobiology, 54:396–405, 2003.[52] Claire P. Massen and Jonathan P. K. Doye. Identifying communities <strong>with</strong>inenergy landscapes. Physical Review E (Statistical, Nonlinear, and Soft MatterPhysics), 71:046101–12, April 2005.[53] Ron Milo, Shalev Itzkovitz, Nadav Kashtan, Reuven Levitt, Shai Shen-Orr,Inbal Ayzenshtat, Michal Sheffer, and Uri Alon. Superfamilies of evolved anddesigned networks. Science, 303:1538–1542, March 2004.[54] Burkhard Monien, Robert Preis, and Ralf Diekmann. <strong>Quality</strong> matchingand local improvement for multilevel graph-partitioning. Parallel Computing,26:1609–1634, November 2000.[55] M. Müller-Hannemann and A. Schwartz. Implementing weighted b-matchingalgorithms: towards a flexible software design. J. Exp. Algorithmics, 4:7, 1999.96


BIBLIOGRAPHY[56] Newman. Detecting community structure in networks. The European PhysicalJournal B - Condensed Matter and Complex Systems, 38:321–330, March 2004.[57] J. Newman. Finding community structure in networks using the eigenvectorsof matrices. Physical Review E, 74(3):036104, 2006.[58] J. Newman. Modularity and community structure in networks. Proceedings ofthe National Academy of Sciences, 103(23):8577, 2006.[59] J. Newman and M. Girvan. Finding and evaluating community structure innetworks. Physical Review E, 69(2):026113, 2004.[60] M. E. J. Newman. Fast algorithm for detecting community structure in networks.Physical Review E, 69:066133, June 2004.[61] MEJ Newman. From the Cover: The structure of scientific collaboration networks.Proceedings of the National Academy of Sciences, 98(2):404, 2001.[62] Andreas Noack. Unified quality measures for clusterings, layouts, and orderingsof graphs, and their application as software design criteria. PhD thesis,Brandenburgische Technische Universität Cottbus, 2007.[63] P. Pons. Post-processing hierarchical community structures: <strong>Quality</strong> improvementsand multi-scale view. ArXiv Computer Science e-prints, cs.DS/0608050,2006.[64] Pascal Pons and Matthieu Latapy. Computing communities in large networksusing random walks (long version). physics/0512106, December 2005.[65] Josep M. Pujol, Javier Bejar, and Jordi Delgado. <strong>Clustering</strong> algorithm for determiningcommunity structure in large networks. Physical Review E (Statistical,Nonlinear, and Soft Matter Physics), 74:016107–9, July 2006.[66] Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, andDomenico Parisi. Defining and identifying communities in networks. Proceedingsof the National Academy of Sciences, 101:2658–2663, March 2004.[67] Jorg Reichardt and Stefan Bornholdt. Statistical mechanics of community detection.Physical Review E (Statistical, Nonlinear, and Soft Matter Physics),74(1):016110–14, 2006.[68] J?rg Reichardt and Stefan Bornholdt. Detecting fuzzy community structuresin complex networks <strong>with</strong> a potts model. Physical Review Letters, 93:218701,November 2004.[69] Jianhua Ruan and Weixiong Zhang. Identifying network communities <strong>with</strong> ahigh resolution. Physical Review E (Statistical, Nonlinear, and Soft MatterPhysics), 77:016104–12, 2008.[70] J. Siek, L.Q. Lee, and A. Lumsdaine. The Boost <strong>Graph</strong> Library: User Guideand Reference Manual. Addison-Wesley, 2002.97


BIBLIOGRAPHY[71] A. J. Soper and C. Walshaw. A generational scheme for partitioning graphs.In L. Spector et al., editor, Proc. Genetic & Evolutionary Comput. Conf.(GECCO-2001), pages 607–614. Morgan Kaufmann, San Francisco, 2001.[72] A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search andmultilevel approach to graph partitioning. In D. Whitley et al., editor, Proc.Genetic & Evolutionary Comput. Conf. (GECCO-2000), pages 674–681. MorganKaufmann, San Francisco, 2000.[73] A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search andmultilevel optimisation approach to graph partitioning. Tech. Rep. 00/IM/58,Comp. Math. Sci., Univ. Greenwich, London SE10 9LS, UK, April 2000.[74] A. J. Soper, C. Walshaw, and M. Cross. A combined evolutionary search andmultilevel optimisation approach to graph partitioning. J. Global Optimization,29(2):225–241, 2004.[75] Stijn van Dongen. <strong>Graph</strong> <strong>Clustering</strong> by Flow Simulation. PhD thesis, Universityof Utrecht, May 2000.[76] Ken Wakita and Toshiyuki Tsurumi. Finding community structure in megascalesocial networks. cs/0702048, February 2007.[77] David Wales and Jonathan Doye. Global optimization by basin-hopping and thelowest energy structures of lennard-jones clusters containing up to 110 atoms.cond-mat/9803344, March 1998. J. Phys. Chem. A 101, 5111-5116 (1997).[78] C. Walshaw and M. Cross. Mesh partitioning: a multilevel balancing andrefinement algorithm. SIAM J. Sci. Comput., 22(1):63–80, 2000.[79] J.H. Ward Jr. Hierarchical grouping to optimize an objective function. Journalof the American Statistical Association, 58(301):236–244, 1963.[80] Scott White and Padhraic Smyth. A spectral clustering approach to findingcommunities in graph. SIAM Data Mining Conference, 2005.[81] Wu and Huberman. Finding communities in linear time: a physics approach.The European Physical Journal B - Condensed Matter and Complex Systems,38:331–338, March 2004.[82] W. W. Zachary. An information flow model for conflict and fission in smallgroups. Journal of Anthropological Research, 33:452–473, 1977.[83] Zhou and Lipowsky. Network brownian motion: A new method to measurevertex-vertex proximity and to identify communities and subcommunities.Computational Science - ICCS 2004, 2004.98


A The Benchmark <strong>Graph</strong> CollectionThe Table A.1 contains a list of the used benchmarks graphs. The graphs are sortedby their number of vertices. For each graph the number of vertices and edges, theweighted mean vertex degree (mean wdeg) and the global connection density in thedegree volume model (wgt. density) is printed. <strong>Graph</strong>s <strong>with</strong> the suffix main arejust the largest connectivity component of the original graph. The second columnmarks the graphs used in the reduced set (R), in the scalability analysis (S), andin the comparison to reference algorithms (C ). The last column gives a short sourcereference. Links to the corresponding websites can be found in Table A.2.Table A.1: The Benchmark <strong>Graph</strong> Collectionsubset vertices edges mean wdeg wgt. density sourceChain8 S C 8 7 1.75000 0.071429 ANoack-generatedStar9 C 9 8 1.77778 0.06250 ANoack-generatedK55 S C 10 25 5.00000 0.02000 RRottaStudent government 11 32 7.45455 0.012195 pajek-socialTree15 C 15 14 1.86667 0.035714 ANoack-generatedModMath main C 30 61 4.06667 0.0081967 pajek-socialSouthernWomen R C 32 89 5.56250 0.005618 pajek-socialHi-tech main S 33 91 8.90909 0.0034014 pajek-socialkarate C 34 78 4.58824 0.0064103 Newmanfootball 35 118 16.85714 0.0016949 pajekmexican power C 35 117 6.68571 0.0042735 pajekGrid66 C 36 60 3.33333 0.0083333 ANoack-generatedmorse 36 666 1324.38889 2.2390e-05 anoackSawmill C 36 62 3.44444 0.0080645 pajek-socialFood 45 990 507.82222 4.376e-05 ANoackdolphins R C 62 159 5.12903 0.0031447 pajek-socialWorldImport1999 66 2145 132361.51897 1.1447e-07 ANoackprison alon R 67 142 5.43284 0.0027473 UriAlonlesmis S 77 254 21.29870 0.00060976 Newmanworld trade 80 875 1644039.85000 7.6032e-09 pajek-econpolBooks R C 105 441 8.40000 0.0011338 pajek-mixedadjnoun R C 112 425 7.58929 0.0011765 Newman-lingafootball 115 613 10.71304 0.00081169 Newmanbaywet 128 2075 54.05277 0.00014453 pajek-biojazz R 198 2742 55.39394 9.1174e-05 ArenasA99m main S 233 325 2.79828 0.0015337 pajek-socialSmallW main S 233 994 17.06438 0.00025151 pajek-citationA01 main S 249 635 5.15663 0.00077882 pajek-citationsandi main R C 253 307 2.42688 0.0016287 pajek-citationcelegansneural R S 297 2148 59.37374 5.6709e-05 Newman-bioUSAir97 R C 332 2126 12.80723 0.00023518 pajek8Clusters 400 24063 238.63000 1.0476e-05 ANoack-generatedWorldCities main 413 7518 81.80145 2.96e-05 pajek-mixedcelegans metabolic R 453 2040 20.24283 0.00010931 Arenas-biocircuit s838 R C 512 819 3.19922 0.0006105 UriAlonRoget main S 994 3641 10.17807 9.8853e-05 pajek-lingCSphd main S C 1025 1043 2.03512 0.00047939 pajek-socialemail 1133 5451 19.24448 4.5863e-05 ArenaspolBlogs main R 1222 16717 31.23977 2.6197e-05 pajek-mixedNDyeast main S 1458 1993 2.70302 0.00025664 NotreDame-bioYeast main R 2224 7049 6.14119 7.5576e-05 pajek-bio99


A The Benchmark <strong>Graph</strong> Collectionsubset vertices edges mean wdeg wgt. density sourceSciMet main S 2678 10369 7.75541 4.8151e-05 pajek-citationODLIS main S 2898 16381 12.70842 2.7156e-05 pajek-lingDutchElite main 3621 4310 2.38111 0.00011598 pajek-socialgeom main 3621 9461 10.91964 2.5291e-05 pajek-citationEpa main R S 4253 8897 4.21020 5.5847e-05 pajek-webeva main 4475 4654 2.08402 0.00010725 pajek-econUSpowerGrid R 4941 6594 5.33819 3.7913e-05 pajek-mixedhep-th main S 5835 13815 4.68711 3.6564e-05 Newman-coauthErdos02 S C 6927 11850 3.42139 4.2194e-05 pajek-socialLederberg main R S 8212 41436 10.10801 1.2048e-05 pajek-citationPairsP R S 10617 63786 115.39098 8.1627e-07 pajek-lingPGPgiantcompo S 10680 24316 4.55805 2.0542e-05 Arenas-socialastro-ph main R S 14845 119652 4.49610 1.4982e-05 Newman-coautheatRS S 23219 305501 67.90641 6.3464e-07 pajek-lingDIC28 main R S C 24831 71014 5.71979 7.0409e-06 pajek-linghep-th-new main S 27400 352059 25.73161 1.4184e-06 pajek-citationcond-mat-2003 main S 27519 116181 4.41826 8.2246e-06 Newman-coauthTable A.2: References to the <strong>Graph</strong> SourcesANoack-generatedpajek-citationpajek-socialNewman-lingNewmanNewman-coauthpajek-bioArenas-bioNewman-bioUriAlonpajek-lingArenaspajek-webpajek-econANoackpajekRRottaanoackNotreDame-bioArenas-socialpajek-mixedweb addresshttp://www-sst.informatik.tu-cottbus.de/~an/GD/http://vlado.fmf.uni-lj.si/pub/networks/data/http://vlado.fmf.uni-lj.si/pub/networks/data/http://www-personal.umich.edu/~mejn/netdata/http://www-personal.umich.edu/~mejn/netdata/http://www-personal.umich.edu/~mejn/netdata/http://vlado.fmf.uni-lj.si/pub/networks/data/bio/foodweb/foodweb.htmhttp://deim.urv.cat/~aarenas/data/welcome.htmhttp://www-personal.umich.edu/~mejn/netdata/http://www.weizmann.ac.il/mcb/UriAlon/http://vlado.fmf.uni-lj.si/pub/networks/data/http://deim.urv.cat/~aarenas/data/welcome.htmhttp://vlado.fmf.uni-lj.si/pub/networks/data/mix/mixed.htmhttp://vlado.fmf.uni-lj.si/pub/networks/data/econ/Eva/Eva.htmhttp://www-sst.informatik.tu-cottbus.de/~an/GD/http://vlado.fmf.uni-lj.si/pub/networks/data/sport/football.htmhttp://goya.informatik.tu-cottbus.de/~clustering/http://www.informatik.tu-cottbus.de/~an/GD/http://vlado.fmf.uni-lj.si/pub/networks/data/ND/NDnets.htmhttp://deim.urv.cat/~aarenas/data/welcome.htmhttp://vlado.fmf.uni-lj.si/pub/networks/data/mix/mixed.htm100


B <strong>Clustering</strong> ResultsTable B.1: Random Walk Distance by <strong>Graph</strong>1 2 3 4 5 WD-sgrdSouthernWomen 0.33449 0.31530 0.31530 0.33184 0.33449 0.31972dolphins 0.46220 0.52680 0.46642 0.46642 0.46642 0.52587prison alon 0.56098 0.61317 0.61739 0.61667 0.60873 0.62077polBooks 0.52694 0.52694 0.52694 0.52694 0.52694 0.52724adjnoun 0.24411 0.27143 0.30637 0.28173 0.26923 0.31078jazz 0.44403 0.44403 0.44424 0.44403 0.44447 0.44468sandi main 0.70997 0.80704 0.80215 0.81677 0.80646 0.82773celegansneural 0.48148 0.47932 0.47936 0.48499 0.50224 0.50295USAir97 0.35050 0.35093 0.35246 0.35804 0.36082 0.36824celegans metabolic 0.42627 0.43558 0.42427 0.42970 0.43405 0.45070circuit s838 0.67174 0.81058 0.76911 0.77549 0.80461 0.81551polBlogs main 0.43162 0.43219 0.43219 0.43219 0.43219 0.43237Yeast main 0.54016 0.57924 0.58575 0.59310 0.58996 0.62179Epa main 0.58271 0.63287 0.63498 0.63672 0.64941 0.66843USpowerGrid 0.84679 0.90077 0.86988 0.90234 0.87756 0.93795Lederberg main 0.61540 0.64291 0.66511 0.66392 0.68129 0.70339PairsP 0.50587 0.58476 0.58308 0.58951 0.58975 0.65067astro-ph main 0.65154 0.71070 0.70995 0.71056 0.71071 0.76261DIC28 main 0.72153 0.77391 0.77093 0.77864 0.78366 0.84747Table B.2: Random Walk Reachability, Length 2RWreach-sgrd 1 RWreach-sgrd 3 RWreach-sgrd 5 WD-sgrdSouthernWomen 0.31972 0.31972 0.33449 0.31972dolphins 0.52852 0.52852 0.52852 0.52587prison alon 0.62077 0.61464 0.61288 0.62077polBooks 0.52724 0.52724 0.52724 0.52724adjnoun 0.30194 0.30359 0.30750 0.31078jazz 0.44487 0.44452 0.44487 0.44468sandi main 0.82760 0.82779 0.82779 0.82773celegansneural 0.50295 0.50382 0.50353 0.50295USAir97 0.36824 0.36603 0.36824 0.36824celegans metabolic 0.44966 0.44989 0.44182 0.45070circuit s838 0.80524 0.81521 0.81551 0.81551polBlogs main 0.43237 0.43228 0.43228 0.43237Yeast main 0.62207 0.62328 0.62593 0.62179Epa main 0.66783 0.66795 0.66568 0.66843USpowerGrid 0.93717 0.93882 0.93919 0.93795Lederberg main 0.70267 0.70323 0.70402 0.70339PairsP 0.65476 0.65361 0.65386 0.65067astro-ph main 0.76248 0.76329 0.76409 0.76261DIC28 main 0.85097 0.85162 0.85158 0.84747Table B.3: Random Walk Reachability, Length 3RWreach-sgrd 1 RWreach-sgrd 3 RWreach-sgrd 5 WD-sgrdSouthernWomen 0.31972 0.32635 0.33184 0.31972dolphins 0.52852 0.52680 0.52680 0.52587101


B <strong>Clustering</strong> ResultsRWreach-sgrd 1 RWreach-sgrd 3 RWreach-sgrd 5 WD-sgrdprison alon 0.61288 0.61464 0.61898 0.62077polBooks 0.52724 0.52724 0.52694 0.52724adjnoun 0.30122 0.29690 0.27580 0.31078jazz 0.44468 0.44487 0.44424 0.44468sandi main 0.82779 0.82779 0.82779 0.82773celegansneural 0.50295 0.49997 0.49783 0.50295USAir97 0.36824 0.36603 0.36094 0.36824celegans metabolic 0.45105 0.44075 0.43418 0.45070circuit s838 0.81377 0.81520 0.81447 0.81551polBlogs main 0.43242 0.43233 0.43225 0.43237Yeast main 0.62524 0.62475 0.62710 0.62179Epa main 0.67258 0.66818 0.66667 0.66843USpowerGrid 0.93768 0.93921 0.93937 0.93795Lederberg main 0.70331 0.70382 0.70362 0.70339PairsP 0.65233 0.65434 0.65346 0.65067astro-ph main 0.76328 0.76365 0.76440 0.76261DIC28 main 0.85106 0.85285 0.85152 0.84747Table B.4: <strong>Clustering</strong> Results from the Refinement Phasenone SGR-density CGR KLChain8 0.3571429 (2) 0.3571429 (2) 0.3571429 (2) 0.3775510 (3)Star9 -0.0078125 (2) 0.0000000 (1) 0.0000000 (1) 0.0000000 (1)K55 0.3000000 (2) 0.3000000 (2) 0.3000000 (2) 0.3000000 (2)Student government 0.1299822 (3) 0.1817371 (2) 0.1817371 (2) 0.1817371 (2)Tree15 0.5051020 (5) 0.5051020 (5) 0.5051020 (5) 0.5127551 (4)ModMath main 0.4201827 (4) 0.4290513 (4) 0.4290513 (4) 0.4488041 (4)SouthernWomen 0.3153011 (2) 0.3197197 (3) 0.3197197 (3) 0.3360056 (3)Hi-tech main 0.2932112 (4) 0.3164654 (3) 0.3164422 (4) 0.3167893 (3)karate 0.4086949 (4) 0.4197896 (4) 0.4197896 (4) 0.4197896 (4)football 0.3560299 (4) 0.3602930 (5) 0.3602930 (5) 0.3603160 (5)mexican power 0.3281102 (4) 0.3477610 (5) 0.3477610 (5) 0.3595222 (4)Grid66 0.5131944 (4) 0.5500000 (4) 0.5500000 (4) 0.5412500 (5)morse 0.2238226 (5) 0.2260019 (5) 0.2260019 (5) 0.2276623 (4)Sawmill 0.5500780 (4) 0.5500780 (4) 0.5500780 (4) 0.5500780 (4)Food 0.3989169 (4) 0.4022119 (4) 0.4022119 (4) 0.4022119 (4)dolphins 0.5178593 (4) 0.5258692 (5) 0.5258692 (5) 0.5258692 (5)WorldImport1999 0.2746734 (3) 0.2750381 (3) 0.2750381 (3) 0.2750381 (3)prison alon 0.6158375 (9) 0.6207735 (9) 0.6207735 (9) 0.6207735 (9)lesmis 0.5662983 (6) 0.5666880 (6) 0.5666880 (6) 0.5666880 (6)world trade 0.3416073 (3) 0.3458804 (3) 0.3458804 (3) 0.3458804 (3)polBooks 0.5036225 (4) 0.5272366 (5) 0.5272366 (5) 0.5256066 (6)adjnoun 0.2842824 (7) 0.3107820 (6) 0.3107820 (6) 0.3107820 (6)afootball 0.5768034 (7) 0.5878718 (7) 0.5878718 (7) 0.5941993 (7)baywet 0.3616026 (4) 0.3630110 (4) 0.3630111 (4) 0.3630111 (4)jazz 0.3915476 (4) 0.4446760 (4) 0.4446760 (4) 0.4448713 (4)A99m main 0.6994806 (12) 0.7093229 (12) 0.7093229 (12) 0.7139006 (11)SmallW main 0.4444636 (3) 0.4489310 (3) 0.4489310 (3) 0.4494608 (3)A01 main 0.5760183 (10) 0.6163954 (11) 0.6195992 (11) 0.6321622 (12)sandi main 0.8208257 (15) 0.8277276 (15) 0.8277276 (15) 0.8277276 (15)celegansneural 0.4740477 (5) 0.5029479 (5) 0.5029479 (5) 0.5029479 (5)USAir97 0.3435344 (6) 0.3682440 (6) 0.3682440 (6) 0.3682440 (6)8Clusters 0.2844054 (8) 0.2940509 (8) 0.2940509 (8) 0.2940509 (8)WorldCities main 0.1162580 (5) 0.1772768 (4) 0.1773516 (4) 0.1789302 (4)celegans metabolic 0.4006559 (9) 0.4507027 (9) 0.4507027 (9) 0.4509054 (9)circuit s838 0.7904052 (15) 0.8155133 (15) 0.8155118 (15) 0.8155133 (15)Roget main 0.5371181 (14) 0.5800064 (14) 0.5801273 (14) 0.5868690 (16)CSphd main 0.9248402 (33) 0.9255843 (33) 0.9255843 (33) 0.9255843 (33)email 0.5293695 (11) 0.5771250 (12) 0.5771250 (12) 0.5776780 (11)polBlogs main 0.4144859 (3) 0.4323714 (13) 0.4323714 (13) 0.4324173 (13)NDyeast main 0.8135195 (34) 0.8209006 (32) 0.8211389 (33) 0.8225031 (31)Yeast main 0.5718771 (23) 0.6217924 (23) 0.6202976 (22) 0.6232274 (22)SciMet main 0.5485112 (14) 0.6206436 (18) 0.6210574 (18) 0.6224638 (16)102


none SGR-density CGR KLODLIS main 0.4607628 (12) 0.5134514 (12) 0.5128609 (12) 0.5137963 (12)DutchElite main 0.8389162 (49) 0.8493005 (48) 0.8482894 (48) 0.8492394 (47)geom main 0.7250140 (36) 0.7470261 (43) 0.7470261 (43) 0.7472464 (42)Epa main 0.6270148 (28) 0.6684264 (27) 0.6685486 (26) 0.6709372 (27)eva main 0.9333784 (53) 0.9357163 (52) 0.9357164 (52) 0.9360090 (50)USpowerGrid 0.9328997 (41) 0.9379491 (39) 0.9382380 (39) 0.9381809 (39)hep-th main 0.8425086 (58) 0.8551498 (59) 0.8548411 (57) 0.8552368 (57)Erdos02 0.6848450 (31) 0.7159228 (38) 0.7159229 (38) 0.7161150 (39)Lederberg main 0.6508633 (20) 0.7033853 (22) 0.7033853 (22) 0.7033671 (22)PairsP 0.6145969 (35) 0.6506679 (35) 0.6510081 (33) 0.6520954 (31)PGPgiantcompo 0.8631499 (66) 0.8832731 (94) 0.8832732 (94) 0.8839220 (102)astro-ph main 0.7300281 (49) 0.7623166 (58) 0.7621238 (58) 0.7631902 (58)eatRS 0.4466266 (26) 0.4999899 (25) 0.4998523 (26) 0.5005763 (22)DIC28 main 0.8015399 (58) 0.8474654 (105) 0.8475762 (102) 0.8478136 (104)hep-th-new main 0.5798309 (18) 0.6663347 (31) 0.6663266 (31) 0.6667373 (30)cond-mat-2003 main 0.7891245 (70) 0.8135988 (76) 0.8134295 (75) 0.8137741 (77)average 0.52305183903 0.54608087341 0.54609771377 0.54810369879Table B.5: Runtime Measurements. All times are in seconds.vertices edges none SGR-density CGR KLChain8 8 7 0.28800 0.32700 0.29000 0.35100K55 10 25 0.42300 0.40100 0.37800 0.43800Hi-tech main 33 91 0.90900 0.93300 0.87800 1.12900lesmis 77 254 1.16100 1.20900 1.15600 1.28100A99m main 233 325 1.39100 1.52100 1.43500 2.49300SmallW main 233 994 1.83700 1.96800 1.89600 3.08400A01 main 249 635 1.55100 1.76400 1.71000 3.35600celegansneural 297 2148 2.02100 2.21100 2.18600 5.16600Roget main 994 3641 2.75000 3.20600 3.85200 24.25000CSphd main 1025 1043 1.66000 2.04300 1.96100 22.72700NDyeast main 1458 1993 2.03800 2.80200 3.32300 37.48900SciMet main 2678 10369 4.75600 6.44700 14.37800 84.12400ODLIS main 2898 16381 6.35300 8.02400 13.80900 79.36200Epa main 4253 8897 4.54300 7.70500 30.41500 201.82000hep-th main 5835 13815 5.27600 11.26100 27.64100 423.34600Erdos02 6927 11850 6.04900 11.31000 34.10100 359.59600Lederberg main 8212 41436 14.61700 20.20700 83.13300 392.03800PairsP 10617 63786 22.13300 33.34600 127.07900 769.91300PGPgiantcompo 10680 24316 9.00400 22.69000 55.56700 1255.66100astro-ph main 14845 119652 29.20600 52.03700 365.39200 1917.59600eatRS 23219 305501 192.21500 236.66900 2115.29200 4003.69600DIC28 main 24831 71014 24.79100 76.99900 816.51200 4755.91700hep-th-new main 27400 352059 143.37100 186.87100 1608.20900 4553.72100cond-mat-2003 main 27519 116181 33.39300 76.34600 1019.13700 4158.93400103


B <strong>Clustering</strong> Resultswalktrap leadingev wakita HE wakita HN fgj ML-none spinglass ML-sgrd ML-KLChain8 1e-05 0.00100 0.12300 0.12400 0.00600 0.25300 1.85300 0.27100 0.31200Star9 0.00100 0.05700 0.12400 0.12300 0.00500 0.29000 3.02000 0.30900 0.36000K55 1e-05 0.00100 0.17300 0.12300 0.00500 0.32800 1.26400 0.34900 0.38000Tree15 1e-05 0.00700 0.17400 0.12300 0.00600 0.40500 5.02400 0.43400 0.50100ModMath main 0.00100 0.02200 0.17300 0.17400 0.00700 0.78900 6.92700 0.84800 1.00800SouthernWomen 0.00100 0.02900 0.17300 0.17300 0.00700 0.90400 7.43200 0.97100 1.16200karate 0.00100 0.03300 0.17400 0.17300 0.00700 0.82900 7.73500 0.89600 1.06900mexican power 0.00200 0.03000 0.17300 0.17300 0.00700 0.88300 6.25000 0.94300 1.12900Grid66 0.00100 0.04400 0.17400 0.17300 0.00700 0.86500 5.37300 0.93300 1.10300Sawmill 0.00100 0.03500 0.17400 0.17300 0.00700 0.86600 6.41200 0.93100 1.09600dolphins 0.00200 0.06000 0.17400 0.17300 0.00800 1.11300 11.57400 1.19900 0.93000polBooks 0.00300 0.13100 0.17500 0.17400 0.01300 1.34800 18.45100 1.46500 1.39700adjnoun 0.00500 0.22600 0.17400 0.17500 0.01700 1.25500 30.18600 1.36300 1.40800sandi main 0.00500 1.13800 0.22300 0.17400 0.01900 1.22000 48.13700 1.34900 2.37400USAir97 0.03400 1.10400 0.22300 0.22500 0.04200 2.04600 116.79000 2.23900 5.03300circuit s838 0.01500 1.84800 0.22300 0.22400 0.05100 1.61200 137.68800 1.84300 5.88600CSphd main 0.04300 2.51300 0.22400 0.22400 0.08900 1.59400 219.04600 1.99000 22.37300Erdos02 4.66500 54.29700 2.22800 2.18700 3.52000 5.71400 1959.89200 10.89800 351.05700DIC28 main 28.73900 810.58300 2.56000 1.54400 27.60200 24.19300 7213.12900 75.86800 4581.66900average 0.0029568 0.14823 0.23281 0.21555 0.02285 1.10230 22.44932 1.30958 2.71063Table B.6: Runtime of the Reference Algorithms. All time values are printed inseconds.104

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!