12.07.2015 Views

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.2 The Modularity Measure of Newmanlated by:deg(v) = ∑ x∈Vf(v, x) + f(v, v) (2.2)2.1.3 <strong>Graph</strong> <strong>Clustering</strong>s and <strong>Quality</strong> <strong>Measures</strong>A clustering C partitions the vertices of a graph into non-empty, disjoint subsetscovering the whole set of vertices. The clustering consists of the clusters i ∈ C <strong>with</strong>their vertices C i ⊆ V . Hence the number of clusters is |C|. The cluster containingvertex v is denoted by C(v) ∈ C and thus C(v) = i ⇔ v ∈ C i . To improve readabilityC[v] := C C(v) directly refers to the vertices of the cluster containing vertex v.The term clustering is mostly used in the context of (social) network analysisbecause there the aim is to group vertices together. The load balancing communitymostly uses the term partitioning to emphasize the search for balanced dissections.A clustering quality measure Q(C) maps clusterings of a graph to rational numbersQ called clustering quality. This way the quality measure defines a semi-orderon the set of clusterings <strong>with</strong> clustering C being at least as good as clustering Dif Q(C) ≥ Q(D). This enables the easy comparison of clusterings and the searchfor high quality clusterings although several different clusterings may be of samequality.Two quality measures Q 1 , Q 2 are said to be ranking equivalent when they orderall clusterings equally. This is the case when for all pairs of clusterings C, D holdsthat Q 1 (C) ≤ Q 1 (D) ⇔ Q 2 (C) ≤ Q 2 (D). Ranking equivalence is invariant underaddition of constants α ∈ Q and multiplication <strong>with</strong> positive constants 0 < β ∈ Q:Q(C) ≤ Q(D) ⇐⇒ α + β Q(C) ≤ α + β Q(D). This allows the normalization ofquality measures against other graph properties which is necessary before comparingqualities between different graphs. Often the maximum quality over all clusteringsor upper limits are used for normalization.C i, C(v), C[v]Q(C)ranking equivalence2.2 The Modularity Measure of NewmanThis section shortly introduces the clustering quality measure called modularity,which used throughout this work. The modularity was developed by Newman forcommunity detection in social networks [58, 59]. Here the modularity definitionfrom [57] is used and it is connected to a volume and density model. The sectionconcludes <strong>with</strong> basic properties of this quality measure. An axiomatic derivation ofthe modularity is presented in the next section. Proofs for the NP-completeness ofmodularity optimization and other properties can be found in [13]. The modularityhas a resolution limit. On growing graphs small clusters become invisible [25, 5].Looking at the edges covered by a cluster Newman observes the necessity tomake a ”judgment about when a particular density of edges is significant enoughto define a community” [57]. Hence more edges should be in clusters than expected∑from a random redistribution of all edges. This is achieved by maximizingC(u)=C(v) (f(u, v) − P (u, v)), where f(u, v) is the actual edge weight and P (u, v) 7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!