12.07.2015 Views

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

Multilevel Graph Clustering with Density-Based Quality Measures

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.1 Methods and Dataand edges, the weighted mean vertex degree (mean wdeg) and the global connectiondensity in the degree volume model (wgt. density) is printed. In addition the sourceof the graph is indicated in the last column and Table A.2 provides web addressesto this data.For various reasons also subsets of the collection will be used. For example somesingle evaluation steps will compare a big number of configurations. To keep feasiblecomputation times a reduced graph set will be employed. The second column marksthe graphs of the reduced set <strong>with</strong> R. In addition the graphs from the scalabilityanalysis are marked <strong>with</strong> S and from the comparison to reference algorithms <strong>with</strong>C.Many graphs of the collection are directed and contain parallel edges. But themodularity measure is defined for simple, undirected graphs. Therefore the graphswere pre-processed <strong>with</strong> the following aim: Between each adjacent pair of verticeslies exactly one edge in each direction and their weight equals the sum of the originaledge weights between both vertices. This pre-processing is allowed as the focus ofthis work lies on the evaluation of algorithms and not in the interpretation of singleclustering results. However other publications may have used different normalizationstrategies.The pre-processing is accomplished in four steps: First parallel edges are removedand their weight is added to the first edge. Then missing inverse edges are added<strong>with</strong> zero weight. Self-edges are used as their own inverse edge. In the third passthe edge weights are made symmetric by taking the sum of each edge and its inverseedge, ignoring self-edges. Finally in disconnected graphs the largest connectivitycomponent is chosen. These graphs are labeled <strong>with</strong> the suffix main.4.1.3 Efficiency and ScalabilityIn order to compare the efficiency of different configurations information about thecomputational expenses are necessary. Then trade offs between quality and runtimeare identified in combination <strong>with</strong> measured differences in the mean modularity.Unfortunately it is not advisable to compare average computation times. Thereis no shared time scale between the graphs. The average values would be stronglyinfluenced by the few largest graphs. Therefore just the measured times from a singlegraph are compared. Through the evaluation the graph DIC28 main is used. Forsome aspects also the graph Lederberg main is considered. All timings were measuredon a 3.00GHz Intel(R) Pentium(R) 4 CPU <strong>with</strong> 1GB main memory. Similar to themean modularity comparing the runtime <strong>with</strong> and <strong>with</strong>out refinement provides asignificance scale.Computation times measured on different graphs can be used to study the scalabilityof the algorithms. By nature the runtime depends on the number of verticesand edges. Plotting the measured times against the number of vertices will visualizethese dependencies. In practice the absolute times are uninteresting as they also dependon startup time, implementation style, and the runtime environment. Insteadthe progression of runtime curves of different algorithms is compared.63

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!