10.07.2015 Views

A distributed multilevel ant-colony algorithm for the multi-way graph ...

A distributed multilevel ant-colony algorithm for the multi-way graph ...

A distributed multilevel ant-colony algorithm for the multi-way graph ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

286 Int. J. Bio-Inspired Computation, Vol. 3, No. 5, 2011A <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> <strong>for</strong> <strong>the</strong><strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioningK. Tashkova*, P. Korošec and J. ŠilcComputer Systems Department,Jožef Stefan Institute,Jamova cesta 39, SI-1000 Ljubljana, SloveniaE-mail: katerina.taskova@ijs.siE-mail: peter.korosec@ijs.siE-mail: jurij.silc@ijs.si*Corresponding authorAbstract: The <strong>graph</strong>-partitioning problem arises as a fundamental problem in many import<strong>ant</strong>scientific and engineering applications. A variety of optimisation methods are used <strong>for</strong> solvingthis problem and among <strong>the</strong>m <strong>the</strong> meta-heuristics outstand <strong>for</strong> its efficiency and robustness.Here, we address <strong>the</strong> per<strong>for</strong>mance of <strong>the</strong> <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> (DMACA),a meta-heuristic approach <strong>for</strong> solving <strong>the</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning problem, which is basedon <strong>the</strong> <strong>ant</strong>-<strong>colony</strong> optimisation paradigm and is integrated with a <strong><strong>multi</strong>level</strong> procedure. The basicidea of <strong>the</strong> DMACA consists of parallel, independent runs enhanced with cooperation in <strong>the</strong> <strong>for</strong>mof a solution exchange among <strong>the</strong> concurrent searches. The objective of <strong>the</strong> DMACA is to reduce<strong>the</strong> overall computation time, while preserving <strong>the</strong> quality of <strong>the</strong> solutions obtained by <strong>the</strong>sequential version. The experimental evaluation on a two-<strong>way</strong> and four-<strong>way</strong> partitioning with 1%and 5% imbalance confirms that with respect to <strong>the</strong> sequential version, <strong>the</strong> DMACA obtainsstatistically, equally good solutions at a 99% confidence level within a reduced overallcomputation time.Keywords: <strong>ant</strong>-<strong>colony</strong> optimisation; bio-inspired computation; <strong>distributed</strong> computing; <strong>graph</strong>partitioning; <strong><strong>multi</strong>level</strong> approach.Reference to this paper should be made as follows: Tashkova, K., Korošec, P. and Šilc, J. (2011)‘A <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> <strong>for</strong> <strong>the</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning’, Int. J.Bio-Inspired Computation, Vol. 3, No. 5, pp.286–296.Bio<strong>graph</strong>ical notes: K. Tashkova is a PhD student at <strong>the</strong> Jožef Stefan International PostgraduateSchool, Ljubljana, Slovenia. She has received her BS in Electrical Engineering from ‘St. Cyriland Methodius’ University, Skopje, Macedonia, in 2005. Since 2007, she is a young Researcherat Computer System Department, Jožef Stefan Institute, Ljubljana, Slovenia. Her current areas ofresearch include numerical optimisation, ma<strong>the</strong>matical modelling and equation discovery.P. Korošec is a Researcher at <strong>the</strong> Jožef Stefan Institute, Ljubljana, and Assist<strong>ant</strong> Professor at <strong>the</strong>University of Primorska, Koper, Slovenia. His current areas of research include combinatorialand numerical optimisation with <strong>ant</strong>-based meta-heuristics, and <strong>distributed</strong> computing.J. Šilc is <strong>the</strong> Deputy Head of <strong>the</strong> Department of Computer Systems at <strong>the</strong> Jožef Stefan Institute,Ljubljana, Slovenia and an Assist<strong>ant</strong> Professor at <strong>the</strong> Jožef Stefan Postgraduate School,Ljubljana, Slovenia. His research interests include processor architecture, parallel computing,combinatorial and numerical optimisation.1 IntroductionThe problem of finding a partitioning <strong>for</strong> a given <strong>graph</strong>G into several sub<strong>graph</strong>s with respect to constraints(determined by <strong>the</strong> specific application), while minimising agiven objective function is <strong>the</strong> most general <strong>for</strong>mulation of<strong>the</strong> <strong>graph</strong>-partitioning problem. It arises as a fundamentalproblem in many import<strong>ant</strong> scientific and engineeringapplications, like parallel computation, sparse matrix-vector<strong>multi</strong>plication, sparse Gaussian elimination, VLSI design,image segmentation, telephone-network design, air-trafficmanagement, data clustering, <strong>the</strong> physical mapping of DNAand many o<strong>the</strong>rs (Alpert and Kahng, 1995; Jain et al., 1999;Simon, 1991; Ucar et al., 2007; Toril et al., 2010).The most common <strong>for</strong>mulation of this problem is knownas a <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning problem. It consists offinding a partitioning of <strong>the</strong> given <strong>graph</strong> into k sub<strong>graph</strong>s insuch a <strong>way</strong> that <strong>the</strong> sum of <strong>the</strong> vertex weights is almostequal in each sub<strong>graph</strong>, while <strong>the</strong> number of edges crossingbetween <strong>the</strong> sub<strong>graph</strong>s is minimised. The <strong>multi</strong>-<strong>way</strong> <strong>graph</strong>partitioning problem is most probably NP-hard (Gareyet al., 1974). Typically, this problem is too difficult to beCopyright © 2011 Inderscience Enterprises Ltd.


288 K. Tashkova et al.Based on <strong>the</strong> <strong>for</strong>mal definition given by Bichot (2007), <strong>the</strong><strong>graph</strong>-partitioning problem can be <strong>for</strong>mulated as follows.Given a <strong>graph</strong> G = (V, E), where V is <strong>the</strong> set of verticesand E is <strong>the</strong> set of edges. If edges or vertices are notweighted, <strong>the</strong>n set all of <strong>the</strong>m to a unit weight. For eachvertex v i ∈ V, let w(v i ) be its weight and <strong>for</strong> each edgee = (v i , v j ) ∈ E, let w(v i , v j ) be its weight. Find a partitionkπ k of k subsets V 1 , …, V k of V such that: ∪ i= 1 Vi= V;∀i, j∉{1, ..., k}, i ≠ j, Vi∩ Vj= 0; / <strong>the</strong> constraint C(V i ) istrue; and <strong>the</strong> cost function cut_size(π k ) is minimised. Let cutbe <strong>the</strong> cut between two distinct parts:cut( ViVj), = ∑ w( u, v),(1)u∈Vi,v∈Vji.e., <strong>the</strong> sum of <strong>the</strong> weights of <strong>the</strong> edges between part V i andpart V j <strong>for</strong> i ≠ j. Let W be <strong>the</strong> weight of a part V i :( ) w(),vW Vi= ∑ (2)v∈Vii.e., <strong>the</strong> sum of <strong>the</strong> weights of <strong>the</strong> vertices of part V i . Thek-<strong>way</strong> <strong>graph</strong>-partitioning problem is defined with <strong>the</strong>constraint C k :WV ( )∀ iW , ( Vi) < β ⎡⎤ ,⎢ k⎥where <strong>the</strong> function ⎡x⎤ returns <strong>the</strong> smallest integer greater orequal than x, and <strong>the</strong> imbalance factor β is low (from 1.0 to1.1). The k-<strong>way</strong> <strong>graph</strong>-partitioning problem uses <strong>the</strong> costfunction:( π ) ( Vi,Vj)cut _ size = ∑ cut ,(4)ki 0 <strong>the</strong>n8: Graph[l – 1] = Refinement(Graph[l])9: Bucket_Initialisation()10: end if11: end <strong>for</strong>12: BestPartition = BestLevelPartition3.1 SolverThe main idea of <strong>the</strong> solver, i.e., <strong>the</strong> <strong>algorithm</strong> <strong>for</strong> k-<strong>way</strong><strong>graph</strong> partitioning, is very simple (Langham and Gr<strong>ant</strong>,1999). It uses k colonies of <strong>ant</strong>s (artificial agents), which aremediated by pheromone trails and a local heuristic, toper<strong>for</strong>m a probabilistic move on a grid (which represents <strong>the</strong><strong>ant</strong>s’ habitat), while competing <strong>for</strong> food (initially randomlyplaced on <strong>the</strong> grid cells) that is represented by <strong>the</strong> verticesof <strong>the</strong> <strong>graph</strong>. The result of <strong>the</strong> <strong>for</strong>aging behaviour of <strong>the</strong> kcolonies is stored food in k nests, i.e., <strong>the</strong>y decompose <strong>the</strong><strong>graph</strong> into k sub<strong>graph</strong>s.3.2 Multilevel frameworkThe <strong><strong>multi</strong>level</strong> framework (Barnard and Simon, 1994)as presented in Algorithm 1 and Figure 1 combines alevel-based coarsening strategy toge<strong>the</strong>r with a level-basedrefinement method (in reverse order) to promote fasterconvergence and solution to a larger problems.Figure 1The three phases of <strong><strong>multi</strong>level</strong> k-<strong>way</strong> <strong>graph</strong> partitioning3 Multilevel <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong>The MACA is an <strong>algorithm</strong> <strong>for</strong> k-<strong>way</strong> <strong>graph</strong> partitioningcombining <strong>ant</strong>-<strong>colony</strong> optimisation paradigm with a<strong><strong>multi</strong>level</strong> technique (Walshaw and Cross, 2001) in a <strong>way</strong>that provides a more efficient behaviour and a higherflexibility when dealing with real world and large-scaleproblems. The MACA is a recursive-like approachthat combines four basic methods: <strong>graph</strong> partitioning(Solver, i.e., <strong>the</strong> method based on <strong>the</strong> <strong>ant</strong>-<strong>colony</strong>optimisation paradigm), <strong>graph</strong> contraction (Coarsening),<strong>graph</strong> expansion (Refinement) and vertex arrangement(Bucket_Sorting). Algorithm 1 outlines <strong>the</strong> top-levelMACA pseudo code.In order to be able to present <strong>the</strong> <strong>distributed</strong> version of<strong>the</strong> MACA, a brief description of <strong>the</strong> particular methodswill be covered in this section. Fur<strong>the</strong>r details about <strong>the</strong>semethods can be found in Korošec et al. (2004).


A <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> <strong>for</strong> <strong>the</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning 289Coarsening is a <strong>graph</strong> contraction procedure that is iteratedL times (on L levels). A coarser <strong>graph</strong> G l+1 (V l+1 , E l+1 ) isobtained from a <strong>graph</strong> G l (V l , E l ) by finding <strong>the</strong> largestindependent subset of <strong>graph</strong> edges and <strong>the</strong>n collapsing<strong>the</strong>m. On <strong>the</strong> o<strong>the</strong>r hand, refinement is a <strong>graph</strong> expansionprocedure that applies on a partitioned <strong>graph</strong> G l , whichexpands it onto its parent <strong>graph</strong> G l–1 . The idea behind this isto solve <strong>the</strong> problem iteratively, step by step, starting with avery condensed problem representation (<strong>the</strong> smallest <strong>graph</strong>)which is <strong>the</strong>n partitioned with <strong>the</strong> solver. The obtainedsolution is <strong>the</strong>n expanded to <strong>the</strong> next level <strong>graph</strong> (bigger insize) and its partitioning is fur<strong>the</strong>r refined with new iterationof optimisation. In this <strong>way</strong>, we expand <strong>the</strong> <strong>graph</strong> to itsoriginal size, and on every level l of our expansion we runour solver.Large <strong>graph</strong> problems and <strong>the</strong> <strong><strong>multi</strong>level</strong> process byitself induce rapid increase of <strong>the</strong> number of vertices ina single grid cell as <strong>the</strong> number of levels goes up.To overcome this problem, <strong>the</strong> MACA employs a methodbased on <strong>the</strong> basic bucket sort idea (Fiduccia andMat<strong>the</strong>yses, 1982) that accelerates and improves <strong>the</strong><strong>algorithm</strong>’s convergence by choosing <strong>the</strong> most ‘promising’vertex from a given cell. Inside <strong>the</strong> cell, all vertices with aparticular gain g are put toge<strong>the</strong>r in a ‘bucket’ ranked g andall non-empty buckets, implemented as double-linked list ofvertices, are organised in a two to three tree (Bayer andMcCreight, 1972).Additionally, <strong>the</strong> MACA keeps separate two to threetree <strong>for</strong> each <strong>colony</strong> on every grid cell that has vertices inorder to gain even faster searches.4 Distributed <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong>An initial study on <strong>the</strong> parallelisation of <strong>the</strong> MACA(Tashkova et al., 2008) examined two <strong>distributed</strong> versionsof <strong>the</strong> MACA.The first one was based on <strong>the</strong> parallel interactive<strong>colony</strong> approach, which, by definition, implied a master/slave implementation and synchronised communication. Adisadv<strong>ant</strong>age of this version was <strong>the</strong> synchronisation/communication overhead, since an in<strong>for</strong>mation exchangeacross <strong>the</strong> concurrent processors was initiated every time apiece of food was taken or dropped at a new position.Fur<strong>the</strong>rmore, <strong>the</strong> master kept and updated its own local gridmatrix of temporal food positions (played <strong>the</strong> role of ashared memory) in order to maintain normal and consistentslave activities.Trying to avoid <strong>the</strong> communication and still exploitsome level of parallelism, <strong>the</strong> second version distributes <strong>the</strong>MACA on <strong>the</strong> idea of <strong>the</strong> parallel, independent runs(Stützle, 1998) enhanced with cooperation in <strong>the</strong> <strong>for</strong>m of asolution exchange among <strong>the</strong> concurrent searches. Inthis paper, we consider <strong>the</strong> second approach in aslightly modified version than <strong>the</strong> initially introduced <strong>the</strong>SIDMACA in Tashkova et al. (2008) and we simply refer toas <strong>the</strong> DMACA. The DMACA modifies <strong>the</strong> SIDMACAsearch method with respect to <strong>the</strong> number of iterations,<strong>the</strong> imbalance setting and <strong>the</strong> buffer size used <strong>for</strong>communication in <strong>the</strong> <strong>way</strong> that will be described in <strong>the</strong>following para<strong>graph</strong>s.The DMACA is basically an approach that allows <strong>the</strong>exchange of <strong>the</strong> best temporal solution at <strong>the</strong> end of everylevel of <strong>the</strong> <strong><strong>multi</strong>level</strong> optimisation process. This exchangerequires that <strong>the</strong> parallel executions of <strong>the</strong> MACA instanceson <strong>the</strong> available processors have to be synchronised onceper level. This means that <strong>the</strong> master processor isresponsible <strong>for</strong> synchronising <strong>the</strong> work of all <strong>the</strong> slaveprocessors that execute a copy of <strong>the</strong> DMACA, bymanaging <strong>the</strong> exchange in<strong>for</strong>mation and communicationprocess. The slave processors have to execute <strong>the</strong> instancesof <strong>the</strong> DMACA code, make signals when <strong>the</strong> current levelof optimisation is finished and send <strong>the</strong> best partition to <strong>the</strong>master. When all <strong>the</strong> slaves finish <strong>the</strong> current level, <strong>the</strong>master determines <strong>the</strong> best solution and broadcasts it to <strong>the</strong>slaves. In order to proceed with <strong>the</strong> next level ofoptimisation, <strong>the</strong> slave processors have to first update <strong>the</strong>local memory structures (grid matrix) and afterwardsper<strong>for</strong>m a partition expansion (refinement). The main ideaof <strong>the</strong> DMACA is outlined in Algorithm 2.Algorithm 2 DMACAMaster:1: Start_All_Slaves()2: repeat3: while all slaves not finished level do4: Receive_From_Slave(SlaveBestLevelPartition)5: BestLevelPartitions =Add(SlaveBestLevelPartition)6: end while7: BestLevelPartition = Calculate(BestLevelPartitions)8: Broadcast_To_Slaves(BestLevelPartition)9: if last level finished <strong>the</strong>n10: BestPartition = BestLevelPartition11: end if12: until last level finished13: Stop_All_Slaves()Slave:1: Receive_From_Master(Parameters)2: Graph[0] = Initiliazation(Parameters)3: <strong>for</strong> l = 0 to L – 1 do4: Graph[l + 1] = Coarsening(Graph[l])5: end <strong>for</strong>6: <strong>for</strong> l = L down to 0 do7: SlaveBestLevelPartition = Solver(Graph[l])8: Send_To_Master(SlaveBestLevelPartition)9: Receive_From_Master(BestLevelPartition)10: Update(Graph[l], BestLevelPartition)11: if l > 0 <strong>the</strong>n12: Graph[l – 1] = Refinement(Graph[l])


290 K. Tashkova et al.13: Bucket_Initialisation()14: end if15: end <strong>for</strong>First characteristic of <strong>the</strong> MACA is that it does not rigidlyfix <strong>the</strong> total number of iterations per level. More precisely,if <strong>the</strong> number of iterations per level is set to some arbitrarynumber m, <strong>the</strong> search procedure stops when in <strong>the</strong> lastsuccessive m iterations no improvement over <strong>the</strong> bestsolution is obtained. Second, it employs a limited search of<strong>the</strong> optimal sub<strong>graph</strong> inside a given const<strong>ant</strong> imbalance(only) on <strong>the</strong> last three levels.Fur<strong>the</strong>rmore, <strong>the</strong> initial <strong>distributed</strong> version SIDMACAinherited <strong>the</strong> above described search procedure completelyunmodified from <strong>the</strong> original MACA, meaning that in <strong>the</strong>comparison in <strong>the</strong> previous study was not based on fixednumber of iterations. In order to obtain an appropriatecomparison on parallel per<strong>for</strong>mance, <strong>the</strong> DMACA employsa fixed number of iterations per corresponding level l,according to <strong>the</strong> <strong>for</strong>mula⎧ m = 0, 1⎪iter[] = ⎨iter[ − 1] + m = 2,..., L −2⎪⎩ 2 iter[ − 1] = L −1.This scaling of <strong>the</strong> number of iterations is in a <strong>way</strong>proportional with <strong>the</strong> size of <strong>the</strong> <strong>graph</strong>s on <strong>the</strong> differentlevels. An intuitive explanation comes <strong>for</strong>m <strong>the</strong> fact that at<strong>the</strong> higher levels associated with <strong>the</strong> smaller <strong>graph</strong>s(contracted <strong>graph</strong>s) <strong>the</strong> search needs less iterations than <strong>the</strong>larger <strong>graph</strong>s in <strong>the</strong> lower levels. Consequently, at <strong>the</strong> finalstage when we obtain <strong>the</strong> original <strong>graph</strong>, we allow <strong>the</strong>search a largest number of iteration <strong>for</strong> final refinement of<strong>the</strong> <strong>graph</strong> partitioning. Moreover, <strong>the</strong> imbalance factor wasscaled by levels as well: starting from level L – 3 constrainton <strong>the</strong> imbalance was introduced, and its value wasdecreased down to a specified threshold (in our caseβ = 1.01 or β = 1.05) at <strong>the</strong> first level. There was nolimitation on <strong>the</strong> imbalance at <strong>the</strong> last two levels, as <strong>the</strong><strong>graph</strong>s are very small in size.Finally, <strong>the</strong> buffer used <strong>for</strong> communication in <strong>the</strong>SIDMACA version was allocated in advance. It was fixed ata value big enough to support <strong>the</strong> transfer of <strong>the</strong> bestsolution of <strong>the</strong> largest (in terms of <strong>the</strong> number of vertices)<strong>graph</strong> tested. To avoid <strong>the</strong> unnecessary communicationoverhead in <strong>the</strong> case of partitioning differently size <strong>graph</strong>s,in <strong>the</strong> DMACA <strong>the</strong> buffer was dynamically allocatedaccording to <strong>the</strong> size of <strong>the</strong> <strong>graph</strong>.5 Experimental evaluationThe proposed DMACA was applied to a set of benchmark<strong>graph</strong>s and <strong>the</strong> results from <strong>the</strong> experimental evaluation ontwo-<strong>way</strong> and four-<strong>way</strong> <strong>graph</strong> partitioning are presented anddiscussed in this section.(5)5.1 Per<strong>for</strong>mance measuresThe quality of <strong>the</strong> <strong>graph</strong> partitioning is described by <strong>the</strong>cut-size measure and <strong>the</strong> imbalance factor. Since <strong>the</strong>imbalance of <strong>the</strong> obtained solutions is kept in a predefinedrange of values, we report on <strong>the</strong> quality in terms of <strong>the</strong>number of cut edges, cut_size(π k ).A statistical significance test was per<strong>for</strong>med to check<strong>the</strong> difference in <strong>the</strong> quality of <strong>the</strong> obtained solutions with<strong>the</strong> MACA and <strong>the</strong> DMACA. We used pairwisecomparisons with <strong>the</strong> signed-rank test proposed byWilcoxon (1945) and <strong>multi</strong>ple comparisons with <strong>the</strong>dynamic post-hoc procedure proposed by Bergmann andHommel (1988). Based on <strong>the</strong>se procedures with a chosensignificance level α (in our case 0.01), we make a decisionabout <strong>the</strong> null-hypo<strong>the</strong>sis that ‘<strong>the</strong>re is no difference inper<strong>for</strong>mance between <strong>the</strong> compared methods’. If α < p-value <strong>the</strong>n <strong>the</strong> null hypo<strong>the</strong>sis is rejected, o<strong>the</strong>rwise it is notrejected. Here, <strong>the</strong> p-value is determined according to <strong>the</strong>Friedman’s statistic using <strong>the</strong> cut-size results.Finally, <strong>the</strong> effectiveness of <strong>the</strong> parallel <strong>algorithm</strong> is, inour case, given by <strong>the</strong> speed-up measurestSSa( n)(6)t ( n )PandtP(1)Sr( n) = ,t ( n)(7)Pwhere t S is <strong>the</strong> time to solve a problem with <strong>the</strong> sequentialcode (MACA), and t P (1) and t P (n) are <strong>the</strong> times to solve <strong>the</strong>same problem with <strong>the</strong> parallel code (DMACA) on a singleprocessor and on n processors, respectively. According to<strong>the</strong> study (Barr and Hickman, 1993), <strong>the</strong> speed-up resultswere calculated based on <strong>the</strong> mean value of <strong>the</strong> time <strong>for</strong> <strong>the</strong>serial code, while <strong>the</strong> final result was presented as <strong>the</strong>harmonic mean of <strong>the</strong> speed-up values of all <strong>the</strong> runs.5.2 SetupThe DMACA is implemented in Borland Delphi, using<strong>the</strong> TCP/IP protocol <strong>for</strong> <strong>the</strong> server/client communication,based on <strong>the</strong> open-source library Indy Sockets 10. All <strong>the</strong>experiments were per<strong>for</strong>med on a eight-node clusterconnected via a Giga-bit switch, where each node consistsof two AMD Opteron 1.8 GHz processors, 2 GB of RAM,and <strong>the</strong> Windows XP operating system.The benchmark <strong>graph</strong>s used in <strong>the</strong> experimentalanalysis were taken from The Graph PartitioningArchive available online at http://staffweb.cms.gre.ac.uk/~c.walshaw/partition/. Their description, in terms of numberof <strong>graph</strong> vertices and number of <strong>graph</strong> edges is presented inTable 1, while <strong>the</strong> best available solutions <strong>for</strong> <strong>the</strong> particular<strong>graph</strong>s up to June 2010 are presented in Table 2.


A <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> <strong>for</strong> <strong>the</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning 291Table 1Graph nameBenchmark <strong>graph</strong>sGDegree| V | | E | min max Avg.add20 2,395 7,462 1 123 6.23data 2,851 15,093 3 17 10.59uk 4,824 6,837 1 3 2.83bcsstk33 8,738 291,583 19 140 66.74crack 10,240 30,380 3 9 5.93wing_nodal 10,937 75,488 5 28 13.80vibrobox 12,328 165,250 8 120 26.814elt 15,606 45,878 3 10 5.88memplus 17,758 54,196 1 573 6.10cs4 22,499 43,858 2 4 3.90The total number of <strong>ant</strong>s per <strong>colony</strong> was set to 120. Thenumber of <strong>ant</strong>s per sub-<strong>colony</strong> was determined from <strong>the</strong>number n of processors as 1 of <strong>the</strong> total number of <strong>ant</strong>s.nWith regard to <strong>the</strong> imbalance, we per<strong>for</strong>med two sets ofexperiments, <strong>for</strong> β = 1.01 and β = 1.05. All <strong>the</strong> experimentswere run 30 times.Since <strong>the</strong> original MACA method and <strong>the</strong> proposedDMACA have slightly different search settings, in order togive <strong>the</strong>m an equal chance in <strong>the</strong> experimental evaluation,we defined <strong>the</strong> MACA with a scaled number of iterationsand a scaled imbalance, as described in <strong>the</strong> previoussection when <strong>the</strong> DMACA was introduced. The number ofiterations m was set to 200.5.3 ResultsReporting results from <strong>the</strong> experiments with parallel<strong>algorithm</strong>s is not a straight<strong>for</strong>ward task (Barr and Hickman,1993). Moreover, in <strong>the</strong> case of stochastic <strong>algorithm</strong>s (like<strong>the</strong> DMACA) <strong>the</strong> repeatability of <strong>the</strong> <strong>algorithm</strong>’s outcomeis questionable, making <strong>the</strong> per<strong>for</strong>mance-evaluationprocedure even more difficult. The standard <strong>way</strong> ofreporting results with <strong>the</strong> mean value and <strong>the</strong> correspondingvariance of <strong>the</strong> best found solutions over all <strong>the</strong> per<strong>for</strong>medexecutions (runs) is not al<strong>way</strong>s sufficient (<strong>the</strong> mean valuecould be far a<strong>way</strong> from <strong>the</strong> best obtained solution).Because of <strong>the</strong> general practice of reporting <strong>the</strong>best obtained solutions in <strong>the</strong> field of <strong>multi</strong>-<strong>way</strong> <strong>graph</strong>partitioning, we report <strong>the</strong> best value <strong>for</strong> <strong>the</strong> cut-sizemeasure obtained from 30 runs. The results on cut-sizeper<strong>for</strong>mance obtained with <strong>the</strong> MACA and <strong>the</strong> DMACA arepresented in Tables 2 and 3, respectively. The relativedistance dist [%] of <strong>the</strong> DMACA solutions with regard to<strong>the</strong> best available solutions are also calculated and given inTable 3.Compared with <strong>the</strong> currently available best solutions <strong>for</strong><strong>the</strong> given <strong>graph</strong>s, <strong>the</strong> best solutions obtained with <strong>the</strong>DMACA are worse from <strong>the</strong> ones given in Table 2, except<strong>for</strong> two-<strong>way</strong> partitioning of <strong>the</strong> uk <strong>graph</strong> in <strong>the</strong> case ofimbalance β = 1.01 when we get <strong>the</strong> same solution.According to Table 3, <strong>the</strong> relative distances of <strong>the</strong>best solutions obtained by <strong>the</strong> DMACA <strong>for</strong> <strong>the</strong> two-<strong>way</strong>partitioning problem are smallest <strong>for</strong> crack (less than 2.2%),wing_nodal (less than 3.8%) and uk (less than 5.6%) <strong>graph</strong>.The largest deviation from <strong>the</strong> best available solutions in <strong>the</strong>case of <strong>the</strong> two-<strong>way</strong> partitioning problem are observed <strong>for</strong><strong>the</strong> add20 (up to 30.4%) and 4elt (up to 42.3%) <strong>graph</strong>. In<strong>the</strong> case of <strong>the</strong> four-<strong>way</strong> partitioning problem, <strong>the</strong> relativedistances of <strong>the</strong> best solutions obtained by <strong>the</strong> DMACA aresmallest <strong>for</strong> <strong>the</strong> add20 (less than 2.4%) and wing_nodal(less than 4.5%) <strong>graph</strong>; solutions of <strong>the</strong> crack, vibroboxand memplus <strong>graph</strong> are worse up to 8.6% from <strong>the</strong> bestavailable, while <strong>the</strong> largest deviation of 40.5% is observed<strong>for</strong> <strong>the</strong> solution of <strong>the</strong> uk <strong>graph</strong>.Table 2Best available (June 2010) and best cut-size values <strong>for</strong> <strong>the</strong> benchmark <strong>graph</strong>s obtained with <strong>the</strong> MACABest known value <strong>for</strong> cut-size measureMACAGraph name β = 1.01 β = 1.05 β = 1.01 β = 1.05k = 2 k = 4 k = 2 k = 4 k = 2 k = 4 k = 2 k = 4add20 594 1,177 550 1,157 718 1,165 698 1,192data 188 383 181 368 208 432 208 432uk 19 42 18 40 19 56 19 56bcsstk33 10,097 21,508 9,914 20,584 10,814 23,003 10,710 22,744crack 183 362 182 360 184 375 184 370wing_nodal 1,696 3,572 1,668 3,536 1,725 3,644 1,709 3,670vibrobox 10,310 19,199 10,310 18,778 11,256 19,996 11,421 20,0994elt 138 321 137 315 140 356 139 342memplus 5,489 9,559 5,267 9,299 6,227 10,076 6,122 10,072cs4 367 940 365 936 397 1,033 394 1,025


292 K. Tashkova et al.Table 3Best values <strong>for</strong> cut_size(π k ) measure obtained with <strong>the</strong> DMACA and corresponding relative distance dist in percentage withregard to <strong>the</strong> best available solution from Table 2Graph namenβ = 1.01 β = 1.05k = 2 k = 4 k = 2 k = 4cut_size dist [%] cut_size dist [%] cut_size dist [%] cut_size dist [%]add201 701 18.0 1,178 0.1 716 30.2 118 2.22 717 20.7 1,184 0.6 706 28.4 118 2.14 717 20.7 1,184 0.6 717 30.4 118 2.38 717 20.7 1,184 0.6 717 30.4 118 2.416 717 20.7 1,187 0.8 717 30.4 118 2.4data1 208 10.6 425 11.0 208 14.9 40 10.62 208 10.6 407 6.3 208 14.9 40 10.94 208 10.6 428 11.7 210 16.0 40 11.18 210 11.7 432 12.8 210 16.0 43 18.516 223 18.6 408 6.5 230 27.1 43 19.0uk1 19 0.0 59 40.5 19 5.6 5 35.02 19 0.0 51 21.4 19 5.6 5 27.54 19 0.0 46 9.5 19 5.6 4 20.08 19 0.0 48 14.3 19 5.6 5 30.016 19 0.0 50 19.0 19 5.6 4 22.5bcsstk331 10,719 6.2 23,170 7.7 10,706 8.0 2,285 11.02 10,805 7.0 23,430 8.9 10,874 9.7 2,332 13.34 10,884 7.8 23,156 7.7 10,545 6.4 2,308 12.28 10,555 4.5 23,463 9.1 10,586 6.8 2,324 12.916 10,882 7.8 22,810 6.1 10,922 10.2 2,355 14.4crack1 184 0.5 369 1.9 184 1.1 37 3.32 184 0.5 382 5.5 184 1.1 37 2.84 184 0.5 374 3.3 184 1.1 37 3.18 187 2.2 378 4.4 184 1.1 37 5.016 185 1.1 384 6.1 184 1.1 38 8.1wing_nodal1 1,722 1.5 3,619 1.3 1,723 3.3 364 2.92 1,715 1.1 3,659 2.4 1,710 2.5 366 3.84 1,713 1.0 3,676 2.9 1,716 2.9 367 3.88 1,718 1.3 3,651 2.2 1714 2.8 367 3.816 1,736 2.4 3,700 3.6 1,732 3.8 369 4.5vibrobox4elt1 11,583 12.3 19,905 3.7 11,528 11.8 2,000 6.52 11,313 9.7 20,198 5.2 11,471 11.3 2,035 8.44 11,376 10.3 20,078 4.6 11,567 12.2 2,003 6.78 11,244 9.1 20,340 5.9 11,226 8.9 2,038 8.616 11,693 13.4 20,244 5.4 11,626 12.8 2,038 8.51 140 1.4 348 8.4 154 12.4 36 16.52 173 25.4 334 4.0 139 1.5 34 7.94 139 0.7 350 9.0 140 2.2 35 14.08 185 34.1 344 7.2 141 2.9 34 9.216 179 29.7 348 8.4 195 42.3 34 9.8


A <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> <strong>for</strong> <strong>the</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning 293Table 3Best values <strong>for</strong> cut_size(π k ) measure obtained with <strong>the</strong> DMACA and corresponding relative distance dist in percentage withregard to <strong>the</strong> best available solution from Table 2 (continued)Graph namemempluscs4β = 1.01 β = 1.05nk = 2 k = 4 k = 2 k = 4cut_size dist [%] cut_size dist [%] cut_size dist [%] cut_size dist [%]1 6,207 13.1 10,058 5.2 6,191 17.5 1,005 8.22 6,219 13.3 9,977 4.4 6,198 17.7 1,005 8.14 6,168 12.4 10,071 5.4 6,196 17.6 997 7.28 6,239 13.7 10,030 4.9 6,172 17.2 1,002 7.816 6,206 13.1 9,994 4.6 6,204 17.8 1,001 7.71 391 6.5 1,022 8.7 391 7.1 1,008 7.72 399 8.7 1,040 10.6 403 10.4 1,038 10.94 403 9.8 1,033 9.9 402 10.1 1,032 10.38 406 10.6 1,060 12.8 408 11.8 1,041 11.216 411 12.0 1,041 10.7 422 15.6 1,040 11.1As our primary goal was not to get <strong>the</strong> best possiblesolutions out of <strong>the</strong> DMACA and compare <strong>the</strong>m with <strong>the</strong>state-of-<strong>the</strong>-art <strong>algorithm</strong>s <strong>for</strong> <strong>graph</strong> partitioning, but topreserve <strong>the</strong> quality and improve <strong>the</strong> execution time of <strong>the</strong>MACA when <strong>distributed</strong> in a <strong>multi</strong>processor environment,we did not fine-tune <strong>the</strong> <strong>algorithm</strong>s’ parameters whenapplied to a specific <strong>graph</strong> problem. This means that <strong>for</strong> all<strong>the</strong> experiments with all <strong>the</strong> <strong>graph</strong>s we used <strong>the</strong> samesetting, no matter how big or complex <strong>the</strong> <strong>graph</strong> was.Based on <strong>the</strong> cut-size results, Table 4 presents pairwisecomparisons with <strong>the</strong> Wilcoxon signed-rank test. The testconfirms that, in general, <strong>the</strong>re is not a signific<strong>ant</strong> differencein <strong>the</strong> quality of <strong>the</strong> generated solutions with <strong>the</strong> MACAand <strong>the</strong> DMACA, except in <strong>the</strong> case of solving <strong>the</strong> two-<strong>way</strong>partitioning problem <strong>for</strong> imbalance β = 1.05 when <strong>the</strong>MACA is signific<strong>ant</strong>ly better than <strong>the</strong> DMACA n = 16 at a1% significance level (α = 0.01).Table 4Hypo<strong>the</strong>sisPairwise comparisons with <strong>the</strong> Wilcoxon test(p-value)β = 1.01 β = 1.05k = 2 k = 4 k = 2 k = 4MACA vs. DMACA n=1 0.44 0.32 0.08 0.36MACA vs. DMACA n=2 0.94 0.61 0.03 1MACA vs. DMACA n=4 0.78 0.43 0.20 0.55MACA vs. DMACA n=8 0.85 0.36 0.72 0.31MACA vs. DMACA n=16 0.78 0.94 0.01 0.23Similarly, Table 5 presents <strong>the</strong> results on <strong>multi</strong>plecomparisons with <strong>the</strong> Bergmann-Hommel dynamic posthocprocedure between <strong>the</strong> inst<strong>ant</strong>s of <strong>the</strong> DMACA applied ondifferent number of processors. For a 1% significance levelall <strong>the</strong> hypo<strong>the</strong>ses are retained, meaning that <strong>the</strong>re is nosignific<strong>ant</strong> difference between <strong>the</strong> generated solutions with<strong>the</strong> DMACA on different number of processors.Since <strong>the</strong> quality of <strong>the</strong> DMACA is preserved, <strong>the</strong>n anyspeed-up we can gain is beneficial. The mean (harmonic)values of <strong>the</strong> absolute and relative speedups when <strong>the</strong>DMACA was applied to <strong>the</strong> 10 <strong>graph</strong>s are presented inFigure 2.Table 5Hypo<strong>the</strong>sisMultiple comparisons with <strong>the</strong> Bergmann-Hommelprocedure (adjusted p-value)β = 1.01 β = 1.05k = 2 k = 4 k = 2 k = 4DMACA n=1 vs. DMACA n=2 1 1 1 1DMACA n=1 vs. DMACA n=4 1 1 1 1DMACA n=1 vs. DMACA n=8 0.82 1 1 0.24DMACA n=1 vs. DMACA n=16 0.20 0.05 1 0.24DMACA n=2 vs. DMACA n=4 1 1 1 1DMACA n=2 vs. DMACA n=8 0.82 1 1 0.24DMACA n=2 vs. DMACA n=16 0.20 0.02 1 0.24DMACA n=4 vs. DMACA n=8 0.82 1 1 0.24DMACA n=4 vs. DMACA n=16 0.20 0.08 1 0.24DMACA n=8 vs. DMACA n=16 1 0.05 1 1In general, <strong>the</strong> observed speed-ups <strong>for</strong> <strong>the</strong> four-<strong>way</strong>partitioning task are slightly higher than <strong>the</strong> one <strong>for</strong> <strong>the</strong>two-<strong>way</strong> partitioning task, obtaining a speed-up of up to 2,3, 5.3, and 8, in <strong>the</strong> case of executing <strong>the</strong> DMACA <strong>for</strong>n = 2, 4, 8, and 16, respectively. When solving <strong>the</strong> two-<strong>way</strong>partitioning task with <strong>the</strong> DMACA <strong>for</strong> n = 2, 4, 8, and 16,<strong>the</strong> speed-up is up to 2, 2.6, 4, and 6, respectively.Based on <strong>the</strong> speed-up results visualised in Figure 2,Table 6 summarises <strong>the</strong> per<strong>for</strong>mance of <strong>the</strong> DMACA withrespect to <strong>the</strong> minimum and maximum speed-up values overall test <strong>graph</strong>s <strong>for</strong> both partitioning problems. The resultsshow that maximal speed-ups are obtained <strong>for</strong> <strong>the</strong> uk, crackand cs4 <strong>graph</strong>. While uk <strong>graph</strong> is <strong>the</strong> smallest in size among<strong>the</strong>se three <strong>graph</strong>s, and cs4 <strong>the</strong> biggest in size from all tested<strong>graph</strong>s, all of <strong>the</strong>m have in average relatively small numberof connections per vertex. Moreover, <strong>the</strong> minimal speed-upsevident in <strong>the</strong> case of executing <strong>the</strong> DMACA <strong>for</strong> n = 2, 4, 8,


294 K. Tashkova et al.and 16, on bcsstk33 <strong>graph</strong> and relatively small speed-ups<strong>for</strong> <strong>the</strong> vibrobox <strong>graph</strong> as well, reveals that <strong>the</strong> DMACAcode is potentially weak on <strong>graph</strong>s with a high degree ofconnections per vertex and a bigger size <strong>graph</strong>, in terms of<strong>the</strong> number of vertices. This comes from <strong>the</strong> bucket-sortingprocedure in <strong>the</strong> MACA, inherited completely unmodifiedby <strong>the</strong> DMACA code. Based on it, <strong>the</strong> food (vertices) insidea grid cell is sorted in buckets of particular gain organised intwo to three trees, <strong>for</strong> every <strong>colony</strong> separately. Thisprocedure is triggered every time food is taken by an <strong>ant</strong>:a bigger <strong>graph</strong> means more food <strong>for</strong> <strong>for</strong>aging, andconsequently more frequent calls to <strong>the</strong> procedure. Inaddition, a more densely connected <strong>graph</strong>, like bcsstk33,means a bigger two to three tree <strong>for</strong> searching and updating.All of this is maintained by every processor that executes aninstance of <strong>the</strong> DMACA.Figure 2 Observed DMACA speed-ups <strong>for</strong> <strong>the</strong> two-<strong>way</strong> and four-<strong>way</strong> partitioning task on <strong>the</strong> benchmarks <strong>graph</strong>s constrained to 1%(triangle marker) and 5% (square marker) imbalanceadd20dataukbcsstk33crackwing_nodalvibrobox4eltmempluscs4Note: Solid lines with black markers correspond to <strong>the</strong> absolute speed-up values, while <strong>the</strong> dashed lines with white markerscorrespond to <strong>the</strong> relative speed-up value.


A <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong> <strong>for</strong> <strong>the</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning 295Table 6Speed-upS a (n)S r (n)Absolute and relative speed-up obtained with <strong>the</strong> DMACAnk = 2 k = 4Best Mean ± StD Worst Best Mean ± StD Worst2 2.27 uk 1.75 ± 0.23 1.39 vibrobox 1.94 add20 1.59 ± 0.19 1.31 data4 2.56 uk 1.97 ± 0.27 1.50 bcsstk33 2.77 crack 2.09 ± 0.33 1.49 bcsstk338 3.62 uk 2.83 ± 0.51 1.74 bcsstk33 4.51 crack 3.33 ± 0.75 1.65 bcsstk3316 5.27 crack 3.84 ± 0.92 1.88 bcsstk33 6.94 crack 4.93 ± 1.39 1.65 bcsstk332 2.11 cs4 1.86 ± 0.25 1.41 bcsstk33 2.33 cs4 1.80 ± 0.32 1.25 vibrobox4 2.56 data 2.11 ± 0.31 1.42 bcsstk33 3.13 data 2.36 ± 0.44 1.53 bcsstk338 3.97 crack 3.03 ± 0.63 1.64 bcsstk33 5.26 cs4 3.78 ± 0.99 1.69 bcsstk3316 6.01 crack 4.12 ± 1.39 1.78 bcsstk33 8.26 cs4 5.62 ± 1.75 1.65 bcsstk33Note: Statistics is calculated based on <strong>the</strong> results obtained <strong>for</strong> all <strong>graph</strong> and both imbalance factors.A general observation is that <strong>the</strong> parallel per<strong>for</strong>mance of <strong>the</strong>system with respect to speed-up over <strong>the</strong> serial MACA ispoor compared to <strong>the</strong> <strong>the</strong>oretically expected speed-up of nwhen using n processors. This is to some extentexpected, since <strong>the</strong> MACA was originally developed <strong>for</strong>single-processor execution.6 ConclusionsThis paper addressed <strong>the</strong> <strong>distributed</strong> <strong><strong>multi</strong>level</strong> <strong>ant</strong>-<strong>colony</strong><strong>algorithm</strong> <strong>for</strong> <strong>multi</strong>-<strong>way</strong> <strong>graph</strong> partitioning, which is basedon <strong>the</strong> idea of parallel, independent runs enhanced withcooperation in <strong>the</strong> <strong>for</strong>m of a solution exchange among <strong>the</strong>concurrent searches. Driven by <strong>the</strong> primary goals of <strong>the</strong>parallel computation, <strong>the</strong> objective of <strong>the</strong> paper was not tofind <strong>the</strong> optimum solution in terms of quality, but to findreasonably good solutions in shorter computation times.The experimental evaluation on a two-<strong>way</strong> and four-<strong>way</strong>partitioning of benchmark <strong>graph</strong>s, using an eight-nodecluster with <strong>distributed</strong> memory, showed that <strong>the</strong> <strong>distributed</strong><strong>algorithm</strong> can obtain <strong>the</strong> same quality as <strong>the</strong> sequential<strong>algorithm</strong>, while reducing <strong>the</strong> overall computation time.A high degree of <strong>graph</strong> connections can noticeablydegrade <strong>the</strong> parallel per<strong>for</strong>mance of <strong>the</strong> <strong>distributed</strong><strong>algorithm</strong> in terms of speed-up. This is mainly because of<strong>the</strong> computationally demanding updates in <strong>the</strong> memorystructures used by <strong>the</strong> bucket sorting procedure andmaintained by every processors on all levels. Consequently,<strong>the</strong> bucket sorting procedure combined with <strong>the</strong> <strong><strong>multi</strong>level</strong>process can result in high time consumption.Since <strong>the</strong> proposed <strong>distributed</strong> implementation suffersfrom increased communication and local memory updates,as initially discussed by Tashkova et al. (2008), a logicaland possible fur<strong>the</strong>r step will be to test a correspondingshared-memory implementation.ReferencesAlpert, C.J. and Kahng, A.B. (1995) ‘Recent directions innetlist partitioning: a survey’, Integration, Vol. 19, Nos. 1–2,pp.1–81.Bahreininejad, A., Topping, B.H.V. and Khan, A.I. (1996) ‘Finiteelement mesh partitioning using neural networks’, Advancesin Engineering Software, Vol. 27, Nos. 1–2, pp.103–115.Baños, R., Gil, C., Ortega, J. and Montoya, F.G. (2003)‘Multilevel heuristic <strong>algorithm</strong> <strong>for</strong> <strong>graph</strong> partitioning’,Lecture Notes in Computer Science, Vol. 2611, pp.143–153.Barnard, S.T. and Simon, H.D. (1994) ‘Fast <strong><strong>multi</strong>level</strong>implementation of recursive spectral bisection <strong>for</strong> partitioningunstructured problems’, Concurrency and Computation:Practice and Experience, Vol. 6, No. 2, pp.101–117.Barr, R.S. and Hickman, B.L. (1993) ‘Reporting computationalexperiments with parallel <strong>algorithm</strong>s: issues, measures, andexperts’ opinion’, ORSA Journal on Computing, Vol. 5,No. 1, pp.2–18.Bayer, R. and McCreight, E.M. (1972) ‘Organization andmaintenance of large ordered indexes’, Acta In<strong>for</strong>matica,Vol. 1, No. 3, pp.173–189.Bergmann, B. and Hommel, G. (1988) ‘Multiplehypo<strong>the</strong>senprüfung – <strong>multi</strong>ple hypo<strong>the</strong>ses testing’, inBauer, P., Hommel, G. and Sonnemann, E. (Eds.):Improvements of General Multiple Test Procedures <strong>for</strong>Redund<strong>ant</strong> Systems of Hypo<strong>the</strong>ses, pp.100–115,Springer-Verlag.Bichot, C-E. (2007) ‘A new method, <strong>the</strong> fusion fission, <strong>for</strong> <strong>the</strong>relaxed k-<strong>way</strong> <strong>graph</strong> partitioning problem, and comparisonswith some <strong><strong>multi</strong>level</strong> <strong>algorithm</strong>s’, Journal of Ma<strong>the</strong>maticalModelling and Algorithms, Vol. 6, No. 3, pp.319–344.Dorigo, M. (1992) ‘Optimization, learning and natural <strong>algorithm</strong>s’,PhD <strong>the</strong>sis, Dipartimento di Elettronica, Politecnico diMilano, Milan, Italy.Dorigo, M. and Stützle, T. (2004) Ant Colony Optimization,The MIT Press, Cambridge, MA.Fiduccia, C.M. and Mat<strong>the</strong>yses, R.M. (1982) ‘A linear timeheuristic <strong>for</strong> improving network partitions’, Proceedings of<strong>the</strong> 19th IEEE Design Automation Conference, pp.175–181,Las Vegas, NV.Flyn, M.J. (1972) ‘Some computer organization and <strong>the</strong>ireffectiveness’, IEEE Transactions on Computers, Vol. 21,No. 9, pp.948–960.Garey, M.R., Johnson, D.S. and Stockmeyer, L. (1974) ‘Somesimplified NP-complete problems’, Proceedings of <strong>the</strong> 6thAnnual ACM Symposium on Theory of Computing, pp.47–63,Seattle, WA.Hendrickson, B. and Leland, R. (1995) ‘A <strong><strong>multi</strong>level</strong> <strong>algorithm</strong> <strong>for</strong>partitioning <strong>graph</strong>s’, Proceedings of <strong>the</strong> Supercomputing ‘95,San Diego, CA.


296 K. Tashkova et al.Jain, A.K., Murthy, M.N. and Flynn, P.J. (1999) ‘Data clustering:a review’, ACM Computing Surveys, Vol. 31, No. 3,pp.264–323.Kad luczka, P. and Wala, K. (1995) ‘Tabu search and genetic<strong>algorithm</strong>s <strong>for</strong> <strong>the</strong> generalized <strong>graph</strong> partitioning problem’,Control and Cybernetics, Vol. 24, No. 4, pp.459–476.Karypis, G. and Kumar, V. (1998) ‘Multilevel k-<strong>way</strong> partitioningscheme <strong>for</strong> irregular <strong>graph</strong>s’, Journal of Parallel andDistributed Computing, Vol. 48, No. 1, pp.96–129.Kaveh, A. and Shojaee, S. (2008) ‘Optimal domain decompositionvia p-median methodology using ACO and hybrid ACGA’,Finite Elements in Analysis and Design, Vol. 44, No. 8,pp.505–512.Kernighan, B.W. and Lin, S. (1970) ‘An efficient heuristicprocedure <strong>for</strong> partitioning <strong>graph</strong>s’, The Bell System TechnicalJournal, Vol. 49, No. 2, pp.291–307.Korošec, P., Šilc, J. and Robič, B. (2004) ‘Solving <strong>the</strong>meshpartitioning problem with an <strong>ant</strong>-<strong>colony</strong> <strong>algorithm</strong>’,Parallel Computing, Vol. 30, Nos. 5–6, pp.785–801.Langham, A.E. and Gr<strong>ant</strong>, P.W. (1999) ‘Using competing <strong>ant</strong>colonies to solve k-<strong>way</strong> partitioning problems with <strong>for</strong>agingand raiding strategies’, Lecture Notes in Computer Science,Vol. 1674, pp.621–625.Randall, M. and Lewis, A. (2002) ‘A parallel implementation of<strong>ant</strong> <strong>colony</strong> optimization’, Journal of Parallel and DistributedComputing, Vol. 62, No. 9, pp.1421–1432.Schloegel, K., Karypis, G. and Kumar, V. (2003) ‘Graphpartitioning <strong>for</strong> high per<strong>for</strong>mance scientific simulation’, inDongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy, K.,Torczon, L. and White, A. (Eds.): Sourcebook of ParallelComputing, pp.491–541, Morgan Kaufmann Publishers,San Francisco, CA.Simon, H.D. (1991) ‘Partitioning of unstructured problems <strong>for</strong>parallel processing’, Computing Systems in Engineering,Vol. 2, Nos. 2–3, pp.135–148.Soper, A.J., Walshaw, C. and Cross, M. (2000) ‘A combinedevolutionary search and <strong><strong>multi</strong>level</strong> approach to <strong>graph</strong>partitioning’, Proceedings of <strong>the</strong> Genetic and EvolutionaryComputation Conference (GECCO 2000), pp.674–681,Las Vegas, NV.Stützle, T. (1998) ‘Parallelization strategies <strong>for</strong> <strong>ant</strong> <strong>colony</strong>optimization’, Lecture Notes in Computer Science, Vol. 1498,pp.722–731.Tashkova, K., Korošec, P. and Šilc, J. (2008) ‘A <strong>distributed</strong><strong><strong>multi</strong>level</strong> <strong>ant</strong> colonies approach’, In<strong>for</strong>matica, Vol. 32,No. 3, pp.307–317.Toril, M., Molina-Fernández, I., Wille, V. and Walshaw, C. (2010)‘Analysis of heuristic <strong>graph</strong> partitioning methods <strong>for</strong> <strong>the</strong>assignment of packet control units in GERAN’, WirelessPersonal Communications, in press, doi: 10.1007/s11277-010-9963-1.Ucar, D., Neuhaus, I., Ross-MacDonald, P., Til<strong>for</strong>d, C.,Parthasarathy, S., Siemers, N. and Ji, R-R. (2007)‘Construction of a reference gene association network from<strong>multi</strong>ple profiling data: application to data analysis’,Bioin<strong>for</strong>matics, Vol. 23, No. 20, pp.2716–2724.Walshaw, C. and Cross, M. (2001) ‘Mesh partitioning: a <strong><strong>multi</strong>level</strong>balancing and refinement <strong>algorithm</strong>’, SIAM Journal onScientific Computing, Vol. 22, No. 1, pp.63–80.Wilcoxon, F. (1945) ‘Individual comparisons by ranking methods’,Biometrics Bulletin, Vol. 1, No. 6, pp.80–83.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!