Parallel Combinatorial Optimisation for Finding Ground ... - EPCC

epcc.ed.ac.uk

Parallel Combinatorial Optimisation for Finding Ground ... - EPCC

Parallel Combinatorial Optimisation for Finding GroundStates of Ising Spin GlassesPeter Alexander FosterMSc in High Performance ComputingThe University of EdinburghYear of Presentation: 2008


AbstractThis dissertation deals with the Ising spin glass ground state problem. An exact approach tothis optimisation problem is described, based on combining the Markov chain framework withdynamic programming. Resulting algorithms allow ground states of the aperiodic k 2 -spin latticeto be computed in O ( k 2 2k) time, which is subsequently improved to O ( k 2 2 k) , thus resemblingtransfer matrix approaches. Based on parallel matrix/vector multiplication, cost optimal parallelalgorithms for the message passing architecture are described, using collective or alternativelycyclic communications. In addition, a parallel realisation of the Harmony Search heuristic isdescribed. The implementation of both exact and heuristic approaches using MPI is detailed, asis an application framework, which allows spin glass problems to be generated and solved.Dynamic programming codes are evaluated on a small-scale AMD Opteron based SMPsystem and a large-scale IBM P575 based cluster, HPCx. On both systems, parallel efficienciesabove 90% are obtained on 16 and 256 processors, respectively, when executing the O ( k 2 2k)algorithm on problem sizes ≥ 14 2 spins. For the improved algorithm, while computationallyless expensive, scalability is considerably diminished. Results for the parallel heuristic approachsuggest marginal improvements in solution accuracy over serial Harmony Search, under certainconditions. However, the examined optimisation problem appears to be a challenge to obtainingnear-optimum solutions, using this heuristic.i


AcknowledgementsI sincerely thank my project supervisor, Dr. Adam Carter for guidance throughout the project,and for commenting on this dissertation prior to submitting it.In addition, I am grateful for funding awarded by the Engineering and Physical Sciences ResearchCouncil.iii


Table of Contents1 Introduction 12 The Spin Glass 32.1 Introduction to magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Modelling magnetic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.1 Spin interaction models . . . . . . . . . . . . . . . . . . . . . . . . . 52.2.2 Spin models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 The Ising spin glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Computational Background 133.1 Ising spin glass ground states and combinatorial optimisation . . . . . . . . . . 133.1.1 Approximate approaches for determining ground states . . . . . . . . . 153.1.2 Exact methods for determining ground states . . . . . . . . . . . . . . 193.2 A dynamic programming approach to spin glass ground states . . . . . . . . . 213.2.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.2 Ising state behaviour as a Markov chain . . . . . . . . . . . . . . . . . 223.2.3 The ground state sequence . . . . . . . . . . . . . . . . . . . . . . . . 233.2.4 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2.5 An order-n Markov approach to determining ground states . . . . . . . 274 Parallelisation Strategies 314.1 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.1 Harmony search performance . . . . . . . . . . . . . . . . . . . . . . 324.1.2 Existing approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.1.3 Proposed parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 364.2 Dynamic programming approaches . . . . . . . . . . . . . . . . . . . . . . . . 384.2.1 First-order Markov chain approach . . . . . . . . . . . . . . . . . . . . 394.2.2 Higher-order Markov chain approach . . . . . . . . . . . . . . . . . . 43v


viTABLE OF CONTENTS5 The Project 455.1 Project description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.1.1 Available resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2 Project preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2.1 Initial investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.2.2 Design and implementation . . . . . . . . . . . . . . . . . . . . . . . 475.2.3 Implementation language and tools . . . . . . . . . . . . . . . . . . . 485.2.4 Choice of development model . . . . . . . . . . . . . . . . . . . . . . 495.2.5 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2.6 Risk analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.2.7 Changes to project schedule . . . . . . . . . . . . . . . . . . . . . . . 515.2.8 Overview of project tasks . . . . . . . . . . . . . . . . . . . . . . . . 516 Software Implementation 536.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2 Implementation overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3 Source code structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3.1 Library functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . 556.3.2 Client functionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 Performance Evaluation 697.1 Serial performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 697.1.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 707.1.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.2 Parallel performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.2.1 Dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . 767.2.2 Harmony search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888 Conclusion 998.1 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 998.1.1 Algorithmic approaches . . . . . . . . . . . . . . . . . . . . . . . . . 998.1.2 Existing code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.1.3 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 1018.2 Project summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1028.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102A Project Schedule 103B UML Chart 105


TABLE OF CONTENTSviiC Markov Properties of Spin Lattice Decompositions 107C.1 First-order property of row-wise decomposition . . . . . . . . . . . . . . . . . 107C.2 Higher-order property of unit spin decomposition . . . . . . . . . . . . . . . . 108D The Viterbi Path 111D.1 Evaluating the Viterbi path in terms of system energy . . . . . . . . . . . . . . 111E Software usage 113F Source Code Listings 115


List of Figures2.1 Types of spin interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Graphs of spin interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Frustrated systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4 Subsystems and associated interaction energy . . . . . . . . . . . . . . . . . . 102.5 Clamping spins to determine interface energy. . . . . . . . . . . . . . . . . . . 103.1 Computing total system energy from subsystem interactions . . . . . . . . . . 143.2 Example first-order Markov chain with states a, b, c . . . . . . . . . . . . . . . 223.3 Illustrating the principle of optimality. Paths within the dashed circle are knownto be optimal. Using this information, optimal paths for a larger subproblem canbe computed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4 Sliding a unit-spin window across a lattice . . . . . . . . . . . . . . . . . . . . 284.1 Using parallelism to improve heuristic performance . . . . . . . . . . . . . . . 324.2 Conceptual illustration of harmony search behaviour within search space . . . . 334.3 Parallelisation strategies for population based heuristics . . . . . . . . . . . . . 344.4 Harmony search parallelisation scheme . . . . . . . . . . . . . . . . . . . . . 374.5 Graph of subproblem dependencies for an n = 3, m = 2 spin problem . . . . . . 404.6 Parallel matrix operations. Numerals indicate order of vector elements. . . . . . 415.1 Spin glass structure design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2 Software framework design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.1 Functions provided by spinglass.c . . . . . . . . . . . . . . . . . . . . . . . . 576.2 Schematic of operations performed by get optimal prestates() (basic dynamicprogramming, collective operations). In contrast, when using cyclic communications,processes evaluate different configurations of row i−1, shifting elementsin minPath. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3 Sliding window for improved dynamic programming . . . . . . . . . . . . . . 65ix


xLIST OF FIGURES6.4 Schematic of operations performed by get optimal prestates() (improved dyanamicprogramming), executed on four processors. The problem instance is a 2×2 spinlattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 677.1 Execution times for serial dynamic programming (basic algorithm) . . . . . . . 707.2 Log execution times for serial dynamic programming (basic algorithm) . . . . 717.3 Execution times for serial dynamic programming (improved algorithm) . . . . 727.4 Log execution times for serial dynamic programming (improved algorithm) . . 727.5 Memory consumption for serial dynamic programming (basic algorithm) . . . . 737.6 Log memory consumption for serial dynamic programming (basic algorithm) . 747.7 Memory consumption for serial dynamic programming (improved algorithm) . 757.8 Log memory consumption for serial dynamic programming (improved algorithm) 757.9 Parallel execution time for dynamic programming (basic algorithm, Ness) . . . 787.10 Parallel efficiency for dynamic programming (basic algorithm, Ness) . . . . . . 787.11 Vampir trace summary for dynamic programming (basic algorithm, Ness) . . . 797.12 Parallel execution time for dynamic programming (basic algorithm, cyclic communications,Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807.13 Parallel efficiency for dynamic programming (basic algorithm, cyclic communications,Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.14 Vampir trace summary for dynamic programming (basic algorithm, cyclic communications,Ness) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817.15 Parallel execution time for dynamic programming (improved algorithm, Ness) . 827.16 Parallel efficiency for dynamic programming (improved algorithm, Ness) . . . 837.17 Vampir trace summary for dynamic programming (improved algorithm, Ness) . 837.18 Parallel execution time for dynamic programming (basic algorithm, HPCx) . . 847.19 Parallel efficiency for dynamic programming (basic algorithm, HPCx) . . . . . 857.20 Parallel execution time for dynamic programming (basic algorithm, cyclic communications,HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.21 Parallel efficiency for dynamic programming (basic algorithm, cyclic communications,HPCx) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.22 Parallel execution time for dynamic programming (improved algorithm, HPCx) 877.23 Parallel efficiency for dynamic programming (improved algorithm, HPCx) . . . 877.24 Summary of parallel efficiencies on HPCx . . . . . . . . . . . . . . . . . . . . 887.25 Conceptual representation of properties relevant to parallel performance . . . . 897.26 Parallel harmony search convergence durations (ZONEEXBLOCK= 100) . . . 917.27 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100) . . . . . 917.28 Parallel harmony search convergence durations (ZONEEXBLOCK= 1000) . . . 927.29 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 1000) . . . . 937.30 Parallel harmony search convergence durations (ZONEEXBLOCK= 10000) . . 94


LIST OF FIGURESxi7.31 Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100000) . . . 947.32 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100) . 957.33 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 1000) 967.34 Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 10000) 96A.1 Project schedule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104B.1 UML class diagram of source code module and header relationships . . . . . . 106


List of Tables5.1 Identified project risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507.1 Mean error µ e , standard error σ e and error rate ɛ e of serial harmony searchground states for increasing solution memory NVECTORS. Results are basedon the ground truth value −30.7214. Error rate is defined as the amount of correctlyobtained ground state configurations over the total amount of algorithminvokations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.2 Serial execution times for basic dynamic programming on Ness, for variousGCC 4.0 optimisation flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 767.3 Serial execution times for basic dynamic programming on HPCx, for variousxlc optimisation flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 777.4 Results for parallel basic dynamic programming on HPCx using 32 processors,for combinations of user space (US) or IP communications in conjunction withthe bulkxfer directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77xiii


Chapter 1IntroductionThis dissertation describes aspects concerned with obtaining solutions to an optimisation problem,namely finding ground states of the Ising spin glass. Attention is given to parallel approaches,their implementation, and their performance.The first half of this work is devoted to theoretical aspects: The Ising spin glass is a modelrelevant to statistical physics and other fields. In Chapter 2, the origins of this model are described.The relation is drawn between between the project’s physical background and theaforementioned optimisation problem. The Ising spin glass is but one possibility of modellingmaterials exhibiting glass-like properties; Chapter 2 also exposes its relation to more involvedmodels. In Chapter 3, the theoretical background of optimisation is examined. Existing approachesare reviewed. The two approaches bearing significance to undertaken practical workare detailed, namely dynamic programming and the harmony search heuristic. Parallelisationstrategies are described in Chapter 4, based on dynamic programming and harmony search.Having examined theoretical aspects, practical aspects are then considered: Chapter 5 describeswork relevant to project organisation. It includes a description of the project’s objectivesand identified risks. This chapter is relevant to practical work undertaken during the project. Asa result of practical work, implemented software is described in Chapter 6. Software functionalityis detailed, in addition to implemented libraries and the source code’s structure. In Chapter7, the implemented codes are evaluated. Experimental procedures are described, alongside parametersused for testing. Results are presented and interpreted. Finally, Chapter 8 concludesthe work. The project’s objectives are reviewed in relation to undertaken practical work. Also,possibilities for further work are explored.1


Chapter 2The Spin Glass2.1 Introduction to magnetic systemsThe phenomenon of magnetism is ubiquitously harnessed in modern technology; it cruciallyunderpins many applications in areas such as automotive engineering, information processingand telecommunications. While known since antiquity, the scientific process has enabled anincreasingly accurate understanding of magnetic phenomena. In current research, investigatingthe magnetic properties of physical systems remains of great interest in the field of condensedmatter physics. One physical system, the spin glass, is the subject of such investigations. Itforms the background of work undertaken during the course of this project.Given a physical system, it is possible to characterise its magnetic properties by examiningthe relation between interactions occurring between internal subsystems, and the system’s externalmagnetic moment. The system’s external magnetic moment is a manifestation of theseinteractions. More generally, all externally observable magnetic properties are the result of individualsubsystems’ properties. This concept is applicable both to microscopic and macroscopicsystems, for single or multiple subsystems: As an extreme case, one might consider a singleelectron a system, as it possesses an intrinsic magnetic moment. In contrast, the interactionswithin a three dimensional crystalline solid, for example are considerably complex and motivatecurrent investigations. This complexity is chiefly due to magnetic interactions at atomicscale.At atomic level, the electron effects magnetism not only as a result of its intrinsic field,but also as a consequence of its orbital motion. The former is associated with binary state,known as spin, which describes the particle’s internal angular momentum. It is spin whichdetermines the direction of the electron’s intrinsic magnetic moment. In contrast, orbital motioncontributes towards the particle’s external angular momentum, since it describes the particle’smovement about the nucleus. Atomic magnetic fields depend both on orbital configuration andspin alignment, where each electron contributes towards the atom’s net magnetic moment.3


4 Chapter 2. The Spin GlassIn general, an electron’s state is governed by quantum properties, which are subject to thePauli exclusion principle [31]. This asserts that for any fermion, such as the electron, particlesmay not assume identical quantum state simultaneously. This has important consequences forthe spin configuration of interacting electrons and therefore influences the magnetic propertiesof multiatomic systems.The first implication of the exclusion principle is that for two electrons possessing identicalorbital movement, spins must antialign to satisfy state uniqueness. Consequentially, theelectrons’ intrinsic magnetic moments antialign, causing net cancellation of these fields for theparticle pair.The second implication relates to minimising a system’s energy: For interacting electronswith different orbital motion, the Pauli exclusion principle states that parallel spin alignment willbe favoured, since it guarantees that orbital movement remains disjoint. Because of electrostaticrepulsion, decreasing proximity between electrons lowers the system’s energy. It is this relationwhich allows certain materials to retain a magnetic field, the result of a surplus of aligned spinsopposed to disordered spin configuration, in a favourable energetic state.It turns out that the difficulty in determining a system’s magnetic properties stems from thecomplexity of spin interactions: The structure of a specified material may be irregular, resultingin differing ranges between electron orbitals. The type of atomic bonds and electron configurationspresent in the material is also influential, since these influence the orbital energy ofelectrons. It was previously mentioned that a system’s energy is sought to be minimised. Thisenergy depends on the proximity in which interactions occur and hence behaves characteristicallyfor the examined system.The energy associated with spin interaction is expressed exactly in the so-called exchangeenergy, first formulated by Heisenberg [38] and Dirac [20]. Based on consequences of the Pauliexclusion principle for the wavefunction of a system consisting of multiple fermions, the systemwavefunction is defined for combinations of aligned and antialigned spins. These wavefunctionsare then used to compute the exchange energy∫J = 2Ψ ∗ 1 (r 1)Ψ ∗ 2 (r 2)V I (r 1 , r 2 )Ψ 2 (r 1 )Ψ 1 (r 2 ) dr 1 dr 2where Ψ 1 , Ψ 2 are wavefunctions of interacting particles with locations r 1 , r 2 on the real line andV I is the interaction energy.Using eigenanlysis ∗ , it is furthermore possible to express the contribution towards the system’sHamiltonian arising from spin interaction, which depends on J and the spin operands s 1 ,s 2 for a pair of spins:−J (s 1 · s 2 ) (2.1)∗ An explanation is given by Griffiths [31]


2.2. Modelling magnetic systems 5↑↑↑↓(a) Ferromagnetic(b) AntiferromagneticFigure 2.1: Types of spin interactionThis object is of fundamental importance for describing the interaction energy of large systems,since these may be described in terms of their underlying interacting subsystems. It isemployed simplified in models such as the Ising model [47] used in this project. The interactionvariable J is comprehensively known as the coupling constant. Although it assumes a positivereal value for spin interactions where parallel alignment is favoured, it is important to note thatantiparallel alignment is also favoured in many materials. Bearing this in mind, positive J areassociated with ferromagnetic coupling, whilst negative J are associated with antiferromagneticcoupling. Figures 2.1(a), 2.1(b) illustrate these interactions.2.2 Modelling magnetic systemsAs currently described, the simplest type of magnetic interaction is expressed by defining twofundamental operands and an associated coupling constant. Together with the coupling constant,these fundamental operands are evaluated using an interaction operator. The operandsare commonly spins, whose state may be described using either a unit vector or an integer, forexample.2.2.1 Spin interaction modelsBecause spin coupling is a symmetric relation, it is possible to describe interactions occurringamongst multiple spins by considering the set E ⊆ { {s i , s j } | s i , s j ∈ S, i j } of pairwise bondsamongst spins in a spin set S , given the weight function w : {s i , s j } → R. This corresponds toan undirected weighted graph. In the graph, the absence of the edge between two spins s k , s lis equivalent to the zero coupled edge w ({s k , s l }) = 0. An example of such a graph is shownin Figure 2.2(a). Given this general case of an undirected graph, there are three specialisationswhich have been used extensively to investigate the properties of magnetic systems consistingof many spins.In terms of spin interactions, a comparatively involved model is the so-called Axial NextNearest Neighbour Interaction (ANNNI) model. Here, spins are arranged conceptually as alattice in Euclidean n-space, with bond edges defined between neighbouring spins along eachdimension. In addition to these bonds, interactions for each spin are extended in a ‘next spinbut one’ fashion along each dimensions. That is, interactions are defined by conducting a walk


6 Chapter 2. The Spin Glass↑↑↓↑↓↓↑↓↑↑↓↑↑↑↑↓↑↓↓↓↓↑↑↓(a) General undirected case(b) ANNNI model(c) EA modelFigure 2.2: Graphs of spin interactionsof length l ≤ 2 along the lattice in each dimension, given an initial node. A spin thereforeinteracts with n ≤ 4d partner spins, as displayed in Figure 2.2(b). This model has been employedextensively in research [57, 17, 56].If the ANNNI model is modified by extending the length of the walk to infinity in arbitrarydirection, the graph defined by spin interactions E becomes fully connected: E ={{si , s j }|s i , s j ∈ S, i j } . This realisation of lattice interactions is known as the Sherrington-Kirkpatrick model [58], whose Hamiltonian is equal to∑H = − J i j s i · s j .(i, j)Here, the notation ∑ (i, j) indicates the sum over all spin interactions, as described. The Sherrington-Kirkpatrick model is employed by Parisi [54] for the purpose of exploring transition propertiesof magnetisation, using an approach known as mean field theory.Given that spin interactions occur over short range, an elementary approach to representing asystem considers only nearest neighbour interactions between spins. In a two dimensional latticemodel, the graph of spin interactions becomes is then defined as E = { {s i , s j }|s i , s j ∈ S, d(s i , s j ) = 1 } ,where d(s i , s j ) is the block distance between spins s i , s j . This is illustrated in Figure 2.2(c). TheHamiltonian of such a system is∑H = − J i j s i · s j〈i, j〉where the notation ∑ 〈i, j〉 indicates the sum over nearest neighbour spin interactions. Due toEdwards and Anderson [22], this model is the subject of work undertaken during the course ofthis project.BondsThe exchange energy between two spins is governed by the magnitude of the coupling constantJ. When dealing with multiple interactions, these bond strengths are often selected from aprobability distribution. This distribution is a continuous uniform or Gaussian distribution for


2.2. Modelling magnetic systems 7many modelling purposes [58, 60, 52]. When dealing with the Sherrington-Kirkpatrick model,the exchange energy distribution often includes the property of exponential decay over spindistance [12]. Another commonly used distribution [36, 21] permits only coupling constantsJ ∈ {1, −1}, such that both values are equally probable.Other distributions have also been employed for defining coupling constants, such as thetwin peaked Gaussian [15]. Ermilov et al. [23] provide an investigation of the implications forinteractions with arbitrarily distributed bonds. In this project, the equally distributed variant ofspin coupling is considered.2.2.2 Spin modelsAs with the approaches to modelling spin interaction, the spin object itself may be modelledto varying levels of of complexity. Most realistically, in a quantum Heisenberg model, eachspin is described by its quantum state in three dimensions, so that the Hamiltonian for a twodimensional Edwards-Anderson model becomes †H = − 1 ∑J x σi x 2σx j + J yσ y i σy j + J zσ z i σz j〈i j〉where σk x, σy k , σz k are Pauli matrices corresponding to spin s k.Alternatively, a classical Heisenberg formulation is also possible, as employed by Ding[19], Kawamura [42]: Here, spins are represented as thee-dimensional real-valued unit vectors,so that exchange energy between spins s i , s j is calculated by means of the inner vector product,as described in Equation 2.1. A simplification achieved by discretising spin state exists in theso-called Potts model [63]. Here, a spin may assume a state s i ∈ {1, . . . k} where k is the totalnumber of states. The Hamiltonian of a system of spins with nearest-neighbour interaction isexpressed aswith θ(s i ) = 2πs i /k.∑H = − J i j cos ( θ(s i ) − θ(s j ) )〈i, j〉The Potts model may be simplified further, achieved when considering the case of the modelwhen k = 2: Define the Potts correlation function γ(s i , s j ) = cos ( θ(s i ) − θ(s j ) ) . Given thatθ : {1, 2} → {π, 2π}, the mapping⎧γ ′ ⎪⎨ 1, s i = s j(s i , s j ) =⎪⎩ −1, s i s jis a sufficient definition for the correlation function in the described case. Alternatively, γ(s i , s j ) =† cf. Baxter [8]


8 Chapter 2. The Spin Glasss i s j , with s i , s j ∈ {1, −1}. This leads to the definition of system energy as∑H = − J i j s i s j ,〈i, j〉with s i , s j ∈ {−1, 1}.When combined with nearest neighbour interactions and constant J, this archetypal modelof spin interaction is known as the Ising model [11]. As formulated, in the Ising model, aspin’s state effects an exchange energy, whose sign is inverted if the spin’s neighbour’s assumesopposing state. In this respect, the model spin object is an abstraction of electron state whichdiscards the consequences of orbital movement , considering only intrinsic angular momentum.While comparatively restrictive, an adaptation of the Ising model has been the subject ofintense research in its originating field of statistical physics [8]. In addition to certain applicationsin investigating the behaviour of neural networks [4] and biological evolution [49], thismodel has proven popular in examining the properties of materials in the field of condensedmatter physics [26]. One application involves investigating the properties of materials collectivelyknown as spin glasses. These possess distinctive properties, which are described in thefollowing.2.3 The Ising spin glassSpin glasses are substances which are characterised by structural disorder. This is the case forchemical glasses or certain types of dilute metal alloys. These materials possess highly irregularmicroscopic structure, which has implications for magnetic interactions between ions. Inparticular, disorder results in a distribution of ferromagnetic and antiferromagnetic interactions,which are the origin of the phenomenon known as frustration.The dynamics of spin glasses are such that there exists a critical phase transition temperature,above which the system behaves like a conventional paramagnet or ferromagnet. Belowthe transition temperature however, a magnetic disorder manifests itself, called the spin glassphase. This magnetic disorder is responsible for the system’s unique behaviour.Frustration, the second component to characteristic behaviour, arises when a system’s energeticallyoptimal state is the result of combined interactions which cannot individually assumeoptimum state. Instead, the global optimum requires certain interactions to be suboptimal. Dependingon the constituent interactions, this may imply that there exist multiple state configurationswhich yield the energetic optimum.An example of this principle is shown in Figure 2.3(a). Here, three Ising spins s 0 , s 1 , s 2 ∈{1, −1} interact in a triangular lattice. Because bonds are not consistently ferromagnetic, it isapparent that some interactions require spins with opposing orientation, to be optimal. This isthe case for the antiferromagnetic bond between spins s 1 , s 2 . For either optimal configuration of


2.3. The Ising spin glass 9↓↓↓↑↑s 0s 1s 2(a) Three spins↑↑(b) Four spin ‘plaquette’Figure 2.3: Frustrated systemsthe spin pair it is not possible however, to set s 0 so that optimality of the remaining interactionsis satisfied. Similarly, when evaluating the system commencing with pairs s 0 , s 1 or s 0 , s 2 , it isnot possible to set the remaining spin so that all interactions are satisfied. It follows that thereexists no configuration of this system in which all interactions are optimal.In the n-dimensional lattice Ising realisation of a spin glass, the smallest structure capableof exhibiting frustration is shown in Figure 2.3(b). Considering all 2 4 combinations of positiveand negative coupling constants, it can be seen that frustrated interactions occur for odd numbersof antiferromagnetic or ferromagnetic bonds. For larger systems, it is possible to analysefrustration by decomposing the lattice into subsystems of this kind. In this context, the squaresubstructure is termed a plaquette.Uses of the Ising spin glassThe extent to which the Ising model departs from a realistic representation of magnetic phenomenawas previously described. Although the model’s accuracy presents a disadvantage, itscomparative simplicity lends itself to certain analytical advantages: These advantages are basedon the fact that the ‘state space’ of a single spin is small, which has consequences for evaluatingsets of spin systems. Also, since spins interact only over nearest neighbour boundaries, it istrivial to ‘decompose’ a system into its constituent subsystems, should this be required. Usingsuch a scheme, total exchange energy is the sum of internal subsystem energy and subsysteminteraction energy (Figure 2.4). This approach is employed in analytical methods described infollowing chapters.For experimental purposes, it is of interest to examine computationally the behaviour ofvarious realisations of spin glasses. As spin glasses are thermodynamic systems, knowledgeof ground state energy is of particular importance towards this aim. Formally, given an n-spinsystem where S = {s 0 , s 1 , . . . , s n−1 } represents some configuration of these spins,argmin H(S )S


10 Chapter 2. The Spin GlassSubsystem interaction energy↑↑↑↑↓↑↓↑↑↑↑↑Figure 2.4: Subsystems and associated interaction energy↓ ↓ ↓ ↑↑ ↑ ↑ ↑↓ ↓ ↓ ↓↑ ↓ ↑ ↑↓↑↓↑↓ ↓ ↓↑ ↓ ↓↑ ↑ ↑↓ ↑ ↓FreeInvert, clampFigure 2.5: Clamping spins to determine interface energy.is the system’s ground state. The Hamiltonian H(S ) describes the energy of system configurationS . In the case of the Ising model with real valued coupling constants, there exists a singleground state configuration, and an equivalent configuration with all spins inverted. For systemswith discrete valued coupling constants, a number of degenerate ground states may exist. Providedan algorithm for determining ground states, it may be of interest to examine the effectsystem size on ground state energy.Previous work investigates scaling with regard to a related quantity, the so-called interfaceenergy [15]. For an Ising-like model, interface energy is the absolute difference between groundstate energies, obtained when altering the model instance’s spin configuration with respect toa certain boundary condition (coupling constants are left unaltered). Figure 2.5) shows an example,again using the two dimensional lattice Ising model. Here, ground state configurationsare obtained for two experimental instances: In the first instance, the entire set of spin configurationsis considered. In the second instance, spins in the rightmost column are ‘clamped’:Their state is equal to that of the previously obtained configuration, only inverted. Enforcingthis condition in the second instance allows the behaviour of adjacent spins to be examined.A closely related aspect deals with exploring the behaviour of spin glass properties in thelimit N → ∞, where N is the system size. For certain purposes, it is beneficial to approximatethis condition by introducing periodicity into spin interactions. In the Ising model, pairsof boundary spins along dimensions with periodic boundary conditions interact in the manner


2.3. The Ising spin glass 11illustrated in Figure 2.2(b). This can easily be expressed mathematically by applying modulararithmetic to the one dimensional Ising case H = ∑ i J i s i s i+1 , requiring minor modification formodels with d > 1.In thermodynamic systems, attention must be given to the relation between macroscopic andmicroscopic properties. To this extent, an important object is the partition function, defined as∑Z(T) = e −H(S )/kT ,Swhere H(S ) is the system energy, T the absolute temperature and k the Boltzmann constant.The sum is over all (microscopic) system configurations S . Using the partition function, it ispossible to determine the probability P(S ) of a specific state asP(S ) =e−H(S)/kTZ(T)Fortunately, when examining an ensemble at T = 0K it turns out that P(S ) = 1 iff S is a groundstate configuration, otherwise P(S ) = 0. This fact has implications for computing ground stateenergies of Ising spin glasses, the subject of this project.


Chapter 3Computational BackgroundIn the previous chapter, the Ising model was introduced. System energy was described as a typeof utility function for evaluating system configurations. The problem of obtaining ground stateenergy was introduced.In this chapter, finding ground states of the Ising spin glass is approached as a combinatorialoptimisation problem. In this context, existing solutions are examined, in addition to describingtwo approaches implemented in this project, harmony search and dynamic programming. Thelatter approach is the consequence of describing spin glass interactions as a Markov chain, whichlends itself to a formulation of the most likely sequence of events in the chain, i.e. the Viterbipath [61].3.1 Ising spin glass ground states and combinatorial optimisationFormally, any instance of the Ising spin glass defines the energy function E(S ) with E : {1, −1} n →R. Here, S = (s 1 , s 2 , . . . , s n ) is an n-spin configuration, with each spin s i ∈ {1, −1}. For convenience,a notation for describing a configuration partitioned into p disjoint subsystems isintroduced as S = {S 1 , S 2 , . . . , S p }. The real valued co-domain of E(S ) corresponds to the totalsystem energy. The total system energy of a partitioned system isp∑ ∑E(S ) = E(S k ) + J i j s i s j |s i ∈ S α , s j ∈ S β ,k=1〈i, j〉where 〈i, j〉 denotes nearest neighbour Ising interactions, as described in Chapter 2. The subsystemsS α , S β are disjoint. By decomposing spin interactions occurring within the entire system,energy is expressed as the sum of subsystem energy and ‘system boundary’ energy. Defining13


14 Chapter 3. Computational BackgroundE(S )→E(S 1 , S 2 )E(S 2 , S 3 )E(S 3 , S 4 )E(S 4 , S 1 )E(S 1 )E(S 2 )E(S 3 )E(S 4 )Figure 3.1: Computing total system energy from subsystem interactionsE b (S i , S j ) as the system boundary energy between disjoint subsystems S i , S j .∑E b (S i , S j ) = J q,r s q s r |s q ∈ S i , s r ∈ S j ,〈q,r〉the total system energy can be defined asp∑ ∑E(S ) = E(S k ) + E b (S i , S j )k=1〈i, j〉where 〈i, j〉 denotes nearest neighbour interactions between subsystems, in analogy to nearestneighbour interactions between individual spins. An example of system decomposition is presentedin Figure 3.1, for a system with cyclic boundary interactions. Decomposition is relevantto approaches described in this chapter.Determining ground statesThe ground state configuration of an Ising spin glass is defined as S min = argmin S E(S ). Thedomain of the evaluated function E(S ) implies that an exhaustive search of the system’s statespace requires 2 |S | individual evaluations. Such a brute force approach might be implementedusing a depth-first traversal of the state space.Clearly, using this method is only practicable for the very smallest of problem instances, asthe search space grows exponentially with the number of spins in the system. Therefore, it isof interest to examine the possibility of restricting the search space, consequently reducing thecomplexity of obtaining solutions to problem instances.The fact that the upper bound of search space size grows exponentially, suggests that the


3.1. Ising spin glass ground states and combinatorial optimisation 15ground state problem belongs to the class of NP problems. Due to Barahona [6], it is shownthat in fact, certain cases of the problem are NP-complete, such as the two dimensional latticewhere every spin interacts with an external magnetic field, and the three dimensional latticemodel. Istrail generalises the proof of NP-completeness to any model where interactions arerepresented as a non-planar graph [16].Fortunately, planar instances of the Ising model are not guaranteed to be in NP; a polynomialtimebound is shown by Barahona for the two dimensional, finite sized model. This fact impliesthat obtaining ground states is not intractable for this case of the model, and motivates the developmentof efficient algorithms which obtain exact solutions. The latter are defined as solutionsequivalent to those generated from an exhaustive search.3.1.1 Approximate approaches for determining ground statesRegardless of NP-completeness, formulation of the ground state problem as a combinatorialoptimisation problem allows a second approach to be considered, involving the class of metaheuristicalgorithms. Although these algorithms are typically only guaranteed to search exhaustivelyas time goes towards infinity, many have been shown to produce optimal or near-optimalsolutions to a wide number of problems, provided sufficient execution time. It is therefore ofproximate interest to investigate the performance of these algorithms, in context of the Isingspin glass.By common definition, a metaheuristic is a heuristic applicable to solving a broad classof problems [28]. In practice, this is achieved by defining a set of ‘black-box’ procedures,i.e. routines specific to the problem. When dealing with combinatorial optimisation problems,these routines typically include a utility function, whose purpose it is to evaluate candidatesolutions selected from the state space. Utility is then used to compare solutions amongst oneanother.To be of practical use for problems with large state spaces, a heuristic must arrive at a solutionby considering some subset of this space its search space. The metaheuristic approach oftenachieves this by random sampling [28], which may cause the algorithm to produce suboptimalresults. To apply a metaheuristic effectively, it may therefore be necessary to evaluate performanceagainst different combinations of algorithm parameters. Generating sufficient amountsof samples may motivate parallel algorithmic approaches. Also, although it has been shownthat the performance of optimisation algorithms remains constant over the class of all optimisationproblems [62], there may be significant performance differences between algorithms whenapplied to a specific problem class. It is hence of interest to examine diverse metaheuristicapproaches in conjunction with the described optimisation problem.


16 Chapter 3. Computational BackgroundEvolutionary algorithmsOne class of metaheuristic is inspired by biological evolution. Here, a population of candidatesolutions is created and subsequently evolved in an iterative process, where individual ‘parent’solutions are selected stochastically in order to generate ‘offspring’ solutions. The process ofselection is designed to favour solutions which exhibit high ‘fitness’, the latter evaluated usinga utility function. In a further biological analogy, offspring are generated by combining solutionparameters from both parents, prior to randomised modification (mutation). These new solutionsare then added to the population, which is typically maintained in order to stay in equilibrium.The process is then repeated, terminating either on completing a specified number of iterations,or when a convergence criterion is fulfilled.Evolutionary algorithmic approaches applicable to combinatorial optimisation are knownas genetic algorithms [50]. The approach here involves representing a solution by the set ofparameters supplied to the target function as a string. After evaluating solution fitness as previouslydescribed, crossover is typically realised as a manipulation of substrings: For example,one might generate offspring as a combination of permuted substrings from parent strings. Correspondingly,mutation might be realised as a permutation of substring elements from a singlesolution. It is evident that the multitude of possibilities in which selection, crossover and mutationmay be implemented, has the potential to cause deviations in the optimisation process’performance.Genetic algorithms have been applied to the spin glass ground state problem by Gropengiesser[32], who considers two variants of the basic evolution procedure. In the first, the populationis initialised to multiple instances of a single solution, to which mutation is then applied iteratively.Using a local search heuristic, mutations conducive to lowering the system energyare accepted. In the second variant, the former regime is augmented with random parent selectionand crossover, such that every child solution replaces one of its parents. Results showthat performance is affected strongly by the method of admitting new candidate solutions to thepopulation, following mutation.As one might expect, approaches incorporating local minimisation techniques have shownto improve optimisation performance, as implemented by Hempel et al. [40], using a so-calledhybrid genetic algorithm. This is in comparison to an early investigation by Sutton [59], usinga general evolutionary approach. Houdayer and Martin [41] report good performance for theIsing model with discrete ±J bond distribution, using a Genetic Renormalisation algorithm.Here, domain specific knowledge is incorporated into the optimisation process by recursivelypartitioning the graph of spin interactions, in resemblance to the description at the beginning ofthis chapter. A local optimisation process is then applied to the partitioned system.Given the nature of the project, of special interest are methods of parallelising genetic algorithms.In the general context of evolutionary computing, Cantu-Paz [14] describes a coarse


3.1. Ising spin glass ground states and combinatorial optimisation 17grained approach known as the ‘island’ method. In the distributed memory paradigm, processesare arranged in a toroidal grid, each executing the algorithm in parallel. After each iteration,a subpopulation of local solutions is selected based on fitness, and exported to neighbouringprocesses asynchronously. As an alternative, a fine grained scheme may also be used, wherecrossover is allowed to take place between solutions residing at different processes.Simulated annealingSimulated annealing is a technique readily applicable to calculating ground states, as it is basedon the principles in statistical physics which underpin the Ising model. The technique is derivedfrom the Metropolis-Hastings algorithm [37], in which a probability distribution is sampled indirectlyby means of a first-order Markov chain. That is, the distribution of a generated sample issufficiently defined by the value of its predecessor. In simulated annealing, a candidate solutionS in the state space is associated with the probabilityP(S ) ∝ e −H(S )/(kT) ,the state probability of a canonical ensemble, which was introduced in Chapter 2.Optimisation is performed by initialising a random solution configuration and samplingproximate configurations in the state space by stochastic parameter modification: Specificallyfor the Ising model, this would involve perturbing spins by inverting their state. The new configurationis accepted if the perturbation resulted in lower system energy, otherwise the state isaccepted with probability e −∆H/(kT) where ∆H is the change in system energy. Of importance isthe value of temperature T, which is initialised to a certain value and decreased monotonicallytowards zero according to a specific annealing schedule, as the algorithm progresses.In Chapter 2, it was mentioned that as T approaches zero, P(S ) = 1 iff S is a groundstate. A consequence of this fact for the optimisation process is that if T is initialised to afinite temperature and decreased sufficiently slowly, the algorithm is guaranteed to arrive at thesystem’s globally optimal state [51]. In practice, execution time is restricted to a fraction of thatrequired for an exhaustive search, so that the annealing process becomes an approximation.Simulated annealing was first applied to the spin glass problem by Kirkpatrick, Gelatt andVecchi [44]. It is important to note that the choice of annealing schedule significantly affectsthe algorithm’s ability to arrive at an optimal solution. This is because temperature influencesthe amount of selectivity involved as state space is explored. Conversely, it follows that thesolution landscape particular to a problem usually affects the accuracy of solutions obtained bythe algorithm using a particular schedule.Ram et al. describe an approach to parallelising the algorithm [55]. Clustering simulatedannealing is based on the observation that a good initial solution typically reduces the amountof iterations required for the algorithm to converge. After executing the algorithm on multiple


18 Chapter 3. Computational Backgroundprocessing elements with different initial states, an exchange of partial results takes place todetermine the most favourable solution. This result is then redistributed to all processing elements,in order to repeat the process a set number of iterations, after which the final solution isdetermined.Harmony searchA recently developed optimisation algorithm is due to Geem [27]. Known as harmony search,this algorithm has been applied to a number of optimisation problems such as structural design[45] and data mining [25]. Harmony search can be considered an evolutionary algorithm, as itmaintains a population of candidate solutions, which compete with one another for permanencyand influence generation of successive candidates.Inspired by the improvisational process exhibited by musicians playing in an ensemble, harmonysearch iteratively evolves new solutions as a composite of of existing solutions. As withgenetic algorithms, a utility function determines whether a newly generated solution is includedin the candidate set. In addition to devising a probabilistic scheme for combining parametersfrom existing solutions, new solutions are modified according to a certain probability. This isdesigned to improve exploration of the state space, similar to genetic mutation.Formally, the algorithm defines an ordered set σ = (σ 1 , σ 2 , . . . , σ m ), of m candidate solutions,where each candidate is an n-tuple σ k = (σ k 1 , σk 2 , . . . , σk n). Algorithm parameters are thememory selection rate P mem , the so-called pitch adjustment rate P ad j and the distance bandwidthβ ∈ R. Random variables X ∈ {1, 2, . . . , m} and Y ∈ [0, 1) are also defined. Using a terminationcriterion such as the number of completed iterations, the algorithm performs the following stepson the set of initially random candidates:⎧⎪⎨ σ• Generate: σ ν iX Y ≤ P mem= (τ(1), τ(2), . . . , τ(n)) where τ(i) =⎪⎩ Random parameter value Y > P mem• Update: For 1 ≤ i ≤ n, σ ν i← σ ν i+ β iff Y ≤ P ad j• Replace:– w ← argmax w {σ w }– σ ν ← min{σ w , σ ν }– σ w ← σ νIn the first step, the algorithm generates a new candidate, whose parameters are selected atrandom both from existing solutions in the population and from a probability distribution. In afurther stochastic procedure using random variable Y, solution parameters are modified. Thisstep is of particular significance for continuous optimisation problems; it may be preferableto omit it in other cases. Finally, the population is updated by replacing its worst solution, if


3.1. Ising spin glass ground states and combinatorial optimisation 19the generated candidate is of higher utility. The process is then repeated, using the updatedpopulation.An application of harmony search to the discrete Ising ground state problem is trivial, byassigning each solution the ordered set of spins defined at the beginning of this chapter, i.e. σ k =(s 1 , s 2 , . . . , s n ). Because the set of solution parameter values is discrete and small, the effectof modifying solutions due to distance bandwidth β can be consolidated into the algorithm’s‘generation’ step. The process thus consists solely of generating and conditionally replacingexisting solutions in memory, governed by parameters m (the candidate population size) andP mem (the memory selection rate). Work undertaken for this project examines the performanceof this algorithm for finding Ising spin glass ground states.3.1.2 Exact methods for determining ground statesGraph theoretic methodsReturning to the spin glass as an exactly solvable model, it is necessary to examine the graphrepresentation of spin interactions more closely. An undirected graph G = (V, E) is describedby a set of vertices V = {v 1 , v 2 , . . . , v n } and edges E ⊆ {{v i , v j }|v i , v j ∈ V}. Given an Ising spinglass model S = {s 1 , s 2 , . . . , s n } let S = V and E = {{s i , s j }|J i j > 0} where J i j is the bondstrength between spins s i , s j . The set of vertices is partitioned into subsets S + , S − such thatS + = {s i |s i = 1}, S − = {s i |s i = −1}.Grötschel et al. [29] provide a description of a method which is the basis of algorithmsdeveloped by Barahona et al. [7]. Here, the system’s Hamiltonian is described in terms of S +and S − asH(S ) = −∑〈i, j〉∈E(S + )J i j s i s j −∑〈i, j〉∈E(S − )J i j s i s j −∑〈i, j〉∈δ(S + )J i j s i s jwhere E(T) = {{s i , s j }|s i , s j ∈ T} and δ(T) = {{s i , s j }|s i ∈ S, s j ∈ S \T}. Considering the effectof opposing spin interactions, the Hamiltonian can be rewritten asfrom which it followsH(S ) = −∑〈i, j〉∈E(S + )H(S ) +J i j −∑〈i, j〉∈S∑〈i, j〉∈E(S − )J i j = 2J i j +∑〈i, j〉∈δ(S + )∑〈i, j〉∈δ(S + )The ground state energy can now be formulated in terms of the function δ as⎧⎪⎨H min = minS + ⊆S ⎪⎩ 2∑〈i, j〉∈δ(S + )J i j −∑J i j .〈i, j〉∈SBecause the co-domain of δ consists of edges which define a cut of the graph of spin interac-J i j⎫⎪⎬⎪⎭J i j ,


20 Chapter 3. Computational Backgroundtions, i.e. a partition of nodes into two disjoint sets, obtaining ground states is now described ingraph theoretical terms as a cut optimisation: As formulated, ground state energy is expressedas the minimum cut of a weighted graph. Equivalently the problem can be formulated as amaximisation, if the signs of interaction energies are inverted.Hadlock [34] shows further that finding a maximum cut of a planar graph is equivalentto determining a maximum weighted matching of a graph, for which there exist polynomialtime algorithms. Bieche et al. [10] and Barahona [6] follow this approach, where a graph isconstructed based on interactions between spin plaquettes. A recent similar approach due toPardella and Liers [53] allows very large systems to be solved exactly.De Simone et al. employ a method known as ‘branch-and-cut’. Here, the cut optimisationproblem is initially expressed as an integer programming problem. In integer programming,the objective is to determine max { u T x|Ax ≤ b } , where the components of vector x ∈ Z nare determined subject to constraints defined by vectors a, b and matrix A. During execution,branch-and-cut specifically employs the linear relaxation of the programming problem, whereit is permitted that x ∈ R. This relaxation is combined with the branch and bound algorithm,which is invoked when a non-integral solution of x is determined. Substituting the non-integralcomponent with integers, the problem is divided using a further algorithm, which recursivelygenerates a tree of subproblems. By maintaining bounds on solution utility, it is possible toidentify partial solutions which are guaranteed to be suboptimal. Since these are not requiredto be subdivided further, the search tree is pruned. Liers et al. [46] describe the branch-and-cutalgorithm in detail, which permits tractable computation of spin glass models consisting of 50 2spins without periodic boundaries.Transfer matrixA technique applicable to various problems in statistical mechanics is the transfer matrix method[8]. The requirement is as described at the beginning of this chapter, where a system is describedin terms of adjacently interacting subsystems. Using the definition of system state probability,a matrix describing interactions is defined as A = [ ]p i j where pi j = P(Sk+1 i , |S j k), given subsystemsS k+1 , S k assuming states Sk+1 i ∈ 2S k+1, S j k ∈ 2S k. Conditional independence from othersystems is assumed, i.e. P(S i k+1 |S j k ) = P(S i k+1 |S j k , S 1, S 2 , . . . S p ). Here, the notation 2 S denotesthe set of all spin configurations of system S .By implications of conditional state probability, given an initial subsystem it is possible toevaluate the state of successive subsystems via a series of matrix multiplications. Problems suchas determining the partition function can be solved using eigenanalysis, an example of which isgiven in [15]. The transfer matrix approach due to Onsager allows the partition function of thetwo-dimensional Ising model to be formulated [39].In the next section, the framework of Markov chain theory is used to examine in detailprobabilistic interactions within the Ising spin glass. The Markov transition matrix is equivalent


3.2. A dynamic programming approach to spin glass ground states 21to the transfer matrix, hence it follows that methods for system properties are closely related.The chosen approach exposes a dynamic programming formulation of the ground state problemwith implications for further parallelisation.3.2 A dynamic programming approach to spin glass ground statesA system S is described by a set of states S = { S 1 , S 2 , . . . , S n} , for example spin configurationsS = { S i |S i ∈ 2 } S . Again, 2 S denotes the set of all system configurations. Residing in state S τ ,the system undergoes a series of non-deterministic state transitions, such that each successivesystem configuration S τ′ is determined from the assignment S τ′ = t(S σ ). The map t : 2 S → 2 Sis defined using a vector of random variables v = (v S 1, v S 2, . . . , v S n), where v S i is a randomsuccessor state the system may assume when in state S i . The probability mass function of theserandom variables is defined asf vS i(Sj ) = P ( v S i = S j |S i) .Given an initial distribution of states, it may be of interest to determine the most likely sequenceof states. For this purpose, it is useful to examine the system in terms of its Markov properties.3.2.1 Markov chainsDefine a sequence of states C = (S x 1, S x 2, . . . , S x m). The sequence is said to fulfil the firstorderMarkov property, if the value of any single state sufficiently determines the probabilitydistribution of the state’s successor in the sequence, i.e.(∀ i Sx i+1|S x ) i= P ( S x i+1|S x i, S x i−1, . . . , S x ) 1.Formulating the probabilities of state transitions in matrix form is convenient for evaluatingthe behaviour of the sequence after finite or infinite state emissions: Define the transition matrixbetween sequence elements i, i + 1 as⎡P(S 1 |S 1 ) P(S 1 |S 2 ) . . . P(S 1 |S n )M i, i+1 P(S 2 |S 1 ) P(S 2 |S 2 ) . . . P(S 2 |S n )=. ,. . .. .⎢⎣⎥⎦P(S n |S 1 ) P(S n |S 2 ) . . . P(S n |S n )where P(S τ |S σ ) denotes the probability of the emission S τ as the i + 1 th element in the chain afterthe i th emission, S σ . It follows that the probability distribution of states d ′ = [ P(S 1 ), P(S 2 ), . . . , P(S n ) ] T⎤


22 Chapter 3. Computational BackgroundP(b|a)P(c|b)a b cP(a|b)P(b|c)Figure 3.2: Example first-order Markov chain with states a, b, cafter m sequence emissions can be evaluated as⎛m∏d ′ = ⎜⎝ M⎞⎟⎠ k, k+1 d (3.1)k=1where vector d is the initial state distribution. If for all k, M k, k+1 = M k−1, k , the Markov chainis termed time-homogeneous. Such a chain may be represented by a directed, weighted graphas shown in Figure 3.2, where nodes represent states and labelled edges represent transitionprobabilities. A detailed discussion of further Markov chain properties is provided by Meynand Tweedie [48].By current definition, state emission is governed by an amount of ‘memory’, in that precedingsequence values influence state output at any given point in the sequence. The first-orderMarkov chain, where states are conditionally dependent on a single, immediate predecessor, isthe simplest instance of a Markov process.When extending the amount of chain memory, i.e. increasing the number of preceding stateswhich determine the distribution of output states, the order-n Markov chain must be considered.A generalisation of the archetypal first-order model, the distribution of an emitted state dependson n immediate predecessors in the sequence. Following the definition of the first-order model,the requirement for an order-n chain is∀ i( Sx i|S x i−1, S x i−2, . . . , S x i−n ) = P ( S x i|S x i−1, S x i−2, . . . , S x 1 ) ,i.e. knowledge of preceding n states sufficiently defines the probability of state S x iin the sequence.Both model have implications for algorithm design.3.2.2 Ising state behaviour as a Markov chainIn context of the previously described Markov model, the following approach examines Isinginteractions within the two-dimensional lattice without boundary conditions. Initially, the latticelattice is partitioned into rows, as shown in Figure 3.1. Clearly, interactions between individualrows occur in nearest-neighbour fashion, significantly along a single dimension. That is, for an


3.2. A dynamic programming approach to spin glass ground states 23n × m spin system, the partition is defined as S = {S 1 , S 2 , . . . , S n } with S i ∈ {1, −1} m , 1 ≤ i ≤ n.The energy of the system isn∑H(S i ) +i=1n∑H b (S i−1 , S i ) = H(S 1 ) +i=2n∑H(S i ) + H b (S i−1 , S i )i=2where H(S i ) is the Hamiltonian of subsystem S i and H b (S i , S j ) is the boundary energy betweensubsystems S i , S j , as previously defined.Since ∪ S i= S , the entire lattice’s state is sufficiently described by the states of its constituentrows. It is reminded that because it is a statistical mechanical model, state is probabilistic, withP(S ) ∝ e −H(S )/(kT) . Using the described partitioning scheme, it turns out that subsystem stateprobability fulfils the property of a first-order Markov chain (cf. Appendix C).3.2.3 The ground state sequenceGiven the Markov property under the chosen representation of Ising interactions, the implicationsof ground state for the chain of states (S x 11 , S x 22 , . . . , S x nn ) are next examined. Formally, theprobability P gnd of obtaining ground state energy min S ∈S {H(S )} is(P gnd ∝ exp − 1∝maxS ∈S)kT min {H(S )} S ∈S(− 1 )}kT H(S ) ,{expfrom which it is clear that P gnd must be maximised, in order to infer the ground state configuration.This configuration is given by the sequenceargmax(S 1 ,S 2 ,...,S n )⎧⎪⎨⎪⎩ P(S 1)⎫n∏ ⎪⎬P(S i |S i−1 ) ⎪⎭ ,which is the most likely sequence of emitted states in a first-order Markov chain.i=2This result is of significance for obtaining an algorithm for computing ground states, becausethere exists a well-known approach due to Viterbi [61]. The basis of the Viterbi algorithmis the observation that optimal state for the first symbol emission in the chain is simplyargmin S 1H(S 1 ). Augmenting the size of considered subproblems, optimum solutions are determinedsuccessively, until the size of the set of considered problems equals the originallyspecified problem. At this point, the optimisation is complete.The probability of the most likely sequence of emissions (S µ 11 , S µ 22 , . . . , S µ nn ), known as the


24 Chapter 3. Computational Background61210Figure 3.3: Illustrating the principle of optimality. Paths within the dashed circle are known tobe optimal. Using this information, optimal paths for a larger subproblem can be computed.Viterbi path, can be obtained from the recurrent formulation⎧⎪⎨ max S i{P (S i )} i = 1P viterbi (S i ) =⎪⎩ max S i−1{P (S i |S i−1 ) P viterbi (S i−1 )} i > 1,by evaluating max S n{P viterbi (S n )}. It follows that the actual sequence can be formulated as⎧⎪⎨ argmax Sviterbi(i) =i{P viterbi (S i )} i = 1⎪⎩ argmax S i{P viterbi (S i )} + viterbi(i − 1) i > 1,determined by evaluating viterbi(n). In this case, the ‘+’ operator denotes symbol concatenation,so that (S µ 11 , S µ 22 , . . . , S µ nn ) = S µ 11 + S µ 12 + . . . + S µ nn .It is important to note that recursive definition of the Viterbi path differs from the (suboptimal)approach of optimising every conditional probability P(S i |S i−1) individually. Instead, thepath is defined as the optimum of incremented subproblems, where subproblems are defined asoptimal. Schematically depicted in Figure 3.3, this approach is an application of the principleof optimality due to Bellman [9]. Consequentially, the Viterbi algorithm is an instance of thedynamic programming problem, recursively defined for all x ∈ X asV(x) = max {F (x, y) + γV (y)} ,y∈Γ(x)where Γ is a map and 0 ≤ γ ≤ 1 is the so-called discount factor. The function V(x) is known asthe value function, and is optimised using F(x, y).The concrete algorithm for computing the Viterbi path probability avoids the overhead andbacktracking suggested by the aforementioned recursive formulation. It involves an iterativeloop to increment the size of the considered system:opt[] := 1for i := 1 to nfor S ji∈ S ip max := − ∞


3.2. A dynamic programming approach to spin glass ground states 25for S k i−1∈ S i−1p := P ( )S ji| S k i−1* opt[k]if p > p maxoptNew[k] := popt := optNewIn the listing, S ji denotes configuration j of subsystem S i, according to previous convention.The array opt[] records the optimum path probability for preceding subsystems S 1 , S 2 , . . . , S ifor every iteration i of the algorithm. Elements of the array are initially set to unity. A secondarray optNew[] is used to store updated path probabilities, which are subsequently copied toopt[] after each iteration of the outer loop. Although the values of optimal state emissions arediscarded in this pseudocode, it is possible to retain them by storing them in an associative datastructure. An implementation of this approach is presented in Chapter 6.Examining the algorithm’s time complexity, it is apparent that execution time is proportionalto the product of the three loops’ length, since these assume nested structure. That is,t(n) ∝ n ∣ ∣ ∣2S 1∣ ∣∣ 2,where n is the number of subsystems, and 2 S 1is the set of configurations of subsystem S 1 . Itfollows that if the spin lattice has dimensions n × m, it ist(n, m) ∝ n 2 2mwhich is O ( n 2 2m) .By further observation it turns out that the Viterbi path can also be used to evaluate systemenergy (cf. Appendix D). This provides a dynamic programming solution to the two dimensionallattice without boundary conditions, which isH min (S i ) =3.2.4 Boundary conditions⎧⎪⎨ min S i{H (S i )} i = 1⎪⎩ min S i−1{H (S i ) + H b (S i , S i−1 ) + H min (S i−1 )} i > 1.(3.2)It is of interest to examine the effects of introducing cyclic boundary conditions on state optimality,using the described approach. As the latter involves partitioning the spin lattice intorows, it is possible to differentiate between energetic contributions occurring within subsystemsS 1 , S 2 , . . . , S n , and energetic contributions occurring between these. It is apparent that horizontalconditions have an effect on subsystem energy, whereas vertical conditions effect subsysteminteractions.The first effect is caused by horizontal boundary interactions, as these involve spins locatedat the outermost positions of each spin row. The Hamiltonian H(S i ) thus effectively includes an


26 Chapter 3. Computational Backgroundadditional term to account for an additional pairwise interaction. The Hamiltonian of the entirelattice is ∑ ni=1H(S i )+ ∑ ni=2H b (S i , S i−1 ), which sufficiently accounts for horizontal boundary interactionswithin the system. Since the recursive formulation of ground state energy in Equation3.2 also computes the sum of all subsystem Hamiltonians and their interactions, the existingdynamic program formulations and algorithms can be left unmodified. It follows that the algorithmiccomplexity of computing ground states does not increase for the case with cyclicboundaries along a single dimension.In contrast, the vertical cyclic boundary condition results in pairwise interactions betweensubsystems S 1 , S n , i.e. the initial and ultimate spin rows. Here, each row constituent spin s j ∈S k (k ∈ {1, n}) potentially has a non-zero bond interaction with its neighbour, s ′ j∈ S k ′(k ′ ∈{1, n} \{k}). Consequentially, The Hamiltonian for the entire lattice is given by ∑ ni=1H(S i ) +∑ ni=2H b (S i , S i−1 ) + H b (S 1 , S n ), where the latter term is the interaction energy between the twoboundary systems in question. Here, it follows that the proposed existing solution does notyield the ground state energy, as the recursive formulation does not include the additional term.Configuration optimality is therefore not guaranteed, for the case with cyclic boundaries alongboth lattice dimensions.As a modification of the original dynamic programming solution, it is conjectured that theground state configuration can be determined by evaluating the set of problem instances whereboth boundary rows are assigned spin configurations in advance, i.e.H ′ min = minS 1 , S n{H min (S n , S n , S 1 )},withH min (S n , S i , S 1 ) =⎧⎪⎨ H (S i ) + H b (S 1 , S n ) i = 1⎪⎩ min S i−1{H (S i ) + H b (S i , S i−1 ) + H min (S n , S i−1 , S 1 )} i > 1,Adapting the previous algorithm, this formulation implies that the execution time t ′ (n) ist ′ (n) ∝ ∣ ∣∣2S 1∣∣ t(n)where n is the number of subsystems, 2 S 1is the set of configurations of S 1 and t(n) is theexecution time of the previously specified algorithm. Therefore,t ′ (n, m) ∝ 2 ( m n 2 2m)∝ n 2 3mwhich is O(n 2 3m ),where the system consists of n × m spins.


3.2. A dynamic programming approach to spin glass ground states 27Proof of the conjecture is by induction. Since interactions within the system occur in aregular lattice, the two adjacent boundary subsystems can be chosen arbitrarily, so the recursiveformulation becomesH min (S j , S i , S j+1 ) =⎧⎪⎨ H (S i ) + H b (S i , S i−1 ) i = j + 1{ ( )}⎪⎩ min S i−1 H (S i ) + H b (S i , S i−1 ) + H min S j , S i−1 , S j+1 otherwise,with subsystems S 0 , S 1 , . . . , S n−1 , boundary subsystems S j , S j+1 and subsystem interactionsmod n. It follows that the ground state energy is defined asH ′ min = minS j , S j+1{Hmin(S j , S n , S j+1)}.Choosing boundary subsystems S j+1 , S j+2 the formulation further becomesH min (S j+1 , S i , S j+2 ) =⎧⎪⎨ H (S i ) + H b (S i , S i−1 ) i = j + 2{ ( )}⎪⎩ min S i−1 H (S i ) + H b (S i , S i−1 ) + H min S j+1 , S i−1 , S j+2 otherwise,which clearly is the optimal sequence of emitted states, given states S j+1 , S j+2 . As the groundstate configuration can be deduced from min S j+1 ,S j+2{Hmin(S j+1 , S n , S j+2)}, the sequence remainsoptimal also for this case. Therefore, the sequence is optimal for all j, i.e.using the notation ¯S j to denote S \S j .∀ 0≤i


28 Chapter 3. Computational BackgroundFigure 3.4: Sliding a unit-spin window across a latticeFormally, the Hamiltonian is expressed asH(S ) =nm−1 ∑i=0H b (S i , S i−1 ) + H b (S i , S i−m ),where H b (S i , S i−m ) is the interaction energy between S i and its vertical predecessor. SimilarlyH b (S i , S i−1 ) is the interaction due to horizontal predecessor S i−1 . Also, subsystem indicesare computedmod (nm), in order to evaluate interactions occurring across lattice boundaries.Here, it indeed turns out that a higher-order formulation of system state is possible (cf. AppendixC), namelyP(S ) =nm−1 ∏i=0P (S i |S i−1 , S i−2 , . . . , S i−m−1 ) ,from which ground state probability can be formulated as⎧⎪⎨ P (S i , S i−1 , . . . , S i−m )i ≤ mP viterbi (S i , S i−1 , . . . , S i−m ) =⎪⎩ max S i−m−1{P (S i |S i−1 , . . . , S i−m−1 ) P viterbi (S i−1 , . . . , S i−m−1 )} i > m,for the lattice without cyclic boundary interactions. As previously described, this probabilitycan be used to determine the actual ground state configuration, and can be reformulated todetermine ground state energy.It follows that the algorithm for obtaining solutions to thisdynamic programming problem is also a modification of the previous approach:opt[] := 1for i := m to n*mfor ( S j 0i, S j 1i−1 , . . . ,if i > mp max := − ∞S jmi−m)∈ ( S i , S i−1 , . . . , S i−m )for S k i−m−1∈ S i−m−1p := P ( S j0i| S j1i−1 , . . . , S k i−m−1)* opt[ S j1i−1 , . . . , S k i−m−1()]


3.2. A dynamic programming approach to spin glass ground states 29elseif p > p maxoptNew[k] :=p := P ( S j 0i, S j 1i−1 , . . . ,optNew[ ( S j 0i, S j 1i−1 , . . . ,opt := optNewpS jmi−mS jmi−m))] := pThe above pseudocode consists of three nested loops, the outermost of which is responsiblefor calculating the probability P(S i |S i−1 , S i−2 , . . . , S i−m−1 ) for iteratively increasing i. The loopthus effectively specifies a sliding window of size m + 1, which is moved across the lattice in thefashion previously described. For each position of the window all spin configurations are evaluated,using the associative data structure opt[] to obtain the probabilities of preceding windowconfigurations. These are referenced by the tuple ( S j 0i, S j 1i−1 , . . . , S i−m) jm , which represents a windowconfiguration. The algorithm is for the case without cyclic boundary conditions, thereforethe window is not required to precede position i = m + 1; at this position, window configurationprobability is unconditional.Adapting the algorithm for calculating ground state energy, where the statementp := P ( S j0i| S j1i−1 , . . . , S )ki−m−1* opt[ ( S j1i−1 , . . . , S )ki−m−1]becomes a summation of subsystem energies, the optimisation proceeds by determining energeticallyminimal preceding window states for each position of the window on the system lattice.In this form, the algorithm performs identically to the transfer matrix optimisation schemedescribed in [15]. It follows that the described scheme must have equivalent computationalcomplexity.An analysis thereof confirms this assumption: Given that the lattice consists of n × m spins,the algorithm’s execution time is proportional tot(n, m) ∝ (nm − m − 1) 2 ∣ ∣ ∣2(S 1 ,S 2 ,...,S m+2 ) ∣ ∣ ∣ ,where 2 (S 1,S 2 ,...,S m ) is the set of configurations of tuple (S 1 , S 2 , . . . , S m ). Therefore,t(n, m) ∝ (nm − m − 1) 2 m+2 + 2 m+1which isO ( (nm − m − 1 ) 2 m+2)= O ( (nm) 2 m) .Although not considered in further detail, the opportunity for further modification of this algorithmpresents itself, to account for cyclic boundary interactions within the spin lattice. Thisentails invoking the algorithm for specified configurations of the spin tuple (S 1 , S 2 , . . . , S 1+m ),similar to the algorithm employing a row-wise lattice decomposition. This is conjectured to increasethe algorithmic complexity to O(nm2 m 2 m ), since there are O(2 m ) possible configurationsof the specified spin tuple.


30 Chapter 3. Computational BackgroundIn the following chapter, parallelisation strategies are described for the harmony searchheuristic, the first-order Markov chain solution, and as an extension the aforementioned higherordermodification.


Chapter 4Parallelisation StrategiesTo be of practical use, a computational solution to a given problem must be able to be implementedon a machine architecture, such that the algorithm completes within a reasonableamount of time. While computational complexity provides a means of qualitatively evaluatingproblem tractability, the properties of the machine determine the amount of time required forsolving a particular problem instance.To reduce machine execution time, an approach applicable to physical architectures is toincrease the processing rate of machine instructions. This may be achieved in practice by increasingthe machine’s CPU clock rate, improving memory bandwidth, and augmenting the architectureby additional features such as registers, caches and pipelining. In general terms, thisrequires no conceptual modification to the algorithm , although the algorithm’s performance isusually amenable to optimisation for the respective architecture.The second approach to increasing machine performance involves parallelisation. Here, performanceis improved by distributing computation among a set of processing elements. Withthe exception of algorithms with implicit parallelism in operations on data structures in combinationwith vector processing architectures, it is necessary to adapt the algorithm and devise ascheme for achieving this distribution. For message passing architectures, this includes definingexplicit communication operations.In the following, the potential for implementing parallel versions of harmony search anddynamic programming methods is considered, with regard to MIMD architectures.4.1 Harmony searchIn the previous chapter, harmony search was described as a probabilistic algorithm employingan evolutionary strategy for both discrete and continuous optimisation. As such, it performsa heuristic evaluation of problem state space, i.e. search is non-exhaustive. Since improvingperformance motivates parallelisation, it is necessary to examine the heuristic for the purpose of31


32 Chapter 4. Parallelisation StrategiesP 1(a) No distribution (serial)P 1 P 2(b) ‘Weak scaling’P 1 P 2(c) ‘Strong scaling’Figure 4.1: Using parallelism to improve heuristic performancedefining performance relevant characteristics.For any heuristic algorithm, on one hand performance can be quantified by the search process’accuracy. The latter is influenced by the algorithm’s state space traversal policy, significantlyby the size of the search space. It follows that performance can be improved by enlargingthe search space, since in the limit of search space towards state space, solution optimality isguaranteed.On the other hand, it may be of interest to restrict the heuristic’s execution time, as previouslydescribed for the general class of halting algorithms. In this case, the task is to increasethe rate at which search is performed.Using parallelism to improve either of these characteristics, it is apparent that distributionof computation among processors bears similarity to the concepts of strong scaling and weakscaling, commonly encountered in parallel performance analysis. Whereas weak scaling impliesincreasing the number of processing elements while keeping the problem size constant(therefore varying the fraction of computation assigned to a processor), strong scaling increasesthe problem size with the number of processors (therefore keeping the fraction of computationassigned to a processor constant). Similarly, in the case of the heuristic, parallelism can eitherbe applied for the purpose of distributing a search space of constant size (weak scaling), or forincreasing the size of the search space (strong scaling). Using a tree model, an example of thisrelationship is shown in Figure 4.1.4.1.1 Harmony search performanceThe evolutionary strategy used by harmony search for combinatorial optimisation consists of initialcandidate generation, followed by iterative randomised candidate recombination (includingrandomised mutation) and solution replacement. The algorithm is probabilistic, hence searchis a random walk, whose average length is influenced by the memory choosing rate (Figure4.2(a)). Also, the number of solution vectors influences search, such that for NVECTORS=1,the optimisation becomes greedy: This is because a single solution is retained, which is onlyreplaced when a solution of higher utility is found . For larger NVECTORS i.e. maintaining a


4.1. Harmony search 33(a) Decreasing memory choosingrate (radius increases)(b) Increasing NVECTORSFigure 4.2: Conceptual illustration of harmony search behaviour within search spacelarger set of candidate solutions, the search process becomes biased, reminiscent of rejectionsamplingalgorithms: Here, the random walk is effectively centred around ‘islands’ representedby candidate solutions, since these effect the composition of future candidates generated by thealgorithm (Figure 4.2(b)). As the algorithm progresses, the positions of candidates in the solutionspace progress monotonically from their initial random position towards local maxima,since least favourable solutions are replaced by fitter candidates upon generation. Informally, itis easy to see that increasing the value of NVECTORS offers the benefit of a more diverse setof solutions from which to initiate the search. This is likely to improve the probability of obtainingan optimal solution. More significantly, this diversity allows for a large state space fromwhich random walks are initiated to generate further candidates; as the algorithm progresses,an increasingly large set of local optima is held. Upon termination, it follows as a conjecturethat there is greater potential for the solution set to hold diverse local optima. For various applicationsof harmony search, solution accuracy is indeed shown to improve when increasingNVECTORS, as shown in [43, 45]. Considering parallelism, it hence suggests itself to apply‘strong scaling’ to increase the number of solution vectors, in order to enlarge the algorithm’ssearch space. This might be achieved by assigning a set of solution vectors to a processor, suchthat each executes the harmony search algorithm on its allocated vectors. In this case, furtherconsideration must be given to exchanging solutions between processors.In contrast, a type of weak scaling might be achieved by maintaining a set of solution vectorsreplicated among processing elements. Here, parallelisation assists in improving the rateat which successive solutions are generated from vectors held in memory, and hence search isconducted. This is in comparison to a single processor executing the algorithm, and updatingsolutions stored in a set of equivalent size. Each processor executes the harmony search algorithm,potentially generating an updated solution vector at each iteration of the process. Uponreplacing the solution vector in question, its value is communicated to every other processor, sothat these continue to operate on replicate solution vectors. A likely consequence of this methodis that convergence is obtained more quickly, due to the increased rate of generating solutions.


34 Chapter 4. Parallelisation StrategiesMigrateMasterSolutionSlaveSlaveSlave(a) Master-slave(b) Coarse-grainedMigrate & select(c) Fine-grainedFigure 4.3: Parallelisation strategies for population based heuristics4.1.2 Existing approachesParallelisation methods for metaheuristic algorithms were briefly mentioned in Chapter 3. Theseare considered in more detail, in order to assess their potential adaptation for harmony search.Cantu-Paz [14] provides an overview of parallelisation schemes for evolutionary algorithms.Although these are discussed specifically in context of genetic algorithms, they are also applicableto other evolutionary heuristics, such as those introduced by Koza for generating softwareprograms [5]. Cantu-Paz discerns between three classes of approach, known as global masterslave,fine-grained and coarse-grained, respectively. These differ in the way the evolutionaryprocess is distributed amongst processors and to which extent solutions are communicatedamongst them.Depicted schematically in Figure 4.3(a), the master-slave approach implements a single population;offspring are generated from potentially any parent solutions in the population (termedpanmixia). This is achieved by assigning the population to a single master processor, allowingslave processors to access and modify individual solutions. Slave processors may be tasked withevaluating solution fitness, whereas the master is responsible for selection and crossover. It is


4.1. Harmony search 35possible to consider both a synchronous variant, where solutions are retrieved and modified indiscrete generations, and an asynchronous variant, where a slave may initiate a retrieval in advanceof its peers. Either are suited for implementation on shared-memory or message passingarchitectures, however it is noted that the heterogeneous organisation of processes into masterand slaves makes the approach generally less suitable for massively parallel architectures.In the coarse-grained approach (Figure 4.3(b)), the evolutionary process is no longer panmictic.The set of solutions which forms the population is partitioned among processors, sothat optimisation progresses primarily within semi-isolated ‘demes’ [14]. To allow evolution toprogress globally, demes exchange a proportion of their population with neighbours in a predefinedgraph topology. This allows solutions of high utility to propagate across the graph, whichpromotes convergence towards a common, global solution. On the other hand, the insularityof subpopulations permits a high degree of diversity, allowing multiple local optima to be approachedindependently, thereby preventing early convergence. Previous work includes investigationsbased on coarse-grained approaches, using both fixed toroidal or hypercubic topologiesand dynamic topologies . The distributed approach makes this technique particularly attractivefor implementation on message passing architectures.The fine-grained approach, shown in Figure 4.3(c), is also based on distributing the solutionpopulation amongst processors. However in contrast, exchange of solutions occurs morefrequently during the evolutionary process: Instead of periodically initiating migration betweensubpopulations, selection itself takes place between processor-assigned demes, which in themost extreme case consists of a single solution. Depending on the specified network topology,it may be practicable to select from all subpopulations within a certain vicinity from the initiatingdeme, which results in a overlapping selection scheme. Cantu-Paz notes that if this vicinityis equal to the network diameter for all nodes, evolution regains panmixia. Suited for massivelyparallel architectures due to its scalability, this approach appears to be especially effectivebecause of its flexibility.Aside from evolutionary algorithms, a potentially relevant approach to parallelising a heuristicis presented by Ram et al. [55]. Here, the simulated annealing algorithm is executed independentlyby multiple processors, where each initialises search with a random configuration. Thisallows parallel exploration of the seach space, in analogy to the effect achieved by executing anevolutionary process such as genetic algorithms using disjoint subpopulations: Since annealingproceeds independently, the process executed by each processor potentially converges towards adifferent local optimum. To counteract state space exploration, periodically the most promisingsolution is determined and exchanged between processors. Akin to migrating solutions betweendemes, this promotes global convergence towards a single solution. The number of algorithmiterations required for convergence is hence reduced. In their implementation, Ram et al. employa collective exchange scheme for communicating solutions between individual annealingprocesses. However, the neighbourhood exchange scheme described by Cantu-Paz is equally


36 Chapter 4. Parallelisation Strategiesapplicable.4.1.3 Proposed parallelisation schemeIn the described approaches, parallelism is applied with the intention of enhancing the explorativeor exploitative properties of heuristics: Whereas the coarse-grained evolutionary approachimproves exploration alone through parallel selection, the remaining approaches include an elementof parallel search exploitation, by propagating promising solutions in order to acceleratesolution convergence. The method used by Ram et al. can be viewed as a simplification ofthe coarse-grained evolutionary approach, where the graph defining solution exchanges is fullyconnected.Having stated the motivation for parallelising harmony search, the opportunity is given toapply the described approaches to this heuristic. Given that harmony search is an evolutionaryalgorithm, distributed state space exploration and exploitation are readily adapted from parallelgenetic algorithms.Figure 4.4 schematically depicts the proposed parallelisation scheme. Here, optimisationtakes place in distributed fashion, so that the heuristic is executed by multiple processors, eachassigned a set of solution vectors. To allow solutions to be exchanged between processors,the latter are arranged in a ring. Periodically, processors send solutions to their successors,while receiving these from predecessors. This reflects the behaviour of the aforementionedfine-grained approach. In addition however, processors are organised into a twofold hierarchy,where subordinate processors are not directly involved in cyclic exchange of solutions. Instead,these exchange solutions using collective operations, based on the scheme described by Ramet al. Subordinate processors are grouped in such a way that each subgroup includes a ‘ringexchange’ processor. It follows that collective exchanges consider solutions obtained throughthe cyclic exchange process.Although the proposed scheme is comparatively involved, it allows the behaviour of theheuristic to be altered by introducing a bias towards search space exploration or converselysearch space exploitation: If the size of subgroups is equal to the total number of processors,communication is restricted to collective solution exchanges, so that rapid convergence is promoted.In this case, effectively only a single subgroup exists. Providing that communicationoccurs at short intervals to ensure that similar solution vectors are held in memory, it is speculatedthat the algorithm will exhibit the described ‘weak scaling’ behaviour while increasing thenumber of processors. On the other hand, for unit subgroup size, collective solution exchangesare absent from the distributed search process. As a consequence, the ring-based approach isreinstated. Here, the expectation is that the heuristic will emphasise on explorative search, andtherefore exhibit ‘strong scaling’ behaviour when increasing the number of processors.It is apparent that there are a multitude of parameters which influence parallel optimisation,in addition to the memory choosing rate and number of solution vectors defined by serial har-


4.1. Harmony search 37Collective exchangeCyclic exchangeProcessorFigure 4.4: Harmony search parallelisation schememony search. These include the total number of processors involved in search, and the sizeof subgroups. Also, of significance is the rate at which solutions are exchanged, both for thering and collective subgroup operations. Finally, the latter two operations must be defined indetail; these may for example involve selecting solutions at random, or communicating the mostpromising solutions.The following describes a pseudocode prototype of a parallel harmony search algorithm forobtaining Ising spin glass ground states, using the message passing model:1 S o l u t i o n [ ] s o l u t i o n s := i n i t i a l i s e r a n d o m s o l u t i o n s (NVECTORS) ;23 f o r ( i =1; h a s c o n v e r g e d ( ) ; i ++) {4 S o l u t i o n s o l u t i o n = new S o l u t i o n ;56 f l o a t h i g h e s t e n e r g y = c o m p u t e h i g h e s t e n e r g y ( s o l u t i o n s ) ;7 i n t h i g h e s t e n e r g y v e c t o r = c o m p u t e h i g h e s t e n e r g y v e c t o r ( s o l u t i o n s ) ;89 f o r ( j : = 1 ; j


38 Chapter 4. Parallelisation StrategiesN PROCESSORS) ;21 msg rcv ( r c v s o l u t i o n ) ;22 copy min ( r c v s o l u t i o n , s o l u t i o n s [ rand ( ) ] ) ;23 }24 i f ( i mod ZONEEXBLOCK = 0) {25 r e d u c e m i n z o n e ( s o l u t i o n s [ h i g h e s t e n e r g y v e c t o r ] ) ;26 }27 }As with serial harmony search, the algorithm consists of an iterative loop, whose purpose itis to generate successive solutions and evaluate their utility. The proposed algorithm involvesterminating the loop when the most favourable configurations held by processes have identicalenergies. Although a more obvious approach might involve a less stringent termination criterion,it is thought that using this scheme, the number of iterations until termination provides a reasonablemeans of evaluating solution exploitation. Within the loop, solutions with random spinsare generated, based on the configuration of existing solutions (lines 9–15), and replaced (lines16–18). The constants NVECTORS and MEMORY CHOOSING RATE control the number ofretained solution vectors and the memory choosing rate, respectively. Following this, each loopiteration contains communication instructions for processors involved in ring exchange of solutions:Lines 20 and 21 swap random solution vectors between processors, following which thefunction copy min() on line 22 copies the value of the energetically more favourable argumentto its complementary argument. In this way, energetically favourable solutions are propagatedwithin a ring of search processes. There are (N PROCESSORS÷ZONE SIZE) such processorsin the ring.In addition, solutions are periodically exchanged between subgroups of processes, using thecollective operation reduce min zone. This performs a reduction based on the most favourableof argument solutions. As defined, the operation involves the highest energy solutions held byeach search process. The operation is executed at a rate determined by the constant ZONE-EXBLOCK. Subgroup size is influenced by the value of constant ZONE SIZE. When equalto N PROCESSORS, there exists a single group for which collective operations are defined,whereas ring communications are without effect. Conversely, for unit ZONE SIZE all processesare involved in ring communications, whereas collective operations are without effect.4.2 Dynamic programming approachesIn the previous chapter, exact solutions to the ground state problem were presented, based onmodelling spin interactions as Markov chains. The latter in turn were used to arrive at dynamicprogramming formulations of the respective optimisation problems. Run-time complexities arelower than the 2 nm bound required for finding the ground states of the n × m spin lattice usingbrute force, nevertheless they are high enough to merit investigating parallelisation strategies.


4.2. Dynamic programming approaches 394.2.1 First-order Markov chain approachParallelisation is based on an approach by Grama et al. [30], where a dynamic programmingproblem which is serial and monadic is decomposed into a tabular arrangement of solutionsto subproblems of increasing size. The order of operations required to solve the problem isequivalent to the order of individual scalar multiplications and additions required for a seriesof matrix/vector multiplications. The parallelisation approach is therefore given by parallelmatrix/vector multiplication, which is well studied.A dynamic programming problem is monadic if its optimisation equation contains a singlerecursive term. That is, given the function c = g ( f (x 1 ), f (x 2 ), . . . , f (x n )), which assigns a costto the solution constructed from subproblems x 1 , x 2 , . . . , x n , monadicity exists when g is definedas f ( j) ⊗ a( j, x), where ⊗ is an associative operator. In this form, each solution depends on asingle subproblem.Furthermore, a dynamic programming problem is serial, if there are no cycles in the graphof dependencies between subproblems. More formally, the graph G = (V, E) is defined by theset of nodes V, where each edge represents a subproblem. An edge between nodes exists, if theoptimisation equation contains a recursive term indicating a dependency between subproblems.Examining the optimisation equation for lattice ground state energy (without cyclic boundaryconditions),H min (S i ) =⎧⎪⎨ min S i{H (S i )} i = 1⎪⎩ min S i−1{H (S i ) + H b (S i , S i−1 ) + H min (S i−1 )} i > 1,it is apparent that the equation is monadic. To establish existence of the serial property, thegraph of subproblem dependencies is visualised (Figure 4.5(a)). As depicted, rows of nodesrepresent states of subsystems S i , which characterise the values of subproblems. Since thereare n subsystems, there are n2 |S 1|nodes in the graph. Since a subproblem may assume as manyvalues as there are values of its preceding dependency, the graph has a trellis-like structure consistingof bipartite graph segments. Because this organisation into individual levels is acyclic,the dynamic programming problem is serial.The graph is modified to include information on system energy. Given the pair of nodesassociated with subsystem configurations Si k, S i−1 l , define the weight function w ( Si k, S i−1) l =w k, li= H ( ) ( )Sik + Hb Ski, Si−1l , for 1 < i ≤ n. Further define an additional node α, such that theset of graph edges in extended to E ′ = E ∪ { (α, S1 k)|1 ≤ k ≤ q} for q subsystem configurations.For i = 1, the weight function is defined as w ( )α, Sik = H(S i ). Minimising system energy isthen equivalent to obtaining min k p ( ( )α, S n) k , where p α, Skn is the minimum path between nodesα and S k n.


40 Chapter 4. Parallelisation Strategiesn(n−1)m2 m(a) First-order2 m+1(b) Higher-orderFigure 4.5: Graph of subproblem dependencies for an n = 3, m = 2 spin problemA further observation is that that the minimum paths p ( α, S k i), 1 ≤ k ≤ q are expressed asp(α, S i 1 ) = min { w 1,1i+ p(α, Si−1 1 ), w1,2i+ p(α, Si−1 2 ), . . . , w1,qi+ p(α, S q i−1 )} ,p(α, S i 2 ) = min { w 2,1i+ p(α, Si−1 1 ), w2,2i+ p(α, Si−1 2 ), . . . , w2,qi+ p(α, S q i−1 )} ,.p(α, S q i ) = min { w q,1i+ p(α, Si−1 1 ), wq,2i+ p(α, Si−1 2 ), . . . , wq,qi+ p(α, S q i−1 )} ,for i > 1. For i = 1, p(α, Si k) = w(α, S i k ). In an analogy to matrix/vector multiplication,where addition is substituted by minimisation and multiplication is substituted by addition, theequations are equivalent top i = M i, i−1 × p i−1where p i = [p(α, S 1 i )p(α, S 2 i ) . . . p(α, S q i )] T . For i > 1, the matrix is defined asM i, i−1 =⎡⎢⎣w 1,1i, w 1,2i, . . . , w 1,qiw 2,1i, w 2,2i, . . . , w 2,qi.. . .. .w q,1i, w q,2i, . . . , w q,qi⎤⎥⎦,otherwiseM i, i−1 =⎡⎢⎣w(α, Si 1), w(α, S i 1), . . . , w(α, S i 1)w(α, Si 2), w(α, S i 2), . . . , w(α, S i 2). .. . .. .⎤⎥⎦w(α, S q i ), w(α, S q i ), . . . , w(α, S i m)


4.2. Dynamic programming approaches 41Step 1P 1P 2P 3×1234P 1P 2P 3×1234Step 2P 1P 2P 3P 4×1234P 1P 2P 3P 4×2341Step 3P 1P 2P 3P 4×1234P 1P 2P 3P 4×3412Step 4P 4×P 1P 2P 3P 41234P 4×P 1P 2P 3P 44123(a) Basic(b) ImprovedFigure 4.6: Parallel matrix operations. Numerals indicate order of vector elements.Using a sequence of n matrix/vector operations, it is now possible to compute minimum pathsp(α, S k i ), by initialising p to a q-component zero vector: The first operation M1,0 × p 0 yieldsminimum paths p(α, S1 k ) for 1 ≤ k ≤ q. Retaining the value of the resulting vector as theargument for the next matrix/vector operation, minimum paths p(α, S2 k ) for 1 ≤ k ≤ q arecomputed. The process is continued, until minimum paths p(α, S n) k have been computed. Theminimum vector component then corresponds to ground state energy.Matrix operation parallelisationA simple approach to parallelising the matrix/vector operation is shown in Figure 4.6(a). Here,the matrix is distributed in such a way that each processor stores the values of q prows, where pis the number of processors. Each is responsible for computing the same fraction of componentsof the resulting vector. It follows that the latter is assembled from partial results computed byeach processor. In the message passing model, this can be achieved using a gather operation.For the required purpose, it is necessary for each processor to access all components of theresulting vector subsequently. Therefore, it is practical to gather collectively. The algorithm isdescribed in the following pseudocode, where Mk, ldenotes the component in row 1 ≤ k ≤ q,i, i−1


42 Chapter 4. Parallelisation Strategiescolumn 1 ≤ l ≤ q of matrix M i, i−1 :Float[]pFloat[] p ′for k:= (proc id * q p +1) to ((proc_id+1) * q p )Float minval := ∞for l := 1 to qi, i−1Mif p[l] +k,l< minvalminval := p[l] + Mp ′ [k] := minvalall gather(p ′ , p ′ )i, i−1k,lIn the pseudocode, the outer loop is responsible for iterating through matrix rows. For each row,elements are added to vector components stored in p. The minimum sum becomes a componentof the vector p ′ . Matrix rows are assigned to processors based on the processor identifier proc id,whose value is in the range [0, number of processors). The computation concludes with thecollective operation all gather().Examining the algorithm’s computational complexity, it can be seen that execution time ist(q) ∝ q pq. Since determining ground state energy requires n iterations of the algorithm, wheren is the number of rows in the spin lattice, total execution time is t(n, q) ∝ n q2 p. Considering thatthe lattice contains m = log 2 (q) spin columns, execution time expressed in terms of lattice sizeis O ( np 22m) , which is cost optimal in comparison to the serial algorithm presented in Chapter 3.Memory efficient matrix/vector computationAlternatively, it is possible to perform the desired matrix/vector computation using a parallelalgorithm with reduced memory requirements for vectors q, q ′ . In resemblance to Cannon’salgorithm [13], it can be observed that although all processors access vector q in its entirety, individualcomponents need not be accessed simultaneously, as in the described approach above.Instead, the vector can be distributed between processors, so that each holds q p components.Computation commences with each processor performing additions of matrix elements associatedwith its allocated vector components. After the latter have been processed, all processorsperform a cyclic shift of vector components, which allows the minimisation operation toprogress further. This procedure is repeated until processors have completed the minimisationoperation on their assigned rows. The approach is illustrated in Figure 4.6(b), for which themodified pseudocode is:Float[]pFloat[] p ′for k:= (proc id * q p +1) to ((proc_id+1) * q p )Float minval := ∞for l := 1 to q


4.2. Dynamic programming approaches 43if (l mod q p ) = 1cyclic shift(p)if p[(l-1) mod q i−1+ 1] + Mi,p k,l< minvalminval := p[(l-1) mod q i−1+ 1] + Mi,p k,lp ′ [(k-1) mod q + 1] :=minvalpHere, the previously defined loop has been adapted to index the components of the distributedvectors. Since the result vector p ′ becomes an operand in successive iterations of the algorithm,performing a collective operation on p ′ is not necessary; this vector is thus distributed identicallyto p.In Chapter 3, a serial algorithm was presented for the ground state energy of the latticewith cyclic boundary conditions. This involved evaluating the boundaryless ground state energyH min for all configurations of boundary subsystems S 1 , S n . To adapt the parallel matrixalgorithm for this problem, define the weight function between nodes α, S k 1 as w ( α, S k 1)=H ( S k 1)+ Hb(Sk1, S l n), for boundary subsystem configuration Sln . The ground state energy canthen be obtained by performing the described series of matrix operations for all configurationsof subsystem S n . For each configuration S k n, the final result vector contains the minimum pathlengths p n = [p(α, S 1 n) . . . p(α, S k n) . . . p(α, S q n)] T , of which the relevant component is retained.The ground state energy is the minimum of these retained components. The complexity of theentire computation is O ( np 23m) executed on p processors, for an n-row, m-column lattice. Incomparison to the serial algorithm, this is cost optimal.4.2.2 Higher-order Markov chain approachIt remains to develop a parallel solution to the approach based on the higher-order Markov chain.For this model, it was formulated that ground state probability is⎧⎪⎨ P (S i , S i−1 , . . . , S i−m )i ≤ mP viterbi (S i , S i−1 , . . . , S i−m ) =⎪⎩ max S i−m−1{P (S i |S i−1 , . . . , S i−m−1 ) P viterbi (S i−1 , . . . , S i−m−1 )} i > m,where m is the number of lattice columns. By the relation between state probability and energy,in analogy to the approach based on row-wise lattice decomposition shown in Chapter 3, it wasshown that⎧⎪⎨ H (S i , S i−1 , . . . , S i−m )i ≤ mH min (S i , S i−1 , . . . , S i−m ) =⎪⎩ min S i−m−1{H b (S i , (S i−1 , . . . , S i−m−1 )) + H min (S i−1 , . . . , S i−m−1 )} i > m,where H(S i , S i−1 , . . . , S i−m ) is the energy of the ordered set of subsystems (S i , S i−1 , . . . , S i−m )and H b (S i , (S i−1 , . . . , S i−m−1 )) is the interaction energy between system S i and the ordered set(S i−1 , . . . , S i−m−1 ). Examining this optimisation equation, it can be seen that it is monadic,since it contains a single recursive term. As each level of recursion effects a unit decrease


44 Chapter 4. Parallelisation Strategiesof indices of the tuple (S i , S i−1 , . . . , S i−m ), there are no cyclic dependencies between subproblems.The dynamic programming formulation is therefore also serial. Considering this similarity,the opportunity is given to adapt the parallel matrix based computation to solve thisdynamic programming problem. To achieve this, the weighted graph of subproblems is reestablished,with an edge connecting two nodes if the recursive formulation indicates dependency.For an n × m spin lattice, there are (n − 1) m 2 m nodes in the graph, because each tuple(S i , S i−1 , . . . , S i−m ) has 2 m configurations and a solution is constructed from (n − 1) m subproblems.A given subproblem corresponds to a certain position of the sliding window on the lattice,as described in Chapter 3. The function w ((S i , S i−1 , . . . , S i−m ), (S i−1 , S i−2 , . . . , S i−m−1 )) =H b (S i , (S i−1 , . . . , S i−m−1 )), defined for i > m, describes the weight of an edge. As before,the graph is extended with an additional node α, so that the set of edges is defined as E ′ =E ∪ {(α, (S 1 , S 2 , . . . , S m+1 )) | for all configurations of (S 1 , . . . , S m+1 )}. For i ≤ m, define theweight function w(α, (S i , S i−1 , . . . , S i−m )) = H(S i , S i−1 , . . . , S i−m ). This results in a trellis-likegraph, shown in Figure 4.5(b). Minimising system energy is equivalent to obtainingmin {p (α, (S nm, S nm−1 , . . . , S nm−m ))} ,(S nm ,S nm−1 ,...,S nm−m )where the function p is the minimum path between two nodes in the graph.Previously, matrices of edge weights between trellis segments were used to compute minimumpaths, for which the parallel matrix operation was presented. From the optimisationequation and Figure 4.5(b), it is observed that each node at a given level is connected toonly two nodes at the preceding level. This is because there are two configurations of tuple(S i−1 , S i−2 , . . . , S i−m−1 ) for any specified tuple (S i , S i−1 , . . . , S i−m ). Assigning infinite weightsto unconnected nodes between trellis levels, it follows that the matrices are sparse, with regardto infinite valued elements.Providing matrix sparseness can be exploited, an adaptation of the existing parallel algorithmwill execute in t(n, m) ∝ (n − 1)m 1 p 2m time on p processors, since each matrix contains2 m rows distributed between processors. With a total of (n − 1) m matrix operations, the groundstate energy of the lattice without cyclic boundary conditions can be obtained in O( nm p 2m ) time.This is cost optimal in comparison to the serial algorithm described in Chapter 3. Using bitstring representations of spin tuples in combination with shift operations, an approach whichconsiders matrix sparseness is described in Chapter 6.


Chapter 5The ProjectIn previous chapters, the theoretical background to the ground state optimisation problem wasdescribed. Having described the two approaches identified for solving this problem, this chapterdeals with undertaken practical work towards their implementation and evaluation.5.1 Project descriptionThe purpose of the project is to conduct practical investigation into parallel algorithms for determiningground states of the Ising spin glass. Specifically, the project deals with the twodimensionalEdwards-Anderson model, i.e. the Ising model with lattice aligned spins, in whichspins are able to assume two discrete states.Investigations deal with a method for obtaining spin glass ground states exactly. The methodis based on the transfer matrix method, in which the statistical-mechanical properties of the latticesystem are used to obtain solutions. It follows that one project objective is to develop aparallel algorithm based on the Transfer Matrix method. As an additional objective, the projectincludes investigating an alternative parallel algorithm, with which solutions to the ground stateproblem are obtained heuristically. The performance of both parallel algorithms is to be evaluated;in the case of the heuristic this entails evaluating solution accuracy.Investigation requires that algorithms are developed in software. The software should beself-contained: From the user’s perspective, the software should offer sufficient functionality tobe useful as a research tool, allowing various types of problem instance to be solved using theimplemented algorithms. The software should be able to be executed on a wide range MIMDmultiprocessing architectures.5.1.1 Available resourcesThere are two computing resources available for the project. The first of these, Ness, is a sharedmemory multiprocessor system [2]. It has a total of 32 back-end processors, which are parti-45


46 Chapter 5. The Projecttioned into two interconnected groups. This configuration allows a single job to request 16 processorsat maximum. The system is constructed from AMD 64-bit Opteron processors, whichhave a clock frequency of 2.6GHz. Jobs are submitted to the back-end from a dual processorfront-end, which executes the Sun Grid Engine scheduling system. The back-end has 32 × 2GBof RAM. The system is based on the Linux operating system, providing Fortran, C and Javaprogramming environments. Both shared memory and message passing model programmingare supported, using the MPI and OpenMP programming interfaces. Ness does not implementa budget system for CPU time, however access to queues is restricted according to the amountof requested computation time.Also available is the supercomputing resource HPCx [3]. This consists of a cluster of IBMP575 shared memory nodes, each containing 16 processors and 32GB of RAM. For executingjobs, the system consists of 160 compute nodes. Nodes are constructed from Power5 processors,which have a clock frequency of 1.5GHz. The processor architecture allows for 6.0Gflop/stheoretical peak performance. Inter-node communication is supported using IBM High PerformanceSwitch interconnects. These provide a maximum unidirectional inter-node bandwidthof 2GB/s, at MPI latencies of 4–6µs [24]. Based on the AIX operating system, the serial andparallel programming environments are similar to those provided on Ness. The job scheduler,LoadLeveler, provides queues for serial and parallel jobs, using a budget system for CPU time.5.2 Project preparationBefore commencing the project, an initial phase was designated to project preparation. Thisconsisted of investigating the problem background and defining the project’s aims. Potentialapproaches to solving the spin glass problem were identfied and implemented as prototype software.Project process activities were carried out, consisting of a risk analysis and scheduling. Asoftware development model was decided upon.5.2.1 Initial investigationsAccess to an existing serial transfer matrix code was provided before commencing the projectpreparation phase. The potential was given for a code level analysis of parallelism; this approachwas considered an alternative to basing an implementation on the mathematical formulation ofthe optimisation problem, which was subsequently undertaken. With a view to implementingthe parallel approach described by Grama et al. [30], initial work consisted of investigating theexact optimisation technique described in Chapter 3.The harmony search algorithm was identified as a potential secondary approach to compareto the envisaged exact ground state solver. After initialising a CVS repository for projectsource code and experiment data, a serial implementation of the heuristic was evaluated, inorder to assess the algorithm’s suitability for further parallelisation. The evaluation consisted


5.2. Project preparation 47spinglass.h+int xSize+int ySize+Spin[] spins+double[] weights+boolean[] clamps+Spin[] initialSpinsFigure 5.1: Spin glass structure designof determining solution accuracy, based on ground states obtained for a collection of randomspin glasses, using an implementation of a brute force algorithm. Discussed in Chapter 7, resultssuggest that solution accuracy might be increased, using a parallel implementation of thealgorithm.5.2.2 Design and implementationA basic software framework was developed, to facilitate the collation of performance data. Thisframework consisted of a set of utilities, implementing rudimentary functionality for creatingspin glass problem instances and evaluating their energy. Based on this, a design for a moreextensive framework was created, based on the following list of client operations on a spin glassAPI:• Initialisation of spin lattices with specific boundary conditions• Destruction of spin lattices• Calculation of system energy• Bond randomisationAlso, a spin glass data structure was designed. Shown in Figure 5.1, this consists of instancevariables for storing the height and width of the spin lattice. The values of spins themselves arestored in an associative array-like data structure, as are the values of coupling constants. Theformer are stored two-dimensionally in row major fashion, while the latter require an additionaldimension. In the design, two 2-dimensional arrays store vertical and horizontal bonds, againusing a row major storage scheme. To record whether a spin is clamped to a specific state, thedata structure includes a further array. Finally, the initial values of spins are stored. This storesthe actual state to which a spin is clamped, allowing the primary spin array to be reserved forcomputation.A schema of the framework is shown in Figure 5.2. This includes an interface for performinginput/output operations: It allows representations of coupling constants to be read from


48 Chapter 5. The ProjectIO+readBonds()+writeBonds()+readClamps()+writeClamps()SpinGlass+spinGlass_new()+spinGlass_remove()+spinGlass_energy()transferMaxtrixSolverSolverwriteBondswriteClampsFigure 5.2: Software framework designfiles, similarly a function allows the clamping state of spins to be read. These operations arecomplemented by functionality for writing representations to file.The IO operations are required by the two utilities writebonds and writeclamps, which facilitatecreating spin glass problem instances. These are responsible for writing data to files,which are subsequently read by solver utilities. The format of clamping state files is specified asa UNIX UTF-8 encoded text file, containing the symbols ‘1’ and ‘0’. These provide a representationof whether a spin is clamped, such that a string encodes the state of a lattice row. Stringsconsist of the aforementioned symbols, separated by whitespace. Spin clamps are stored in thefile as consecutive strings, separated by line feed characters. The file format for spin couplingconstants is similar: Here, symbols are floating point numbers in decimal notation, again separatedby whitespace and line feed characters. The format reflects the design of the spin glassdata structure, in that two consecutive blocks retain values of vertical and horizontal bonds. Theformat specifies that these blocks are separated by a single blank line.Figure 5.2 also shows the design of the spin glass API. This exports functionality to clientsolvers, which themselves implement a simple interface for solving spin glass instances. Asolver uses the IO interface to construct a spin glass instance from bond and clamp state files.Thereafter, it invokes its implementation of a ground state algorithm. The latter utilises furtherAPI operations, to evaluate spin glass energy. Finally, the spin glass instance is destroyed, afteran output of the determined solution has been generated.5.2.3 Implementation language and toolsDuring the course of software design, the choice of implementation language and tools wasconsidered. The C language was selected due to its general widespread use as a development


5.2. Project preparation 49language on high performance systems, and availability of compilers both on the two computationresources and development machines. To ensure portability, ANSI C 89 was selected as theimplementation standard.To expedite software development, it was decided to implement the software using the GLiblibrary [1]. This is a cross-platform collection of utility functions which implement generalpurpose data structures, parsers etc. Macros and type definitions are provided, which potentiallyreduce the amount of required pointer casts in a code. This in turn has an impact on cast errorsand debugging time.A build management system was also selected. Widely used in conjunction with the C andC++ programming language on UNIX based systems, this allows makefiles to be generatedsemi-automatically and configured for different target systems. This was considered useful forproviding an application package for a variety of systems.Given the available computing resources of which the HPCx is a clustered system, the MPImessage passing library was chosen for parallel development. For this reason, the algorithmsdescribed in Chapter 4 are given for the message passing model. Although the possibility ofusing a hybrid shared memory/message passing approach using MPI and e.g. OpenMP is given,this was considered beyond the scope of the project.5.2.4 Choice of development modelFor the choice of software development model, multiple factors were taken into account. Theseincluded the amount of time available, the required functionality and overall software complexity.Intuitively, implementation can be realised in two phases, each relating to one of the twoalgorithms. From previous experience and design requirements, it was assumed that each ofthe implementation tasks would involve a relatively small amount of written code. Instead, implementationeffort was assumed to focus on distribution of data, communication patterns andalgorithm correctness. Therefore, it was thought that the approach of applying staged delivery toeach phase would be advantageous to the project. Following the design of the framework’s overallarchitecture with multiple ground state solvers, this approach involves discrete design/implementation/testingactivities associated with one release for each ground state solver. Developingeach ground state solver is associated with iteratively augmenting software functionality.5.2.5 Project scheduleThe devised project schedule is shown in Appendix A. Based on an available time frame of 16weeks, the schedule accounts for all project deliverables, implementation goals and exploratoryaims. Therefore both a practical component, consisting of software development and evaluation,and the project report and presentation are included.


50 Chapter 5. The ProjectRisk Type Impact Likelihood ActionData loss Schedule High Low AvoidLack of time Schedule, Scope High Moderate ReduceUnavailable testing resources Schedule, Quality, Scope High Low AvoidAlgorithmic complexity Scope, Schedule Moderate Moderate AvoidTable 5.1: Identified project risksThe practical component is split into two distinct phases. Each of these corresponds to thedevelopment and evaluation of the dynamic programming and harmony search based groundstate solvers. A development/evaluation iteration is comprised of tasks for designing, implementing,debugging and testing software, before gathering performance data. Following developmentand evaluation, tasks are specified for producing the report and presentation. A singleweek is left unallocated for making amendments to the produced work.The implementation, debugging and testing tasks required for software development arescheduled in parallel, as it was thought that this best reflects the nature of the chosen developmentmodel, where functionality is integrated iteratively. Evaluation tasks are interleaved withsoftware development, so as to minimise the effects of unavailable resources, should these haveoccurred.5.2.6 Risk analysisTo assess the chance of the project’s successful completion, potentially detrimental factors wereconsidered. Such factors include those affecting the project plan and scheduling, software qualityand software scope. Table 5.1 lists risks identified during project preparation by type, estimatedimpact, likelihood of occurring and proposed action.Judging from the product of impact and likelihood of occurrence, the most significant risk islack of time. As the time frame for completing the project and required deliverables was short,this was conceivable. To counteract this, care was taken to define project goals rigorously toavoid feature creep, furthermore all tasks were scheduled within a 15 week time frame, allowingfor a further week as float time.The remaining risks were avoided by ensuring sufficient computing time on parallel machines(pertaining to unavailable resources), backups and software version control (pertainingto data loss) and sufficient background research (pertaining to sophisication of algorithms). Asa fallback action in the event of not being able to implement the researched transfer matrixscheme, the possibility of performing a code level analysis of an existing serial transfer matrixsolver code was given. As a caveat, this approach would have offered less insight into theunderpinnings of parallelism in the transfer matrix method.


5.2. Project preparation 515.2.7 Changes to project scheduleA number of changes were made to the project schedule. These concerned both the order ofscheduled tasks and their estimated duration.Most significantly, developing the parallel harmony search solver proved to require lesstime than envisaged in the project schedule; it claimed only two schedule weeks in comparisonto the four weeks assigned during preparation. As a result, it was possible to implement a moreadvanced exact parallel solver, as previously described.Also, the original decision to designate performance evaluation to a single task for each ofthe two solver types proved impractical. Instead, data were gathered separately for each computingresource, with subtasks for each variant of the exact solver. Separating evaluation betweenmachines was initiated by the fact that implementing experiments on HPCx was delayed due tocompilation issues with the required version of the GLib library.Furthermore, after devising the original project schedule, the communicated date for thepresentation proved to be after the date for the remaining deliverables. The time gained wasallocated to completing the project report.5.2.8 Overview of project tasksThe following provides a description of tasks undertaken during the project, as an account ofthe extent to which the project schedule was adhered to.In weeks 1 and 2, the ideas presented in Chapter 3 were developed as a basic serial exactground state solver code. The parallelisation method using collective operations, discussed inChapter 4 was also implemented. In both cases, the algorithms were based on the spin latticewithout boundary conditions.In week 2, timing data were collected for the previously implemented serial solver. In addition,scaling data for the parallel solver were collected on the Ness computing resource. Workcommenced on implementing the improved parallel ground state solver using cyclic communicationpatterns, also described in Chapter 4. The improved parallel ground state solver wascompleted in week 3. In week 4, further scaling performance data were collected on Ness forthis code. Remaining time in week 4 was used to conduct a code review, based on the entiretyof implemented software.In week 5, work commenced on developing the harmony search ground state solver. Bothserial and parallel code was completed in week 6, during which the dynamic programmingcode was modified to support solving systems with cyclic boundary conditions. In week 6,performance data for the dynamic programming code were collected on the HPCx machine.In week 7, further performance data were gathered on HPCx. This was to evaluate thedynamic programming code with cyclic communication patterns. Also, routines were developedfor evaluating harmony search performance, which was subsequently evaluated in week 8.


52 Chapter 5. The ProjectIn weeks 9 and 10 a further modification to the exactly solving dynamic programming approachwas implemented, based on the higher-order Markov chain theory described in Chapter3. This was for the spin glass model without cyclic boundary conditions. In week 10, performancedata were gathered for this algorithm.The remaining time was used to complete the project report and perform a final revision ofall deliverables.


Chapter 6Software Implementation6.1 IntroductionThe implemented software is a framework for experimenting with two-dimensional lattice spinglass ground state problems. It consists of utilities which assist with generating spin glass instances,which may be subsequently solved using either exact or heuristic based solver utilities.The latter provide information on both the energy and spin configuration of ground states. Whileaimed primarily at generating solutions using parallel algorithms, it is also possible to reconfigurethe software to use serial computation only.The software is implemented in the C programming language. The GNU C compiler wasused on the development system. To increase C90 standard conformity, the compiler flags -ansi-pedantic were used. Development took place predominantly on a 32 bit single processor Linuxsystem, on which gcc 4.1.2 and gdb 6.6 were installed. The MPI implementation was MPICH2,version 1.0.6. To assist with debugging, the Valgrind suite was used to check for memory leaks.The version control system CVS was used extensively during implementation. Based on acentral repository stored on the Ness machine, version control was used as a means of retrievingthe entire code base and synchronising code modifications between machines.The build management system used for the software is the GNU autotools suite. This is usedto automatically configure the software prior compiling it on the target architecture. Instructionson how this can be achieved are given in Appendix E.In the following, an overview of the software framework is given.6.2 Implementation overviewFrom the user’s perspective, the framework consists of a set of binary executables. These are:• genbonds53


54 Chapter 6. Software Implementation• genclamps• sbforce• dpsolver• dpsolverfast• hmsolverThe two utilities genbonds and genclamps are used to generate random coupling constantsand specify the clamping state of spins in the lattice, respectively. As implemented, the utilitiesproduce character based representations as described in the design in Chapter 5. The utilitieswrite to the standard output. Using UNIX shell redirection, this output can be stored in files, inpreparation to invoking a ground state solver on the data. Using these utilities therefore facilitatescreating instance data. Both genbonds and genclamps use standard command line optionsfor specifying spin lattice dimensions and related parameters. For example, lattice dimensionsare specified using –xSize=x –ySize=y, for a system with x rows and y columns.The remaining executables correspond to implementations of algorithms described in Chapters3 and 4: For testing purposes, the sbforce utility implements a simple exhaustive search,hmsolver the harmony search algorithm in its parallel realisation. Similarly, dpsolver and dpsolverfastprovide exact solvers based on dynamic programming approaches. As before, all ofthese executables use command line parameters for specifying options. In this case, the mostsignificant parameters are those for specifying bond and clamp configuration files. These utilitieswrite solutions to standard output.From the perspective of implementation, the software is constructed using a modular approach.Also based on the design described in the previous chapter, there exist various librarymodules, which provide functionality such as IO and spin glass manipulation. These are utilisedby client modules, which include implementations of of ground state solvers. By means of Cheaders, client modules are able to reference APIs. API implementations are used to generateseparate binary executables through the linking process.Appendix B includes a UML class schema of the relationships between source code modulesand headers. As shown, source code modules reference various headers, which include arrays.h,gstatefinder.h, io.h, random.h and spinglass.h are defined. Their purpose is as follows:• arrays.h Specifies multidimensional array operations• gstatefinder.h Specifies the interface to be implemented by ground state solvers• io.h Defines IO operations• random.h Defines randomisation functions


6.3. Source code structure 55• spinglass.h Defines the spin glass data structure and operationsAs shown in Figure B.1, multidimensional arrays are used by the dynamic programming basedsolvers, as befits the algorithms’ requirements for associative data structures. The IO headeris used by module main.c, which implements an entry point for all executables. Furthermore,gstatefinder.h is included by main.c, bforce gstate finder.c, dp gstate finder.c and harmonygstate finder.c, the latter three implementing exhaustive search, dynamic programmingand harmony search, respectively. Whereas dp gstate finder.c implements the basic exact optimisationalgorithm described in Chapter 3, a further module dp gstate finder fast.c providesan implementation of the improved dynamic programming algorithm, described in the samechapter.6.3 Source code structureFrom the description of source module and header purpose, the following provides a more detaileddescription of the implementation. This is given at function level for a selection of thecode base, to illustrate core functionality.6.3.1 Library functionalityarrays.hAs previously mentioned, the implementation of the exactly solving algorithm requires accessto multidimensional arrays. Given the restriction in C to defining single-dimensional dynamicarrays, next to using static arrays, it is necessary to use pointer arithmetic and casts to implementmultidimensional arrays. Confining implementation to source module arrays.c, functions areprovided for constructing and destroying arrays in two and three dimensions of arbitrary size.Returning pointer types, the constructor functions allow data elements to be accessed usingconventional array syntax, while preserving memory contiguity. These functions are invokedrepeatedly by dp gstate finder.c and dp gtate finder fast.c. While a less involved approachmight have offered increased performance, implementing the dynamic programming algorithmotherwise was considered too cumbersome, given the allocated time for software development.As an alternative, the header defines macros which emulate a multidimensional array, basedon performing arithmetic on a single pointer. Although syntactically less convenient, this approachrequires fewer dereferencing operations to access a pointer element. For performancereasons, the approach is utilised by the spin glass library functions in spinglass.c.


56 Chapter 6. Software Implementationio.hHeader io.h defines six functions, responsible for reading and writing files containing representationsof spin state, clamping state and coupling constants. Three functions responsiblefor reading from file are of the form *read(char *fileName, int *xSize, int *ySize). Of all parameters,which are all called by reference, the value of fileName is read upon invoking thefunction, whereas xSize and ySize retain the spin lattice diminsions after the function call hasbeen completed. The function returns a pointer to state data read from file.Complementary functions for writing to file are of the form write(struct SpinGlass *spin-Glass, char *fileName). Here, the parameters consist of a pointer to an instance of the spinglass abstract data type (described in the previous chapter), and the name of the file to write to.The function return type is void.The file-reading functions in io.c are implemented using a single static method, GQueue*parse file(). As the name suggests, this provides simple parsing capabilities, using a loop toiterate through string tokens obtained from the standard library function strtok(). Recording andverifying counts of symbols on each line, tokens are added to a queue. This queue is returnedby the function. Dequeuing elements stored in the queue, the aforementioned reading functionsthen construct data structures representing spin glass parameters.spinglass.hThe spin glass data structure is defined in header spinglass.h. Using a C struct type, the followingfields are defined:1 s t r u c t S p i n G l a s s {2 g i n t x S i z e ;3 g i n t y S i z e ;4 Spin ∗ s p i n s ;5 gdouble ∗ w e i g h t s ;6 g b o o l e a n ∗ clamps ;7 Spin ∗ i n i t i a l S p i n s ;8 } ;As given by the design description in Chapter 5, the structure specifies variables ∗ for storinglattice dimensions. An enumeration type defines the Spin type; the pointer field is used toreference a memory block storing the state of spins. The enumeration defines integer statesUP=1 and DOWN=-1. Spins’ states are stored using a row-major scheme. This matches theaccess method using a single pointer, defined in arrays.h. Coupling constants, clamping statesand the field initialSpins store states similarly. The latter field provides an account of spin state[1]∗ GLib specifies wrappers for standard C types; motivation for their use is discussed in the GLib documentation


6.3. Source code structure 57(a) row energy() (b) interrow energy() (c) ensemble delta()Figure 6.1: Functions provided by spinglass.cdistinct to field spins, the latter storing the state of spins while performing optimisation. Usingtwo separate fields allows lattice configurations to be compared before and after optimisation.Header functions in spinglass.h are grouped into four categories, associated with allocatingmemory for the data type, computing lattice energy, writing lattice properties to file, and miscellaneousactivities. All functions operate on the spin glass data structure, which is passed byreference from a caller function.The purpose of the memory related functions is as described in the design: These ensure thatthe spin glass structure is initialised and terminated correctly. The constructor function is of theform *spinglass alloc(gint xSize, gint ySize, Spin *initialSpins, gdouble *weights, gboolean*clamps); it requires as parameters the lattice dimensions, initial spin configuration, couplingconstants, and clamping states. The function returns a pointer to a newly allocated data structure(fields are assigned according to supplied parameters). To assist in freeing memory after use,the function spinglass free() is implemented.Lattice energy is computed using a collection of five functions. The simplest of these is definedas spinglass energy(struct SpinGlass *spinGlass), which returns as a floating point numberthe energy arising from all interactions in the lattice. For convenience, spinglass(struct SpinGlass*spinGlass, Spin *conf) returns the energy due to coupling constants specified in *spin-Glass, however the configuration is given as a separate array *conf. A comparison between theremaining three energy calculating functions is given in Figure 6.1: The spinglass row energy()function determines the energy of a spin row (considering horizontal bonds), whereas interrowenergy() uses vertical bonds to calculate the interaction energy between adjacent rows.With ensemble delta(), the energetic contribution between a single spin and its predecessors inhorizontal and vertical dimensions is calculated.The file output functions in spinglass.c are used to implement the output functions in io.c.The functions are of the form write(struct SpinGlass *spinglass, FILE *file), i.e. argumentsinclude a pointer to a spin glass structure and a file pointer. If required, this allows spin glassproperties to be easily echoed to screen, using the file pointer stdout.


58 Chapter 6. Software ImplementationFinally, miscellaneous functions include get random spins() (used to generate random spinconfigurations, while considering spin clamping state), has vertical boundary() (used to determinewhether cyclic boundary interactions are present along the lattice’s vertical dimension),and correlate(). The latter is used to compare spin configurations between spin glass structuresin terms of differing spin state.6.3.2 Client functionalityHaving described library functionality provided by the software, attention is now given to thecode modules utilising this functionality. These include the entry point module main.c, andmore importantly, the modules implementing optimisation algorithms. Note that the code baseincludes additional modules for the utilities genbonds and genclamps. These do not make useof library functions; as their implementation is trivial, these are not considered in further detail.The source code for all algorithms is provided in Appendix F.main.cModule main.c uses the standard argument processing library provided by GLib to implementexecution parameter parsing for solver utilities. This requires a number of auxiliary data typesand structures, which are defined as static global and local variables in the module’s main()function. The latter is responsible for reading file name arguments associated with specific flags,describing the locations of coupling constant and clamping state files. Also, a file describing aspin configuration to compare the solution to may be specified.After parsing program arguments, presence of required and optional parameters is verified.A local function init() then initialises a spin glass data structure, using previously describedfunction spinglass alloc(). Optimisation is then initiated by invoking the header defined functionfind ground states(). After the solution has been obtained, spinglass correlate() performsa comparison, should the related flag have been specified. After deallocating the data structure,init() and main() terminate. By each optimisation algorithm implementing find ground states()in its own module and linking with main.c, the main() function is provided by the same modulefor all utilities. This promotes code reuse and facilitates extending the code base with newalgorithms.bforce gstate finder.cTo generate ground truth data for testing purposes, module bforce gstate finder.c implementsa brute force ground state solver. The solver is based on an infix traversal of state space. Thisis achieved using a function find ground states(), which is called recursively. A conditionalstatement restricts recursion depth, based on a variable whose value represents the position of awindow on the spin lattice. For each invocation of the function, the state of the spin under the


6.3. Source code structure 59window is flipped. Before and after flipping spin state, recursive calls are performed, in eachcase advancing the window by one spin. The base case effects evaluation of system energy. Ifsystem energy is found to be lower than the recorded minimum, energy and configuration areoutput before updating the minimum. Since search is exhaustive, the ground state configurationis eventually output.harmony gstate finder.cSerial and parallel harmony search algorithms were described in Chapters 3 and 4. The serialalgorithm consists of initial random solution generation (characterised by the parameterNVECTORS) followed by an iterative process, in which low-utility solutions are replaced. Replacementis based on combining the components of stored solutions, using randomisation. Thelatter is controlled by the memory choosing rate parameter. The parallelisation strategy involvesa collection of harmony search processes which exchange solutions between each other, using ahierarchical system of nearest-neighbour and collective communication patterns.Excepting the number of processss, module harmony gstate finder.c defines all parameterscontrolling the behaviour of harmony search using preprocessor directives. These parametersinclude the number of solutions held by a process (NVECTORS), the memory choosing rate,the number of iterations before performing a collective communication operation, and the sizeof subgroups involved in collective communications.In addition to the module’s entry function find ground states(), the implementation consistsof seven static functions, responsible for initialising and finalising message passing communications,collectively evaluating solution energy, and verifying the algorithm’s state of convergence.When the entry function is invoked, the implementation begins by allocating memory for asingle solution vector *neighbourSpins, which is used to store data from nearest-neighbour ringcommunications. After initialising communications, solution vectors are generated randomlyand assigned to elements of an array Spin *spins[NVECTORS]. The latter is the collectionof solution vectors used during the heuristic process. The actual heuristic consists of a loopexecuted directly after the aforementioned solution generation, which is of the form:1 f o r ( i =1; g e t s t a b i l i s e d s t a t u s ( ) == FALSE ; i ++) {2 / ∗ C r e a t e new v e c t o r ∗ /34 / ∗ Compute h i g h e s t e n e rgy v e c t o r ∗ /56 / ∗ S e t v e c t o r components ∗ /78 / ∗ Replace v e c t o r i n memory , i f new v e c t o r i s of h i g h e r f i t n e s s ∗ /910 / ∗ Perform communication o p e r a t i o n s ∗ /11 }


60 Chapter 6. Software ImplementationAs shown, the loop’s execution is controlled by get stabilised status(), responsible for evaluatingthe state of convergence. Within the loop body, memory for a new solution vector isallocated; like all other solution vectors, the memory block consists of xS ize × yS ize elementsof type Spin, where xS ize × yS ize are the dimensions of the spin lattice. After determiningthe solution vector with highest energy, the values of the new solution vector’s components areset from existing vectors, according to the algorithm described in Chapter 3. Following this,the new solution’s energy is determined. The highest energy solution is replaced, if comparisonyields that the new solution’s energy is lower. Communication routines are executed, afterwhich the process begins anew.The hierarchical communication scheme is implemented using two separate conditionalstatements, responsible for performing nearest-neighbour ring communications and collectiveoperations:1 i f ( S o l v e r P r o c I D % ZONE SIZE == 0) {2 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS) ;3 MPI Sendrecv ( s p i n s [ random ] , 1 , Type Array , ( S o l v e r P r o c I D+ZONE SIZE )%S o l v e r N P r o c s , 0 , n e i g h b o u r S p i n s , 1 , Type Array , MPI ANY SOURCE ,MPI ANY TAG , COMM, MPI STATUS IGNORE ) ;4 r e d u c t i o n f u n c t i o n ( n e i g h b o u r S p i n s , s p i n s [ random ] , NULL, NULL) ;5 }67 i f ( i % ZONEEXBLOCK == 0) {8 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ maxVector ] , S o l v e r Z o n e ) ;9 }The exchange begins by processes selecting solutions at random (line 2) and sending them totheir neighbours. Ring communication is performed using the send/receive operation in line 3,where each process with ID Solver ProcID sends to process ID ((Solver ProcID + ZONE SIZE)mod Solver NProcs). Here, Solver NProcs is the total number of processes and ZONE SIZEis the number of processes in a subgroup. In this way, ZONE SIZE controls the number ofprocesses involved in ring communications. Every random solution is received into the memoryblock referenced by *neighbourSpins. Whether this is committed to a process’ solution setspins[], depends on the result of applying reduction function(). The latter performs identicallyto the copy min() function in Chapter 4, copying the energetically minimal argument to itscomplement. Consequentially, line 4 is responsible for accepting or rejecting solutions receivedin the ring exchange operation. Line 7 performs the aforementioned collective operation; thisinvolves each subgroup performing a reduction on their least favourable solutions, using thecommunicator Solver Zone. The communicator refers to all processes in a subgroup based onthe instructionMPI Comm split (COMM, S o l v e r P r o c I D / ZONE SIZE , 0 , &S o l v e r Z o n e ) ;which partitions the set of all processes, such that processes with equal Solver ProcID /


6.3. Source code structure 61ZONE SIZE share the same subgroup. The function reduce minimal spin vectors is itselfbased on the MPI Allreduce() operation, using reduction function() as a custom reduction operator.The frequency of reduction is controlled by the value of constant ZONE SIZE.After the optimisation loop has terminated, the function find ground states() performs anumber of operations to finalise optimisation, such as determining the most favourable solutionheld hitherto in the solution set among processes. The obtained configuration data are copied tothe spins field of the spin glass data structure, and the solution is output by invoking the functionspinglass write spins(). Memory for storing solution vectors is deallocated, following whichMPI communications are terminated.To complete the description of the harmony search module, it remains to detail the functionwhich controls the heuristic’s termination, get stabilised status(). Like the collective operationused for exchanging solutions between processes, this is based on reduction operations used todetermine whether the most favourable solutions held by processes have equal energy. This isachieved with the instructionsc o m p u t e l o w e s t e n e r g y (&minEnergy , &minVector ) ;MPI Allreduce (&minEnergy , &globalMinEnergy , 1 , MPI DOUBLE , MPI MIN , COMM) ;i f ( minEnergy == globalMinEnergy ) localHasOptimum = TRUE;MPI Allreduce (& localHasOptimum , &allHaveOpitimum , 1 , MPI INT , MPI LAND , COMM) ;the first of which determines the lowest energy locally, the second the lowest energy globally,followed by a further reduction to determine whether all processes possess solutions withenergies corresponding to that of the globally most favourable solution. This implements thetermination condition described in Chaper 4.dp gstate finder.cIn Chapter 3, it was established that the ground state energy of the Ising spin glass can beobtained using an algorithm consisting of nested loops. Based on formulating ground stateenergy as a dynamic programming problem, approaches to parallelisation inspired by those usedfor matrix/vector multiplication were presented in Chapter 4. The basic O(nm2 2m ) time serialalgorithm for computing ground state energy of the lattice without cyclic boundary conditionsleads to two parallel variants, using a collective communication operation between processes,or alternatively a cyclic shift operation. The latter was shown to be more memory efficient. Toaccount for cyclic boundary conditions in more than one dimension, the algorithm is requiredto be executed for all configurations of an arbitrary spin row (cf. Theorem 3.3). In the collectivevariant, the basic algorithm for systems without cyclic boundary conditions is given by thepseudocodeFloat[] pFloat[] p ′


62 Chapter 6. Software Implementationfor k:= (proc id * q p +1) to ((proc_id+1) * q p )Float minval := ∞for l := 1 to qi, i−1Mif p[l] +k,l< minvalminval := p[l] + Mp ′ [k] := minvalall gather(p ′ , p ′ )i, i−1k,lwhich is executed n times for an n × m spin column, using vector p ′ as argument p insuccessive iterations of the algorithm and matrices M i, i−1 to store interaction energies betweenconfigurations of spin rows i, i − 1. The latter are evaluated in the ith iteration of the algorithm.The all gather() operation combines the vector distributed among p processors into a single vector.Upon termination, vector p ′ contains ground state energies for all configurations of the nthspin row, from which ground state energy can be obtained for the entire lattice by determiningthe minimum vector component.As described, the algorithm is capable only of computing ground state energy; implicitinformation on actual ground state configuration is discarded. To enable this information tobe computed, it is necessary to retain at each iteration of the algorithm the value of l yieldingthe assignment p ′ [k] := minval, for all values of k. This corresponds to retaining the optimalconfiguration of row i − 1 for each of the q configurations of row i, with 1 < i ≤ n. This requiresa two-dimensional array.Module dp gstate finder.c implements the basic dynamic programming algorithm, suitedfor both serial and parallel execution. Both parallel variants based on collective and cyclic shiftoperations are implemented. To promote code reuse, this is achieved by using preprocessordirectives for conditional compilation.Similar to the implementation of harmony search, in addition to the entry function find ground states(),the module consists of six static functions. These are responsible for initialising and finalisingmessage passing, computing ground state energy, manipulating spin rows and applying the obtainedground state configuration to the spin glass data structure.Given the parallel algorithm in either of its variants, a problem the implementation mustaddress is how to distribute the set of configurations a spin row may assume, among processes.This amounts to distributing the rows of matrices M i,i−1 among processes, where each row accountsfor a unique configuration of spin row i. As spins assume binary state, a simple approachis to represent spin subsystems as bit strings, e.g. assigning spin values +1 ↦→ 1, −1 ↦→ 0.Exploiting the fact that processes are addressed using integer numbers in MPI, the bit stringrepresentation can be split into a prefix and suffix, where the prefix is given by the processnumber. For an m spin subsystem and p processors, prefixes consist of ⌈log 2 p⌉ bits, suffixesm − ⌈log 2 p⌉ bits. Providing the number of processes is a power of 2, it is possible to enumerateall possible spin configurations by each process considering its process number prefix, and allsuffixes 0 ≤ k < 2 m−log 2 p . This is the approach implemented in dp gstate finder.c.


6.3. Source code structure 63When find ground states() is invoked, the implementation begins by initialising messagepassing, following which the function get minimum path() is invoked. This is responsible forinitiating a series of further function calls, based on a loop which iterates through each rowin the lattice. After allocating memory for an array *minPath , get minimum path() allocates**minPathConf, the two-dimensional array used to record optimal subssystem configurations.The aforementioned loop then commences; for each spin row i, the functiong e t o p t i m a l p r e s t a t e s ( s p i n G l a s s , m i n P a t h P a r t i a l , minPathConf [ i ] , i , t r e l l i s C o l s , 0 ) ;is invoked, which performs the parallel matrix/vector operation previously described inpseudocode. The arguments are the spin glass data structure to optimise, a memory block correspondingto vector p, the matrix row to hold the optimal states of row i-1, the current spinrow, and the total number of elements in p. The final argument is used to enforce a particularconfiguration of the final spin row. In absence of cyclic boundary conditions its value is notsignificant. Should the spin glass indeed possess cyclic boundary conditions, the loop over spinrows is repeated for all configurations of this row, and the lowest obtained energy is accepted asthe ground state energy.Using conditional compilation based on the constant CYCLIC EXCHANGE, two implementationsof get optimal prestates() are provided, to account for both variants of the parallelalgorithm. If CYCLIC EXCHANGE is left undefined, a further constant USE MPI allows controlover whether message passing communications are used. If the latter is left undefined, theoptimisation proceeds serially.Both implementations of get optimal prestates() are based on the pseudocode designs previouslydiscussed, using control flow instructions for dealing with spin rows when i=1, for whichcyclic boundary interactions must be considered. In contrast to the presented pseudocode, theelements of matrices M i, i−1 are not stored explicitly in a data structure. Instead, loop variablesare used to determine matrix elements as demanded, which are computed by invoking the functionsdefined in spinglass.h on the spin glass instance. To this end, of importance is the functionadjust spin row(), which modifies a spin glass instance according to the bit string representationof a spin row.The collective implementation of get optimal prestates() begins by allocating the array*minPathNew, which is equivalent to vector p ′ in the pseudocode, with elements distributedamong processes. Elements of *minPathNew are assigned values, based on elements in *min-Path and interaction energies arising from the examined spin rows. Having completed thisevaluation, distributed vector elements are combined and reassigned to *minPath, using the instructionM P I A l l g a t h e r ( minPathNew , t r e l l i s C o l s / S olver N Procs , MPI DOUBLE , minPath ,t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE ,COMM) ;where trellisCols/Solver NProcs is the amount of vector components stored by each process,COMM is the global communicator and MPI DOUBLE is the data type of vector elements.


64 Chapter 6. Software ImplementationminPathConfigurations of row i−1Configurations of row iDetermine optimum states ofrow i−1, for row iminPathNewGather results held inminPathNewFigure 6.2: Schematic of operations performed by get optimal prestates() (basic dynamic programming,collective operations). In contrast, when using cyclic communications, processesevaluate different configurations of row i − 1, shifting elements in minPath.A schematic depiction of the optimisation process for a single invocation of get optimal prestates()is shown in Figure 6.2.Similar in its operation, the realisation of get optimal prestates() using cyclic shift operationsbetween processes distributes vector p ′ among processes using the array *minPathNew.However, instead of assigning all components of vector p to each process, these are also distributedamong *minPath. This requires multiple communication operations as optimisationprogresses for a single spin row. Here, elements in *minPath are examined in parallel by eachprocess, however since each only retains a fraction of components in p, it is necessary to performa cyclic shift of data. It turns out that as iteration through elements in *minPath progresses, it ispossible to communicate elements residing at neighbouring processes is advance. This suggestsa nonblocking communication scheme, which is implemented in the software module. The nonblockingcommunication scheme utilises MPI Issend(), Wait() and Recv() instructions insertedinto the optimisation loops (cf. Appendix F).After get optimal prestates() has been invoked for all spin rows, it remains to obtain theground state energy from *minPath and the corresponding ground state configuration from**minPathConf. Since the latter stores optimal configurations of preceding spin rows, for eachspin row, the ground state configuration can be recovered. This is achieved by determining theoptimum configuration of the final spin row, and traversing through matrix rows, referencingpreceding subsystem configurations. Function set optimal config() performs this activity. It isinvoked by get minimum path(), following which the ground state configuration is output usingspinglass write spins().


6.3. Source code structure 65Figure 6.3: Sliding window for improved dynamic programmingdp gstate finder fast.cIn Chapter 3, an improved serial algorithm for computing ground states was presented. Incontrast to the previous algorithm, instead of considering interacting spin rows in the lattice,subsystems can be considered positions of a ‘sliding window’. This window covers spin rowshorizontally, such that the total number of spins is equal to the number of columns in the latticeplus one. As with the row-wise approach, optimisation is achieved by comparing adjacentsubsystems. Here, adjacent subsystems are those obtained by advancing the sliding window byone spin (Figure 6.3).In Chapter 4, it was suggested that the matrix/vector approach can be used to arrive at an improvedparallel algorithm. As previously, matrices retain interaction energies between adjacentsubsystems. However, as a caveat of the sliding window approach, interacting subsystems mustshare spin configurations in the overlapping region between window positions. This means thatfor every subsystem configuration, it is only necessary to evaluate interactions with two configurationsof the preceding subsystem.The module dp gstate finder fast.c implements the improved algorithm for obtaining groundstates, for the lattice without cyclic boundary conditions. Similar in structure to dp gstate finder.c,the module consists of a function get minimum path(), which is responsible for performing themain optimisation. Given a spin glass instance, it proceeds to invoke get optimal prestates() ina loop which iterates through all subsystems in the lattice.Two main differences arise from the ‘sliding window’ approach to subsystems. Firstly,adjusting spin configurations based on bit strings requires a ‘leading spin’ to be referencedin the spin lattice, instead of a spin row. For this reason, the module implements the functionadjust spin ensemble(), whose arguments include the problem instance and referentialspin. Secondly, interaction between subsystems involves the energy introduced by a singlespin interacting with vertical and horizontal neighbours (Figure 6.1(c)). Therefore, functionget optimal prestates() utilises the library function spinglass ensemble delta().Invoking get optimal prestates() serves the same purpose as previously, namely to recordoptimal energy for increasing size, recording configuration data in a two dimensional array.Again, this is achieved using an argument *minPath which corresponds to vector p in the pseudocodealgorithm. After the function has returned, this array stores data equivalent to vector


66 Chapter 6. Software Implementationp ′ . The computation performed by get optimal prestates() is shown in Figure 6.4. Here, elementscorresponding to vector p ′ are computed in parallel, such that interactions between eachcorresponding subsystem configuration and preceding subsystems in both of its two states arecompared. Given the irregular pattern in which elements in *minPath are accessed, the approachusing a collective operation to combine elements of the resulting array *minPathNew isfavourable.The method of determining configurations of preceding subsystems to evaluate involves manipulatingthe subsystem’s bit string representation. Given a bit string where the most significantbit describes the leading spin’s state, conducting a left arithmetic shift reveals permissable configurationsof the preceding subsystem (the least significant bit may assume 1 or 0). Figure 6.4illustrates bit strings corresponding to subsystem configurations, for a 2 × 2 spin lattice.Once optimisation has completed, as with dp gstate finder.c it remains to restore the groundstate configuration from data stored in **minPathConf. Again, this is achieved using a functionset optimal config(). In this case, each row of **minPath yields information on the optimumstate of one spin. The final row is used to infer the state of an entire subsystem. The entireground state configuration can then be output.


6.3. Source code structure 67minPathWindow position i−1Window position iminPathNew↓ ↓ ↑ ↓ ↓ ↑ ↑ ↑ ↓ ↓ ↑ ↓ ↓↓ ↓ ↓ ↓ ↑ ↑ ↑↑↑↑↑000 001 010 011 100 101 110 111000 001 010 011 100 101 110 111↓ ↑ ↓ ↑ ↓ ↑ ↓↓ ↓ ↓ ↓ ↑ ↓ ↑ ↓ ↓ ↑ ↓ ↑ ↑ ↑↑↑ ↑P1 P2 P3 P4Determine optimum states ofrow i−1, for row iGather results held inminPathNewFigure 6.4: Schematic of operations performed by get optimal prestates() (improved dyanamicprogramming), executed on four processors. The problem instance is a 2 × 2 spin lattice.


Chapter 7Performance EvaluationSo far, approaches to solving spin glass ground states have been presented. These includeexactly solving methods based on dynamic programming, and the harmony search heuristic.Both approaches are implemented in software, suited for serial and parallel execution usingMPI. The dynamic programming implementation incorporates two variants, which are referredto as the basic and improved algorithms. Previous complexity analysis showed that the improvedalgorithm requires less run time than its counterpart.In examining techniques for parallelising these exact and heuristic algorithms, further alternativeswere described in Chapter 4. In the case of the dynamic programming algorithms,approaches based on collective and cyclic communication patterns were given. The latter areimplemented using nonblocking synchronous send operations in MPI. Both collective and cyclicvariants are applicable to the basic dynamic programming algorithm, whereas the improved dynamicprogramming algorithm relies solely on collective communications.In this chapter, the aforementioned solver implementations are examined in terms of theirperformance. Data are presented against varying parameters and interpreted. For the parallelexact solvers, a comparison is given between attainable performance on the Ness and HPCxmachines.7.1 Serial performanceIn the development process, serial versions of ground state solvers were implemented prior totheir parallel analogues. For the exact algorithms, besides facilitating an incremental developmentstrategy, this allowed an initial evaluation of performance, in order to gauge the possiblebehaviour of parallel dynamic programming. Similarly, performance data for serial harmonysearch were examined, in particular to assess the accuracy of solutions generated by the algorithm.69


70 Chapter 7. Performance Evaluation3000Serial dynamic programming code performance25002000Time (s)1500100050000 20 40 60 80 100 120 140 160 180 200SpinsFigure 7.1: Execution times for serial dynamic programming (basic algorithm)7.1.1 Dynamic programmingExecution time data for serial dynamic programming were gathered on Ness. The experimentalprocedure involved invoking both variants of the algorithm on the machine’s back-end, againstvarying problem sizes. Timing data were recorded using the shell’s time command. Whileoffering limited accuracy and resolution, this method was deemed sufficient, considering themagnitude of execution times. The source code was compiled using the gcc compiler, supplyingthe -O2 optimisation flag. Random problem instances were generated as square lattice k-spinsystems without cyclic boundary conditions.Basic algorithmResults for basic dynamic programming are shown in Figure 7.1. As shown, problem instancesare generated for systems of up to 14 2 spins. As one would expect, execution time rises monotonically,such that the recorded time for 14 2 spins is approximately 42min. Considering theascertainments made in Chapter 3 about the algorithm’s asymptotic behaviour, the graph appearsto confirm an exponential relationship between system size and execution time.To examine run time behaviour more closely, the data are visualised as a logarithmic plot(Figure 7.2). Here, it is apparent that execution time cannot be accurately approximated withthe function f (k) = αe β k , since it is ln ( f (k)) = ln (α) + βk, which corresponds to a line. Also,the plot shows near-constant values for the first three data points. This is likely to result fromlimited timing resolution.In Chapter 3, the algorithm’s asymptotic complexity was shown to be O ( √ k 2 2 √ ) k, for a


7.1. Serial performance 7186Serial dynamic programming code performanceCurve fit42lg(Time) (s)0-2-4-6-8-10-120 20 40 60 80 100 120 140 160 180 200SpinsFigure 7.2: Log execution times for serial dynamic programming (basic algorithm)square lattice k-spin system without cyclic boundary interactions. From this fact, it is clearthat a more accurate model of execution time must consider an exponential relationship withrespect to the root of system size. The function f (k) = αe β √k is thought to be an adequateapproximation.Figure 7.2 includes a fit of the function ln ( f (k)) = ln (α) + β √ k to log plotted data points.The first three data points are excluded from the fit. This was obtained using the Marquardt-Levenberg algorithm implemented in Gnuplot. With asymptotic standard errors of 0.9365%and 0.8656% respectively, values of α = 1.77111 10 −6 and β = 1.50197 were computed. Thevalue βln 2= 2.1667 bears similarity to the theoretical value of 2 in the exponential term of thealgorithm’s asymptotic complexity. The greater value may be attributed to approximation usingconstant α.Improved algorithmResults for improved dynamic programming are shown in Figure 7.3. Here, problem instanceswere generated in the range of k = [4, 361] spins. Comparison with Figure 7.1 reveals thatas expected, execution times are lower. As a practical advantage, this allowed the algorithm’sperformance to be evaluated against larger problem instances during experimentation.A log plot of these data is shown in Figure 7.4. As before, this representation reveals nearconstantexecution time for the first data points in the series. A unique feature is the data pointat k = 49, which is an outlier in what appears to be another exponential curve against √ k. It isspeculated that the outlier is due to caching effects: The Opteron 1218 processor on Ness hasa 64KiB L1 data cache, which is likely to be sufficient for containing optimisation data held in


72 Chapter 7. Performance Evaluation120Serial improved dynamic programming code performance10080Time (s)60402000 50 100 150 200 250 300 350 400SpinsFigure 7.3: Execution times for serial dynamic programming (improved algorithm)6Serial improved dynamic programming code performanceCurve fit420lg(Time) (s)-2-4-6-8-100 50 100 150 200 250 300 350 400SpinsFigure 7.4: Log execution times for serial dynamic programming (improved algorithm)


7.1. Serial performance 738e+06Serial dynamic programming code performance7e+06Resident memory consumption (KiB)6e+065e+064e+063e+062e+061e+060100 200 300 400 500 600 700SpinsFigure 7.5: Memory consumption for serial dynamic programming (basic algorithm)**minPathConf and *minPath (cf. Chapter 6): The former requires 6 × 7 × 2 8 × 4bytes = 42KiB,the latter 2 8 × 4bytes = 1KiB. The spin glass data structure is estimated to require less than1KiB, yielding a total of less than 64KiB (considering the size of additional memory blocks).Fitting the log plot to the function used for analysing basic dynamic programming, ln ( f (k)) =ln (α) + β √ k allows further comparison of the two algorithms. Using the same procedure forproducing the fit, obtained values are α = 1.0845 10 −5 , β = 1.2275, with asymptotic standarderrors of 0.8924% and 0.9401%, respectively. The value of β is close to the theoretical valueof 1 in the exponential term of the algorithm’s complexity function; compared to basic dynamicprogramming, execution time is observed to grow at a slower rate, as expected.Memory consumptionBrief experiments were conducted to assess memory consumed by the dynamic programmingimplementations. Considering resident memory values, as reported by the top process utility,data were recorded by initiating computation using increasingly large problem sizes. For bothalgorithms, as allocated memory remains constant for the majority of computation, it was notnecessary to execute until termination.Plots of memory consumption are shown in Figures 7.5,7.6. For basic dynamic programming,the data reveal that to avoid swapping on a machine with 4GiB (e.g. Ness), the maximumproblem size is a 24×24 spin lattice. With improved dynamic programming, the maximum problemsize decreases to 19 × 19 spins. This behaviour is expected, since **minPathConf containsO ( √ k 2 √ ) kvs. O ( k 2 √ ) kelements, for a k-spin square lattice. Again using a log plot approach(Figures 7.7,7.8), performance data are fit to the function f (k) = β k α 2 √k , whose logarithm is


74 Chapter 7. Performance Evaluation16Serial dynamic programming code performanceCurve fit15lg(Resident memory consumption) (KiB)1413121110100 200 300 400 500 600 700SpinsFigure 7.6: Log memory consumption for serial dynamic programming (basic algorithm)√ln (β) + α ln (k) + kln(2). For basic dynamic programming, obtained values are α = −9.46851,β = 40.42 (asymptotic standard errors 1.401% and 1.924%, respectively). The values for improveddynamic programming are α = −6.76659, β = 27.1801 (asymptotic standard errors2.092% and 2.844%). Comparing the two values of β, it is apparent that between the two variantsof dynamic programming, there exists a trade-off between execution time and memoryefficiency: In terms of execution time, improved dynamic programming is preferable, whereasfor memory consumption, the basic algorithm is preferable.7.1.2 Harmony searchSerial harmony search was evaluated by comparing solutions generated by the heuristic toground truth, based on a 6 × 6 spin problem instance with equally distributed bonds in therange [−1, 1). Ground truth was obtained by conducting an exhaustive search on the probleminstance. While varying the number of solution vectors used, search was executed multipletimes. Results were used to compute minimum error , mean error and standard error values.Totalling 80 executions for each value of NVECTORS, results are presented in Table 7.1.As shown, standard and mean error values improve monotonically when increasing algorithmmemory capacity. No improvement in error rate is given, when increasing memory toNVECTORS=50; the algorithm’s ability to find the exact ground state decreases under the specifiedparameter value. Despite this, µ e and σ e suggest that large NVECTORS benefits solutionquality in general. This is in agreement with the behaviour of ‘solution exploration’ describedin Chapter 3. Exploring the algorithm’s behaviour against large NVECTORS is indeed the motivationbehind developing parallel harmony search.


7.1. Serial performance 757e+06Serial improved dynamic programming code performance6e+06Resident memory consumption (KiB)5e+064e+063e+062e+061e+060100 150 200 250 300 350 400SpinsFigure 7.7: Memory consumption for serial dynamic programming (improved algorithm)16Serial improved dynamic programming code performanceCurve fit15lg(Resident memory consumption) (KiB)1413121110100 150 200 250 300 350 400SpinsFigure 7.8: Log memory consumption for serial dynamic programming (improved algorithm)NVECTORS = 1 NVECTORS = 2 NVECTORS = 10 NVECTORS = 50µ e 1.84 1.55 0.97 0.83σ e 0.83 0.77 0.77 0.61ɛ e 0.06 0.10 0.14 0.10Table 7.1: Mean error µ e , standard error σ e and error rate ɛ e of serial harmony search groundstates for increasing solution memory NVECTORS. Results are based on the ground truth value−30.7214. Error rate is defined as the amount of correctly obtained ground state configurationsover the total amount of algorithm invokations.


76 Chapter 7. Performance EvaluationOptimisation flagsExecution time-O0 10.682s-O1 10.542s-O2 6.354s-O3 6.340s-O3 -funroll-loops 4.043s-O3 -funroll-loops -ftree-loop-im 4.043s-O3 -funroll-loops -ftree-loop-im -funswitch-loops 4.043sTable 7.2: Serial execution times for basic dynamic programming on Ness, for various GCC 4.0optimisation flags7.2 Parallel performanceThe architecture of the Ness and HPCx machines was described in Chapter 5. In the following,the method and results of performance assessment are presented for the implemented parallelalgorithms. As with the serial algorithms, results are interpreted.7.2.1 Dynamic programmingSince the dynamic programming algorithms are deterministic, the opportunity is given to assessparallel performance in terms of execution time. That is, given parallel execution time T p onp processors, and serial execution time T s , it is possible to describe performance in terms ofparallel efficiency T s /(T p p).In preparation for experiments on Ness, serial execution time was measured against variouscombinations of gcc compiler flags, based on the basic dynamic programming algorithmand a 11 × 11 test spin problem. Using the -O3 optimisation level with flag -funroll-loops,for automated loop unrolling offered the greatest gain in performance over unoptimised code.Timing data are shown in Table 7.2. This behaviour is not surprising, since the code is heavilyreliant on loops for processing spin glass data structures. In contrast, rudimentary analysis ofthe source code reveals few cases where performance would likely benefit from loop-invariantmotion (pertaining to other optimisation flags used).On HPCx, the same test spin problem was used to assess execution time on the machine’sserial job node. Here, the effect of target architecture optimisation was considered, using thexlc r re-entrant compiler, version 8.0. For all tests, 64-bit compilation was enabled using the-q64 flag. Timing data are listed in Table 7.3. The set of compiler flags used for parallelperformance evaluation was -qhot -qarch=pwr5 -O5 -Q -qstrict.The parallel environment on HPCx allows control over a number of settings [3], potentiallyinfluencing distributed application performance. Specifically, settings effect the protocolused for communicating between shared memory nodes, including use of remote direct memory


7.2. Parallel performance 77Optimisation flags Execution time-g 91.50s-qhot -qarch=pwr4 -O3 19.99s-qhot -qarch=pwr4 -O4 19.29s-qhot -qarch=pwr5 -O4 19.23-qhot -qarch=pwr5 -O5 18.38s-qhot -qarch=pwr5 -O5 -Q 18.26-qhot -qarch=pwr5 -O5 -Q -qstrict 17.95Table 7.3: Serial execution times for basic dynamic programming on HPCx, for various xlcoptimisation flagsCommunications directive Execution timeUS,bulkxfer 12.68sUS 14.12sIP,bulkxfer 14.32sIP 14.26sTable 7.4: Results for parallel basic dynamic programming on HPCx using 32 processors, forcombinations of user space (US) or IP communications in conjunction with the bulkxfer directiveaccess (RDMA). Using requested processor counts of 32 to examine the effect of inter-nodecommunications, timing data were recorded using combinations of two LoadLeveler directives,network.MPI and bulkxfer. The former is responsible for defining the aforementioned protocol,while the latter controls direct memory access. Table 7.4 shows results obtained for messagepassing using IP and user space protocols, in conjunction with RDMA. Given these results, itwas decided to utilise the user space protocol for computations involving multiple nodes.Performance on NessResults for basic dynamic programming on Ness are shown in Figures 7.9, 7.10, 7.11. FromFigure 7.9, it is apparent that execution time generally diminishes as the processor count isincreased, with the exception of smaller 10×10 and 11×11 spin instances. The log plot indicatesthat for larger problem instances, execution time becomes increasingly ( √klinear in relation to thenumber of processors. This is in agreement with theoretical O√ )22 kexecution time for ak-spin square lattice.The parallel efficiency plot in Figure 7.10 confirms near-linear scaling for large probleminstances. Good scalability is observable for problem instances larger than 13 × 13 spins, forwhich efficiency exceeds 90% on 16 processors. Efficiency drops approximately linearly forinstances larger than 11 × 11 spins. As an extreme case, performance drops sharply for thep


78 Chapter 7. Performance Evaluation10000100010x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins100Time (s)1010.10 2 4 6 8 10 12 14 16ProcessorsFigure 7.9: Parallel execution time for dynamic programming (basic algorithm, Ness)10.90.810x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins0.7Parallel efficiency0.60.50.40.30.20.100 2 4 6 8 10 12 14 16ProcessorsFigure 7.10: Parallel efficiency for dynamic programming (basic algorithm, Ness)


7.2. Parallel performance 79Application time (s) / Total execution time (s)10.950.90.850.810x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins0.750 2 4 6 8 10 12 14 16ProcessorsFigure 7.11: Vampir trace summary for dynamic programming (basic algorithm, Ness)10 × 10 instance, at a rate decreasing against p.To interpret these results, it is reminded that the basic dynamic programming algorithm requiresa sequence of √ k blocking, collective gather operations to complete computation. Foreach of these operations, each processor contributes 2 √k elements. After ground state energyhas been obtained from array *minPath, the ground state configuration is recovered from **min-PathConf through a similar sequence of √ k gather operations.Clearly, scalability is affected by the size of problem instances, since this influences theamount and size of messages sent between processors. If the cost of a single collective gatheris approximated as t gather = p ( T 0 + m 1 B)where p is the number of processors, T0 the messageinitialisation cost, m the message size and B the bandwidth, it follows that for constant messagesize, overall cost relates linearly to p. This serves as a possible explanation for the linearreduction in parallel efficiency observed for the majority of problem instances in Figure 7.10.The increase in efficiency for larger problem instances can then be attributed to the fact thatcomputing ground state energy requires ∝ 1 p 2 m operations per processor (cf. Chapter 4). Consequentially,for constant p, the fraction m/ m2pdiminishes as m is increased; communication coststhus become less significant as the problem size increases. It is speculated that the 10 × 10 spinlattice causes severe imbalance between communication and computation, so that the amount ofcomputation is closely approximated by a constant, regardless of p.Figure 7.11 shows the fraction T c /T m of parallel computation time over communicationtime. These data were gathered by re-linking compiled source code with the Vampir library andrecording summary data as reported by the applications trace utility. Time spent on tracer APIcalls is omitted. As a general trend, it is observed from the plot that increasing the number of


80 Chapter 7. Performance Evaluation1000010x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins1000Time (s)1001010 2 4 6 8 10 12 14 16ProcessorsFigure 7.12: Parallel execution time for dynamic programming (basic algorithm, cyclic communications,Ness)processors does indeed increase the proportion of time spent on communication. For the 14×14,15×15 lattices, T c /T m does not decrease monotonically with p. This may be due to the accuracyof trace data, which indicate a non-monotonic relation between lattice sizes and scalability.Having examined performance of basic dynamic programming using collective operations,a similar procedure is given for the approach based on cyclic communication. In Figures 7.12,7.10, 7.11, plots of execution time, parallel efficiency and the fraction T c /T m are shown. FromFigure 7.12, it is again observed that increasing the processor count causes execution time todiminish, with the exception of the 10×10 lattice. For the latter, performance appears to degrademore profoundly as with the collective variant of the algorithm. This is to the extent that executiontime on 16 processors exceeds that obtained for a single processor. For larger processorcounts and the remaining problem instances, performance appears to degrade uniformly; thiseffect is shown more clearly in Figure 7.13. Here, parallel efficiency fluctuates in the range of[1, 4] processors, before decreasing monotonically for each examined problem instance. Significantly,scalability does not improve monotonically as lattice size is increased. Nevertheless, it ispossible to group problem instances into two categories, such that the smaller 10×10 and 11×11lattices result in parallel efficiency in the range [.4, .5] on four processors, with the remainderattaining [.99, .8] efficiency. Increasing the processor count to 16, parallel efficiency drops to[.4, .5], [.01, .2] for the respective groups. From Figure 7.14, it is observed that communicationcosts become significant for all problem sizes, as the processor count increases: For p = 16, thefraction T c /T m lies in the range [.4, .5] for all examined lattices, except the 10 × 10 lattice, forwhich the fraction is further diminished due to communication costs.


7.2. Parallel performance 8110.90.810x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins0.7Parallel efficiency0.60.50.40.30.20.100 2 4 6 8 10 12 14 16ProcessorsFigure 7.13: Parallel efficiency for dynamic programming (basic algorithm, cyclic communications,Ness)Application time (s) / Total execution time (s)10.90.80.70.60.50.410x10 spins11x11 spins12x12 spins13x13 spins14x14 spins15x15 spins0.30 2 4 6 8 10 12 14 16ProcessorsFigure 7.14: Vampir trace summary for dynamic programming (basic algorithm, cyclic communications,Ness)


82 Chapter 7. Performance Evaluation100010010x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spinsTime (s)1010.10 2 4 6 8 10 12 14 16ProcessorsFigure 7.15: Parallel execution time for dynamic programming (improved algorithm, Ness)Comparing the two variants’ performance, it is observed that using collective communicationsreduces execution time on few processors. This suggests that in this case, collectivecommunication costs are less expensive than cyclic operations. Also, it is reminded that thecyclic variant of the algorithm requires additional conditional statements, which increases thenumber of branch instructions in the code. Scalability is significantly reduced, indicating thatproblem instances significantly larger than 15 × 15 spins are required to obtain favourable efficiencyat p > 16 processors. It is possible that sufficiently large problem instances mightexpose the cyclic approach as advantageous, these are however not explored due to restrictedexperimental time scales. For the examined problem sizes, reduced scalability is thought to beinfluenced by synchronisation overhead, such that the amount of computation within the nestedloops ∗ is not sufficient to merit overlapping communications.Results for improved dynamic programming executed on Ness are shown in Figures 7.15,7.16, 7.17. For all examined problem instances, parallel execution times behave similarly as observedfor the 10 × 10 lattice using basic dynamic programming: Here, increasing the processorcount causes performance to degrade severely for smaller lattices, such that parallel efficiencydrops to around 20% at p = 4 processors. Larger lattices result in slightly enhanced parallelefficiency, however increasing to p = 16 causes near-uniform degradation to around 10%.Figure 7.17 shows performance degradation from the perspective of computation and communicationtime. The fraction T c /T m behaves as expected in relation to Figure 7.16, indicatingthat performance degradation is due to communication costs. In comparison to basic dynamicprogramming using cyclic communications, the effect of increasing processors is further pro-∗ cf. Chapter 4


7.2. Parallel performance 8310.90.810x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins0.7Parallel efficiency0.60.50.40.30.20.100 2 4 6 8 10 12 14 16ProcessorsFigure 7.16: Parallel efficiency for dynamic programming (improved algorithm, Ness)Application time (s) / Total execution time (s)10.90.80.70.60.50.40.30.210x10 spins12x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins0.100 2 4 6 8 10 12 14 16ProcessorsFigure 7.17: Vampir trace summary for dynamic programming (improved algorithm, Ness)


84 Chapter 7. Performance Evaluation1000001000011x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins1000Time (s)1001010 50 100 150 200 250 300ProcessorsFigure 7.18: Parallel execution time for dynamic programming (basic algorithm, HPCx)nounced, such that T c /T m is reduced to under 20% at 16 processors.Comparing basic and improved variants of the algorithm, it appears there exists a tradeoffbetween scalability and algorithmic complexity. Whereas basic dynamic programming hashigher algorithmic complexity, results show favourable scalability up to 16 processors. In contrast,improved dynamic programming is a more efficient algorithm in terms of complexity,however scalability is considerably diminished on Ness for examined problem sizes. A possibleexplanation for this behaviour is provided by the number of communication operations, whichis O (k) for the improved variant, versus O ( √ k ) required for the basic variant, for a k-spin lattice.Given that communication takes place every O ( 2 2 √ k ) instructions, versus every O ( 2 √ k )instructions for basic (collective) and improved algorithms, respectively, it is clear that the ratioof computation against communication is lower for the improved algorithm. Since communicationsare non-blocking in both cases, it follows that for improved dynamic programming, agreater proportion of execution time is due to communication operations. As a consequence,this reduces scalability.Performance on HPCxPlots of performance data on HPCx for basic dynamic programming using collective communicationsare shown in Figures 7.18, 7.19. Because of how the machine’s resources are groupedinto logical partitions and their implication for time budgeting, the processor count was scaledlike 16 2 n , albeit to greater magnitude than on Ness. For small problem sizes, behaviour is asobserved on Ness, where increasing the processor count effects little improvement in executiontime. Scalability improves as problem size is increased, to the extent that parallel efficiency is


7.2. Parallel performance 851.110.911x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins0.8Parallel efficiency0.70.60.50.40.30.20.100 50 100 150 200 250 300ProcessorsFigure 7.19: Parallel efficiency for dynamic programming (basic algorithm, HPCx)greater than 95%, for lattices with 15 × 15, 16 × 16 spins solved on 256 processors. A distinctfeature is observed for the 15 × 15 lattice, where super linear speedup appears to occur in therange of [16, 128] processors.In Figures 7.20, 7.21, results for the algorithm variant using cyclic communications areshown. In comparison to the collective approach, again performance improves as problem sizeis increased. However, the obtained parallel efficiency is around 60% at 256 processors, for a16 × 16 spin lattice. This decline in performance is similar to that observed on Ness. In contrast,on HPCx , increasing parallel efficiency reflects the ordering of problem sizes more accurately.Fluctuations observed on Ness are not present; for all examined problem instances executiontime decreases monotonically against the number of processors. As with the collective variant,parallel efficiency obtained for the 15 × 15 lattice exceeds that for the 16 × 16 lattice, on 16and 32 processors. In contrast, scaling performance is not sufficient for super linear speedup, aspreviously noted.Results for improved dynamic programming on HPCx are shown in Figures 7.22, 7.23.Here, performance drops rapidly for all explored problem sizes, such that executing on 16 processorsreduces parallel efficiency to below 50%. Increasing the number of processors, efficiencytails off further; at 256 processors, it is less than 10%. Significantly, in resemblance tothe aforementioned results, the largest examined problem instance does not result in the mostscalable computation: The 22 × 22 lattice falls behind 18 × 18 and 20 × 20 instances in terms ofparallel efficiency. This phenomenon is observed for all evaluated processor counts.Concluding from performance data on HPCx, the three algorithm variants exhibit varyingdegrees of scalability. From most to least scalable, the algorithms are ordered as:


86 Chapter 7. Performance Evaluation1000001000011x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins1000Time (s)1001010 50 100 150 200 250 300ProcessorsFigure 7.20: Parallel execution time for dynamic programming (basic algorithm, cyclic communications,HPCx)10.90.811x11 spins12x12 spins13x13 spins14x14 spins15x15 spins16x16 spins0.7Parallel efficiency0.60.50.40.30.20.100 50 100 150 200 250 300ProcessorsFigure 7.21: Parallel efficiency for dynamic programming (basic algorithm, cyclic communications,HPCx)


7.2. Parallel performance 87100012x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins100Time (s)1010 50 100 150 200 250 300ProcessorsFigure 7.22: Parallel execution time for dynamic programming (improved algorithm, HPCx)10.90.812x12 spins14x14 spins16x16 spins18x18 spins20x20 spins22x22 spins0.7Parallel efficiency0.60.50.40.30.20.100 50 100 150 200 250 300ProcessorsFigure 7.23: Parallel efficiency for dynamic programming (improved algorithm, HPCx)


88 Chapter 7. Performance Evaluation10.916x16 spins, Improved DP16x16 spins, Basic collective DP16x16 spins, Basic cyclic DP0.80.7Parallel efficiency0.60.50.40.30.20.100 50 100 150 200 250 300ProcessorsFigure 7.24: Summary of parallel efficiencies on HPCx• Basic algorithm using collective communications• Basic algorithm using cyclic communications• Improved algorithm using collective communicationsThis ordering is as observed on Ness, however problem scalability is higher on HPCx, for eachof the variants. This is attributed to lower communication costs on HPCx, resulting from highermessage passing bandwidth available on the machine. A summary of the algorithms’ parallelefficiency on HPCx is shown in Figure 7.24, based on a 16 × 16 lattice.7.2.2 Harmony searchThe parallel harmony search algorithm introduced in Chapter 4 is based on a combination oftwo types of communication operation. Considering additional algorithm parameters, the algorithmexhibits a high degree of flexibility; this leads to a potentially large set of algorithmvariants. The latter must be considered when examining performance. To restrict the spaceof algorithm variants, it was decided to confine the behaviour of communication operations:Hence, cyclic operations are based on exchanging random solution vectors between processes,such that favourable solutions are retained. Collective operations take place between processgroups of specified size. Cyclic operations are executed every iteration of the harmony searchalgorithm, while collective operations are executed periodically.The question arises how to assess the heuristic’s parallel performance. For a deterministicalgorithm, such as the exact dynamic programming based solver, performance is characterised


7.2. Parallel performance 89TimeTimeProcessorsAccuracyProcessors(a) Non-heuristic(b) HeuristicFigure 7.25: Conceptual representation of properties relevant to parallel performanceby scalability. Scalability is quantified in terms of the algorithm’s execution time against thenumber of processors on which it is executed. From the latter, measures such as speedup andparallel efficiency can be computed. This leads to a two-dimensional space (Figure 7.25(a)),which may be explored experimentally; for a given problem size, it may for example be of interestto approximate the function which maps the number of processors to execution time. Inthe case of heuristic algorithms however, an additional dimension is significant for characterisingperformance, namely the accuracy of generated solutions. As a result, the space in whichperformance is evaluated is three-dimensional (Figure 7.25(b)). Experimental exploration mayinvolve assessing the relation between accuracy and execution time, for a given number of processors.Another possibility might involve approximating the boundary surface in the space,providing such a surface exists.From the discussion in Chapter 3, it is evident that quantifying solution accuracy is nontrivial:It is necessary to define a measure to compare solutions with one another. An obviousapproach is to use the utility function, if defined by the heuristic. However, it might proveadvantageous to employ a measure more reflective of the problem’s solution landscape, forexample considering the distribution of solution utility values.In the following description of an attempt at performance evaluation, parallel harmonysearch was executed on a number of test instances, while varying the number of processes anda selection of algorithm parameters. As previously explained, the algorithm possesses a significantnumber of parameters. Given the specified communication strategies, these include thenumber of solution vectors NVECTORS, the memory choosing rate, and the rate of performingcollective operations ZONEEXBLOCK.Experiment series are based on three lattice sizes of 12 × 12, 14 × 14 and 15 × 15 spins.For each size, five instances were generated, using random uniform bond distributions in therange [−1, 1). The procedure for every configuration of parameters and process count involved


90 Chapter 7. Performance Evaluationexecuting the algorithm on each lattice instance five times. Result data were then collected andmean values computed. A single data point used in visualisation corresponds to the mean resultobtained for a given lattice size instance.Evidently, using several problem instances multiplies the number of times the parallel algorithmmust be invoked. As a compromise to reduce the number of invokations, the twoparameters NVECTORS and the memory choosing rate were held constant. More importantly,the three-dimensional space to explore is adapted, such that execution time is replaced by thenumber of loop iterations executed by harmony search. This is thought to better reflect theperformance property of state space exploitation, described in Chapter 3. An advantage of theparallel algorithm’s design is that it terminates when all processes hold identical solution vectors(cf. Chapter 4). Consequentially, the aforementioned performance property can be seenas a ‘dependent variable’ reflecting solution exploitation, which need not be considered whenpermuting algorithm parameters. Effectively, this allows performance assessment to be dividedbetween exploring the relations number of processes against accuracy and number of processesagainst algorithm iterations.Experiments were carried out on Ness, using up to 16 processors. The size of processorsubgroups ZONESIZE was varied in the range [1, 16], so that the number of processors lies inthe range [ZONESIZE, 16] for each experiment. The parameter ZONEEXBLOCK was variablyassigned values 10 2 , 10 3 , 10 4 . For each lattice instance, solution accuracy was characterisedin terms of energetic aberrance from ground truth data obtained using dynamic programming.Also, solution configurations were compared using the Hamming distance [35] † . Finally, thenumber of algorithm iterations was recorded.Performance resultsIn Figure 7.27 performance data for ZONEEXBLOCK = 10 are shown, against varying processornumbers, lattice sizes and ZONESIZE. Quantitatively, the plot corresponds to the series ofexperiments where collective operations are performed frequently among processes. As the algorithmis defined, solutions are exchanged at a constant rate between process groups. The latterhowever vary in size with parameter ZONESIZE, as previously mentioned. Given a subgroupsize, the smallest collection of processes consists of a single subgroup; in general the processorcount must be a multiple of ZONESIZE. For this reason, curves in the plot vary in length. As anexample of reading the plot, consider the curves s16 which range from 4 to 16 processes. Thesecorrespond to invoking the algorithm with a subgroup size of 4. As a special case, for each plotthere exist two curves per lattice size in the range [1, 16]. These correspond to subgroup sizesof 1 and 2.Figure 7.27 describes ∆ E , the difference between ground truth and mean solution energies,† The implemented algorithm takes the complement of spin configurations into account, where all spin states areinverted.


7.2. Parallel performance 91100000s12s14s16100001000Iterations1001012 4 6 8 10 12 14 16ProcessorsFigure 7.26: Parallel harmony search convergence durations (ZONEEXBLOCK= 100)0-20s12s14s16-40-60Delta E-80-100-120-140-160-1802 4 6 8 10 12 14 16ProcessorsFigure 7.27: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100)


92 Chapter 7. Performance Evaluation0-20s12s14s16-40-60Delta E-80-100-120-140-160-1802 4 6 8 10 12 14 16ProcessorsFigure 7.28: Parallel harmony search convergence durations (ZONEEXBLOCK= 1000)against processors p. On initial consideration, it is observed that increasing the processor countreduces aberrance in some cases: Accuracy for one of the 16 × 16 spin lattice series improvesfrom around −160 to −60 at 16 processors. It turns out that this series corresponds to theparameter value ZONESIZE = 1. Similar improvements occur for 12 × 12 and 14 × 14 lattices,from −120 and −85 to −35 and −17, respectively. However, increasing ZONESIZE to 2 effectsan increase in solution accuracy in all cases, such that little improvement in accuracy is observedwhen increasing p.Comparing Figures 7.27, 7.28, 7.29 allows insight to be gained into the effect of increasingthe frequency of collective exchanges within processor subgroups. For increasing ZONE-EXBLOCK, the effect of p becomes less significant: With the exception of experiment seriesconducted for ZONESIZE = 1, all processor counts yield energetic aberrances in the approximaterange [−10, −20]. For ZONESIZE = 1, behaviour is consistent for all values of ZONE-EXBLOCK, to the extent that increasing p effects a significant increase in solution accuracy asobserved for ZONEEXBLOCK = 10 2 .From the previous observations, two conclusions can be drawn with regard to solution exploration.Firstly, it appears that increasing the value of ZONEEXBLOCK causes solution explorationto improve, given that accuracy as characterised by ∆ E improves. This is in agreementwith the assumption made in Chapter 4, where solution exploration and exploitation were describedas opposing qualities in the search process. Assuming that collectively exchanging solutionsbenefits solution exploitation, an obvious consequence of reducing the frequency of thisoperation is increased solution accuracy. Secondly, from the increase in solution accuracy betweensubgroups sized 1 and 2, it is concluded that contrary to prior expectation, the ring-based


7.2. Parallel performance 930-20s12s14s16-40-60Delta E-80-100-120-140-160-1802 4 6 8 10 12 14 16ProcessorsFigure 7.29: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 1000)scheme of exchanging solutions contains an element of solution exploitation. In increasing thesize of subgroups, more opportunity is evidently given for diverse solution ‘islands’, since thereexist processes only participating in infrequent collective operations. A possible explanation forthe increase in accuracy against p is the circumference of the ring in which processes exchangesolutions. For large circumferences, it becomes increasingly propagating a solution across thering becomes increasingly involved. This also improves solution diversity.Figures 7.26, 7.30, 7.31 show performance results in terms of algorithm iterations until convergence.The scheme is identical to that used to visualise solution aberrance. In Figure 7.26,results for ZONEEXBLOCK = 100, (where collective operations occur frequently) show thatincreasing p above ZONESIZE causes a reduction in execution time for all lattice and processsubgroup sizes. As previously observed, an exception are the series executed for unit ZONE-SIZE, where the number of iterations increases against the processor count. Also, maximumexecution times occur for ZONESIZE = 16.These results are interpreted as follows: Firstly, the reduction in execution times against p isattributed to the solution exploitation property of ring-based communications: As p is increased,so does the number of processor subgroups. Since the latter exchange solutions frequently, convergenceis promoted between those processes involved in ring communications. Convergencebetween remaining processors is affected by the rate of subgroup communications. Secondly,when no cyclic communications take place, it follows that convergence is only promoted bycollective communications, which in all experiments occur infrequently in comparison to cycliccommunications. This serves as an explanation for peak execution times when ZONESIZE = p.Thirdly, for unit ZONESIZE execution times are comparatively short, which is attributed to


94 Chapter 7. Performance Evaluation1e+06s12s14s1610000010000Iterations10001001012 4 6 8 10 12 14 16ProcessorsFigure 7.30: Parallel harmony search convergence durations (ZONEEXBLOCK= 10000)1e+06s12s14s1610000010000Iterations10001001012 4 6 8 10 12 14 16ProcessorsFigure 7.31: Parallel harmony search energetic aberrance (ZONEEXBLOCK= 100000)


7.2. Parallel performance 95130120s12s14s16110Hamming distance10090807060502 4 6 8 10 12 14 16ProcessorsFigure 7.32: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 100)absence of processes exempt from cyclic communications. Since the latter occur frequently,convergence is promoted especially rapidly.Figures 7.32, 7.33, 7.34 plot the Hamming distances of generated solutions against processors,for all conducted experiment series. This metric is designed to expose accuracy in termsof the number of dissimilar spin states, in solutions generated by the heuristic. Increasing thenumber of processors to 16 appears to decrease Hamming distance slightly, for all lattice instances.It is observed that distances are approximately equal to k 2, where k is the number ofspins. This suggests that the distribution of spin configurations against system energy might beuniform. Considering this, the metric does not appear expressive of solution accuracy.Overall, results indicate that parallel Harmony search does improve solution accuracy. However,it must be considered that the improvements shown in Figures 7.27, 7.28, 7.28 are marginal.Also, it is noted that comparatively good performance is achieved on few processors, providingalgorithm parameters are selected carefully. Cyclic communications were observed to containa significant element of solution exploitation. Unsurprisingly considering the latter, lowest energeticaberrance is achieved when communications are minimised. The attempt to quantifyaccuracy in terms of Hamming distance highlights the difficulty of obtaining solutions heuristically:The spin glass problem appears to have a rough solution landscape, which poses adifficulty for finding ground states using harmony search. In all conducted experiment series,only suboptimal solutions were found.Because of their fundamental differences, comparison between examined exact approachesand harmony search is difficult to achieve. Whereas dynamic programming places exact demandson computation due to its deterministic nature, the heuristic is flexible in terms of re-


96 Chapter 7. Performance Evaluation130120s12s14s16110Hamming distance10090807060502 4 6 8 10 12 14 16ProcessorsFigure 7.33: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 1000)130120s12s14s16110100Hamming distance9080706050402 4 6 8 10 12 14 16ProcessorsFigure 7.34: Parallel harmony search solution Hamming distance (ZONEEXBLOCK= 10000)


7.2. Parallel performance 97sources, albeit at the expense of accuracy. All dynamic programming approaches were shownto benefit from high bandwidth communications as found on HPCx. The codes are thus suitedfor execution on non-vector supercomputer machines with many processors. In contrast, dependingon algorithm parameters, execution performance on a commodity cluster system withlow latency Gigabit Ethernet may prove adequate. This is estimated from 153s execution timeon Ness, corresponding to around 20000 iterations of harmony search on 16 processors, for a256 spin lattice. Guest [33] provides an overview of message passing performance on commoditysystems, which suggests reasonable bandwidth would be obtained.


Chapter 8ConclusionIn the previous chapters, implemented parallel optimisation software was described and experimentalresults presented. Given the project’s scope, there exist numerous possibilities for conductingfurther work. Based on theoretical and practical aspects described in this dissertation,the following discusses such possibilities briefly, before concluding.8.1 Further workIn Chapter 2, the spin glass problem was introduced. Here, it was established that the Isingspin glass is a simplification of spin interaction. The two objects defining the exchange energybetween spins are the spins themselves, and coupling constants. In general, the graph of spininteractions can be arbitrary. Spins assume state, whose representation can vary in complexityfrom the classical or quantum Heisenberg formulation of state, to the binary Ising formulation.Coupling constants may be chosen from arbitrary distributions, such as a discrete or continuousGaussian etc.8.1.1 Algorithmic approachesConsidering that the project is concerned with the Ising spin glass, the opportunity presentsitself to explore the behaviour of more involved models. As an intermediate model betweenHeisenberg and Ising formulations, one might implement the Potts model, where spins assumediscrete state. Provided that the model of spin interactions is left unaltered, this model appearscomparatively simple to implement: Applying the framework of subsystems and subsysteminteractions to the Potts model, it is apparent that the total energy of a system is still the sumof subsystem energies and interaction energies between them. However, for a p state model,the number of states a k-spin subsystem can assume is p k , instead of 2 k . The consequence ofgreater diversity is that the computational complexity of basic dynamic programming increasesto O ( n p 2m) for an n×m lattice. Similarly, improved dynamic programming has a complexity of99


100 Chapter 8. ConclusionO (nm p m ). A further ramification of spin state concerns the algorithm’s implementation, whichis based on bit string representations of subsystems. Clearly, allowing more than binary staterequires the code to be redesigned. A possible approach might involve representing subsystemsas linked lists of integers. A likely consequence of this for all algorithms would be reducedperformance from additional memory operations.One might also consider extending the algorithms to higher dimensions. While this is trivialin the case of the heuristic, the dynamic programming approaches require the notion of a subsystemto be extended into higher dimensions: Whereas basic dynamic programming is based ona sequence of interacting spin rows for the square lattice, it is necessary to consider a sequenceof interacting lattices for the cubic lattice. The relation is analogous between hypercubes ofd and d + 1 dimensions. As a caveat, the algorithms become computationally expensive: Thebasic algorithm requires O ( n 2 2 n d−1) time for an n d -spin Ising hypercubic lattice, since thereare n (d − 1)-dimensional subsystems in the lattice. For the improved algorithm, the slidingwindow approach is based on a sequence of d − 2-dimensional subsystems, yielding a timecomplexity of O ( n d 2 nd) . It is assumed that both algorithms’ parallel performance will degrade,since higher-dimensional data are required to be communicated between processes. This placesgreater requirements on message passing bandwidth.Another possibility for further work involves applying the framework described in Chapter3 to more general models of spin interaction: For an arbitrary graph of interacting spins, theconcept of probabilistic spin configuration (s 1 , s 2 , . . . , s n ) can be expressed asP(s 1 , s 2 , . . . , s n ) =n∏P (s i | Π i ),i=1where Π i is the set of precursor spins associated with spin s i . The task is then to arrive ata formulation of optimum spin configuration, as shown in Chapter 3. It is believed that theresulting dynamic programming problem must be both non-serial and polyadic, since the graphmay contain cycles, and since a spin is permitted to have multiple ancestors. This is likely tohave consequences for the complexity of the corresponding optimisation algorithm.Of particular interest is the algorithm described by Pardella and Liers [53]. This providesa polynomial time solution to the planar spin glass problem, allowing ground states to be determinedexactly, for problem instances far larger than those examined in this project. Theapproach is based on combining the cut optimisation problem with the notion of ‘Kasteleyncities’, i.e. complete graphs which are subgraphs in the dual lattice representing plaquette frustrationsin the spin lattice. Pardella and Liers apply the algorithm to a 3000×3000 lattice, whichrepresents an improvement over previous graph theoretical approaches [46]. Parallelisation ofcut optimisation might be achieved using the approach described by Diaz, Gibbons et al. [18].


8.1. Further work 1018.1.2 Existing codeNext to implementing additional algorithms for spin glass optimisation, further work might beconducted on the existing code base. Possible additional features include augmenting functionalityto allow algorithm parameters to be controlled at runtime, or implementing further bonddistributions. Unlike basic dynamic programming, the improved dynamic programming algorithmdoes not support lattices with periodic boundary conditions. This can be implemented byadapting the approach described in Chapter 3, where the algorithm is invoked repeatedly, fordifferent configurations of boundary spins.More pertinent is the optimisation of the existing code’s performance. Considering theproject’s scope, it was decided to adopt a design promoting code maintainability, describedin Chapters 5 and 6. Given additional time, it would be of interest to examine the cost ofpointer operations, replacing them where possible by static arrays. Also, although state-of-theartcompilers were used during development and evaluation, the potential is given for optimisingkernel code segments: In the function get optimal prestates(), one might for example considermanual function inlining or loop unrolling. Similar treatment for the harmony search module isconceivable.As implemented, the codes use MPI for achieving message passing parallelism. Althoughthe algorithms are indeed based on the message passing architecture, one might consider ashared memory approach: Given the method of state space decomposition, where configurationsof spin subsystems are distributed equally among processes, the parallel for directive ase.g. implemented in OpenMP appears an obvious instrument in implementing shared memoryversions of the algorithms.8.1.3 Performance evaluationIn Chapter 7, performance data were gathered for dynamic programming and harmony searchalgorithms. Scalability of the exact algorithms was examined on two machines. Further experimentalwork might be concerned with evaluating scalability on other machines, such as commodityclusters or the Blue Gene architecture, if available. A more detailed examination of performanceon existing architectures might consider the implications of message passing latencyand bandwidth, especially with regard to the dynamic programming code using asynchronouscommunications. Also applicable to harmony search, it is of interest to examine scalability.Due to time constraints, undertaken work considered the algorithm’s accuracy. Additionally,one might consider the effect of processor count and communication frequency on algorithmiterations (ideally the latter should remain constant). Finally, there exists the potential to experimentwith alternative communication strategies as proposed in this work.


102 Chapter 8. Conclusion8.2 Project summaryDuring the course of the project, software was developed to compute ground states of the Isingspin glass. The software includes implementations of serial and parallel optimisation algorithms.The latter include parallel dynamic programming algorithms, available in two variants.The first of these allows lattice instances with arbitrary boundary conditions to be solved, whilethe second is computationally more efficient. Performance was examined, indicating good scalabilityfor the first variant. In contrast, scalability is limited for the second variant. Also, afurther algorithm was examined. This implements a parallel ground state optimiser, based onthe harmony search heuristic. Performance was examined in terms of solution accuracy andalgorithm convergence.In Chapter 5, the project’s goals were described. These consisted of developing an exactground state solver based on the transfer matrix method. As an additional objective, investigationwas to include an alternative, heuristic parallel algorithm. The performance of bothalgorithms was to be examined. It was intended that the software should be self-contained,offering sufficient functionality to be useful as a research tool.In the light of undertaken work, the project’s goals are considered fulfilled to considerableextent: Implemented software includes variants of exact optimisation algorithms. In theoreticalwork, the dynamic programming approach was shown to offer identical performance to transfermatrix based methods, therefore both approaches are considered computationally equivalent.The described harmony search heuristic was also implemented. Both dynamic programmingand harmony search are implemented as message passing codes. Performance was investigatedas proposed, examining scalability of dynamic programming codes, and accuracy of parallelharmony search. Although it remains of interest to examine scalability of the alternative code,overall the project is considered a success.8.3 ConclusionIn this dissertation, the Ising spin glass was introduced as a combinatorial optimisation problem.The theoretical background was discussed, identifying and developing solutions to the problem.A description of undertaken project work was provided. Implemented software was describedand experimental results were presented. Finally, possibilities for further work were identified.


Appendix AProject Schedule103


104 Chapter A. Project ScheduleDetailed designImplementationDebuggingTestingPerformance EvaluationDetailed designImplementationDebuggingTestingPerformance EvaluationReportPresentationSubmission, CorrectionsWk 1 Wk 2 Wk 3 Wk 4 Wk 5 Wk 6 Wk 7 Wk 8 Wk 9 Wk 10Wk 11Wk 12Wk 13Wk 14Wk 15Wk 16Figure A.1: Project schedule


Appendix BUML Chart105


106 Chapter B. UML Chartarrays.cio.hrandom.cmain.carrays.h io.crandom.hgstatefinder.hbforce_gstate_finder.cspinglass.hspinglass.cdp_gstate_finder.c dp_gstate_finder_fast.c harmony_gstate_finder.cFigure B.1: UML class diagram of source code module and header relationships


Appendix CMarkov Properties of Spin LatticeDecompositionsC.1 First-order property of row-wise decompositionUsing a row-wise decomposition strategy of spin rows, system state probability is expressed asP(S ) ==1Z(T)⎛⎜⎝ exp − 1⎛kT⎜⎝ H(S 1) +(1Z(T) exp − 1 ) n∏kT H(S 1)i=2⎞⎞n∑H(S i ) + H b (S i−1 , S i ) ⎟⎠ ⎟⎠i=2exp(− 1)kT (H (S i) + H b (S i−1 , S i )) .The partition function is expanded in a similar manner to account for subsystems, asZ(T) ===∑ (exp − 1 )kT H(S ) S ∈S∑ (exp − 1 ) n∏⎛∑ (kT H(S 1) ⎜⎝ exp − 1kT (H(S i) + H b (S i−1 , S i )) ⎞ ⎟⎠S 1i=1 S i⎧n∏⎪⎨∑SZ i (T), with Z i (T) =iexp ( − 1kT H(S i) ) i = 1⎪⎩i=2∑S iexp ( − 1kT (H(S i) + H b (S i−1 , S i )) )1 < i ≤ nSubstituting Z(T) in Equation C.1, state is defined asP(S ) =(1Z 1 (T) exp − 1 ) n∏ (kT H(S 11)Zi=2 i (T) exp − 1)kT (H (S i) + H b (S i−1 , S i ))n∏= P(S 1 ) P (S i |S i−1 ).i=2107


108 Chapter C. Markov Properties of Spin Lattice Decompositionswhich shows that the chosen approach fulfils the property of a first-order Markov chain; theconditional probability P(S i |S i−1 ) is due to dependence of row S i on its predecessor’s configuration.C.2 Higher-order property of unit spin decompositionApplying an analogous approach to determining system state probability, P(S ) is expressed asP(S ) ==1Z(T)⎛⎜⎝ exp − 1kTi=0⎛⎞⎞nm−1 ∑⎜⎝ H b (S i , S i−1 ) + H b (S i , S i−m ) ⎟⎠ ⎟⎠i=0nm−11 ∏ (exp − 1)Z(T) kT (H b(S i , S i−1 ) + H b (S i , S i−m )) .with Z(T) = ∏ nm−1i=0Z i (T) and Z i (T) = ∑ S iexp ( − 1kT (H b(S i , S i−1 ) + H b (S i , S i−m )) ) it followsthatP(S ) =nm−1 ∏i=0P (S i |S i−1 , S i−m ).It is reminded that ground state information can be obtained by optimising P(S ). For this particularmodel, the ground state configuration is obtained by maximising P(S ), i.e.⎧ ⎫⎪nm−1⎨∏⎪⎬argmax ⎪ P (SS 0 ,S 1 ,...,S nm−1⎩i |S i−1 , S i−m ) ⎪⎭ .i=0Next, it is necessary to adapt the Viterbi path formulation, in order to arrive at a recursive expressionof ground state energy for the higher-order Markov model. Disregarding cyclic boundaryinteractions in the model, and noting that P(S i |S i−1 , S i−m i) = P(S i ) for i = 0, a prototypicalapproach is⎧⎪⎨ max S i{P(S i )} i = 1P viterbi (S i ) =⎪⎩ max S i−1 ,S i−m{P (S i |S i−1 , S i−m ) Q viterbi (S i−1 ) Q viterbi (S i−m )} i > 1.Unfortunately, there exists a caveat against recursively statingP viterbi (S i ) =maxS i−1 ,S i−m{P (S i |S i−1 , S i−m ) P viterbi (S i−1 ) P viterbi (S i−m )} ,because by definition, probability of subsystem S i assuming a given state is conditionally dependenton subsystems S i−1 , S i−m , which in turn are both conditionally dependent on subsystemS i−m−1 . This ordering requires that when evaluating terms P viterbi (S i−1 ) and P viterbi (S i−n ) identi-


C.2. Higher-order property of unit spin decomposition 109cal sets of subsystem configurations are considered. The The mapping Q viterbi must reflect thisbehaviour in terms of P viterbi .A solution to the dependency problem of vertical and horizontal predecessor spins can beobtained by increasing the order of the Markov model to m + 1. As a result, system stateprobability is given by the productP(S ) =nm−1 ∏i=0P (S i |S i−1 , S i−2 , . . . , S i−m−1 ) ,from which ground state probability can be formulated as⎧⎪⎨ P (S i , S i−1 , . . . , S i−m )i ≤ mP viterbi (S i , S i−1 , . . . , S i−m ) =⎪⎩ max S i−m−1{P (S i |S i−1 , . . . , S i−m−1 ) P viterbi (S i−1 , . . . , S i−m−1 )} i > m.


Appendix DThe Viterbi PathD.1 Evaluating the Viterbi path in terms of system energyIt is of interest to examine the behaviour of system state probability, which is present in therecursive formulation of the Viterbi path, and evaluated in the described pseudocode algorithm.Taking the natural logarithm of the state probability , it is observed that( (1ln (P (S )) = lnZ(T) exp − 1 ))kT H(S ) = ln(1Z(T))− H(S ) ∝ −H(S ).kTUsing this result, the natural logarithm of the conditionally dependent state probability P(S i |S i−1 )isln (P(S i |S i−1 )) =( ) P(S i , S i−1 )lnP(S i−1 )= ln (P(S i , S i−1 )) − ln (P(S i−1 ))∝ − (H (S i ) + H (S i−1 ) + H b (S i , S i−1 )) + H(S i−1 )∝ − (H (S i ) + H b (S i , S i−1 )) ,which allows system probability to be evaluated quantitatively in terms of its Hamiltonian. Thisin turn permits reformulation of the dynamic programming optimisation problem;ln (P viterbi (S i )) =ln (P viterbi (S i )) =⎧⎪⎨ max S i{ln (P (S i ))} i = 1⎪⎩ max S i−1{ln (P (S i |S i−1 )) + ln (P viterbi (S i−1 )}) i > 1⎧⎪⎨ c min S i{H (S i )} i = 1⎪⎩ min S i−1{H (S i ) + H b (S i , S i−1 ) + c ln (P viterbi (S i−1 ))} i > 1,with c ∈ R. It is trivial to apply the same approach to the recursive function viterbi(i), whichevaluates to the actual sequence of emitted states in the Viterbi path, and the described pseu-111


112 Chapter D. The Viterbi Pathdocode algorithm.Setting c = 1, the evaluated optimal sequence remains the Viterbi path. Further substitutionyields⎧⎪⎨ min S i{H (S i )} i = 1H min (S i ) =(D.1)⎪⎩ min S i−1{H (S i ) + H b (S i , S i−1 ) + H min (S i−1 )} i > 1,which is the Hamiltonian of the system (S 1 , S 2 , . . . , S i ), whose states are equal to those emittedby the Viterbi algorithm. Since the Viterbi path corresponds to the most probable system state,H min is the system’s ground state. This provides a solution to the ground state problem for thetwo dimensional lattice without vertical or horizontal boundary interactions.


Appendix ESoftware usageThe following provides instructions on how to install and use the software described in thisdissertation.RequirementsThe software requires the library glib-2.0 to be installed. By default, this library is expectedto reside in the directory /usr/lib, with headers located at /usr/include/glib-2.0 and /usr/lib/glib-2.0/include. These settings may be changed by modifying the file Makefile.am. An implementationof MPI, such as MPICH2, is also required.Configure and compileThe software is delivered as a compressed tarball with the .tar.gz file name extension. It is unpackedby issuingtar xvzf ising.tar.gzat the command prompt. Following this, it is necessary to initiate configuration by issuing./configurefrom within the package’s root directory. Environment variables are used to specify configurationoptions, including the compiler used (which defaults to mpicc). For example, to disableoptimisation, the necessary commands are:export CFLAGS=-O0; ./configure113


114 Chapter E. Software usageProviding configuration was successful, compilation is initiated usingmakeUsageUpon completion, the source directory contains the binaries genbonds, genclamps, sbforce,dpsolver, dpsolverfast, hmsolver, whose purpose is described in chapter 6. Most significantly,the solver utilities dpsolver, dpsolverfast, hmsolver operate on spin bond configuration files.which are generated using genbonds. To generate a sample 12 × 12 spin configuration fileBONDS, the required command is./genbonds -x 12 -y 12 > BONDSwhich is solved e.g. using improved dynamic programming on a single process by invoking./dpsolverfast -b BONDSMultiprocessing is enabled either by invoking mpiexec directly, or by using one of the SUNGridEngine scripts located inside the source root directory. All utilities support the -? flag fordisplaying a list of command line options.


Appendix FSource Code Listings1 / ∗2 ∗ F i l e : main . c3 ∗4 ∗ I m p l e m e n t s common e n t r y p o i n t f o r ground s t a t e s o l v e r u t i l i t i e s .5 ∗ R e s p o n s i b l e f o r p r o c e s s i n g command l i n e o p t i o n s and i n i t i a t i n g c o m p u t a t i o n6 ∗7 ∗ /89 # i n c l u d e < s t d i o . h>10 # i n c l u d e < s t d l i b . h>11 # i n c l u d e < g l i b . h>12 # i n c l u d e < g l i b / g p r i n t f . h>1314 # i n c l u d e ” s p i n g l a s s . h ”15 # i n c l u d e ” i o . h ”16 # i n c l u d e ” g s t a t e f i n d e r . h ”1718 / ∗ These s t o r e v a l u e s o f command l i n e arguments ∗ /19 s t a t i c g c h a r ∗ s p i n C o n f i g = NULL;20 s t a t i c g c h a r ∗ bondConfig = NULL;21 s t a t i c g c h a r ∗ clampConfig = NULL;22 s t a t i c g c h a r ∗ compSpinConfig = NULL;2324 / ∗ Data s t r u c t u r e f o r command l i n e p r o c e s s i n g .25 ∗ S p e c i f i e s p r o p e r t i e s o f command l i n e o p t i o n s ∗ /26 s t a t i c GOptionEntry e n t r i e s [ ] = {27 { ” spin − i n i t i a l − c o n f i g ” , ’ s ’ , G OPTION FLAG OPTIONAL ARG , G OPTION ARG FILENAME , &s p i n C o n f i g , ” I n i t i a l s p i n c o n f i g u r a t i o n f i l e ” , ” s p i n C o n f i g ” } ,28 { ” bond− c o n f i g ” , ’ b ’ , 0 , G OPTION ARG FILENAME , &bondConfig , ” I n i t i a l bondc o n f i g u r a t i o n f i l e ” , ” bondConfig ” } ,29 { ” clamp − c o n f i g ” , ’ c ’ , G OPTION FLAG OPTIONAL ARG , G OPTION ARG FILENAME , &clampConfig , ” I n i t i a l s p i n clamp c o n f i g u r a t i o n f i l e ” , ” clampConfig ” } ,30 { ” spin −comparison − c o n f i g ” , ’ x ’ , G OPTION FLAG OPTIONAL ARG , G OPTION ARG FILENAME, &compSpinConfig , ” Spin c o n f i g u r a t i o n t o compare r e s u l t with ” , ”compSpinConfig ” } ,31 { NULL }32 } ;115


116 Chapter F. Source Code Listings3334 s t a t i c void i n i t i a l i s e c o m p u t a t i o n ( ) ;3536 i n t main ( i n t argc , char ∗ a rgv [ ] ) {3738 / ∗ I n i t i a l i s e d ata s t r u c t u r e f o r argument p r o c e s s i n g ∗ /39 GError ∗ e r r o r = NULL;40 GOptionContext ∗ c o n t e x t ;4142 c o n t e x t = g o p t i o n c o n t e x t n e w ( ”− C a l c u l a t e s p i n g l a s s ground s t a t e s ” ) ;43 g o p t i o n c o n t e x t a d d m a i n e n t r i e s ( c o n t e x t , e n t r i e s , NULL) ;44 / ∗ Parse arguments ∗ /45 g o p t i o n c o n t e x t p a r s e ( c o n t e x t , &argc , &argv , &e r r o r ) ;4647 / ∗ Handling o f r e q u i r e d arguments ∗ /48 i f ( bondConfig == NULL) {49 g f p r i n t f ( s t d e r r , ” P l e a s e s p e c i f y an i n p u t bond c o n f i g u r a t i o n f i l e . \ n ” ) ;50 e x i t ( EXIT FAILURE ) ;51 }52 i f ( clampConfig != NULL && s p i n C o n f i g == NULL) {53 g f p r i n t f ( s t d e r r , ” S p e c i f y i n g a clamp c o n f i g u r a t i o n f i l e r e q u i r e s t h e use ofan i n i t i a l s p i n c o n f i g u r a t i o n f i l e . \ n ” ) ;54 e x i t ( EXIT FAILURE ) ;55 }5657 i n i t i a l i s e c o m p u t a t i o n ( ) ;5859 g o p t i o n c o n t e x t f r e e ( c o n t e x t ) ;60 return ( EXIT SUCCESS ) ;61 }6263 void i n i t i a l i s e c o m p u t a t i o n ( ) {64 g i n t xSize , ySize , xSize1 , y S i z e 1 ;6566 / ∗ Used t o c o n s t r u c t s p i n g l a s s s t r u c t u r e ∗ /67 gdouble ∗ w e i g h t s = NULL;68 g b o o l e a n ∗ clamps = NULL;69 Spin ∗ s p i n s = NULL;70 Spin ∗ compSpins = NULL;7172 s t r u c t S p i n G l a s s ∗ s p i n G l a s s ;7374 / ∗ Read w e i g h t s from p r e v i o u s l y o b t a i n e d f i l e name ∗ /75 w e i g h t s = r e a d w e i g h t s ( bondConfig , &xSize , &y S i z e ) ;7677 i f ( clampConfig != NULL) {78 / ∗ Read s p i n clamps from p r e v i o u s l y o b t a i n e d f i l e name ∗ /79 clamps = r e a d c l a m p s ( clampConfig , &xSize1 , &y S i z e 1 ) ;8081 / ∗ Check t h a t s i z e s o f s p i n and clamp m a t r i c e s match ∗ /82 i f ( x S i z e != x S i z e 1 | | y S i z e != y S i z e 1 ) {83 g f p r i n t f ( s t d e r r , ” E r r o r : Bond and clamp m a t r i x s i z e s do n o t match .A b o r t i n g \ n ” ) ;84 e x i t ( EXIT FAILURE ) ;


85 }86 }8788 i f ( s p i n C o n f i g != NULL) {89 / ∗ Read i n i t i a l s p i n c o n f i g u r a t i o n from p r e v i o u s l y o b t a i n e d f i l e name ∗ /90 s p i n s = r e a d s p i n s ( s p i n C o n f i g , &xSize1 , &y S i z e 1 ) ;9192 i f ( x S i z e != x S i z e 1 | | y S i z e != y S i z e 1 ) {93 g f p r i n t f ( s t d e r r , ” E r r o r : Bond and s p i n c o n f i g u r a t i o n m a t r i x s i z e s do n o tmatch . A b o r t i n g \ n ” ) ;94 e x i t ( EXIT FAILURE ) ;95 }96 }9798 i f ( compSpinConfig != NULL) {99 / ∗ Read comparison s p i n c o n f i g u r a t i o n from p r e v i o u s l y o b t a i n e d f i l e name ∗ /100 compSpins = r e a d s p i n s ( compSpinConfig , &xSize1 , &y S i z e 1 ) ;101102 i f ( x S i z e != x S i z e 1 | | y S i z e != y S i z e 1 ) {103 g f p r i n t f ( s t d e r r , ” E r r o r : R e f e r e n c e s p i n c o n f i g u r a t i o n and bond m a t r i xs i z e s do n o t match . A b o r t i n g \ n ” ) ;104 e x i t ( EXIT FAILURE ) ;105 }106 }107108 / ∗ I n i t i a l i s e s p i n g l a s s ∗ /109 s p i n G l a s s = s p i n g l a s s a l l o c ( xSize , ySize , s p i n s , weights , clamps ) ;110111 / ∗ Compute ground s t a t e ∗ /112 f i n d g r o u n d s t a t e s ( s p i n G l a s s ) ;113114 i f ( compSpins != NULL) {115 / ∗ Compare r e s u l t i n g c o n f i g u r a t i o n t o s p e c i f i e d r e f e r e n c e c o n f i g u r a t i o n ∗ /116 g i n t d i s t a n c e ;117 s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 = s p i n g l a s s a l l o c ( xSize , ySize , compSpins , NULL,NULL) ;118 d i s t a n c e = s p i n g l a s s c o r r e l a t e ( s p i n G l a s s , s p i n G l a s s 2 ) ;119120 g p r i n t f ( ” C o r r e l a t i o n d i s t a n c e : %d \ n ” , d i s t a n c e ) ;121 s p i n g l a s s f r e e ( s p i n G l a s s 2 ) ;122 }123124 s p i n g l a s s f r e e ( s p i n G l a s s ) ;125 }117


118 Chapter F. Source Code Listings1 / ∗2 ∗ F i l e : d p g s t a t e f i n d e r . c3 ∗4 ∗ I m p l e m e n t s s e r i a l and p a r a l l e l b a s i c dynamic programming a l g o r i t h m s5 ∗6 ∗ /78 # i n c l u d e < s t d l i b . h>9 # i n c l u d e 10 # i n c l u d e < s t r i n g . h>11 # i n c l u d e < g l i b . h>12 # i n c l u d e < g l i b / g p r i n t f . h>1314 # i n c l u d e ” s p i n g l a s s . h ”15 # i n c l u d e ” a r r a y s . h ”16 # i n c l u d e ” g s t a t e f i n d e r . h ”1718 / ∗ CYCLIC EXCHANGE d e f i n e s c y c l i c communication p a t t e r n s ∗ /19 # d e f i n e YCLIC EXCHANGE2021 / ∗ USE MPI d e f i n e s p a r a l l e l code ∗ /22 # d e f i n e USE MPI23 # i f d e f USE MPI24 # i n c l u d e 25 # e n d i f2627 / ∗ D e f i n e s d ata t y p e f o r message p a s s i n g ∗ /28 # d e f i n e T INT MPI LONG LONG INT2930 / ∗ C o n s t a n t a l i a s ∗ /31 # d e f i n e IGNORE BITMASK TRUE3233 / ∗ Communications d ata ∗ /34 s t a t i c g i n t S o l v e r N P r o c s = 1 ;35 s t a t i c g i n t S o l v e r P r o c I D = 0 ;36 # d e f i n e COMM MPI COMM WORLD37 s t a t i c g u i n t 6 4 S o l v e r P r o c e s s o r M a s k = 0 ;38 / ∗ Communications d ata ∗ /3940 / ∗ A d j u s t row o f s p i n s a c c o r d i n g t o b i t s t r i n g r e p r e s e n t a t i o n41 ∗ s p i n G l a s s ( w r i t e ) t h e s p i n g l a s s s t r u c t u r e t o m a n i p u l a t e42 ∗ row s p e c i f i e s t h e s p i n row i n t h e range [ 0 ,NROWS)43 ∗ c o n f t h e b i t s t r i n g r e p r e s e n t a t i o n o f a s p i n row44 ∗ i g n o r e B i t m a s k i f TRUE , t h e p r o c e s s ID does n o t i n f l u e n c e t h e b i ts t r i n g ∗ /45 s t a t i c void a d j u s t s p i n r o w ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t row , t i n t conf ,g b o o l e a n i g n o r e B i t m a s k ) ;4647 / ∗ Determine ground s t a t e and c o n f i g u r a t i o n o f a s p i n g l a s s i n s t a n c e48 ∗ s p i n G l a s s s p i n g l a s s i n s t a n c e ∗ /49 s t a t i c void get minimum path ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;5051 / ∗ Determine optimum c o n f i g u r a t i o n s o f s p i n row row −1 , f o r a l l c o n f i g u r a t i o n s o f rowrow


52 ∗ s p i n G l a s s ( read / w r i t e ) s p i n g l a s s i n s t a n c e53 ∗ minPath ( read / w r i t e ) s t o r e s minimum p ath ( i . e . ground s t a t e e n e r g y ) o fs u b s y s t e m b e f o r e and a f t e r i n c r e m e n t i n g row row54 ∗ minPathConf ( read / w r i t e ) s t o r e s optimum c o n f i g u r a t i o n s o f rows55 ∗ row row o f t h e s p i n l a t t i c e t o p r o c e s s56 ∗ t r e l l i s C o l s number o f s p i n row c o n f i g u r a t i o n s57 ∗ f i n a l R o w C o n f used t o s p e c i f y f i n a l row ’ s c o n f i g u r a t i o n , i f c y c l i cboundary c o n d i t i o n s are p r e s e n t ∗ /58 s t a t i c void g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , gdouble ∗ minPath , t i n t∗ minPathConf , g i n t row , t i n t t r e l l i s C o l s , t i n t finalConfRow ) ;5960 / ∗ S e t t h e c o n f i g u r a t i o n o f s p i n rows , based on optimum c o n f i g u r a t i o n s61 ∗ s p i n G l a s s ( w r i t e ) s p i n g l a s s t o m a n i p u l a t e62 ∗ minPathConf ( read ) s t o r e s optimum s p i n row c o n f i g u r a t i o n s63 ∗ c o n f optimum c o n f i g u r a t i o n o f u l t i m a t e s p i n row ∗ /64 s t a t i c void s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , t i n t ∗∗ minPathConf , t i n tc onf ) ;6566 / ∗ I n i t i a l i s e message p a s s i n g communications67 ∗ /68 s t a t i c void i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;6970 / ∗ T e r m i n a t e message p a s s i n g communications71 ∗ /72 s t a t i c void term comms ( void ) ;737475 gdouble f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {7677 gdouble e n e r g y ;7879 i f ( s p i n G l a s s −>y S i z e > 63) {80 g f p r i n t f ( s t d e r r , ” E r r o r : The s p e c i f i e d s p i n l a t t i c e e x c e e d s a c o u n t of 63columns \ n ” ) ;81 }8283 i n i t c o m m s ( s p i n G l a s s ) ;8485 get minimum path ( s p i n G l a s s ) ;8687 term comms ( ) ;8889 / ∗ Master p r o c e s s o u t p u t s s p i n g l a s s ground s t a t e ∗ /90 i f ( S o l v e r P r o c I D == 0) {91 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;92 g p r i n t f ( ” Energy : %E\ n ” , e n e r g y ) ;93 s p i n g l a s s w r i t e s p i n s ( s p i n G l a s s , s t d o u t ) ;94 }9596 return e n e r g y ;97 }9899 s t a t i c void a d j u s t s p i n r o w ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t row , t i n t conf ,g b o o l e a n i g n o r e B i t m a s k ) {119


120 Chapter F. Source Code Listings100 g i n t i ;101 Spin s p i n ;102103 # i f d e f USE MPI104 / ∗ Row c o n f i g u r a t i o n i s d e p e n d e n t on p r o c e s s o r ID , which i s a b i t p r e f i x ∗ /105 i f ( ! i g n o r e B i t m a s k ) conf = c onf | S o l v e r P r o c e s s o r M a s k ;106 # e n d i f107108 f o r ( i =0; i y S i z e ; i ++) {109 i f ( c onf % 2 != 0) s p i n = UP ;110 e l s e s p i n = DOWN;111112 / ∗ S e t s t a t e o f s p i n i w i t h i n row ∗ /113 ArrayAccess2D ( s p i n G l a s s −>s p i n s , s p i n G l a s s −>ySize , row , i ) = s p i n ;114115 conf = c onf >> 1 ;116 }117 }118119 / ∗ C o l l e c t i v e / s e r i a l v a r i a n t ∗ /120 # i f n d e f CYCLIC EXCHANGE121 s t a t i c void g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , gdouble ∗ minPath , t i n t∗ minPathConf , g i n t row , t i n t t r e l l i s C o l s , t i n t finalRowConf ) {122 t i n t j ;123 t i n t k ;124125 / ∗ S t o r e s updated minimum p ath d ata ∗ /126 gdouble ∗ minPathNew = g new0 ( gdouble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;127128 g i n t previousRow ;129130 i f ( row == 0) {131 previousRow = ( s p i n G l a s s −>x S i z e ) − 1 ;132133 / ∗ S e t p r e c e d i n g row c o n f i g u r a t i o n ∗ /134 a d j u s t s p i n r o w ( s p i n G l a s s , previousRow , finalRowConf , IGNORE BITMASK) ;135136 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) {137 minPathConf [ j ] = finalRowConf ;138139 / ∗ S e t c u r r e n t s p i n row c o n f i g u r a t i o n ∗ /140 a d j u s t s p i n r o w ( s p i n G l a s s , row , j , !IGNORE BITMASK) ;141142 / ∗ C a l c u l a t e e n e r g e t i c c o n t r i b u t i o n ∗ /143 minPathNew [ j ] = s p i n g l a s s r o w e n e r g y ( s p i n G l a s s , row ) +s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l a s s , previousRow ) ;144 }145146 } e l s e {147 previousRow = row − 1 ;148149 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) {150 gdouble p a t h = G MAXDOUBLE;151 t i n t conf ;


152153 / ∗ S e t c u r r e n t s p i n row c o n f i g u r a t i o n ∗ /154 a d j u s t s p i n r o w ( s p i n G l a s s , row , j , !IGNORE BITMASK) ;155156 f o r ( k =0; k< t r e l l i s C o l s ; k++) {157 gdouble interRowEnergy ; / ∗ E n e r g e t i c c o n t r i b u t i o n o f c u r r e n t andp r e v i o u s row ∗ /158 gdouble rowEnergy ; / ∗ E n e r g e t i c c o n t r i b u t i o n o f c u r r e n t row ∗ /159160 / ∗ S e t p r e c e d i n g s p i n row c o n f i g u r a t i o n ∗ /161 a d j u s t s p i n r o w ( s p i n G l a s s , previousRow , k , IGNORE BITMASK) ;162163 / ∗ C a l c u l a t e e n e r g e t i c c o n t r i b u t i o n s ∗ /164 interRowEnergy = s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l a s s , previousRow ) ;165 rowEnergy = s p i n g l a s s r o w e n e r g y ( s p i n G l a s s , row ) ;166167 i f ( minPath [ k ]+ interRowEnergy+rowEnergy < p a t h ) {168 p a t h = minPath [ k ] + interRowEnergy + rowEnergy ;169 conf = k ;170 }171 }172173 / ∗ Record optimum p a t h s t o examined s t a t e ∗ /174 minPathConf [ j ] = c onf ;175 minPathNew [ j ] = p a t h ;176 }177 }178179 # i f d e f USE MPI180 / ∗ Exchange minimum p a t h s ∗ /181 M P I A l l g a t h e r ( minPathNew , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , minPath ,t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , COMM) ;182 # e l s e183 f o r ( j =0; j < t r e l l i s C o l s ; j ++) minPath [ j ] = minPathNew [ j ] ;184 # e n d i f185186 g f r e e ( minPathNew ) ;187 }188 # e n d i f189190 / ∗ C y c l i c v a r i a n t ∗ /191 # i f d e f CYCLIC EXCHANGE192 s t a t i c void g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , gdouble ∗ minPath , t i n t∗ minPathConf , g i n t row , t i n t t r e l l i s C o l s , t i n t finalRowConf ) {193194 t i n t j , k ;195196 / ∗ Compute n e i g h b o u r p r o c e s s ID ∗ /197 g i n t l e f t N e i g h b o u r = ( S o l v e r P r o c I D −1+ S o l v e r N P r o c s ) % S o l v e r N P r o c s ;198199 / ∗ S t o r e s updated minimum p ath d ata ∗ /200 gdouble ∗ minPathNew = g new0 ( gdouble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;201 gdouble ∗ b u f f e r = g new0 ( gdouble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;202121


122 Chapter F. Source Code Listings203 g i n t previousRow ;204205 i f ( row == 0) {206 previousRow = ( s p i n G l a s s −>x S i z e ) − 1 ;207208 / ∗ S e t p r e c e d i n g row c o n f i g u r a t i o n ∗ /209 a d j u s t s p i n r o w ( s p i n G l a s s , previousRow , finalRowConf , IGNORE BITMASK) ;210211 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) {212 minPathConf [ j ] = finalRowConf ; / ∗ T h e o r e t i c a l l y r e d u n d a n t ∗ /213214 / ∗ S e t c u r r e n t s p i n row c o n f i g u r a t i o n ∗ /215 a d j u s t s p i n r o w ( s p i n G l a s s , row , j , !IGNORE BITMASK) ;216217 / ∗ C a l c u l a t e e n e r g e t i c c o n t r i b u t i o n ∗ /218 minPathNew [ j ] = s p i n g l a s s r o w e n e r g y ( s p i n G l a s s , row ) +s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l a s s , previousRow ) ;219 }220 } e l s e {221 MPI Request r e q u e s t ;222 previousRow = row − 1 ;223224 / ∗ I t e r a t e t h r o u g h s u b s e t o f c u r r e n t row ’ s s t a t e s ∗ /225 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) {226 gdouble p a t h = G MAXDOUBLE;227 t i n t conf ;228229 / ∗ S e t s p i n row c o n f i g u r a t i o n ∗ /230 a d j u s t s p i n r o w ( s p i n G l a s s , row , j , !IGNORE BITMASK) ;231232 / ∗ I t e r a t e t h r o u g h ∗ a l l ∗ s t a t e s o f p r e c e d i n g s p i n row ∗ /233 f o r ( k =0; k< t r e l l i s C o l s ; k++) {234 gdouble interRowEnergy ;235 gdouble rowEnergy ;236237 / ∗ S e t p r e v i o u s row c o n f i g u r a t i o n ID ∗ /238 t i n t cID = ( S o l v e r P r o c I D ∗ ( t r e l l i s C o l s / S o l v e r N P r o c s ) + k ) %t r e l l i s C o l s ;239240 / ∗ I n i t i a t e n e i g h b o u r r o t a t i o n o f minpath ∗ /241 i f ( k == 0) MPI Issend ( minPath , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE ,l e f t N e i g h b o u r , 0 , COMM, &r e q u e s t ) ;242243 / ∗ S e t p r e c e d i n g s p i n row c o n f i g u r a t i o n ∗ /244 a d j u s t s p i n r o w ( s p i n G l a s s , previousRow , cID , IGNORE BITMASK) ;245246 / ∗ C a l c u l a t e e n e r g e t i c c o n t r i b u t i o n s ∗ /247 interRowEnergy = s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l a s s , previousRow ) ;248 rowEnergy = s p i n g l a s s r o w e n e r g y ( s p i n G l a s s , row ) ;249250 i f ( k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) == 0 && k != 0) {251 / ∗ R e c e i v e data ∗ /252 MPI Recv ( b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , (S o l v e r P r o c I D +1) % S o l v e r N P r o c s , MPI ANY TAG , COMM,


MPI STATUS IGNORE ) ;253 MPI Wait(& r e q u e s t , MPI STATUS IGNORE ) ;254 memcpy ( minPath , b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s ∗ s i z e o f ( gdouble) ) ;255 / ∗ . . . r e c e i v e data ∗ /256 / ∗ Send d ata ∗ /257 MPI Issend ( minPath , t r e l l i s C o l s / S o l v er N P r o c s , MPI DOUBLE ,l e f t N e i g h b o u r , 0 , COMM, &r e q u e s t ) ;258 / ∗ Send d ata ∗ /259 }260261 i f ( minPath [ k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) ] + interRowEnergy +rowEnergy < p a t h ) {262 p a t h = minPath [ k % ( t r e l l i s C o l s / S o l v e r N P r o c s ) ] + interRowEnergy +rowEnergy ;263 conf = cID ;264 }265 }266267 minPathConf [ j ] = c onf ;268 minPathNew [ j ] = p a t h ;269270 / ∗ R e c e i v e data ∗ /271 MPI Recv ( b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , ( S o l v e r P r o c I D +1)% S o l v e r N P r o c s , MPI ANY TAG , COMM, MPI STATUS IGNORE ) ;272 MPI Wait(& r e q u e s t , MPI STATUS IGNORE ) ;273 memcpy ( minPath , b u f f e r , t r e l l i s C o l s / S o l v e r N P r o c s ∗ s i z e o f ( gdouble ) ) ;274 }275 }276277 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) minPath [ j ] = minPathNew [ j ] ;278279 / ∗ Free memory ∗ /280 g f r e e ( minPathNew ) ;281 g f r e e ( b u f f e r ) ;282 }283 # e n d i f284285 s t a t i c void get minimum path ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {286 t i n t j ;287 g u i n t i ;288289 g u i n t t r e l l i s R o w s = s p i n G l a s s −>x S i z e ;290 t i n t t r e l l i s C o l s = 1 y S i z e ) ;291292 gdouble p a t h = G MAXDOUBLE;293 t i n t conf ;294295 / ∗ S t o r e s minimum p ath t o c u r r e n t l y examined s u b s y s t e m f o r each o f i t s s t a t e s ∗ /296 # i f d e f CYCLIC EXCHANGE297 gdouble ∗ m i n P a t h P a r t i a l = g new0 ( gdouble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;298 gdouble ∗ minPath = g new0 ( gdouble , t r e l l i s C o l s ) ; / ∗ S t o r e s minimum p ath d ata o f as u b s y s t e m i n a s u b s e t o f i t s s t a t e s ∗ /299 # e l s e123


124 Chapter F. Source Code Listings300 gdouble ∗ minPath = g new0 ( gdouble , t r e l l i s C o l s ) ;301 gdouble ∗ m i n P a t h P a r t i a l = minPath ;302 # e n d i f303304 t i n t ∗∗ minPathConf = array new 2D ( t r e l l i s R o w s , t r e l l i s C o l s / S o l v e r N P r o c s ) ; / ∗S t o r e s o p t i m a l c o n f i g u r a t i o n s o f p r e c e d i n g subsystem , g i v e n s u b s y s t e m i i ns t a t e j ∗ /305306 i f ( ! s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s p i n G l a s s ) ) {307 f o r ( i =0; i < t r e l l i s R o w s ; i ++) {308 g e t o p t i m a l p r e s t a t e s ( s p i n G l a s s , m i n P a t h P a r t i a l , minPathConf [ i ] , i ,t r e l l i s C o l s , 0) ; / ∗ L a s t argument i s zero , s i n c e we don ’ t c a re aboutv e r t i c a l boundary ∗ /309 }310311 # i f d e f CYCLIC EXCHANGE312 M P I A l l g a t h e r ( m i n P a t h P a r t i a l , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , minPath ,t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , COMM) ;313 # e n d i f314315 / ∗ Get minimum p ath ∗ /316 f o r ( j =0; j < t r e l l i s C o l s ; j ++) {317 i f ( minPath [ j ] < p a t h ) {318 p a t h = minPath [ j ] ;319 conf = j ;320 }321 }322 s e t o p t i m a l c o n f i g ( s p i n G l a s s , minPathConf , conf ) ;323324 } e l s e {325 t i n t ∗∗ r e t a i n e d M i n P a t h C o n f = array new 2D ( t r e l l i s R o w s , t r e l l i s C o l s /S o l v e r N P r o c s ) ;326327 f o r ( j =0; j < t r e l l i s C o l s ; j ++) {328 f o r ( i =0; i < t r e l l i s R o w s ; i ++) {329 / ∗ L a s t argument c o r r e s p o n d s t o f i x e d s p i n f o r boundary i n t e r a c t i o n ∗ /330 g e t o p t i m a l p r e s t a t e s ( s p i n G l a s s , m i n P a t h P a r t i a l , minPathConf [ i ] , i ,t r e l l i s C o l s , j ) ;331 }332333 # i f d e f CYCLIC EXCHANGE334 M P I A l l g a t h e r ( m i n P a t h P a r t i a l , t r e l l i s C o l s / S o l v e r N Procs , MPI DOUBLE ,minPath , t r e l l i s C o l s / S o l v e r N P r o c s , MPI DOUBLE , COMM) ;335 # e n d i f336337 / ∗ Track e n e r g y ∗ /338 i f ( minPath [ j ] < p a t h ) {339 p a t h = minPath [ j ] ;340 conf = j ;341 / ∗ R e t a i n s t a t e s s t o r e d i n minConf ∗ /342 memcpy(&( r e t a i n e d M i n P a t h C o n f [ 0 ] [ 0 ] ) , &( minPathConf [ 0 ] [ 0 ] ) , t r e l l i s R o w s∗ ( t r e l l i s C o l s / S o l v e r N P r o c s ) ∗ s i z e o f ( t i n t ) ) ;343 }344 }


345346 s e t o p t i m a l c o n f i g ( s p i n G l a s s , r e t a i n e d M i n P a t h C o n f , c onf ) ;347 a r r a y f r e e 2 D ( r e t a i n e d M i n P a t h C o n f ) ;348 }349350 g f r e e ( minPath ) ;351 a r r a y f r e e 2 D ( minPathConf ) ;352 # i f d e f CYCLIC EXCHANGE353 g f r e e ( m i n P a t h P a r t i a l ) ;354 # e n d i f355 }356357 s t a t i c void s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , t i n t ∗∗ minPathConf , t i n tc onf ) {358 g i n t i ;359 g u i n t t r e l l i s R o w s = s p i n G l a s s −>x S i z e ;360 t i n t t r e l l i s C o l s = 1 y S i z e ) ;361362 # i f d e f USE MPI363 t i n t ∗ minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / ∗ Used t o s t o r e exchanged (c o m p l e t e ) row c o n f i g u r a t i o n d ata ∗ /364 # e n d i f365366 / ∗ I t e r a t e t h r o u g h s p i n rows i n r e v e r s e ∗ /367 f o r ( i = t r e l l i s R o w s −1; i >=0; i −−) {368 / ∗ S e t row c o n f i g u r a t i o n ∗ /369 a d j u s t s p i n r o w ( s p i n G l a s s , i , conf , IGNORE BITMASK) ;370371 / ∗ R e f e r e n c e optimum c o n f i g u r a t i o n o f p r e c e d i n g s p i n row ∗ /372 # i f d e f USE MPI373 M P I A l l g a t h e r ( minPathConf [ i ] , t r e l l i s C o l s / S o l v e r N P r o c s , T INT , minPathConfRow, t r e l l i s C o l s / S o l v e r N P r o c s , T INT , COMM) ;374 conf = minPathConfRow [ c onf ] ;375 # e l s e376 conf = minPathConf [ i ] [ c onf ] ;377 # e n d i f378 }379380 # i f d e f USE MPI381 g f r e e ( minPathConfRow ) ;382 # e n d i f383 }384385 s t a t i c void i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {386387 # i f d e f USE MPI388 gdouble b i n a r y P l a c e s ;389390 M P I I n i t (NULL, NULL) ;391 MPI Comm size (COMM, &S o l v e r N P r o c s ) ;392 MPI Comm rank (COMM, &S o l v e r P r o c I D ) ;393394 / ∗ Check p r o c e s s o r c o u n t i s a power o f two or u n i t y ∗ /395 i f ( S o l v e r N P r o c s >1 && S o l v e r N P r o c s % 2!=0) {125


126 Chapter F. Source Code Listings396 g f p r i n t f ( s t d e r r , ” The p r o c e s s o r c o u n t must be a power of two . A b o r t i n g . \ n ” ) ;397 e x i t ( EXIT FAILURE ) ;398 }399400 / ∗ C r e a t e p r o c e s s o r mask ∗ /401 S o l v e r P r o c e s s o r M a s k = S o l v e r P r o c I D ;402 b i n a r y P l a c e s = ( l o g ( ( gdouble ) S o l v e r N P r o c s ) / l o g ( 2 . 0 ) ) ;403 S o l v e r P r o c e s s o r M a s k


1 / ∗2 ∗ F i l e : d p g s t a t e f i n d e r f a s t . c3 ∗4 ∗ I m p l e m e n t s s e r i a l and p a r a l l e l improved dynamic programming a l g o r i t h m s5 ∗6 ∗ /78 # i n c l u d e < s t d l i b . h>9 # i n c l u d e 10 # i n c l u d e < s t r i n g . h>11 # i n c l u d e < g l i b . h>12 # i n c l u d e < g l i b / g p r i n t f . h>1314 # i n c l u d e ” s p i n g l a s s . h ”15 # i n c l u d e ” a r r a y s . h ”16 # i n c l u d e ” g s t a t e f i n d e r . h ”1718 / ∗ USE MPI d e f i n e s p a r a l l e l code ∗ /19 # d e f i n e USE MPI20 # i f d e f USE MPI21 # i n c l u d e 22 # e n d i f2324 / ∗ D e f i n e s d ata t y p e f o r message p a s s i n g ∗ /25 # d e f i n e T INT MPI LONG LONG INT2627 / ∗ C o n s t a n t a l i a s ∗ /28 # d e f i n e IGNORE BITMASK TRUE2930 / ∗ Communications d ata ∗ /31 s t a t i c g i n t S o l v e r N P r o c s = 1 ;32 s t a t i c g i n t S o l v e r P r o c I D = 0 ;33 # d e f i n e COMM MPI COMM WORLD34 s t a t i c t i n t S o l v e r P r o c e s s o r M a s k = 0 ;35 / ∗ Communications d ata ∗ /3637 / ∗ A d j u s t group o f s p i n s a c c o r d i n g t o b i t s t r i n g r e p r e s e n t a t i o n38 ∗ s p i n G l a s s ( w r i t e ) t h e s p i n g l a s s s t r u c t u r e t o m a n i p u l a t e39 ∗ l e a d i n g S p i n s p e c i f i e s s l i d i n g window p o s i t i o n i n t h e range [ y S i z e, x S i z e ∗ y S i z e )40 ∗ c o n f t h e b i t s t r i n g r e p r e s e n t a t i o n o f a s p i n row41 ∗ i g n o r e B i t m a s k i f TRUE , t h e p r o c e s s ID does n o t i n f l u e n c e t h e b i ts t r i n g ∗ /42 s t a t i c void a d j u s t s p i n e n s e m b l e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t l e a d i n g S p i n , t i n tconf , g b o o l e a n i g n o r e B i t m a s k ) ;4344 / ∗ Determine ground s t a t e and c o n f i g u r a t i o n o f a s p i n g l a s s i n s t a n c e45 ∗ s p i n G l a s s s p i n g l a s s i n s t a n c e ∗ /46 s t a t i c void get minimum path ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;4748 / ∗ Determine optimum c o n f i g u r a t i o n s o f s p i n group l e a d i n g S p i n −1 , f o r a l lc o n f i g u r a t i o n s o f group l e a d i n g S p i n49 ∗ s p i n G l a s s ( read / w r i t e ) s p i n g l a s s i n s t a n c e127


128 Chapter F. Source Code Listings50 ∗ minPath ( read / w r i t e ) s t o r e s minimum p ath ( i . e . ground s t a t e e n e r g y ) o fs u b s y s t e m b e f o r e and a f t e r i n c r e m e n t i n g by s p i n l e a d i n g S p i n51 ∗ minPathConf ( read / w r i t e ) s t o r e s optimum c o n f i g u r a t i o n s o f s p i n groups52 ∗ l e a d i n g S p i n p o s i t i o n o f s l i d i n g window i n t h e range [ y S i z e , x S i z e ∗y S i z e )53 ∗ t r e l l i s C o l s number o f s p i n group c o n f i g u r a t i o n s54 ∗ f i n a l R o w C o n f used t o s p e c i f y f i n a l row ’ s c o n f i g u r a t i o n , i f c y c l i cboundary c o n d i t i o n s are p r e s e n t ∗ /55 s t a t i c void g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , gdouble ∗ minPath , t i n t∗ minPathConf , g i n t l e a d i n g S p i n , t i n t t r e l l i s C o l s ) ;5657 / ∗ S e t t h e c o n f i g u r a t i o n o f s p i n groups , based on optimum c o n f i g u r a t i o n s58 ∗ s p i n G l a s s ( w r i t e ) s p i n g l a s s t o m a n i p u l a t e59 ∗ minPathConf ( read ) s t o r e s optimum s p i n group c o n f i g u r a t i o n s60 ∗ c o n f optimum c o n f i g u r a t i o n o f s p i n group a t u l t i m a t es l i d i n g window p o s i t i o n ∗ /61 s t a t i c void s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , t i n t ∗∗ minPathConf , t i n tc onf ) ;6263 / ∗ I n i t i a l i s e message p a s s i n g communications64 ∗ /65 s t a t i c void i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;6667 / ∗ T e r m i n a t e message p a s s i n g communications68 ∗ /69 s t a t i c void term comms ( void ) ;707172 gdouble f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {7374 gdouble e n e r g y ;7576 i f ( s p i n G l a s s −>y S i z e > 63) {77 g f p r i n t f ( s t d e r r , ” E r r o r : The s p e c i f i e d s p i n l a t t i c e e x c e e d s a c o u n t of 63columns \ n ” ) ;78 }7980 i n i t c o m m s ( s p i n G l a s s ) ;8182 get minimum path ( s p i n G l a s s ) ;8384 term comms ( ) ;8586 / ∗ Master p r o c e s s o u t p u t s s p i n g l a s s ground s t a t e ∗ /87 i f ( S o l v e r P r o c I D == 0) {88 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;89 g p r i n t f ( ” Energy : %E\ n ” , e n e r g y ) ;90 s p i n g l a s s w r i t e s p i n s ( s p i n G l a s s , s t d o u t ) ;91 }9293 return e n e r g y ;94 }95


96 s t a t i c void a d j u s t s p i n e n s e m b l e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t l e a d i n g S p i n , t i n tconf , g b o o l e a n i g n o r e B i t m a s k ) {97 g i n t i ;98 Spin s p i n ;99100 # i f d e f USE MPI101 / ∗ Row c o n f i g u r a t i o n i s d e p e n d e n t on p r o c e s s o r ID , which i s a b i t p r e f i x ∗ /102 i f ( ! i g n o r e B i t m a s k ) conf = c onf | S o l v e r P r o c e s s o r M a s k ;103 # e n d i f104105 f o r ( i =0; i y S i z e ; i ++) {106 i f ( c onf % 2 != 0) s p i n = UP ;107 e l s e s p i n = DOWN;108109 / ∗ S e t s p i n a t p o s i t i o n i w i t h i n s l i d i n g window ∗ /110 s p i n G l a s s −> s p i n s [ l e a d i n g S p i n −( s p i n G l a s s −>y S i z e )+ i ] = s p i n ;111112 conf = c onf >> 1 ;113 }114 }115116 s t a t i c void g e t o p t i m a l p r e s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , gdouble ∗ minPath , t i n t∗ minPathConf , g i n t l e a d i n g S p i n , t i n t t r e l l i s C o l s ) {117 t i n t j ;118 t i n t k ;119120 / ∗ S t o r e s updated minimum p ath d ata ∗ /121 gdouble ∗ minPathNew = g new0 ( gdouble , t r e l l i s C o l s / S o l v e r N P r o c s ) ;122123 i f ( l e a d i n g S p i n == s p i n G l a s s −>y S i z e ) {124 / ∗ s p i n G l a s s −> y S i z e c o r r e s p o n d s t o t h e f i r s t s p i n i n t h e second row o f t h el a t t i c e ∗ /125126 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) {127 / ∗ S e t c u r r e n t s p i n group c o n f i g u r a t i o n ∗ /128 a d j u s t s p i n e n s e m b l e ( s p i n G l a s s , l e a d i n g S p i n , j , !IGNORE BITMASK) ;129130 / ∗ C a l c u l a t e e n e r g e t i c c o n t r i b u t i o n ∗ /131 minPathNew [ j ] = s p i n g l a s s e n s e m b l e d e l t a ( s p i n G l a s s , l e a d i n g S p i n ) +s p i n g l a s s r o w e n e r g y ( s p i n G l a s s , 0) ;132 }133134 } e l s e {135 f o r ( j =0; j < t r e l l i s C o l s / S o l v e r N P r o c s ; j ++) {136 gdouble p a t h = G MAXDOUBLE;137 gdouble ensembleEnergy ;138 t i n t confIndex , conf ;139140 / ∗ S e t c u r r e n t s p i n ensemble c o n f i g u r a t i o n ∗ /141 a d j u s t s p i n e n s e m b l e ( s p i n G l a s s , l e a d i n g S p i n , j , !IGNORE BITMASK) ;142 ensembleEnergy = s p i n g l a s s e n s e m b l e d e l t a ( s p i n G l a s s , l e a d i n g S p i n ) ;143144 / ∗ C a l c u l a t e i n d e x f o r a c c e s s i n g p r e c e d i n g ensemble c o n f i g u r a t i o n ∗ /129


130 Chapter F. Source Code Listings145 c o n f I n d e x = ( ( ( j | S o l v e r P r o c e s s o r M a s k ) y S i z e ;176 t i n t t r e l l i s C o l s = 1 y S i z e +1) ;177178 gdouble p a t h = G MAXDOUBLE;179 t i n t conf ;180181 gdouble ∗ minPath = g new0 ( gdouble , t r e l l i s C o l s ) ;182183 t i n t ∗∗ minPathConf = array new 2D ( t r e l l i s R o w s , t r e l l i s C o l s / S o l v e r N P r o c s ) ; / ∗S t o r e s o p t i m a l c o n f i g u r a t i o n s o f p r e c e d i n g subsystem , g i v e n s u b s y s t e m i i ns t a t e j ∗ /184185 f o r ( i =0; i < t r e l l i s R o w s ; i ++) {186 g e t o p t i m a l p r e s t a t e s ( s p i n G l a s s , minPath , minPathConf [ i ] , s p i n G l a s s −>y S i z e+i ,t r e l l i s C o l s ) ;187 }188 / ∗ Find optimum c o n f i g u r a t i o n o f s p i n group a t u l t i m a t e s l i d i n g window p o s i t i o n ∗ /189 f o r ( j =0; j < t r e l l i s C o l s ; j ++) {190 i f ( minPath [ j ] < p a t h ) {191 p a t h = minPath [ j ] ;192 conf = j ;193 }


194 }195 s e t o p t i m a l c o n f i g ( s p i n G l a s s , minPathConf , c onf ) ;196197 g f r e e ( minPath ) ;198 a r r a y f r e e 2 D ( minPathConf ) ;199 }200201 s t a t i c void s e t o p t i m a l c o n f i g ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , t i n t ∗∗ minPathConf , t i n tc onf ) {202 g i n t i ;203 g u i n t t r e l l i s R o w s = ( s p i n G l a s s −>xSize −1) ∗ s p i n G l a s s −>y S i z e ;204 t i n t t r e l l i s C o l s = 1 y S i z e +1) ;205206 # i f d e f USE MPI207 t i n t ∗ minPathConfRow = g new0 ( t i n t , t r e l l i s C o l s ) ; / ∗ Used t o s t o r e exchanged (c o m p l e t e ) row c o n f i g u r a t i o n d ata ∗ /208 # e n d i f209210 f o r ( i = t r e l l i s R o w s −1; i >0; i −−) {211212 / ∗ S e t s p i n G l a s s s p i n a c c o r d i n g t o l e a d i n g s p i n c o n f i g u r a t i o n ∗ /213 g i n t s p i n V a l = c onf >> ( s p i n G l a s s −>y S i z e ) ;214 g i n t l e a d i n g S p i n = ( s p i n G l a s s −>x S i z e ∗ s p i n G l a s s −>y S i z e − 1) − ( t r e l l i s R o w s −1− i ) ;215 i f ( s p i n V a l != 0) ( s p i n G l a s s −> s p i n s ) [ l e a d i n g S p i n ] = UP ;216 e l s e ( s p i n G l a s s −> s p i n s ) [ l e a d i n g S p i n ] = DOWN;217218 # i f d e f USE MPI219 M P I A l l g a t h e r ( minPathConf [ i ] , t r e l l i s C o l s / S o l v e r N P r o c s , T INT , minPathConfRow, t r e l l i s C o l s / S o l v e r N P r o c s , T INT , COMM) ;220 conf = minPathConfRow [ c onf ] ;221 # e l s e222 conf = minPathConf [ i ] [ c onf ] ;223 # e n d i f224 }225226 / ∗ S e t ensemble c o n f i g u r a t i o n due t o f i r s t l e a d i n g s p i n ∗ /227 a d j u s t s p i n e n s e m b l e ( s p i n G l a s s , s p i n G l a s s −>ySize , conf , IGNORE BITMASK) ;228229230 # i f d e f USE MPI231 g f r e e ( minPathConfRow ) ;232 # e n d i f233 }234235 s t a t i c void i n i t c o m m s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {236237 # i f d e f USE MPI238 gdouble b i n a r y P l a c e s ;239240 M P I I n i t (NULL, NULL) ;241 MPI Comm size (COMM, &S o l v e r N P r o c s ) ;242 MPI Comm rank (COMM, &S o l v e r P r o c I D ) ;243131


132 Chapter F. Source Code Listings244 / ∗ Check p r o c e s s o r c o u n t i s a power o f two or u n i t y ∗ /245 i f ( S o l v e r N P r o c s >1 && S o l v e r N P r o c s % 2!=0) {246 g f p r i n t f ( s t d e r r , ” The p r o c e s s o r c o u n t must be a power of two . A b o r t i n g . \ n ” ) ;247 e x i t ( EXIT FAILURE ) ;248 }249250 / ∗ C r e a t e p r o c e s s o r mask ∗ /251 S o l v e r P r o c e s s o r M a s k = S o l v e r P r o c I D ;252 b i n a r y P l a c e s = ( l o g ( ( gdouble ) S o l v e r N P r o c s ) / l o g ( 2 . 0 ) ) ;253 S o l v e r P r o c e s s o r M a s k


1 / ∗2 ∗ F i l e : h a r m o n y g s t a t e f i n d e r . c3 ∗4 ∗ I m p l e m e n t s p a r a l l e l harmony ground s t a t e s o l v e r5 ∗6 ∗ /78 # i n c l u d e < g l i b . h>9 # i n c l u d e < g l i b / g p r i n t f . h>10 # i n c l u d e 11 # i n c l u d e < s t r i n g . h>1213 # i n c l u d e ” s p i n g l a s s . h ”14 # i n c l u d e ” g s t a t e f i n d e r . h ”15 # i n c l u d e ” random . h ”161718 / ∗ S e r i a l a l g o r i t h m p a r a m e t e r s ∗ /19 # d e f i n e NVECTORS 1020 # d e f i n e MEMORY CHOOSING RATE 0 . 9 52122 / ∗ P a r a l l e l a l g o r i t h m p a r a m e t e r s ∗ /23 # d e f i n e ITERBLOCK 10024 # d e f i n e ZONEEXBLOCK 1002526 / ∗ Common s p i n g l a s s data ∗ /27 s t a t i c s t r u c t S p i n G l a s s ∗ s p i n G l a s s ;28 s t a t i c Spin ∗ s p i n s [NVECTORS ] ;29 s t a t i c g i n t x S i z e ;30 s t a t i c g i n t y S i z e ;31 / ∗ Common s p i n g l a s s data ∗ /3233 / ∗ Communications d ata ∗ /34 # d e f i n e COMM MPI COMM WORLD35 # d e f i n e ZONE SIZE 1636 s t a t i c MPI Datatype Type Array ;37 s t a t i c MPI Op Reduction Op ;38 s t a t i c MPI Comm S o l v e r Z o n e ;39 s t a t i c g i n t S o l v e r P r o c I D = 0 ;40 s t a t i c g i n t S o l v e r N P r o c s = 1 ;41 / ∗ Communications d ata ∗ /4243 / ∗ Determine h i g h e s t e n e r g y s p i n g l a s s h e l d by t h i s p r o c e s s44 ∗ h i g h e s t E n e r g y ( w r i t e ) t h e e n e r g y o f t h e o b t a i n e d s o l u t i o n v e c t o r45 ∗ vectorNum ( w r i t e ) t h e i n d e x o f t h e s o l u t i o n v e c t o r as s t o r e d i n t h ea r r a y s p i n s [ ] ∗ /46 s t a t i c void c o m p u t e h i g h e s t e n e r g y ( gdouble ∗ h i g h e s t E n e r g y , g i n t ∗ vectorNum ) ;4748 / ∗ Determine l o w e s t e n e r g y s p i n g l a s s h e l d by t h i s p r o c e s s49 ∗ h i g h e s t E n e r g y ( w r i t e ) t h e e n e r g y o f t h e o b t a i n e d s o l u t i o n v e c t o r50 ∗ vectorNum ( w r i t e ) t h e i n d e x o f t h e s o l u t i o n v e c t o r as s t o r e d i n t h ea r r a y s p i n s [ ] ∗ /51 s t a t i c void c o m p u t e l o w e s t e n e r g y ( gdouble ∗ lowestEnergy , g i n t ∗ vectorNum ) ;52133


134 Chapter F. Source Code Listings53 / ∗ Determine t h e a l g o r i t h m ’ s c o n v e r g e n c e s t a t u s , based on s o l u t i o n v e c t o r s h e l d byeach p r o c e s s54 ∗ r e t u r n s TRUE , i f t h e a l g o r i t h m has converged ∗ /55 s t a t i c g b o o l e a n g e t s t a b i l i s e d s t a t u s ( void ) ;5657 / ∗ C o l l e c t i v e l y o b t a i n e n e r g e t i c a l l y minimal s o l u t i o n v e c t o r h e l d by p r o c e s s e s58 ∗ s p i n V e c t o r ( read / w r i t e ) s p e c i f i e s s o l u t i o n v e c t o r t o perform r e d u c t i o n on ,based on e n e r g y59 ∗ comm ( read ) MPI communicator t o s p e c i f y p r o c e s s e s i n v o l v e d i nr e d u c t i o n ∗ /60 s t a t i c void r e d u c e m i n i m a l s p i n v e c t o r ( Spin ∗ s p i n V e c t o r , MPI Comm comm) ;6162 / ∗ D e f i n e s o p e r a t i o n , on which r e d u c t i o n i s based63 ∗ v e c t o r 1 , v e c t o r 2 ( read / w r i t e ) o p e r a t i o n arguments64 ∗ l e n g t h ( read ) l e n g t h o f v e c t o r s65 ∗ d a t a t y p e ( read ) data t y p e used f o r communications ∗ /66 s t a t i c void r e d u c t i o n f u n c t i o n ( Spin ∗ v e c t o r 1 , Spin ∗ v e c t o r 2 , g i n t ∗ l e n g t h ,MPI Datatype ∗ dataType ) ;6768 / ∗ I n i t i a l i s e message p a s s i n g communications69 ∗ /70 s t a t i c void i n i t c o m m s ( void ) ;7172 / ∗ T e r m i n a t e message p a s s i n g communications73 ∗ /74 s t a t i c void term comms ( void ) ;7576 gdouble f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ paramSpinGlass ) {77 g i n t i , j ;7879 / ∗ Used t o s t o r e e n e r g y and i d e n t i f i e r o f h i g h e s t e n e r g y v e c t o r i n memory ∗ /80 gdouble h i g h e s t E n e r g y ;81 g i n t maxVector ;8283 / ∗ Used t o s t o r e e n e r g y and i d e n t i f i e r o f l o w e s t e n e r g y v e c t o r i n memory ∗ /84 gdouble minEnergy ;85 g i n t minVector ;8687 / ∗ Used f o r communicating s p i n v e c t o r s ∗ /88 Spin ∗ n e i g h b o u r S p i n s = g new ( Spin , paramSpinGlass −>x S i z e ∗ paramSpinGlass −>y S i z e ) ;8990 / ∗ S t o r e s p i n g l a s s g l o b a l l y ∗ /91 s p i n G l a s s = paramSpinGlass ;92 x S i z e = paramSpinGlass −>x S i z e ;93 y S i z e = paramSpinGlass −>y S i z e ;9495 i n i t c o m m s ( ) ;9697 / ∗ I n i t i a l i s e by g e n e r a t i n g random v e c t o r s ∗ /98 f o r ( i =0; i


103 Spin ∗ newSpins = g new ( Spin , x S i z e ∗ y S i z e ) ;104105 / ∗ Compute i n i t i a l h i g h e s t e n e r g y v e c t o r ∗ /106 c o m p u t e h i g h e s t e n e r g y (& h i g h e s t E n e r g y , &maxVector ) ;107108 / ∗ S e t v e c t o r components ∗ /109 f o r ( j =0; j clamps != NULL && ( s p i n G l a s s −>clamps ) [ j ] ) {111 / ∗ Clamping c o n d i t i o n ∗ /112 newSpins [ j ] = s p i n G l a s s −> s p i n s [ j ] ;113 } e l s e i f ( r a n d c o n t i n u o u s ( 0 , 1) < MEMORY CHOOSING RATE) {114 / ∗ Memory s e l e c t i o n c o n d i t i o n ∗ /115 newSpins [ j ] = s p i n s [ g r a n d o m i n t r a n g e ( 0 , NVECTORS) ] [ j ] ;116 } e l s e i f ( r a n d c o i n t o s s ( ) ) {117 newSpins [ j ] = UP ;118 } e l s e {119 newSpins [ j ] = DOWN;120 }121 }122123 / ∗ Replace v e c t o r i n memory , i f t h e new v e c t o r i s f i t t e r ∗ /124 i f ( s p i n g l a s s e n e r g y c o n f ( s p i n G l a s s , newSpins ) < h i g h e s t E n e r g y ) {125126 g f r e e ( s p i n s [ maxVector ] ) ; / ∗ Free p r e v i o u s v e c t o r ∗ /127 s p i n s [ maxVector ] = newSpins ;128 } e l s e {129 g f r e e ( newSpins ) ;130 }131132 i f ( S o l v e r P r o c I D % ZONE SIZE == 0) {133 / ∗ P e r i o d i c exchange o f s p i n v e c t o r s between n e i g h b o u r i n g z o n e s ∗ /134 / ∗ H i g h e s t e n e r g y v e c t o r i s r e p l a c e d by random v e c t o r ∗ /135 g i n t random = g r a n d o m i n t r a n g e ( 0 , NVECTORS) ;136 MPI Sendrecv ( s p i n s [ random ] , 1 , Type Array , ( S o l v e r P r o c I D+ZONE SIZE )%S o l v e r N P r o c s , 0 , n e i g h b o u r S p i n s , 1 , Type Array , MPI ANY SOURCE ,MPI ANY TAG , COMM, MPI STATUS IGNORE ) ;137 r e d u c t i o n f u n c t i o n ( n e i g h b o u r S p i n s , s p i n s [ random ] , NULL, NULL) ;138 }139140 / ∗ Zone i n t e r n a l v e c t o r exchange ∗ /141 i f ( i % ZONEEXBLOCK == 0) {142 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ maxVector ] , S o l v e r Z o n e ) ;143 }144 }145146 / ∗ Determine minimum v e c t o r , copy c o n f i g u r a t i o n back t o o r i g i n a l s t r u c t u r e ∗ /147 c o m p u t e l o w e s t e n e r g y (&minEnergy , &minVector ) ;148 r e d u c e m i n i m a l s p i n v e c t o r ( s p i n s [ minVector ] , COMM) ;149 memcpy ( s p i n G l a s s −>s p i n s , s p i n s [ minVector ] , s i z e o f ( Spin ) ∗ x S i z e ∗ y S i z e ) ;150151 / ∗ Master p r o c e s s o u t p u t s s o l u t i o n ∗ /152 i f ( S o l v e r P r o c I D == 0) {153 p r i n t f ( ” S t a b i l i s e d a f t e r %d i t e r a t i o n s . \ n ” , i ) ;154 g p r i n t f ( ” Energy : %E\ n ” , minEnergy ) ;135


136 Chapter F. Source Code Listings155 s p i n g l a s s w r i t e s p i n s ( s p i n G l a s s , s t d o u t ) ;156 }157158 term comms ( ) ;159160 f o r ( i =0; i


209 ∗ l o w e s t E n e r g y = e n e r g y ;210 ∗ vectorNum = i ;211 }212 }213 }214215 s t a t i c void r e d u c e m i n i m a l s p i n v e c t o r ( Spin ∗ s p i n V e c t o r , MPI Comm comm) {216 Spin ∗ newSpins = g new ( Spin , x S i z e ∗ y S i z e ) ;217218 MPI Allreduce ( s p i n V e c t o r , newSpins , 1 , Type Array , Reduction Op , comm) ;219 memcpy ( s p i n V e c t o r , newSpins , x S i z e ∗ y S i z e ∗ s i z e o f ( Spin ) ) ;220 g f r e e ( newSpins ) ;221 }222223 s t a t i c void r e d u c t i o n f u n c t i o n ( Spin ∗ v e c t o r 1 , Spin ∗ v e c t o r 2 , g i n t ∗ l e n g t h ,MPI Datatype ∗ dataType ) {224 gdouble energy1 , energy2 ;225226 energy1 = s p i n g l a s s e n e r g y c o n f ( s p i n G l a s s , v e c t o r 1 ) ;227 energy2 = s p i n g l a s s e n e r g y c o n f ( s p i n G l a s s , v e c t o r 2 ) ;228229 / ∗ O p e r a t i o n c o n d i t i o n ∗ /230 i f ( energy1 < energy2 ) {231 memcpy ( v e c t o r 2 , v e c t o r 1 , x S i z e ∗ y S i z e ∗ s i z e o f ( Spin ) ) ;232 }233 }234235 s t a t i c void i n i t c o m m s ( void ) {236 MPI Datatype spinType ;237238 M P I I n i t (NULL, NULL) ;239240 MPI Comm size (COMM, &S o l v e r N P r o c s ) ;241 MPI Comm rank (COMM, &S o l v e r P r o c I D ) ;242 i f ( S o l v e r P r o c I D == 0) p r i n t f ( ” NProcs : %d , zone s i z e : %d \ n ” , Solver N Procs ,ZONE SIZE ) ;243244 / ∗ S p l i t communicator ∗ /245 MPI Comm split (COMM, S o l v e r P r o c I D / ZONE SIZE , 0 , &S o l v e r Z o n e ) ;246247 / ∗ I n i t i a l i s e r e d u c t i o n o p e r a t i o n ∗ /248 MPI Op create ( ( M P I U s e r f u n c t i o n ∗ ) r e d u c t i o n f u n c t i o n , 1 , &Reduction Op ) ;249 MPI Type vector ( 1 , s i z e o f ( Spin ) , s i z e o f ( Spin ) , MPI BYTE , &spinType ) ;250 MPI Type vector ( xSize , ySize , ySize , spinType , &Type Array ) ;251 MPI Type commit(& Type Array ) ;252 }253254 s t a t i c void term comms ( void ) {255 MPI Comm free(& S o l v e r Z o n e ) ;256 MPI Type free (& Type Array ) ;257 M P I F i n a l i z e ( ) ;258 }137


138 Chapter F. Source Code Listings1 / ∗2 ∗ F i l e : s p i n g l a s s . h3 ∗4 ∗ S p e c i f i e s s p i n g l a s s o p e r a t i o n i n t e r f a c e and s p i n g l a s s d ata s t r u c t u r e5 ∗6 ∗ /78 # i n c l u d e < g l i b . h>9 # i n c l u d e < s t d i o . h>1011 # i f n d e f SPINGLASS H12 # d e f i n e SPINGLASS H1314 / ∗ C o n s t a n t s f o r s p i n g l a s s IO ∗ /15 # d e f i n e STR SPIN UP ”+”16 # d e f i n e STR SPIN DOWN ”−”17 # d e f i n e STR CLAMPED ” 1 ”18 # d e f i n e STR UNCLAMPED ” 0 ”19 # d e f i n e WEIGHT FMT ”%l f ”2021 / ∗ Spin data t y p e ∗ /22 t y p e d e f enum S p i n {23 UP = 1 ,24 DOWN = −125 } Spin ;2627 / ∗ Spin g l a s s s t r u c t u r e ∗ /28 s t r u c t S p i n G l a s s {29 / ∗ L a t t i c e d i m e n s i o n s ∗ /30 g i n t x S i z e ;31 g i n t y S i z e ;3233 / ∗ V e c t o r o f s p i n s t a t e s ∗ /34 Spin ∗ s p i n s ;3536 / ∗ S t o r e s c o u p l i n g c o n s t a n t s . Data are s t o r e d as two row major mappings o f s p i n st o v e c t o r s ,37 ∗ such t h a t v e r t i c a l bonds p r e c e d e h o r i z o n t a l bonds . ∗ /38 gdouble ∗ w e i g h t s ;39 / ∗ S t o r e s clamping s t a t e s s i m i l a r l y ∗ /40 g b o o l e a n ∗ clamps ;41 / ∗ S t o r e s i n i t i a l s p i n c o n f i g u r a t i o n ∗ /42 Spin ∗ i n i t i a l S p i n s ;43 } ;4445 / ∗ C o n s t r u c t a new s p i n g l a s s s t r u c t u r e46 ∗ x S i z e l a t t i c e rows47 ∗ y S i z e l a t t i c e columns48 ∗ i n i t i a l S p i n s ( read ) v e c t o r o f i n i t i a l s p i n s t a t e s . I f NULL , a v e c t o r o f UP s p i n si s a l l o c a t e d49 ∗ w e i g h t s ( read ) v e c t o r o f bonds . I f NULL , z e r o w e i g h t s are i n i t i a l i s e d50 ∗ clamps ( read ) v e c t o r o f clamping s t a t e s .51 ∗ r e t u r n s s p i n g l a s s data s t r u c t u r e ∗ /


52 s t r u c t S p i n G l a s s ∗ s p i n g l a s s a l l o c ( g i n t xSize , g i n t ySize , Spin ∗ i n i t i a l S p i n s , gdouble∗ weights , g b o o l e a n ∗ clamps ) ;5354 / ∗ D e s t r u c t a s p i n g l a s s s t r u c t u r e . Performs deep d e a l l o c a t i o n .55 ∗ s p i n G l a s s s p i n g l a s s data s t r u c t u r e ∗ /56 void s p i n g l a s s f r e e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;5758 / ∗ Determine t o t a l e n e r g y o f s p i n g l a s s59 ∗ s p i n G l a s s ( read ) s p i n g l a s s data s t r u c t u r e , whose s p i n s t a t e s and bonds arer e f e r e n c e d60 ∗ r e t u r n s t o t a l energy , a c c o u n t i n g f o r c y c l i c boundary i n t e r a c t i o n s ∗ /61 gdouble s p i n g l a s s e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;6263 / ∗ Determine t o t a l e n e r g y o f s p i n g l a s s u s i n g a l t e r n a t i v e s p i n v e c t o r64 ∗ s p i n G l a s s ( read ) s p i n g l a s s data s t r u c t u r e , whose bonds are r e f e r e n c e d65 ∗ c o n f ( read ) v e c t o r o f s p i n s whose s t a t e s are r e f e r e n c e d66 ∗ r e t u r n s t o t a l energy , a c c o u n t i n g f o r c y c l i c boundary i n t e r a c t i o n s ∗ /67 gdouble s p i n g l a s s e n e r g y c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , Spin ∗ c onf ) ;6869 / ∗ Determine e n e r g y o f s p i n row70 ∗ row s p i n row i n range [ 0 ,NROWS)71 ∗ s p i n G l a s s ( read ) s p i n g l a s s data s t r u c t u r e72 ∗ r e t u r n s t o t a l energy , a c c o u n t i n g f o r c y c l i c boundary i n t e r a c t i o n s ∗ /73 gdouble s p i n g l a s s r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t row ) ;7475 / ∗ Determine e n e r g y r e s u l t i n g from v e r t i c a l i n t e r a c t i o n s between two rows row , row+176 ∗ s p i n G l a s s ( read ) s p i n g l a s s data s t r u c t u r e77 ∗ row row i n s p i n l a t t i c e , i n t h e range [ 0 ,NROWS)78 ∗ r e t u r n s row energy , a c c o u n t i n g f o r c y c l i c boundary i n t e r a c t i o n s ∗ /79 gdouble s p i n g l a s s i n t e r r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t row ) ;8081 / ∗ Determine e n e r g y between s p i n and i t s n e i g h b o u r s i m m e d i a t e l y above and t o t h e l e f to f i t82 ∗ s p i n G l a s s ( read ) s p i n g l a s s data s t r u c t u r e83 ∗ l e a d i n g S p i n s p i n p o s i t i o n i n t h e range [ 0 , XSIZE ∗ YSIZE ) , w i t h row majore n u m e r a t i o n ∗ /84 gdouble s p i n g l a s s e n s e m b l e d e l t a ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t l e a d i n g S p i n ) ;8586 / ∗ W r i t e s p i n s t a t e s t o f i l e87 ∗ f i l e ( read ) f i l e t o w r i t e t o88 ∗ s p i n g l a s s ( read ) s p i n g l a s s data s t r u c t u r e ∗ /89 void s p i n g l a s s w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , FILE ∗ f i l e ) ;9091 / ∗ W r i t e s p i n s t a t e s t o f i l e92 ∗ c o n f ( read ) s p i n c o n f i g u r a t i o n v e c t o r t o o u t p u t93 ∗ s p i n G l a s s ( read ) used t o s p e c i f y l a t t i c e d i m e n s i o n s94 ∗ f i l e ( read ) f i l e t o w r i t e t o ∗ /95 void s p i n g l a s s w r i t e s p i n s c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , Spin ∗ conf , FILE ∗ f i l e ) ;9697 / ∗ W r i t e c o u p l i n g c o n s t a n t s t o f i l e98 ∗ s p i n g l a s s ( read ) s p i n g l a s s data s t r u c t u r e99 ∗ f i l e ( read ) f i l e t o w r i t e t o ∗ /100 void s p i n g l a s s w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , FILE ∗ f i l e ) ;101139


140 Chapter F. Source Code Listings102 / ∗ W r i t e clamping s t a t e s t o f i l e103 ∗ s p i n g l a s s ( read ) s p i n g l a s s data s t r u c t u r e104 ∗ f i l e ( read ) f i l e t o w r i t e t o ∗ /105 void s p i n g l a s s w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , FILE ∗ f i l e ) ;106107 / ∗ Generate random s p i n s based on u n i f o r m d i s t r i b u t i o n , a c c o u n t i n g f o r clamped s p i n s108 ∗ s p i n G l a s s ( read ) used t o s p e c i f y l a t t i c e d i m e n s i o n s and clamping s t a t e s109 ∗ r e t u r n s v e c t o r o f s p i n s s t o r i n g l a t t i c e c o n f i g u r a t i o n ∗ /110 Spin ∗ s p i n g l a s s g e t r a n d o m s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;111112 / ∗ Determine whether s p i n g l a s s has c y c i c v e r t i c a l boundary i n t e r a c t i o n s113 ∗ s p i n g l a s s ( read ) s p i n g l a s s data s t r u c t u r e114 ∗ r e t u r n s TRUE i f c o n d i t i o n p r e s e n t ∗ /115 g b o o l e a n s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;116117 / ∗ Compare s p i n s t a t e s o f two s p i n g l a s s e s118 ∗ s p i n g l a s s 1 ( read ) s p i n g l a s s data s t r u c t u r e119 ∗ s p i n g l a s s 2 ( read ) s p i n g l a s s data s t r u c t u r e120 ∗ r e t u r n s minimum number o f d i f f e r i n g s p i n s , c o n s i d e r i n g s p i n G l a s s 1 ’ si n v e r s i o n ∗ /121 g i n t s p i n g l a s s c o r r e l a t e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s 1 , s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 ) ;122123 # e n d i f / ∗ SPINGLASS H ∗ /


1 / ∗2 ∗ F i l e i : s p i n g l a s s . c3 ∗4 ∗ I m p l e m e n t s s p i n g l a s s o p e r a t i o n i n t e r f a c e5 ∗6 ∗ /78 # i n c l u d e < s t d i o . h>9 # i n c l u d e < s t r i n g . h>10 # i n c l u d e < g l i b . h>11 # i n c l u d e < g l i b / g p r i n t f . h>1213 # i n c l u d e ” s p i n g l a s s . h ”14 # i n c l u d e ” a r r a y s . h ”15 # i n c l u d e ” random . h ”1617 s t r u c t S p i n G l a s s ∗ s p i n g l a s s a l l o c ( g i n t xSize , g i n t ySize , Spin ∗ i n i t i a l S p i n s , gdouble∗ weights , g b o o l e a n ∗ clamps ) {18 g i n t i ;1920 s t r u c t S p i n G l a s s ∗ s p i n G l a s s = g new ( s t r u c t SpinGlass , 1) ;2122 s p i n G l a s s −>x S i z e = x S i z e ;23 s p i n G l a s s −>y S i z e = y S i z e ;24 i f ( x S i z e < 2 | | y S i z e < 2) {25 g f p r i n t f ( s t d e r r , ” Warning : T r i e d t o c o n s t r u c t s p i n g l a s s w ith d i m e n s i o n s %dby %d \ n ” , xSize , y S i z e ) ;26 }2728 / ∗ A l l o c a t e s p i n m a t r i x ∗ /29 i f ( i n i t i a l S p i n s == NULL) {30 s p i n G l a s s −> s p i n s = g new ( Spin , x S i z e ∗ y S i z e ) ;31 / ∗ A s s i g n d e f a u l t v a l u e s ∗ /32 f o r ( i =0; i s p i n s ) [ i ] = UP ;33 s p i n G l a s s −> i n i t i a l S p i n s = NULL;34 } e l s e {35 s p i n G l a s s −> s p i n s = i n i t i a l S p i n s ;36 / ∗ S e t i n i t i a l s p i n s ∗ /37 s p i n G l a s s −> i n i t i a l S p i n s = g new ( Spin , x S i z e ∗ y S i z e ) ;38 memcpy ( s p i n G l a s s −> i n i t i a l S p i n s , s p i n G l a s s −>s p i n s , s i z e o f ( Spin ) ∗ x S i z e ∗ y S i z e ) ;39 }4041 / ∗ A l l o c a t e bond w e i g h t m a t r i x − s t o r e s v e r t i c a l bonds , t h e n h o r i z o n t a l bonds ∗ /42 i f ( w e i g h t s == NULL) s p i n G l a s s −>w e i g h t s = g new0 ( gdouble , x S i z e ∗ y S i z e ∗2) ;43 e l s e s p i n G l a s s −>w e i g h t s = w e i g h t s ;4445 s p i n G l a s s −>clamps = clamps ;4647 return s p i n G l a s s ;48 }4950 void s p i n g l a s s f r e e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {51 / ∗ Free a l l f i e l d s ∗ /52 i f ( s p i n G l a s s −> s p i n s != NULL) g f r e e ( s p i n G l a s s −> s p i n s ) ;141


142 Chapter F. Source Code Listings53 i f ( s p i n G l a s s −> i n i t i a l S p i n s != NULL) g f r e e ( s p i n G l a s s −> i n i t i a l S p i n s ) ;54 i f ( s p i n G l a s s −>w e i g h t s != NULL) g f r e e ( s p i n G l a s s −>w e i g h t s ) ;55 i f ( s p i n G l a s s −>clamps != NULL) g f r e e ( s p i n G l a s s −>clamps ) ;5657 g f r e e ( s p i n G l a s s ) ;58 }5960 gdouble s p i n g l a s s r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t row ) {61 g i n t i ;62 gdouble e n e r g y = 0 ;6364 gdouble w e i g h t ; / ∗ Bond w e i g h t ∗ /65 Spin spin0 , s p i n 1 ; / ∗ Neighbour s p i n s ∗ /6667 g i n t x S i z e = s p i n G l a s s −>x S i z e ;68 g i n t y S i z e = s p i n G l a s s −>y S i z e ;69 Spin ∗ s p i n s = s p i n G l a s s −> s p i n s ;70 gdouble ∗ w e i g h t s = s p i n G l a s s −>w e i g h t s ;7172 / ∗ I t e r a t e t h r o u g h row s p i n s ∗ /73 f o r ( i =0; i clamps , ySize , row , i ) ;85 i f ( clamp == TRUE && s p i n 0 != ArrayAccess2D ( s p i n G l a s s −> i n i t i a l S p i n s , ySize ,row , i ) ) {86 e n e r g y = −G MAXDOUBLE;87 }88 }89 }9091 return −1 ∗ e n e r g y ;92 }9394 gdouble s p i n g l a s s i n t e r r o w e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g i n t row ) {95 g i n t i ;96 gdouble e n e r g y = 0 ;9798 gdouble w e i g h t ;99 Spin spin0 , s p i n 1 ;100101 g i n t x S i z e = s p i n G l a s s −>x S i z e ;102 g i n t y S i z e = s p i n G l a s s −>y S i z e ;103 Spin ∗ s p i n s = s p i n G l a s s −> s p i n s ;104 gdouble ∗ w e i g h t s = s p i n G l a s s −>w e i g h t s ;


105106 / ∗ I t e r a t e t h r o u g h row s p i n s , a c c u m u l a t i n g e n e r g y ∗ /107 f o r ( i =0; i w e i g h t s ;126127 g i n t row = l e a d i n g S p i n / s p i n G l a s s −>y S i z e ;128 g i n t column = l e a d i n g S p i n % s p i n G l a s s −>y S i z e ;129130 g i n t y S i z e = s p i n G l a s s −>y S i z e ;131 g i n t x S i z e = s p i n G l a s s −>x S i z e ;132 gdouble w e i g h t ;133134 i f ( row > 0) {135 / ∗ C a l c u l a t e v e r t i c a l component ∗ /136 s p i n 0 = ArrayAccess2D ( s p i n s , ySize , row , column ) ;137 s p i n 1 = ArrayAccess2D ( s p i n s , ySize , row −1 , column ) ;138 w e i g h t = ArrayAccess3D ( weights , ySize , xSize , row −1 , column , 0) ;139 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;140 }141142 i f ( column > 0) {143 / ∗ C a l c u l a t e h o r i z o n t a l component ∗ /144 s p i n 0 = ArrayAccess2D ( s p i n s , ySize , row , column −1) ;145 s p i n 1 = ArrayAccess2D ( s p i n s , ySize , row , column ) ;146 w e i g h t = ArrayAccess3D ( weights , ySize , xSize , row , column −1 , 1) ;147 e n e r g y += w e i g h t ∗ s p i n 0 ∗ s p i n 1 ;148 }149150 return −1 ∗ e n e r g y ;151 }152153 gdouble s p i n g l a s s e n e r g y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {154155 gdouble e n e r g y = 0 ;156157 g i n t i ;158 / ∗ T o t a l e n e r g y i s sum o f rows ’ e n e r g i e s and row i n t e r a c t i o n s ∗ /143


144 Chapter F. Source Code Listings159 f o r ( i =0; i x S i z e ; i ++) {160 e n e r g y += s p i n g l a s s i n t e r r o w e n e r g y ( s p i n G l a s s , i ) ;161 e n e r g y += s p i n g l a s s r o w e n e r g y ( s p i n G l a s s , i ) ;162 }163164 return e n e r g y ;165 }166167 gdouble s p i n g l a s s e n e r g y c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , Spin ∗ c onf ) {168 gdouble e n e r g y ;169170 Spin ∗ c u r r e n t S p i n s = s p i n G l a s s −> s p i n s ;171 s p i n G l a s s −> s p i n s = c onf ;172 e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;173 s p i n G l a s s −> s p i n s = c u r r e n t S p i n s ;174175 return e n e r g y ;176 }177178 void s p i n g l a s s w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , FILE ∗ f i l e ) {179 g i n t i , j ;180 Spin s p i n ;181182 / ∗ I t e r a t e t h r o u g h s p i n s and f o r m a t o u t p u t ∗ /183 f o r ( i =0; i x S i z e ; i ++) {184 f o r ( j =0; j y S i z e ; j ++) {185 s p i n = ArrayAccess2D ( s p i n G l a s s −>s p i n s , s p i n G l a s s −>ySize , i , j ) ;186 i f ( s p i n == UP) {187 g f p r i n t f ( f i l e , ”%s ” , STR SPIN UP ) ;188 } e l s e {189 g f p r i n t f ( f i l e , ”%s ” , STR SPIN DOWN ) ;190 }191192 g f p r i n t f ( f i l e , ”%s ” , ” ” ) ;193 }194195 g f p r i n t f ( f i l e , ”%s ” , ” \ n ” ) ;196 }197 }198199 void s p i n g l a s s w r i t e s p i n s c o n f ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , Spin ∗ conf , FILE ∗ f i l e ) {200 Spin ∗ c u r r e n t S p i n s = s p i n G l a s s −> s p i n s ;201 s p i n G l a s s −> s p i n s = c onf ;202 s p i n g l a s s w r i t e s p i n s ( s p i n G l a s s , f i l e ) ;203 s p i n G l a s s −> s p i n s = c u r r e n t S p i n s ;204 }205206 void s p i n g l a s s w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , FILE ∗ f i l e ) {207 g i n t i , j , k ;208 gdouble w e i g h t ;209210 / ∗ I t e r a t e t h r o u g h w e i g h t s and f o r m a t o u t p u t ∗ /211 f o r ( k =0; k


213 f o r ( j =0; j y S i z e ; j ++) {214 w e i g h t = ArrayAccess3D ( s p i n G l a s s −>weights , s p i n G l a s s −>ySize , s p i n G l a s s−>xSize , i , j , k ) ;215 g f p r i n t f ( f i l e , WEIGHT FMT ” ” , w e i g h t ) ;216 }217218 g f p r i n t f ( f i l e , ”%s ” , ” \ n ” ) ;219 }220221 g f p r i n t f ( f i l e , ”%s ” , ” \ n ” ) ;222 }223 }224225 void s p i n g l a s s w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , FILE ∗ f i l e ) {226 g i n t i , j ;227 g b o o l e a n clamp ;228229 / ∗ I t e r a t e t h r o u g h clamps and f o r m a t o u t p u t ∗ /230 f o r ( i =0; i x S i z e ; i ++) {231 f o r ( j =0; j y S i z e ; j ++) {232 clamp = ArrayAccess2D ( s p i n G l a s s −>clamps , s p i n G l a s s −>ySize , i , j ) ;233 i f ( clamp ) {234 g f p r i n t f ( f i l e , ”%s ” , STR CLAMPED) ;235 } e l s e {236 g f p r i n t f ( f i l e , ”%s ” , STR UNCLAMPED) ;237 }238239 g f p r i n t f ( f i l e , ”%s ” , ” ” ) ;240 }241242 g f p r i n t f ( f i l e , ”%s ” , ” \ n ” ) ;243 }244 }245246 Spin ∗ s p i n g l a s s g e t r a n d o m s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {247 g i n t t o t a l = s p i n G l a s s −>x S i z e ∗ s p i n G l a s s −>y S i z e ;248 g i n t i ;249250 / ∗ A l l o c a t e s p i n s ∗ /251 Spin ∗ s p i n s = g new ( Spin , t o t a l ) ;252253 / ∗ A s s i g n s p i n v a l u e s ∗ /254 f o r ( i =0; i < t o t a l ; i ++) {255 i f ( s p i n G l a s s −>clamps != NULL && ( s p i n G l a s s −>clamps ) [ i ] ) {256 / ∗ Clamped s t a t u s ∗ /257 s p i n s [ i ] = ( s p i n G l a s s −> s p i n s ) [ i ] ;258 } e l s e {259 / ∗ A s s i g n random s p i n v a l u e s ∗ /260 g b o o l e a n randomVal = r a n d c o i n t o s s ( ) ;261 i f ( randomVal == TRUE) {262 s p i n s [ i ] = UP ;263 } e l s e {264 s p i n s [ i ] = DOWN;265 }145


146 Chapter F. Source Code Listings266 }267 }268269 return s p i n s ;270 }271272 g b o o l e a n s p i n g l a s s h a s v e r t i c a l b o u n d a r y ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {273 g b o o l e a n h a s V e r t i c a l B o u n d a r y = FALSE ;274275 g i n t y S i z e = s p i n G l a s s −>y S i z e ;276 g i n t x S i z e = s p i n G l a s s −>x S i z e ;277278 g i n t i ;279 / ∗ I t e r a t e t h r o u g h s p i n s i n u l t i m a t e row , c h e c k i n g f o r non−z e r o s p i n v a l u e s ∗ /280 f o r ( i =0; i weights , ySize , xSize , xSize −1 , i ,0) ;282 i f ( w e i g h t != 0) h a s V e r t i c a l B o u n d a r y = TRUE;283 }284285 return h a s V e r t i c a l B o u n d a r y ;286 }287288 g i n t s p i n g l a s s c o r r e l a t e ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s 1 , s t r u c t S p i n G l a s s ∗ s p i n G l a s s 2 ) {289 g i n t i , j , k ;290 g i n t f i n a l D i s t a n c e = G MAXINT ;291 g i n t d i s t a n c e ;292293 f o r ( k =0; k s p i n s , s p i n G l a s s 1 −>ySize , i , j );300 Spin s p i n 2 = ArrayAccess2D ( s p i n G l a s s 2 −>s p i n s , s p i n G l a s s 2 −>ySize , i , j );301 i f ( k == 0) {302 i f ( s p i n 1 != s p i n 2 ) d i s t a n c e ++;303 } e l s e {304 i f ( s p i n 1 == s p i n 2 ) d i s t a n c e ++;305 }306 }307 }308 i f ( d i s t a n c e < f i n a l D i s t a n c e ) f i n a l D i s t a n c e = d i s t a n c e ;309 }310311 return f i n a l D i s t a n c e ;312 }


1 / ∗2 ∗ F i l e : i o . h3 ∗4 ∗ S p e c i f i e s IO o p e r a t i o n i n t e r f a c e5 ∗6 ∗ /78 # i n c l u d e ” s p i n g l a s s . h ”910 # i f n d e f IO H11 # d e f i n e IO H1213 / ∗ For f i l e i n p u t r o u t i n e s u s i n g f g e t s ( ) ∗ /14 # d e f i n e MAX LINE LEN 1000001516 / ∗ Read s p i n c o n f i g u r a t i o n from f i l e17 ∗ f i l e N a m e ( read ) f i l e name from which t o i n i t i a t e r e a d i n g18 ∗ x S i z e ( w r i t e ) number o f rows i n t h e o b t a i n e d c o n f i g u r a t i o n19 ∗ y S i z e ( w r i t e ) number o f columns i n t h e o b t a i n e d c o n f i g u r a t i o n20 ∗ r e t u r n s v e c t o r o f s p i n s , s t o r e d i n row major o r d e r ∗ /21 Spin ∗ r e a d s p i n s ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) ;2223 / ∗ Read s p i n clamping s t a t e from f i l e24 ∗ f i l e N a m e ( read ) f i l e name from which t o i n i t i a t e r e a d i n g25 ∗ x S i z e ( w r i t e ) number o f rows i n t h e o b t a i n e d c o n f i g u r a t i o n26 ∗ y S i z e ( w r i t e ) number o f columns i n t h e o b t a i n e d c o n f i g u r a t i o n27 ∗ r e t u r n s v e c t o r o f s p i n clamp s t a t e s , s t o r e d i n row major o r d e r ∗ /28 g b o o l e a n ∗ r e a d c l a m p s ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) ;2930 / ∗ Read s p i n bond c o n f i g u r a t i o n from f i l e31 ∗ f i l e N a m e ( read ) f i l e name from which t o i n i t i a t e r e a d i n g32 ∗ x S i z e ( w r i t e ) number o f rows i n t h e o b t a i n e d c o n f i g u r a t i o n33 ∗ y S i z e ( w r i t e ) number o f columns i n t h e o b t a i n e d c o n f i g u r a t i o n34 ∗ r e t u r n s v e c t o r o f s p i n bonds , s t o r e d i n row major o r d e r35 ∗ data f o r v e r t i c a l bonds p r e c e d e t h o s e f o r h o r i z o n a l bonds ∗ /36 gdouble ∗ r e a d w e i g h t s ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) ;373839 / ∗ W r i t e s p i n c o n f i g u r a t i o n t o f i l e40 ∗ s p i n G l a s s ( read ) data s t r u c t u r e s t o r i n g s p i n g l a s s d ata41 ∗ f i l e N a m e ( read ) f i l e name t o w r i t e data t o ∗ /42 void w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g c h a r ∗ fileName ) ;4344 / ∗ W r i t e s p i n clamping s t a t e t o f i l e45 ∗ s p i n G l a s s ( read ) data s t r u c t u r e s t o r i n g s p i n g l a s s d ata46 ∗ f i l e N a m e ( read ) f i l e name t o w r i t e data t o ∗ /47 void w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g c h a r ∗ fileName ) ;4849 / ∗ W r i t e s p i n bond c o n f i g u r a t i o n t o f i l e50 ∗ s p i n G l a s s ( read ) data s t r u c t u r e s t o r i n g s p i n g l a s s d ata51 ∗ f i l e N a m e ( read ) f i l e name t o w r i t e data t o ∗ /52 void w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g c h a r ∗ fileName ) ;5354 # e n d i f / ∗ IO H ∗ /147


148 Chapter F. Source Code Listings1 / ∗2 ∗ F i l e : i o . c3 ∗4 ∗ I m p l e m e n t s IO o p e r a t i o n s s p e c i f i e d i n i o . h5 ∗6 ∗ /78 # i n c l u d e < s t d l i b . h>9 # i n c l u d e < s t d i o . h>10 # i n c l u d e < s t r i n g . h>11 # i n c l u d e < g l i b . h>12 # i n c l u d e < g l i b / g p r i n t f . h>1314 # i n c l u d e ” s p i n g l a s s . h ”15 # i n c l u d e ” i o . h ”1617 / ∗ P a r s es a f i l e , adding t o k e n s t o a queue18 ∗ f i l e N a m e ( read ) f i l e name t o read from19 ∗ x S i z e ( w r i t e ) number o f t o k e n rows c o n t a i n e d i n t h e f i l e20 ∗ y S i z e ( w r i t e ) number o f t o k e n columns c o n t a i n e d on t h e f i l e21 ∗ r e t u r n s queue c o n t a i n i n g p a r s e d t o k e n s ∗ /22 s t a t i c GQueue ∗ p a r s e f i l e ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) {23 g i n t nRows = 0 ;24 g i n t nCols = 0 ;25 g i n t nColCheck = 0 ;2627 GQueue ∗ tokenQueue = g queue new ( ) ;2829 FILE ∗ f i l e = fopen ( fileName , ” r ” ) ;30 g c h a r l i n e [ MAX LINE LEN+1];31 i f ( f i l e != NULL) {3233 / ∗ Read l i n e s u n t i l end o f f i l e , p r o c e s s i f non z e r o l e n g t h ∗ /34 while (NULL != f g e t s ( l i n e , MAX LINE LEN , f i l e ) ) {35 i f ( s t r l e n ( l i n e ) > 0 && l i n e [ 0 ] != ’ \ n ’ ) {36 g c h a r ∗ t o k e n ;37 nRows++;3839 nColCheck = 0 ;40 / ∗ T o k e n i s e l i n e s ∗ /41 t o k e n = s t r t o k ( l i n e , ” \ t \ n ” ) ;42 while ( t o k e n != NULL) {43 g c h a r ∗ tokenMem = g m a l l o c ( s t r l e n ( t o k e n ) + 1) ;44 s t r c p y ( tokenMem , t o k e n ) ;4546 nColCheck ++;4748 / ∗ Add t o k e n t o queue ∗ /49 g q u e u e p u s h t a i l ( tokenQueue , tokenMem ) ;50 t o k e n = s t r t o k (NULL, ” \ t \ n ” ) ;51 }5253 / ∗ Check f o r matching row l e n g t h s ∗ /54 i f ( nCols == 0) nCols = nColCheck ;


55 i f ( nColCheck != nCols ) {56 g f p r i n t f ( s t d e r r , ” E r r o r : The i n p u t d a t a m a t r i x does n o t c o n t a i nrows of e q u a l l e n g t h s . \ n ” ) ;57 e x i t ( −1) ;58 }59 }60 }61 } e l s e {62 g f p r i n t f ( s t d e r r , ”An e r r o r o c c u r r e d w h i l e opening t h e f i l e %s . \ n ” , fileName ) ;63 e x i t ( −1) ;64 }6566 f c l o s e ( f i l e ) ;6768 ∗ x S i z e = nRows ;69 ∗ y S i z e = nCols ;7071 return tokenQueue ;72 }7374 Spin ∗ r e a d s p i n s ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) {75 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ /76 GQueue ∗ tokenQueue = p a r s e f i l e ( fileName , xSize , y S i z e ) ;77 Spin ∗ s p i n s = g new ( Spin , ( ∗ x S i z e ) ∗ ( ∗ y S i z e ) ) ;7879 i n t i =0;8081 / ∗ P r o c e s s t o k e n s ∗ /82 while ( g q u e u e g e t l e n g t h ( tokenQueue ) > 0) {83 / ∗ Get t o k e n ∗ /84 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( tokenQueue ) ;8586 / ∗ Check whether s t r i n g s assume e x p e c t e d v a l u e s ∗ /87 i f ( s t r c m p ( token , STR SPIN UP ) ==0) {88 s p i n s [ i ] = UP ;89 } e l s e i f ( s t r c m p ( token , STR SPIN DOWN ) ==0) {90 s p i n s [ i ] = DOWN;91 } e l s e {92 g f p r i n t f ( s t d e r r , ” E r r o r : U n r e c o g n i s e d s p i n d a t a . \ n ” ) ;93 e x i t ( −1) ;94 }9596 g f r e e ( t o k e n ) ;97 i ++;98 }99100 g q u e u e f r e e ( tokenQueue ) ;101 return s p i n s ;102 }103104105 void w r i t e s p i n s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g c h a r ∗ fileName ) {106 / ∗ Open f i l e , d e l e g a t e t o s p i n g l a s s module ∗ /107149


150 Chapter F. Source Code Listings108 FILE ∗ f i l e = fopen ( fileName , ”w” ) ;109110 i f ( f i l e != NULL) {111 s p i n g l a s s w r i t e s p i n s ( s p i n G l a s s , f i l e ) ;112 } e l s e {113 g f p r i n t f ( s t d e r r , ”An e r r o r o c c u r r e d w h i l e opening t h e f i l e %s . ” , fileName ) ;114 }115116 f c l o s e ( f i l e ) ;117 }118119 g b o o l e a n ∗ r e a d c l a m p s ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) {120 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ /121 GQueue ∗ tokenQueue = p a r s e f i l e ( fileName , xSize , y S i z e ) ;122 g b o o l e a n ∗ clamps = g new ( gboolean , ( ∗ x S i z e ) ∗ ( ∗ y S i z e ) ) ;123124 i n t i =0;125126 / ∗ P r o c e s s t o k e n s ∗ /127 while ( g q u e u e g e t l e n g t h ( tokenQueue ) > 0) {128 / ∗ Get t o k e n ∗ /129 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( tokenQueue ) ;130131 / ∗ Check whether s t r i n g s assume e x p e c t e d v a l u e s ∗ /132 i f ( s t r c m p ( token , STR CLAMPED) ==0) {133 clamps [ i ] = TRUE;134 } e l s e i f ( s t r c m p ( token , STR UNCLAMPED) ==0) {135 clamps [ i ] = FALSE ;136 } e l s e {137 g f p r i n t f ( s t d e r r , ” E r r o r : U n r e c o g n i s e d s p i n d a t a . \ n ” ) ;138 e x i t ( −1) ;139 }140141 g f r e e ( t o k e n ) ;142 i ++;143 }144145 g q u e u e f r e e ( tokenQueue ) ;146 return clamps ;147 }148149 void w r i t e c l a m p s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g c h a r ∗ fileName ) {150 / ∗ Open f i l e , d e l e g a t e t o s p i n g l a s s module ∗ /151 FILE ∗ f i l e = fopen ( fileName , ”w” ) ;152153 i f ( f i l e != NULL) {154 s p i n g l a s s w r i t e c l a m p s ( s p i n G l a s s , f i l e ) ;155 } e l s e {156 g f p r i n t f ( s t d e r r , ”An e r r o r o c c u r r e d w h i l e opening t h e f i l e %s . ” , fileName ) ;157 }158159 f c l o s e ( f i l e ) ;160 }161


162 gdouble ∗ r e a d w e i g h t s ( g c h a r ∗ fileName , g i n t ∗ xSize , g i n t ∗ y S i z e ) {163 g i n t nRows , nCols ;164 g i n t i =0;165166 / ∗ R e t r i e v e t o k e n s i n f i l e ∗ /167 GQueue ∗ tokenQueue = p a r s e f i l e ( fileName , &nRows , &nCols ) ;168 gdouble ∗ w e i g h t s = g new ( gdouble , ( nRows∗ nCols ) ) ;169170 / ∗ Account f o r v e r t i c a l and h o r i z o n t a l w e i g h t s s t o r e d i n f i l e ∗ /171 ∗ x S i z e = nRows / 2 ;172 ∗ y S i z e = nCols ;173174 / ∗ S i m p l e check f o r matching v e r t i c a l / h o r i z o n t a l bond numbers ∗ /175 i f ( nRows % 2 == 1) {176 g f p r i n t f ( s t d e r r , ”Odd number of d a t a rows d e t e c t e d when r e a d i n g bond f i l e .Should be even . \ n ” ) ;177 e x i t ( −1) ;178 }179180 / ∗ P r o c e s s t o k e n s ∗ /181 while ( g q u e u e g e t l e n g t h ( tokenQueue ) > 0) {182 / ∗ Get t o k e n ∗ /183 g c h a r ∗ t o k e n = g q u e u e p o p h e a d ( tokenQueue ) ;184 gdouble weightVal = 0 ;185186 / ∗ Convert t o d o u b l e ∗ /187 i f ( s s c a n f ( token , WEIGHT FMT, &weightVal ) != 1) {188 g f p r i n t f ( s t d e r r , ” E r r o r : U n r e c o g n i s e d bond d a t a . \ n ” ) ;189 e x i t ( −1) ;190 }191192 w e i g h t s [ i ++] = weightVal ;193194 g f r e e ( t o k e n ) ;195 }196197 g q u e u e f r e e ( tokenQueue ) ;198 return w e i g h t s ;199 }200201 void w r i t e w e i g h t s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s , g c h a r ∗ fileName ) {202 / ∗ Open f i l e , d e l e g a t e t o s p i n g l a s s module ∗ /203 FILE ∗ f i l e = fopen ( fileName , ”w” ) ;204205 i f ( f i l e != NULL) {206 s p i n g l a s s w r i t e w e i g h t s ( s p i n G l a s s , f i l e ) ;207 } e l s e {208 g f p r i n t f ( s t d e r r , ”An e r r o r o c c u r r e d w h i l e opening t h e f i l e %s . ” , fileName ) ;209 }210211 f c l o s e ( f i l e ) ;212 }151


152 Chapter F. Source Code Listings1 / ∗2 ∗ F i l e : a r r a y s . h3 ∗4 ∗ S p e c i f i e s a r r a y o p e r a t i o n i n t e r f a c e5 ∗ and d e f i n e s macros f o r a r r a y o p e r a t i o n s6 ∗7 ∗ /89 # i n c l u d e < g l i b . h>1011 # i f n d e f ARRAYS H12 # d e f i n e ARRAYS H1314 / ∗ Emulates two−d i m e n s i o n a l a r r a y a c c e s s15 ∗ a r r a y p o i n t e r t o data16 ∗ i , j a r r a y i n d i c e s ∗ /17 # d e f i n e ArrayAccess2D ( a r r a y , rowlength , i , j ) ( ( a r r a y ) [ ( i ) ∗ ( r o w l e n g t h ) + ( j ) ] )1819 / ∗ Emulates t h r e e −d i m e n s i o n a l a r r a y a c c e s s20 ∗ a r r a y p o i n t e r t o data21 ∗ i , j , k a r r a y i n d i c e s ∗ /22 # d e f i n e ArrayAccess3D ( a r r a y , rowlength , columnlength , i , j , k ) ( ( a r r a y ) [ ( c o l u m n l e n g t h )∗ ( r o w l e n g t h ) ∗ ( k ) + ( i ) ∗ r o w l e n g t h + ( j ) ] )232425 / ∗ Array d ata t y p e s ∗ /26 t y p e d e f g u i n t 6 4 t i n t ;27 t y p e d e f gdouble t d o u b l e ;2829 / ∗ C o n s t r u c t two−d i m e n s i o n a l a r r a y . Data c o n t i g u i t y i s e n s u r e d30 ∗ nRows number o f rows31 ∗ nCols number o f columns32 ∗ r e t u r n s p o i n t e r t o a l l o c a t e d data ∗ /33 t i n t ∗∗ array new 2D ( t i n t nRows , t i n t nColumns ) ;3435 / ∗ D e s t r u c t two−d i m e n s i o n a l a r r a y p r e v i o u s l y a l l o c a t e d w i t h array new 2D ( )36 ∗ a r r a y t h e a r r a y t o d e s t r u c t ∗ /37 void a r r a y f r e e 2 D ( t i n t ∗∗ a r r a y ) ;3839 / ∗ C o n s t r u c t two−d i m e n s i o n a l a r r a y . Data c o n t i g u i t y i s e n s u r e d40 ∗ nRows number o f rows41 ∗ nCols number o f columns42 ∗ nZ s i z e o f t h i r d d i m e n s i o n43 ∗ r e t u r n s p o i n t e r t o a l l o c a t e d data ∗ /44 t d o u b l e ∗∗∗ array new 3D ( t i n t nZ , t i n t nRows , t i n t nColumns ) ;454647 / ∗ D e s t r u c t t h r e e −d i m e n s i o n a l a r r a y p r e v i o u s l y a l l o c a t e d w i t h array new 3D ( )48 ∗ a r r a y t h e a r r a y t o d e s t r u c t ∗ /49 void a r r a y f r e e 3 D ( t d o u b l e ∗∗∗ a r r a y ) ;5051 i n t a r r a y u t e s t ( void ) ;5253 # e n d i f / ∗ ARRAYS H ∗ /


1 / ∗2 ∗ F i l e : a r r a y s . c3 ∗4 ∗ I m p l e m e n t s a r r a y o p e r a t i o n i n t e r f a c e s p e c i f i e d i n a r r a y s . h5 ∗6 ∗ /78 # i n c l u d e < g l i b . h>9 # i n c l u d e < s t d i o . h>1011 # i n c l u d e ” a r r a y s . h ”1213 t i n t ∗∗ array new 2D ( t i n t nRows , t i n t nColumns ) {14 g i n t i ;1516 / ∗ A l l o c a t e p o i n t e r b l o c k ∗ /17 t i n t ∗∗ a r r a y = g m a l l o c ( nRows ∗ s i z e o f ( t i n t ∗ ) ) ;18 / ∗ A l l o c a t e d ata b l o c k ∗ /19 a r r a y [ 0 ] = g m a l l o c ( nRows ∗ nColumns ∗ s i z e o f ( t i n t ) ) ;20 / ∗ A s s i g n d ata o f f s e t s ∗ /21 f o r ( i =1; i


154 Chapter F. Source Code Listings5556 i n t a r r a y u t e s t ( void ) {5758 g i n t i , j , k ;59 t i n t ∗∗ a r r a y = array new 2D ( 1 0 , 10) ;60 t d o u b l e ∗∗∗ a r r a y 2 = array new 3D ( 5 , 32 , 32) ;6162 f o r ( i =0; i


1 / ∗2 ∗ F i l e : random . h3 ∗4 ∗ D e f i n e s i n t e r f a c e f o r random number g e n e r a t i o n5 ∗6 ∗ /78 # i n c l u d e < g l i b . h>910 / ∗ Generate c o n t i n u o u s l y d i s t r i b u t e d random d o u b l e i n t h e range [ lower , upper )11 ∗ lower lower l i m i t12 ∗ upper upper l i m i t ∗ /13 gdouble r a n d c o n t i n u o u s ( gdouble lower , gdouble upper ) ;1415 / ∗ Generate e q u a l l y d i s t r i b u t e d random boolean16 ∗ /17 g b o o l e a n r a n d c o i n t o s s ( ) ;155


156 Chapter F. Source Code Listings1 / ∗2 ∗ F i l e : random . c3 ∗4 ∗ I m p l e m e n t s i n t e r f a c e f o r random number g e n e r a t i o n5 ∗6 ∗ /78 # i n c l u d e < s t d i o . h>9 # i n c l u d e < g l i b . h>10 # i n c l u d e ” random . h ”1112 gdouble r a n d c o n t i n u o u s ( gdouble lower , gdouble upper ) {13 return g r a n d o m d o u b l e r a n g e ( lower , upper ) ;14 }1516 g b o o l e a n r a n d c o i n t o s s ( ) {17 g b o o l e a n v a l u e = g r a n d o m b o o l e a n ( ) ;18 return v a l u e ;19 }


1 / ∗2 ∗ F i l e : b f o r c e g s t a t e f i n d e r . c3 ∗4 ∗ I m p l e m e n t s b r u t e f o r c e ground s t a t e f i n d e r5 ∗6 ∗ /78 # i n c l u d e < g l i b . h>9 # i n c l u d e < g l i b / g p r i n t f . h>10 # i n c l u d e < s t d i o . h>11 # i n c l u d e ” s p i n g l a s s . h ”12 # i n c l u d e ” g s t a t e f i n d e r . h ”1314 s t a t i c void f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l e a d i n g S p i n , gdouble ∗ minEnergy ,s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;1516 gdouble f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {17 g i n t n S p i n s = s p i n G l a s s −>x S i z e ∗ s p i n G l a s s −>y S i z e ;18 gdouble minEnergy = G MAXDOUBLE;1920 / ∗ I n i t i a t e b r u t e f o r c e e v a l u a t i o n ∗ /21 f i n d g r o u n d s t a t e s b r u t e f o r c e ( nSpins , &minEnergy , s p i n G l a s s ) ;2223 return minEnergy ;24 }2526 / ∗ R e c u r s i v e b r u t e f o r c e ground s t a t e e v a l u a t i o n27 ∗ l e a d i n g S p i n s p i n ‘ window ’ p o s i t i o n , used t o s p e c i f y s t a t e t o be f l i p p e d . Used t oe v a l u a t e base case28 ∗ minEnergy ( read / w r i t e ) Records c u r r e n t minimum e n e r g y . For each i n v o c a t i o n o ft h e f u n c t i o n , s t a t e s are o u t p u t i f t h e i r e n e r g y i s lower t han t h e v a l u ec u r r e n t l y h e l d by t h i s v a r i a b l e29 ∗ s p i n G l a s s ( read / w r i t e ) s p i n g l a s s d ata s t r u c t u r e whose s p i n s are m a n i p u l a t e dd u r i n g s e a r c h ∗ /30 s t a t i c void f i n d g r o u n d s t a t e s b r u t e f o r c e ( g i n t l e a d i n g S p i n , gdouble ∗ minEnergy ,s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) {31 / ∗ Base c a se ∗ /32 i f ( l e a d i n g S p i n == 0) {33 / ∗ Compute e n e r g y ∗ /34 gdouble e n e r g y = s p i n g l a s s e n e r g y ( s p i n G l a s s ) ;3536 i f ( e n e r g y < ∗ minEnergy ) {37 ∗ minEnergy = e n e r g y ;38 }3940 i f ( e n e r g y == ∗ minEnergy ) {41 g p r i n t f ( ” \ nLeaf node w ith e n e r g y %E\ n ” , e n e r g y ) ;42 g p r i n t f ( ” I s c u r r e n t ground s t a t e \ n ” ) ;43 s p i n g l a s s w r i t e s p i n s ( s p i n G l a s s , s t d o u t ) ;44 }4546 } e l s e {47 / ∗ Compute e n e r g y ∗ /48 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a d i n g S p i n −1 , minEnergy , s p i n G l a s s ) ;157


158 Chapter F. Source Code Listings49 / ∗ F l i p s p i n down ∗ /50 s p i n G l a s s −> s p i n s [ l e a d i n g S p i n −1] ∗= DOWN;51 / ∗ Compute e n e r g y ∗ /52 f i n d g r o u n d s t a t e s b r u t e f o r c e ( l e a d i n g S p i n −1 , minEnergy , s p i n G l a s s ) ;53 }54 }


1 / ∗2 ∗ F i l e : g s t a t e f i n d e r . h3 ∗4 ∗ S p e c i f i e s i n t e r f a c e f o r ground s t a t e s o l v e r s5 ∗ /67 # i n c l u d e ” s p i n g l a s s . h ”89 # i f n d e f GSTATEFINDER H10 # d e f i n e GSTATEFINDER H1112 / ∗ Determine ground s t a t e s o f s p i n g l a s s13 ∗ s p i n G l a s s ( read ) t h e s p i n g l a s s t o e v a l u a t e ∗ /14 gdouble f i n d g r o u n d s t a t e s ( s t r u c t S p i n G l a s s ∗ s p i n G l a s s ) ;1516 # e n d i f / ∗ GSTATEFINDER H ∗ /159


Bibliography[1] The GLib library. http://library.gnome.org/devel/glib/, 2008. Accessed 2 July, 2008.[2] The Ness user guide. http://www2.epcc.ed.ac.uk/ñess/documentation/index.html, 2008.Accessed 2 July, 2008.[3] User’s guide to the HPCx service.http://www.hpcx.ac.uk/support/documentation/UserGuide/HPCxuser/HPCxuser.html,2008. Accessed 2 July, 2008.[4] D.J. Amit, H. Gutfreund, and H. Sompolinsky. Spin-glass models of neural networks.Physical Review A, 32(2):1007–1018, 1985.[5] D. Andre and J.R. Koza. Parallel genetic programming: a scalable implementation usingthe transputer network architecture. Advances in genetic programming: volume 2 table ofcontents, pages 317–337, 1996.[6] F. Barahona. On the computational complexity of Ising spin glass models. J. Phys. A:Math. Gen, 15(10):3241–3253, 1982.[7] F. Barahona, M. Grotschel, M. Junger, and G. Reinelt. An application of combinatorialoptimization to statistical physics and circuit layout design. Operations Research,36(3):493–513, 1988.[8] R.J. Baxter. Exactly solved models in statistical mechanics. Academic Press, London;Tokyo, 1982.[9] R. Bellman. Dynamic Programming. Science, 153(3731):34–37, 1966.[10] I. Bieche, R. Maynard, R. Rammal, and JP Uhry. On the ground states of the frustrationmodel of a spin glass by a matching method of graph theory. J. Phys. A: Math. Gen,13:2553–2576, 1980.[11] S.G. Brush. History of the Lenz-Ising Model. Rev. Mod. Phys., 39(4):883–893, Oct 1967.161


162 BIBLIOGRAPHY[12] M. Campanino, E. Olivieri, and A.C.D. van Enter. One dimensional spin glasses with potentialdecay 1/r 1+g . Absence of phase transitions and cluster properties. Communicationsin Mathematical Physics, 108(2):241–255, 1987.[13] Lynn Elliot Cannon. A cellular computer to implement the kalman filter algorithm. PhDthesis, Montana State University, Bozeman, MT, USA, 1969.[14] E. Cantu-Paz. A survey of parallel genetic algorithms. Calculateurs Paralleles, Reseauxet Systems Repartis, 10(2):141–171, 1998.[15] A. Carter. Finite-size scaling studies of Ising spin glasses. PhD thesis, Department ofPhysics and Astronomy, University of Manchester, 2003.[16] B.A. Cipra. The Ising Model Is NP-Complete. SIAM News, 33(6), 2000.[17] D. de Fontaine and J. Kulik. Application of the ANNNI model to long-period superstructures.ACTA METALLURG., 33(2):145–165, 1985.[18] J. Díaz, A. Gibbons, G.E. Pantziou, M.J. Serna, P.G. Spirakis, and J. Toran. Parallelalgorithms for the minimum cut and the minimum length tree layout problems. TheoreticalComputer Science, 181(2):267–287, 1997.[19] H.Q. Ding. Monte Carlo simulations of Quantum systems on massively parallel computers.Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pages 34–43, 1993.[20] P.A.M. Dirac. On the Theory of Quantum Mechanics. Proceedings of the Royal Societyof London. Series A, Containing Papers of a Mathematical and Physical Character (1905-1934), 112(762):661–677, 1926.[21] B. Drossel and MA Moore. The ±J spin glass in Migdal-Kadanoff approximation. TheEuropean Physical Journal B Condensed Matter, 2001.[22] S.F. Edwards and P.W. Anderson. Theory of spin glasses. Journal of Physics F: MetalPhysics, 5(5):965–974, 1975.[23] A.N. Ermilov, A.N. Kireev, and A.M. Kurbatov. Investigation of models of spin glass witharbitrary distributions of the coupling constants. Theoretical and Mathematical Physics,49(3):1071–1076, December 1981.[24] Chochia et al. IBM High Performance Switch on System p5 575 Server - Performance.http://www-03.ibm.com/systems/p/hardware/whitepapers/575 hpc perf.html, 2008. Accessed2 July, 2008.


BIBLIOGRAPHY 163[25] R. Forsati, M. Mahdavi, M. Kangavari, and B. Safarkhani. Web page clustering using HarmonySearch optimization. Electrical and Computer Engineering, 2008. CCECE 2008.Canadian Conference on, pages 001601–001604, 2008.[26] M. Gabay and G. Toulouse. Coexistence of Spin-Glass and Ferromagnetic Orderings.Physical Review Letters, 47(3):201–204, 1981.[27] Z.W. Geem, J.H. Kim, et al. A New Heuristic Optimization Algorithm: Harmony Search.SIMULATION, 76(2):60, 2001.[28] F. Glover and G.A. Kochenberger. Handbook of Metaheuristics. Springer, 2003.[29] C.D. Godsil, M. Grötschel, and D.J.A. Welsh. Combinatorics in statistical physics. Handbookof combinatorics (vol. 2) table of contents, pages 1925–1954, 1996.[30] A. Grama, V. Kumar, A. Gupta, and G. Karypis. Introduction to Parallel Computing:Design and Analysis of Algorithms. Addison-Wesley, 2003.[31] D.J. Griffiths. Introduction to Quantum Mechanics. Prentice Hall, 1995.[32] U. Gropengiesser. The ground state energy of the ±J spin glass. A comparison of variousbiologically motivated algorithms. Journal of Statistical Physics, 79(5-6):1005–1012,1995.[33] M.F. Guest. Communications Benchmarks on High-End and Commodity-Class Computers.http://www.cse.scitech.ac.uk/disco/Benchmarks/pmb.2004/index.htm, 2008. Accessed2 July, 2008.[34] F. Hadlock. Finding a maximum cut of a planar graph in polynomial time. SIAM Journalon Computing, 4(3):221–225, 1975.[35] R.W. Hamming. Error Detecting and Error Correcting Codes. Computer Arithmetic, II,29(2):147–160, 1990.[36] A.K. Hartmann. Scaling of stiffness energy for three-dimensional ±J Ising spin glasses.Physical Review E, 59(1):84–87, 1999.[37] W.K. Hastings. Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57(1):97–109, 1970.[38] W. Heisenberg. Mehrkörperproblem und Resonanz in der Quantenmechanik. Zeitschriftfür Physik, 38(6):411–426, 1926.[39] P.C. Hemmer, H. Holden, and S.K. Ratkje. The collected works of Lars Onsager: withcommentary. World Scientific, Singapore; River Edge, NJ, 1996.


164 BIBLIOGRAPHY[40] G. Hempel, G. Blaschke, and KF Pal. The ground state energy of the Edwards-AndersonIsing spin glass with a hybrid genetic algorithm. Physica A, 223(3):283–292, 1996.[41] J. Houdayer and O.C. Martin. Hierarchical approach for computing spin glass groundstates. Physical Review E, 64(5):56704, 2001.[42] H. Kawamura. Chiral ordering in Heisenberg spin glasses in two and three dimensions.Physical Review Letters, 68(25):3785–3788, 1992.[43] J.H. Kim, Z.W. Geem, and E.S. Kim. Parameter estimation of the nonlinear Muskingummodel using harmony search. Journal of the American Water Resources Association,37(5):1131–1138, 2001.[44] S. Kirkpatrick, CD Gelati Jr, and MP Vecchi. Optimization by Simulated Annealing.Biology and Computation: A Physicist’s Choice, 1994.[45] K.S. Lee and Z.W. Geem. A new structural optimization method based on the harmonysearch algorithm. Computers and Structures, 82(9-10):781–798, 2004.[46] F. Liers, M. Junger, G. Reinelt, and G. Rinaldi. Computing Exact Ground States of HardIsing Spin Glass Problems by Branch-and-Cut. New Optimization Algorithms in Physics,June 2005.[47] B.M. McCoy and T.T. Wu. The two-dimensional Ising model. Harvard University Press,Cambridge, Mass., 1973.[48] S.P. Meyn and R.L. Tweedie. Markov chains and stochastic stability. Springer-VerlagLondon, 1993.[49] M. Mezard, G. Parisi, and M.A. Virasoro. Spin glass theory and beyond. World ScientificTeaneck, NJ, USA, 1987.[50] T.M. Mitchell. Machine learning. McGraw-Hill, 1997.[51] D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli. Convergence and Finite-Time Behaviorof Simulated Annealing. Advances in Applied Probability, 18(3):747–771, 1986.[52] C.M. Newman and D.L. Stein. Blocking and Persistence in the Zero-Temperature Dynamicsof Homogeneous and Disordered Ising Models. Physical Review Letters, 82(20):3944–3947, 1999.[53] G. Pardella and F. Liers. Exact Ground States of Huge Two-Dimensional Planar Ising SpinGlasses. Arxiv preprint arXiv:0801.3143, 2008.


BIBLIOGRAPHY 165[54] G. Parisi. Infinite Number of Order Parameters for Spin-Glasses. Physical Review Letters,43(23):1754–1756, 1979.[55] D.J. Ram, TH Sreenivas, and K.G. Subramaniam. Parallel Simulated Annealing Algorithms.Journal of Parallel and Distributed Computing, 37(2):207–212, 1996.[56] J. Randa. Axial next-nearest-neighbor Ising (ANNNI) and extended-ANNNI models inexternal fields. Physical Review Letters, 32(1):413–416, 1985.[57] W. Selke. The ANNNI model-Theoretical analysis and experimental application. PhysicsReports, 170(4):213–264, 1988.[58] D. Sherrington and S. Kirkpatrick. Solvable Model of a Spin-Glass. Physical ReviewLetters, 35(26):1792–1796, 1975.[59] P. Sutton, DL Hunter, and N. Jan. Short Communication The ground state energy of the±J spin glass from the genetic algorithm. J. Phys. I France, 4:1281–1285, 1994.[60] D.J. Thouless, P.W. Anderson, and R.G. Palmer. Solution of ’Solvable model of a spinglass’. Philosophical Magazine, 35(3):593–601, 1977.[61] A. Viterbi. Error bounds for convolutional codes and an asymptotically optimum decodingalgorithm. Information Theory, IEEE Transactions on, 13(2):260–269, 1967.[62] D.H. Wolpert and W.G. Macready. No free lunch theorems for optimization. EvolutionaryComputation, IEEE Transactions on, 1(1):67–82, 1997.[63] F. Y. Wu. The Potts model. Rev. Mod. Phys., 54(1):235–268, Jan 1982.

More magazines by this user
Similar magazines