06.04.2013 Views

Monte Carlo Methods in Statistical Mechanics: Foundations and ...

Monte Carlo Methods in Statistical Mechanics: Foundations and ...

Monte Carlo Methods in Statistical Mechanics: Foundations and ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong><br />

<strong>in</strong> <strong>Statistical</strong> <strong>Mechanics</strong>:<br />

<strong>Foundations</strong> <strong>and</strong> New Algorithms<br />

Alan D. Sokal<br />

Department of Physics<br />

New York University<br />

4 Wash<strong>in</strong>gton Place<br />

New York, NY 10003<br />

USA<br />

E-mail: SOKAL@NYU.EDU<br />

Lectures at the Cargese Summer School on<br />

\Functional Integration: Basics <strong>and</strong> Applications"<br />

September 1996<br />

These notes are an updated version of lectures given atthe Cours deTroisieme<br />

Cycle de laPhysique en Suisse Rom<strong>and</strong>e (Lausanne, Switzerl<strong>and</strong>) <strong>in</strong> June<br />

1989. We thank the TroisiemeCycledelaPhysique en Suisse Rom<strong>and</strong>e <strong>and</strong><br />

Professor Michel Droz for k<strong>in</strong>dly giv<strong>in</strong>g permission to repr<strong>in</strong>tthese notes.


Note tothe Reader<br />

The follow<strong>in</strong>g notes are based on my course \<strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong> <strong>in</strong> <strong>Statistical</strong><br />

<strong>Mechanics</strong>: <strong>Foundations</strong> <strong>and</strong> New Algorithms" given at the Cours de Troisieme Cycle<br />

de laPhysique en Suisse Rom<strong>and</strong>e (Lausanne, Switzerl<strong>and</strong>) <strong>in</strong> June 1989, <strong>and</strong> onmy<br />

course \Multi-Grid <strong>Monte</strong> <strong>Carlo</strong> for Lattice Field Theories" given at the W<strong>in</strong>ter College<br />

on Multilevel Techniques <strong>in</strong> Computational Physics (Trieste, Italy) <strong>in</strong> January{February<br />

1991.<br />

The reader is warned that some ofthis material is out-of-date (this is particularly<br />

true as regards reports ofnumerical work). For lack oftime, I have made noattempt to<br />

update the text, but Ihave added footnotes marked \Note Added 1996" that correct<br />

a few errors <strong>and</strong> give additional bibliography.<br />

My rst two lectures at Cargese 1996 were based on the material <strong>in</strong>cluded here. My<br />

third lecture described the new nite-size-scal<strong>in</strong>g extrapolation method of [97, 98, 99,<br />

100, 101, 102, 103].<br />

1 Introduction<br />

The goal of these lectures is to givean<strong>in</strong>troduction to current research on <strong>Monte</strong><br />

<strong>Carlo</strong> methods <strong>in</strong> statistical mechanics <strong>and</strong> quantum eld theory, withanemphasis on:<br />

1) the conceptual foundations of the method, <strong>in</strong>clud<strong>in</strong>g the possible dangers <strong>and</strong> misuses,<br />

<strong>and</strong> the correct use of statistical error analysis <strong>and</strong><br />

2) new <strong>Monte</strong> <strong>Carlo</strong> algorithms for problems <strong>in</strong> critical phenomena <strong>and</strong>quantum<br />

eld theory, aimed at reduc<strong>in</strong>g or elim<strong>in</strong>at<strong>in</strong>gthe \critical slow<strong>in</strong>g-down" found <strong>in</strong><br />

conventional algorithms.<br />

These lectures are aimed at a mixed audience of theoretical, computational <strong>and</strong> mathematical<br />

physicists |someofwhom are currently do<strong>in</strong>g orwant todo<strong>Monte</strong> <strong>Carlo</strong><br />

studies themselves, others of whom wanttobeable toevaluatethereliabilityofpublished<br />

<strong>Monte</strong> <strong>Carlo</strong> work.<br />

Before embark<strong>in</strong>g on9hours of lectures on <strong>Monte</strong> <strong>Carlo</strong> methods, let me o er a<br />

warn<strong>in</strong>g:<br />

<strong>Monte</strong> <strong>Carlo</strong> is an extremely bad method it should be used only when all<br />

alternative methods are worse.<br />

Why isthis so? Firstly, all numerical methods are potentially dangerous, compared<br />

to analytic methods there are more ways to make mistakes. Secondly, asnumerical<br />

methods go, <strong>Monte</strong> <strong>Carlo</strong> is one ofthe least e cient it should be used only on those<br />

<strong>in</strong>tractable problems for which all other numerical methods are even less e cient.<br />

1


Let me be more precise about this latter po<strong>in</strong>t. Virtually all <strong>Monte</strong> <strong>Carlo</strong> methods<br />

have the property that the statistical error behaves as<br />

error<br />

1<br />

p computational budget<br />

(or worse) this is an essentially universal consequence of the central limit theorem. It<br />

may be possible to improve the proportionality constant <strong>in</strong> this relation by afactor of<br />

106 or more | this is one ofthepr<strong>in</strong>cipal subjects ofthese lectures | but the overall<br />

1= p n behavior is <strong>in</strong>escapable. This should be contrasted with traditional determ<strong>in</strong>istic<br />

numerical methods whose rate ofconvergence is typically someth<strong>in</strong>g like1=n4 or e ;n or<br />

e ;2n.Therefore,<br />

<strong>Monte</strong> <strong>Carlo</strong> methods should be used only on those extremely di cult<br />

problems <strong>in</strong> which all alternative numerical methods behave even worse than 1= p n.<br />

Consider, for example, the problem of numerical <strong>in</strong>tegration <strong>in</strong> d dimensions, <strong>and</strong><br />

let us compare <strong>Monte</strong> <strong>Carlo</strong> <strong>in</strong>tegration with a traditional determ<strong>in</strong>istic method such<br />

as Simpson's rule. As is well known, the error <strong>in</strong> Simpson's rule with n nodal po<strong>in</strong>ts<br />

behaves asymptotically as n ;4=d (for smooth <strong>in</strong>tegr<strong>and</strong>s). In low dimension (d 8) it is much<br />

worse. So it is not surpris<strong>in</strong>g that <strong>Monte</strong><strong>Carlo</strong> is the method of choice for perform<strong>in</strong>g<br />

high-dimensional <strong>in</strong>tegrals. It is still a bad method: with an error proportional to n ;1=2 ,<br />

it is di cult toachieve more than 4 or 5 digits accuracy. Butnumerical <strong>in</strong>tegration <strong>in</strong><br />

high dimension is very di cult though <strong>Monte</strong> <strong>Carlo</strong> is bad, all other known methods<br />

are worse. 1<br />

In summary,<strong>Monte</strong><strong>Carlo</strong> methods should be used only when neither analytic methods<br />

nor determ<strong>in</strong>istic numerical methods are workable (or e cient). Onegeneral doma<strong>in</strong><br />

of application of <strong>Monte</strong> <strong>Carlo</strong> methods will be, therefore, to systems with many degrees<br />

of freedom, far from the perturbative regime. Butsuch systems are precisely the ones<br />

of greatest <strong>in</strong>terest <strong>in</strong> statistical mechanics <strong>and</strong> quantumeld theory!<br />

It is appropriateto close this <strong>in</strong>troduction with a general methodological observation,<br />

ably articulated by Wood <strong>and</strong> Erpenbeck [3]:<br />

:::these [<strong>Monte</strong> <strong>Carlo</strong>] <strong>in</strong>vestigations share some ofthe features of ord<strong>in</strong>ary<br />

experimental work, <strong>in</strong> that they are susceptible to bothstatistical <strong>and</strong> systematic<br />

errors. With regardtothese matters, we believe that papers should<br />

meet muchthe samest<strong>and</strong>ards as are normally required for experimental <strong>in</strong>vestigations.<br />

We have <strong>in</strong> m<strong>in</strong>d the <strong>in</strong>clusion of estimates of statistical error,<br />

descriptions of experimental conditions (i.e. parameters of the calculation),<br />

1 This discussion of numerical <strong>in</strong>tegration is grossly oversimpli ed. Firstly, there are determ<strong>in</strong>istic<br />

methods better than Simpson's rule <strong>and</strong> there are also sophisticated <strong>Monte</strong> <strong>Carlo</strong> methods whose<br />

asymptotic behavior (on smooth <strong>in</strong>tegr<strong>and</strong>s) behaves as n ;p with p strictly greater than 1=2 [1, 2].<br />

Secondly, forallthese algorithms (except st<strong>and</strong>ard <strong>Monte</strong> <strong>Carlo</strong>), the asymptotic behavior as n !1<br />

may be irrelevant <strong>in</strong> practice, because it is achieved only at ridiculously large values of n. For example,<br />

to carry out Simpson's rule with even 10 nodes per axis (a very coarse mesh) requires n =10 d ,which<br />

is unachievable for d > 10.<br />

2


elevant details of apparatus (program) design, comparisons with previous<br />

<strong>in</strong>vestigations, discussion of systematic errors, etc. Only if these are provided<br />

will the results be trustworthy guides to improved theoretical underst<strong>and</strong><strong>in</strong>g.<br />

2 Dynamic <strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong>:<br />

General Theory<br />

All <strong>Monte</strong> <strong>Carlo</strong> workhas the samegeneral structure: given some probabilitymeasure<br />

on some con guration space S, wewishto generate many r<strong>and</strong>om samples from .<br />

How isthis to be done?<br />

<strong>Monte</strong> <strong>Carlo</strong> methods can be classi ed as static or dynamic. Static methods are<br />

those that generate a sequence of statistically <strong>in</strong>dependent samples from the desired<br />

probability distribution .These techniques are widely used <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> numerical<br />

<strong>in</strong>tegration <strong>in</strong> spaces of not-too-high dimension [2]. But they are unfeasible for most<br />

applications <strong>in</strong> statistical physics <strong>and</strong>quantum eld theory,where is the Gibbs measure<br />

of some rather complicated system (extremely many coupled degrees of freedom).<br />

The idea of dynamic <strong>Monte</strong> <strong>Carlo</strong> methods is to <strong>in</strong>vent astochastic process with<br />

state space S hav<strong>in</strong>g as its unique equilibrium distribution. We then simulate this<br />

stochasticprocessonthecomputer, start<strong>in</strong>g from an arbitrary <strong>in</strong>itial con guration once<br />

the system has reached equilibrium, wemeasure time averages, which converge (as the<br />

runtimetends to <strong>in</strong> nity) to -averages. In physical terms, weare<strong>in</strong>vent<strong>in</strong>gastochastic<br />

time evolution for the given system. Let us emphasize, however, that this timeevolution<br />

need not correspond to any real \physical" dynamics: rather, the dynamics is simply a<br />

numerical algorithm, <strong>and</strong> itistobechosen, like allnumerical algorithms, on the basis<br />

of its computational e ciency.<br />

In practice, the stochastic process is always taken to beaMarkov process. So our<br />

rst order of bus<strong>in</strong>ess is to review some ofthe general theory of Markov cha<strong>in</strong>s. 2 For<br />

simplicity let us assumethatthestate space S is discrete (i.e. niteorcountably <strong>in</strong> nite).<br />

Much ofthe theory for general state space can be guessed from the discretetheory by<br />

mak<strong>in</strong>g the obvious replacements (sums by <strong>in</strong>tegrals, matrices by kernels), although the<br />

proofs tend tobeconsiderably harder.<br />

Loosely speak<strong>in</strong>g, a Markovcha<strong>in</strong> (= discrete-timeMarkov process) withstate space<br />

S is a sequence of S-valued r<strong>and</strong>om variables X0X1X2::: such that successive transitions<br />

Xt ! Xt+1 are statistically <strong>in</strong>dependent (\the future depends on the past only<br />

through the present"). More precisely, aMarkov cha<strong>in</strong> is speci ed by two <strong>in</strong>gredients:<br />

2 The books of Kemeny<strong>and</strong>Snell [4] <strong>and</strong> Iosifescu [5] are excellent references on the theory of Markov<br />

cha<strong>in</strong>s with nite state space. At a somewhat higher mathematical level, the books of Chung [6]<strong>and</strong><br />

Nummel<strong>in</strong> [7] deal with the cases of countable <strong>and</strong> general state space, respectively.<br />

3


The <strong>in</strong>itial distribution . Here is a probability distribution on S, <strong>and</strong>the<br />

process will be de ned so that IP(X0 = x) = x.<br />

The transition probability matrix P = fpxygxy2S = fp(x ! y)gxy2S. HereP is a<br />

matrix satisfy<strong>in</strong>g pxy 0 for all x y <strong>and</strong> P y pxy = 1 for all x. The process will be<br />

de ned so that IP(Xt+1 = y j Xt = x) =pxy.<br />

The Markov cha<strong>in</strong> is then completely speci ed by the jo<strong>in</strong>t probabilities<br />

IP(X0 = x0 X1 = x1 X2 = x2 ::: Xn = xn) x0 px0x1 px1x2 pxn;1xn : (2.1)<br />

This product structure expresses the factthat \successive transitions are <strong>in</strong>dependent".<br />

Next we de ne the n-step transition probabilities<br />

p (n)<br />

xy = IP(Xt+n = y j Xt = x) : (2.2)<br />

Clearly p (0)<br />

xy = xy, p (1)<br />

xy = pxy, <strong>and</strong> <strong>in</strong> general fp (n)<br />

xy g are the matrix elements ofP n .<br />

AMarkov cha<strong>in</strong> is said to beirreducible if from each state it is possible to get to<br />

each other state: that is, for each pair x y 2 S, there exists ann 0 for which p (n)<br />

xy > 0.<br />

We shall be considered almost exclusively with irreducible Markov cha<strong>in</strong>s.<br />

For each state x, wede ne the period ofx (denoted dx) tobethe greatest common<br />

divisor of the numbers n > 0 for which p (n)<br />

xx > 0. If dx =1,the state x is called<br />

aperiodic. It can be shown that, <strong>in</strong> an irreducible cha<strong>in</strong>, all states have the same period<br />

so we canspeakofthe cha<strong>in</strong> hav<strong>in</strong>g period d. Moreover, the state space can then<br />

be partitioned <strong>in</strong>to subsets S1S2:::Sd around whichthe cha<strong>in</strong> moves cyclically, i.e.<br />

p (n)<br />

xy =0whenever x 2 Si, y 2 Sj with j ; i 6= n (mod d). F<strong>in</strong>ally, itcanbeshown<br />

that acha<strong>in</strong> is irreducible <strong>and</strong> aperiodic if <strong>and</strong> only if, for each pair x y, there exists<br />

Nxy such that p (n)<br />

xy > 0forall n Nxy.<br />

We nowcome tothe fundamental topic <strong>in</strong> the theory of Markov cha<strong>in</strong>s, which isthe<br />

problem of convergence to equilibrium. A probability measure = f xgx2S is called<br />

a stationary distribution (or <strong>in</strong>variant distribution or equilibrium distribution) for the<br />

Markov cha<strong>in</strong> P <strong>in</strong> case X<br />

x<br />

x pxy = y for all y: (2.3)<br />

Astationary probability distribution need not exist but if it does, then a lot more<br />

follows:<br />

Theorem 1 Let P be thetransition probability matrix of an irreducible Markov cha<strong>in</strong><br />

of period d. If a stationary probability distribution exists, then it is unique, <strong>and</strong> x > 0<br />

for all x. Moreover,<br />

lim<br />

n!1 p(nd+r) xy<br />

=<br />

for all x y. Inparticular, if P is aperiodic, then<br />

( d y if x 2 Si, y 2 Sj with j ; i = r (mod d)<br />

0 if x 2 Si, y 2 Sj with j ; i 6= r (mod d)<br />

(2.4)<br />

lim<br />

n!1 p(n)<br />

xy = y : (2.5)<br />

4


This theorem shows that the Markov cha<strong>in</strong> converges as t !1to the equilibrium<br />

distribution , irrespective ofthe <strong>in</strong>itial distribution . Moreover, under the conditions<br />

of this theorem much more can be proven | for example, a strong law of large numbers,<br />

a central limit theorem, <strong>and</strong> a law of the iterated logarithm. For statements <strong>and</strong> proofs<br />

of all these theorems, we refer the reader to Chung [6].<br />

Wecannowseehowtosetupadynamic <strong>Monte</strong><strong>Carlo</strong>method for generat<strong>in</strong>g samples<br />

from the probability distribution . It su ces to <strong>in</strong>vent a transition probability matrix<br />

P = fpxyg = fp(x ! y)g satisfy<strong>in</strong>g the follow<strong>in</strong>g two conditions:<br />

(A) Irreducibility. 3 For each pair x y 2 S, there exists ann 0 for which p (n)<br />

xy<br />

(B) Stationarity of . For each y 2 S,<br />

X<br />

x<br />

> 0.<br />

x pxy = y: (2.6)<br />

Then Theorem 1 (together with its more precise counterparts) shows that simulation of<br />

the Markovcha<strong>in</strong> P constitutes a legitimate<strong>Monte</strong> <strong>Carlo</strong> method for estimat<strong>in</strong>gaverages<br />

with respect to .Wecanstart the system <strong>in</strong> any state x, <strong>and</strong>the system is guaranteed<br />

to converge to equilibrium ast !1[at least <strong>in</strong> the averaged sense (2.4)]. Long-time<br />

averages of any observable f will converge with probability 1to -averages (strong<br />

law of large numbers), <strong>and</strong> willdosowith uctuations of size n ;1=2 (central limit<br />

theorem). In practice we will discard the data fromthe <strong>in</strong>itial transient, i.e. before the<br />

system has come close to equilibrium, but <strong>in</strong> pr<strong>in</strong>ciple this is not necessary (the bias is<br />

of order n ;1 ,hence asymptotically much smaller than the statistical uctuations).<br />

So far, so good! But while this is a correct <strong>Monte</strong> <strong>Carlo</strong> algorithm,itmay or may<br />

not be an e cient one. The key di culty isthat the successivestates X0 X1 :::of the<br />

Markov cha<strong>in</strong> are correlated |perhaps very strongly | so the variance of estimates<br />

producedfromthe dynamic <strong>Monte</strong> <strong>Carlo</strong>simulation may be much higher than <strong>in</strong> static<br />

<strong>Monte</strong> <strong>Carlo</strong> (<strong>in</strong>dependent sampl<strong>in</strong>g). To make this precise, let f = ff(x)gx2S be a<br />

real-valued function de ned on the state space S (i.e. a real-valued observable) that is<br />

square-<strong>in</strong>tegrable with respect to . Now consider the stationary Markov cha<strong>in</strong> (i.e.<br />

start the system <strong>in</strong> the stationary distribution ,orequivalently, \equilibrate" it for a<br />

very long time prior to observ<strong>in</strong>g the system). Then fftg ff(Xt)g is a stationary<br />

stochastic process with mean<br />

f<br />

hfti = X<br />

x<br />

x f(x) (2.7)<br />

3 Weavoid the term \ergodicity" because of its multiple <strong>and</strong> con ict<strong>in</strong>g mean<strong>in</strong>gs. In the physics<br />

literature, <strong>and</strong> <strong>in</strong>the mathematics literature on Markov cha<strong>in</strong>s with nite state space, \ergodic" is typically<br />

used as a synonym for \irreducible" [4, Section 2.4] [5, Chapter 4]. However, <strong>in</strong> the mathematics<br />

literature on Markov cha<strong>in</strong>s with general state space, \ergodic" is used as a synonym for \irreducible,<br />

aperiodic <strong>and</strong> positive Harris recurrent" [7, p.114][8, p. 169].<br />

5


<strong>and</strong> unnormalized autocorrelation function 4<br />

Cff(t) hfsfs+ti; 2 f (2.8)<br />

= X<br />

The normalized autocorrelation function is then<br />

xy<br />

f(x)[ xp (jtj)<br />

xy ; x y] f(y):<br />

ff(t) Cff(t)=Cff(0): (2.9)<br />

Typically ff(t) decays exponentially ( e ;jtj= ) for large t wede ne the exponential<br />

autocorrelation time<br />

t<br />

expf =limsup<br />

t!1 ; log j ff(t)j<br />

(2.10)<br />

<strong>and</strong><br />

exp =sup<br />

f<br />

expf: (2.11)<br />

Thus, exp is the relaxation time ofthe slowest mode <strong>in</strong>the system. (If the state space<br />

is <strong>in</strong> nite, exp might be+1!)<br />

An equivalent de nition, which is useful for rigorous analysis, <strong>in</strong>volves consider<strong>in</strong>g<br />

the spectrum ofthe transition probability matrix P considered as an operator on the<br />

Hilbert space l 2 ( ). 5 It is not hard to prove the follow<strong>in</strong>g factsabout P :<br />

(a) The operator P is a contraction. (In particular, its spectrum lies<strong>in</strong>the closed<br />

unit disk.)<br />

(b) 1 is a simple eigenvalue of P ,aswell as of its adjo<strong>in</strong>t P ,witheigenvector equal<br />

to the constant function 1.<br />

(c) If the Markov cha<strong>in</strong> is aperiodic, then 1 is the onlyeigenvalue of P (<strong>and</strong> ofP )<br />

on the unit circle.<br />

(d) Let R be the spectral radius of P act<strong>in</strong>g onthe orthogonal complement ofthe<br />

constant functions:<br />

Then R = e ;1= exp .<br />

R <strong>in</strong>f n r: spec(P j n 1 ? ) f : j j rg o : (2.12)<br />

Facts (a){(c) are a generalized Perron-Frobenius theorem [9] fact (d) is a consequence<br />

of a generalized spectral radius formula [10, Propositions 2.3{2.5].<br />

4 In the statistics literature, this is called the autocovariance function.<br />

5 l 2 ( )isthe space of complex-valued functions on S that are square-<strong>in</strong>tegrable with respect to :<br />

kfk ( P x xjf(x)j 2 ) 1=2 < 1. The<strong>in</strong>ner product is given by (fg) P x xf(x) g(x).<br />

6


The rate ofconvergence to equilibrium from an <strong>in</strong>itial nonequilibrium distribution<br />

can be bounded above <strong>in</strong>terms of R (<strong>and</strong> hence exp). More precisely, let is a probability<br />

measure on S, <strong>and</strong>let us de ne itsdeviation from equilibrium <strong>in</strong>thel2 sense,<br />

Then, clearly,<br />

d2( ) jj ; 1jjl 2( ) = sup<br />

jjfjj l 2 ( ) 1<br />

And bythe spectral radius formula,<br />

Z fd ; Z fd : (2.13)<br />

d2( P t ) jjP t j n 1 ? jj d2( ) : (2.14)<br />

jjP t j n 1 ? jj R t = exp(;t= exp) (2.15)<br />

asymptotically as t !1,withequality for all t if P is self-adjo<strong>in</strong>t (seebelow).<br />

On the other h<strong>and</strong>, for a given observable f we de ne the <strong>in</strong>tegrated autocorrelation<br />

time<br />

<strong>in</strong>tf = 1<br />

2<br />

= 1<br />

2 +<br />

1X<br />

t=;1 ff(t) (2.16)<br />

1X<br />

t=1<br />

ff(t)<br />

[The factor of 1<br />

2 is purely a matter of convention it is <strong>in</strong>serted so that <strong>in</strong>tf expf if<br />

ff(t) e ;jtj= with 1.] The <strong>in</strong>tegrated autocorrelation time controls the statistical<br />

error <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> measurements ofhfi. More precisely, the sample mean<br />

has variance<br />

f<br />

var(f) = 1<br />

n 2<br />

= 1<br />

n<br />

nX<br />

rs=1<br />

n;1 X<br />

1<br />

n<br />

t=;(n;1)<br />

nX<br />

ft<br />

t=1<br />

(2.17)<br />

Cff(r ; s) (2.18)<br />

1 ; jtj<br />

!<br />

Cff(t)<br />

n<br />

(2.19)<br />

1<br />

n (2 <strong>in</strong>tf) Cff(0) for n (2.20)<br />

Thus, the variance of f is a factor 2 <strong>in</strong>tf larger than it would be if the fftg were statistically<br />

<strong>in</strong>dependent. Stated di erently, the number of \e ectively <strong>in</strong>dependent samples"<br />

<strong>in</strong> a run of length n is roughly n=2 <strong>in</strong>tf .<br />

It is sometimes convenienttomeasure the<strong>in</strong>tegrated autocorrelation time<strong>in</strong>terms of<br />

the equivalent pure exponential decay thatwould produce thesamevalue of P1 t=;1 ff(t),<br />

namely<br />

;1<br />

e <strong>in</strong>tf<br />

log 2 <strong>in</strong>tf;1<br />

2 <strong>in</strong>tf +1<br />

7<br />

: (2.21)


This quantity has the nicefeature that asequenceofuncorrelated data has e<strong>in</strong>tf =0<br />

(but <strong>in</strong>tf = 1<br />

2 ). Of course, e<strong>in</strong>tf is ill-de ned if <strong>in</strong>tf < 1<br />

2 , as can occasionally happen<br />

<strong>in</strong> cases of anticorrelation.<br />

In summary, the autocorrelation times exp <strong>and</strong> <strong>in</strong>tf play di erent roles <strong>in</strong> <strong>Monte</strong><br />

<strong>Carlo</strong> simulations. exp places an upper bound onthe number of iterations ndisc which<br />

should be discarded at the beg<strong>in</strong>n<strong>in</strong>g ofthe run, before the system has atta<strong>in</strong>ed equilibrium<br />

for example, ndisc 20 exp is usually more than adequate. On the other h<strong>and</strong>,<br />

<strong>in</strong>tf determ<strong>in</strong>es the statistical errors <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong>measurements ofhfi, once equilibrium<br />

has been atta<strong>in</strong>ed.<br />

Most commonly it is assumed that exp <strong>and</strong> <strong>in</strong>tf are of the same order of magnitude,<br />

at least for \reasonable" observables f. But this is not true <strong>in</strong> general. In<br />

fact, <strong>in</strong> statistical-mechanical problems near a critical po<strong>in</strong>t, one usually expects the<br />

autocorrelation function ff(t) to obey a dynamic scal<strong>in</strong>g law [11] of the form<br />

valid <strong>in</strong> the region<br />

ff(t ) jtj ;a F ( ; c) jtj b (2.22)<br />

jtj 1 j ; cj 1 j ; cjjtj b bounded. (2.23)<br />

Here a b > 0 are dynamic critical exponents <strong>and</strong> F is a suitable scal<strong>in</strong>g function is<br />

some \temperature-like" parameter, <strong>and</strong> c is the critical po<strong>in</strong>t. Now suppose that F<br />

is cont<strong>in</strong>uous <strong>and</strong> strictly positive, with F (x) decay<strong>in</strong>g rapidly (e.g. exponentially) as<br />

jxj!1.Then it is not hard to see that<br />

expf<br />

<strong>in</strong>tf<br />

ff(t = c) jtj ;a<br />

j ; cj ;1=b<br />

j ; cj ;(1;a)=b<br />

(2.24)<br />

(2.25)<br />

(2.26)<br />

so that expf <strong>and</strong> <strong>in</strong>tf have di erent critical exponents unless a =0. 6 Actually, this<br />

should not be surpris<strong>in</strong>g: replac<strong>in</strong>g \time" by \space", we see that expf is the analogue<br />

of a correlation length, while <strong>in</strong>tf is the analogue of a susceptibility <strong>and</strong> (2.24){(2.26)<br />

are the analogue of the well-known scal<strong>in</strong>g law =(2; ) | clearly 6= <strong>in</strong> general!<br />

So it is crucial to dist<strong>in</strong>guish between the two types of autocorrelation time.<br />

Return<strong>in</strong>g tothe general theory, wenotethat oneconvenient way of satisfy<strong>in</strong>g condition<br />

(B) is to satisfy the follow<strong>in</strong>g stronger condition:<br />

(B 0 ) For each pairx y 2 S xpxy = ypyx: (2.27)<br />

6 Our discussion of this topic <strong>in</strong> [12] is <strong>in</strong>correct.<br />

8


[Summ<strong>in</strong>g (B 0 )over x, we recover (B).] (B 0 ) is called the detailed-balance condition<br />

a Markov cha<strong>in</strong> satisfy<strong>in</strong>g (B 0 ) is called reversible. 7 (B 0 ) is equivalent tothe selfadjo<strong>in</strong>tness<br />

of P as on operator on the space l 2 ( ). In this case, the spectrum ofP<br />

is real <strong>and</strong> lies<strong>in</strong>the closed <strong>in</strong>terval [;1 1] we de ne<br />

>From (2.12) we have<br />

m<strong>in</strong> = <strong>in</strong>f spec (P j n 1 ? ) (2.28)<br />

max = sup spec (P j n 1 ? ) (2.29)<br />

exp =<br />

;1<br />

log max[j m<strong>in</strong>j max]<br />

: (2.30)<br />

For many purposes, only the spectrum near +1 matters, so it is useful to de ne the<br />

modi ed exponential autocorrelation time<br />

0<br />

exp = ;1= log max if max > 0<br />

+1 if max 0<br />

(2.31)<br />

Now let us apply the spectral theorem to the operator P : it follows that the autocorrelation<br />

function ff(t) has a spectral representation<br />

ff(t) =<br />

maxf Z<br />

m<strong>in</strong>f<br />

jtj d ff( ) (2.32)<br />

with a nonnegative spectral weight d ff( ) supported on an <strong>in</strong>terval<br />

[ m<strong>in</strong>f maxf] [ m<strong>in</strong> max]. Clearly<br />

<strong>and</strong> we can de ne<br />

0<br />

expf =<br />

expf =<br />

(if maxf 0). On the other h<strong>and</strong>,<br />

It follows that<br />

<strong>in</strong>tf<br />

0<br />

1<br />

2 1 ; e<br />

;1<br />

log max[j m<strong>in</strong>fj maxf]<br />

( ;1= log maxf if maxf > 0<br />

0 if maxf 0<br />

<strong>in</strong>tf = 1<br />

2<br />

@ 1+e;1= 0 expf<br />

;1= 0<br />

expf<br />

maxf Z<br />

m<strong>in</strong>f<br />

1<br />

A<br />

(2.33)<br />

(2.34)<br />

1+<br />

1 ; d ff( ) (2.35)<br />

1<br />

2<br />

;1= 1+e 0 exp<br />

1 ; e ;1= 0 !<br />

exp<br />

0<br />

exp : (2.36)<br />

7 For the physical signi cance of this term, see Kemeny <strong>and</strong>Snell [4, section 5.3] or Iosifescu [5,<br />

section 4.5].<br />

9


Moreover, s<strong>in</strong>ce 7! (1 + )=(1 ; ) is a convex function, Jensen's <strong>in</strong>equality implies<br />

that<br />

<strong>in</strong>tf<br />

1<br />

2<br />

1+ ff(1)<br />

1 ; ff(1)<br />

If we de ne the <strong>in</strong>itial autocorrelation time<br />

(<br />

;1= log<br />

<strong>in</strong>itf =<br />

ff(1) if ff(1) 0<br />

unde ned if ff(1) < 0<br />

then these <strong>in</strong>equalities can be summarized conveniently as<br />

<strong>in</strong>itf<br />

Conversely, itisnothard to see that<br />

sup<br />

f2l 2 ( )<br />

e <strong>in</strong>tf<br />

0<br />

expf<br />

<strong>in</strong>itf = sup<br />

f2l2 e<strong>in</strong>tf = sup<br />

( )<br />

f2l2 ( )<br />

: (2.37)<br />

(2.38)<br />

0<br />

exp : (2.39)<br />

0<br />

expf = 0 exp (2.40)<br />

it su ces to choose f so that its spectral weight d ff is supported <strong>in</strong> a very small<br />

<strong>in</strong>terval near max.<br />

F<strong>in</strong>ally, let us make a remark about transition probabilities P that are \built upout<br />

of" other transition probabilities P1P2:::Pn:<br />

a) If P1P2:::Pn satisfy the stationarity condition (resp. the detailed-balance condition)<br />

for ,then so does any convex comb<strong>in</strong>ation P = P n i=1 iPi. Here i 0<br />

<strong>and</strong> P n i=1 i =1.<br />

b) If P1P2:::Pn satisfy the stationarity condition for ,then so does the product<br />

P = P1P2 Pn. (Note, however, that P does not <strong>in</strong> general satisfy the detailedbalance<br />

condition, even if the <strong>in</strong>dividual Pi do. 8 )<br />

Algorithmically,the convex comb<strong>in</strong>ation amounts tochoos<strong>in</strong>g r<strong>and</strong>omly, with probabilities<br />

f ig, from among the \elementary operations" Pi. (It is crucial here that the i<br />

are constants, <strong>in</strong>dependent ofthe current con guration of the system only <strong>in</strong> this case<br />

does P leave stationary <strong>in</strong> general.) Similarly,the product corresponds to perform<strong>in</strong>g<br />

sequentially the operations P1P2:::Pn.<br />

8 Recall that ifA <strong>and</strong> B are self-adjo<strong>in</strong>t operators, then AB is self-adjo<strong>in</strong>t if <strong>and</strong> only if A <strong>and</strong> B<br />

commute.<br />

10


3 <strong>Statistical</strong> Analysis of Dynamic <strong>Monte</strong> <strong>Carlo</strong>Data<br />

Many published <strong>Monte</strong> <strong>Carlo</strong> studies conta<strong>in</strong> statements like:<br />

We ran for a total of 100000 iterations, discard<strong>in</strong>g the rst 50000 iterations<br />

(for equilibration) <strong>and</strong> then tak<strong>in</strong>g measurements once every 100 iterations.<br />

It is important to emphasize that unless further <strong>in</strong>formation is given | namely, the<br />

autocorrelation time ofthe algorithm | such statements have no value whatsoever.<br />

Is a run of 100000 iterations long enough? Are 50000 iterations su cient for equilibration?<br />

That depends on how bigthe autocorrelation time is. The purpose of this<br />

lecture is to give some practical advice for choos<strong>in</strong>g the parameters of a dynamic <strong>Monte</strong><br />

<strong>Carlo</strong> simulation, <strong>and</strong> to give an<strong>in</strong>troduction to the statistical theory that puts this<br />

advice on a sound mathematical foot<strong>in</strong>g.<br />

There are two fundamental | <strong>and</strong> quitedist<strong>in</strong>ct | issues <strong>in</strong> dynamic <strong>Monte</strong> <strong>Carlo</strong><br />

simulation:<br />

Initialization bias. If the Markov cha<strong>in</strong> is started <strong>in</strong> a distribution that isnot<br />

equal to the stationary distribution ,then there is an \<strong>in</strong>itial transient" <strong>in</strong> which<br />

the data do not re ect the desired equilibrium distribution . This results <strong>in</strong>a<br />

systematic error (bias).<br />

Autocorrelation <strong>in</strong> equilibrium. The Markov cha<strong>in</strong>, once it reaches equilibrium,<br />

provides correlated samples from . This correlation causes the statistical error<br />

(variance) to beafactor 2 <strong>in</strong>tf larger than <strong>in</strong> <strong>in</strong>dependent sampl<strong>in</strong>g.<br />

Let us discuss these issues <strong>in</strong> turn.<br />

Initialization bias. Often the Markov cha<strong>in</strong> is started <strong>in</strong> some chosen con guration<br />

x then = x. For example, <strong>in</strong> an Is<strong>in</strong>g model, x might bethe con guration with<br />

\all sp<strong>in</strong>s up" this is sometimes called an ordered or cold start. Alternatively, the<br />

Markov cha<strong>in</strong> might bestarted <strong>in</strong> a r<strong>and</strong>om con guration chosen accord<strong>in</strong>g to some<br />

simple probability distribution . For example, <strong>in</strong> an Is<strong>in</strong>g model, we might<strong>in</strong>itialize<br />

the sp<strong>in</strong>s r<strong>and</strong>omly <strong>and</strong> <strong>in</strong>dependently, with equal probabilities of up <strong>and</strong> down this is<br />

sometimes called a r<strong>and</strong>om or hot start. In all these cases, the <strong>in</strong>itial distribution is<br />

clearly not equal to the equilibrium distribution . Therefore, the system is <strong>in</strong>itially<br />

\out of equilibrium". Theorem 1 guarantees that the system approaches equilibriumas<br />

t !1,butweneed to knowsometh<strong>in</strong>g about the rate of convergence to equilibrium.<br />

Us<strong>in</strong>g the exponential autocorrelation time exp, we can set an upper bound on the<br />

amount oftime wehave towait before equilibrium is \for all practical purposes" atta<strong>in</strong>ed.<br />

For example, if we wait a time 20 exp, then the deviation from equilibrium (<strong>in</strong><br />

the l 2 sense) will be at moste ;20 ( 2 10 ;9 )times the <strong>in</strong>itial deviation from equilibrium.<br />

There are two di culties with this bound. Firstly, it is usually impossible to<br />

apply <strong>in</strong> practice, s<strong>in</strong>ce we almost never know exp (or a rigorous upper bound forit).<br />

Secondly, even if we canapply it, it may be overly conservative <strong>in</strong>deed, there exist<br />

perfectly reasonable algorithms <strong>in</strong> which exp =+1 (see Sections 7 <strong>and</strong> 8).<br />

11


Lack<strong>in</strong>g rigorous knowledge of the autocorrelation time exp, weshould try to estimate<br />

itboth theoretically <strong>and</strong> empirically. Tomake aheuristic theoretical estimate of<br />

exp, weattempt to underst<strong>and</strong> the physical mechanism(s) caus<strong>in</strong>g slowconvergence to<br />

equilibrium but it is always possible that wehave overlooked one or more such mechanisms,<br />

<strong>and</strong> have therefore grossly underestimated exp. To make a rough empirical<br />

estimate of exp, wemeasure the autocorrelation function Cff(t) for a suitably large<br />

setofobservables f [see below] but there is always the danger that ourchosen set of<br />

observables has failed to <strong>in</strong>cludeonethat has strong enoughoverlap with the slowest<br />

mode, aga<strong>in</strong> lead<strong>in</strong>g to a gross underestimate of exp.<br />

On the other h<strong>and</strong>, the actual rate ofconvergence to equilibrium from a given <strong>in</strong>itial<br />

distribution may be much faster than the worst-case estimate given by exp. So it<br />

is usual to determ<strong>in</strong>e empirically when \equilibrium" has been achieved, by plott<strong>in</strong>g<br />

selected observables as a function of time <strong>and</strong>not<strong>in</strong>g when the <strong>in</strong>itial transient appears<br />

to end. More sophisticated statistical tests for <strong>in</strong>itialization bias can also be employed<br />

[13].<br />

In all empirical methods of determ<strong>in</strong><strong>in</strong>g when \equilibrium" has been<br />

achieved, a serious danger is the possibilityofmetastability. That is, it could appear that<br />

equilibriumhas been achieved, when <strong>in</strong> fact the system has only settled down to a longlived<br />

(metastable) region of con guration space that may be very far from equilibrium.<br />

The only sure- re protection aga<strong>in</strong>st metastabilityisaproof of an upper bound on exp<br />

(or more generally, onthe deviation from equilibriumasafunction of the elapsedtime<br />

t). The next-best protection is a conv<strong>in</strong>c<strong>in</strong>g heuristic argument that metastability is<br />

unlikely (i.e. that exp is not too large) but asmentioned before, even if one rules out<br />

several potential physical mechanisms formetastability,itisvery di cult to be certa<strong>in</strong><br />

that onehas not overlooked others. If one cannot rule out metastability ontheoretical<br />

grounds, then it is helpful at least to haveanidea of what the possible metastable regions<br />

look like then one can perform several runs with di erent <strong>in</strong>itial conditions typical of<br />

each ofthe possible metastable regions, <strong>and</strong> test whether the answers are consistent.<br />

For example, near a rst-order phase transition, most <strong>Monte</strong> <strong>Carlo</strong> methods su er from<br />

metastability associated with transitions between con gurations typical of the dist<strong>in</strong>ct<br />

pure phases. We can try <strong>in</strong>itial conditions typical of each ofthese phases (e.g. for many<br />

models, a \hot" start <strong>and</strong> a \cold" start). Consistency between these runs does not<br />

guarantee that metastability isabsent, but itdoesgive <strong>in</strong>creased con dence. Plots of<br />

observables as a function of time are also useful <strong>in</strong>dicators of possible metastability.<br />

But when all is said <strong>and</strong> done, no purely empirical estimateof from a run of length<br />

n can be guaranteed to beeven approximately correct. What we can say is that if<br />

estimated n, then either estimated or else > n.<br />

Once we know (or guess) the time needed to atta<strong>in</strong> \equilibrium", what dowedo<br />

with it? The answer is clear: we discard the data fromthe <strong>in</strong>itial transient, up to some<br />

time ndisc, <strong>and</strong> <strong>in</strong>clude onlythe subsequent data <strong>in</strong>ouraverages. In pr<strong>in</strong>ciple, this is<br />

(asymptotically) unnecessary, because the systematic errors from this <strong>in</strong>itial transient<br />

will be of order =n, while thestatistical errors will be of order ( =n) 1=2 .But <strong>in</strong> practice,<br />

the coe cientof =n <strong>in</strong> the systematic error may be fairly large, if the <strong>in</strong>itial distribution<br />

12


is very far from equilibrium. By throw<strong>in</strong>g away the data from the <strong>in</strong>itial transient, we<br />

lose noth<strong>in</strong>g, <strong>and</strong> avoid a potentially large systematic error.<br />

Autocorrelation <strong>in</strong> equilibrium. As expla<strong>in</strong>ed <strong>in</strong> the preced<strong>in</strong>g lecture, the variance<br />

of the sample mean f <strong>in</strong>adynamic <strong>Monte</strong> <strong>Carlo</strong>method is a factor 2 <strong>in</strong>tf higher than<br />

it would be <strong>in</strong> <strong>in</strong>dependent sampl<strong>in</strong>g. Otherwise put, a run oflength n conta<strong>in</strong>s only<br />

n=2 <strong>in</strong>tf \e ectively <strong>in</strong>dependent data po<strong>in</strong>ts".<br />

This has several implications for <strong>Monte</strong> <strong>Carlo</strong> work. On the oneh<strong>and</strong>, it means<br />

that the the computational e ciency of the algorithm is determ<strong>in</strong>ed pr<strong>in</strong>cipally by its<br />

autocorrelation time. More precisely, ifonewishes to compare two alternative <strong>Monte</strong><br />

<strong>Carlo</strong> algorithms for the same problem, then the better algorithm is the one that has<br />

the smaller autocorrelation time, when time is measured <strong>in</strong> units of computer (CPU)<br />

time. [In general there may arise tradeo s between \physical" autocorrelation time (i.e.<br />

measured <strong>in</strong> iterations) <strong>and</strong> computational complexity per iteration.] So accurate<br />

measurements ofthe autocorrelation time are essential to evaluat<strong>in</strong>g the computational<br />

e ciency of compet<strong>in</strong>g algorithms.<br />

On the other h<strong>and</strong>, even for a xed algorithm, knowledge of <strong>in</strong>tf is essential for<br />

determ<strong>in</strong><strong>in</strong>g run lengths | is a run of 100000 sweeps long enough? | <strong>and</strong> for sett<strong>in</strong>g<br />

error bars on estimates of hfi. Roughly speak<strong>in</strong>g, error bars will be of order ( =n) 1=2 <br />

so if we want 1% accuracy, then we need a run oflength 10000 ,<strong>and</strong>soon. Above<br />

all, there is a basic self-consistency requirement: the runlength n must be than the<br />

estimates of produced by that samerun, otherwise none of the results fromthat run<br />

should be believed. Of course, while self-consistency is a necessary condition for the<br />

trustworth<strong>in</strong>ess of <strong>Monte</strong> <strong>Carlo</strong> data, it is not a su cient condition there is always the<br />

danger of metastability.<br />

Already we can draw a conclusion about the relative importance of <strong>in</strong>itialization<br />

bias <strong>and</strong> autocorrelation as di culties <strong>in</strong> dynamic <strong>Monte</strong> <strong>Carlo</strong> work. Let us assume<br />

that the time for <strong>in</strong>itial convergence to equilibrium is comparable to (orat least not too<br />

much larger than) the equilibrium autocorrelation time <strong>in</strong>tf (for the observables f of<br />

<strong>in</strong>terest) | thisisoften but notalways the case. Then <strong>in</strong>itialization bias is a relatively<br />

trivial problem compared to autocorrelation <strong>in</strong> equilibrium. To elim<strong>in</strong>ate <strong>in</strong>itialization<br />

bias, it su ces to discard 20 of the data atthe beg<strong>in</strong>n<strong>in</strong>g ofthe run but toachieve<br />

a reasonably small statistical error, it is necessary to make arunoflength 1000 or<br />

more. So the data that must be discarded at the beg<strong>in</strong>n<strong>in</strong>g, ndisc, isanegligible fraction<br />

of the total run length n. This estimate alsoshows that the exact value of ndisc is not<br />

particularly delicate: anyth<strong>in</strong>g between 20 <strong>and</strong> n=5 will elim<strong>in</strong>ate essentially all<br />

<strong>in</strong>itialization bias while pay<strong>in</strong>g less than a 10% price <strong>in</strong> the nal error bars.<br />

In this rema<strong>in</strong>der of this lecture I would like to discuss <strong>in</strong> more detail the statistical<br />

analysis of dynamic <strong>Monte</strong> <strong>Carlo</strong> data (assumed to be already \<strong>in</strong> equilibrium"), with<br />

emphasis on how toestimate the autocorrelation time <strong>in</strong>tf <strong>and</strong> how tocompute valid<br />

error bars. What is<strong>in</strong>volved here is a branch ofmathematical statistics called timeseries<br />

analysis. An excellent exposition can be found <strong>in</strong>the books of of Priestley [14]<br />

<strong>and</strong> Anderson [15].<br />

13


Let fftg be a real-valued stationary stochastic process with mean<br />

unnormalized autocorrelation function<br />

normalized autocorrelation function<br />

<strong>and</strong> <strong>in</strong>tegrated autocorrelation time<br />

hfti (3.1)<br />

C(t) hfs fs+ti ; 2 (3.2)<br />

(t) C(t)=C(0) (3.3)<br />

<strong>in</strong>t = 1<br />

2<br />

1X<br />

t = ;1<br />

(t) : (3.4)<br />

Our goal is to estimate , C(t), (t) <strong>and</strong> <strong>in</strong>t based on a nite (but large) sample<br />

f1 ::: fn from this stochastic process.<br />

The \natural" estimator of is the sample mean<br />

f<br />

1<br />

n<br />

nX<br />

i =1<br />

This estimator is unbiased (i.e. hfi = )<strong>and</strong>has variance<br />

var(f) = 1<br />

n<br />

n;1 X<br />

t=;(n;1)<br />

fi : (3.5)<br />

(1 ; jtj<br />

) C(t) (3.6)<br />

n<br />

1<br />

n (2 <strong>in</strong>t) C(0) for n (3.7)<br />

Thus, even if we are<strong>in</strong>terested only <strong>in</strong> the static quantity ,itisnecessary to estimate<br />

the dynamic quantity <strong>in</strong>t <strong>in</strong> order to determ<strong>in</strong>e valid error bars for .<br />

The \natural" estimator of C(t) is<br />

bC(t)<br />

if the mean is known, <strong>and</strong><br />

bC(t)<br />

1<br />

n ;jtj<br />

1<br />

n ;jtj<br />

n X;jtj<br />

i =1<br />

n X;jtj<br />

i =1<br />

(fi ; )(fi + jtj ; ) (3.8)<br />

(fi ; f)(fi + jtj ; f) (3.9)<br />

if the mean is unknown. We emphasize the conceptual dist<strong>in</strong>ction between the autocorrelation<br />

function C(t), which foreach t is a number, <strong>and</strong>the estimator b C(t) or<br />

bC(t), which foreach t is a r<strong>and</strong>om variable. As will become clear, this dist<strong>in</strong>ction is<br />

14


also of practical importance. b C(t) isanunbiased estimator of C(t), <strong>and</strong> b C(t) is almost<br />

unbiased (the bias is of order 1=n) [15, p. 463]. Their variances <strong>and</strong> covariances are [15,<br />

pp. 464{471] [14, pp. 324{328]<br />

1X<br />

var( b C(t)) = 1<br />

n m=;1<br />

+ o 1<br />

n<br />

cov( b C(t) b C(u)) = 1<br />

n<br />

1X<br />

h C(m) 2 + C(m + t)C(m ; t)+ (t m m + t) i<br />

[C(m)C(m + u ; t)+C(m + u)C(m ; t)<br />

m=;1<br />

+ (t m m + u)] + o 1<br />

n<br />

where t u 0<strong>and</strong> is the connected 4-po<strong>in</strong>t autocorrelation function<br />

(3.10)<br />

(3.11)<br />

(r st) h(fi ; )(fi+r ; )(fi+s ; )(fi+t ; )i<br />

; C(r)C(t ; s) ; C(s)C(t ; r) ; C(t)C(s ; r) : (3.12)<br />

To lead<strong>in</strong>g order <strong>in</strong> 1=n, the behavior of b C is identical to that of b C.<br />

The \natural" estimator of (t) is<br />

if the mean is known, <strong>and</strong><br />

b(t) b C(t)= b C(0) (3.13)<br />

b(t)<br />

bC(t)= b C(0) (3.14)<br />

if the mean is unknown. The variances <strong>and</strong> covariances of b(t) <strong>and</strong> b(t) canbe<br />

computed (for large n) from (3.11) we omitthe detailed formulae.<br />

The \natural" estimator of <strong>in</strong>t would seem to be<br />

b <strong>in</strong>t<br />

? 1<br />

2<br />

n X;<br />

1<br />

t = ;(n ; 1)<br />

b(t) (3.15)<br />

(or the analogous th<strong>in</strong>g with b), but this is wrong! The estimator de ned <strong>in</strong> (3.15) has a<br />

variance that doesnotgoto zero as the sample size n goes to <strong>in</strong> nity [14, pp. 420{431],<br />

so it is clearly a very bad estimator of <strong>in</strong>t. Roughly speak<strong>in</strong>g, this is because the sample<br />

autocorrelations b(t) for jtj conta<strong>in</strong> much \noise" but little \signal" <strong>and</strong> there are<br />

so many ofthem (order n) that the noise adds up to atotal variance of order 1. (For a<br />

more detailed discussion, see [14, pp. 432{437].) The solution is to cuto the sum<strong>in</strong><br />

(3.15) us<strong>in</strong>g a \w<strong>in</strong>dow" (t) whichis 1forjtj < but 0 for jtj :<br />

b <strong>in</strong>t<br />

1<br />

2<br />

n X;<br />

1<br />

t = ;(n ; 1)<br />

15<br />

(t) b(t) : (3.16)


This reta<strong>in</strong>s most of the \signal" but discards most of the \noise". A good choice is the<br />

rectangular w<strong>in</strong>dow<br />

(<br />

1<br />

(t) =<br />

0<br />

if jtj M<br />

if jtj >M<br />

(3.17)<br />

where M is a suitably chosen cuto . This cuto <strong>in</strong>troduces a bias<br />

bias(b <strong>in</strong>t) =; 1<br />

2<br />

X<br />

jtj>M<br />

(t)+o 1<br />

n<br />

: (3.18)<br />

On the other h<strong>and</strong>, the variance of b<strong>in</strong>t can be computed from (3.11) after some algebra,<br />

one obta<strong>in</strong>s<br />

var(b<strong>in</strong>t) 2(2M +1)<br />

n<br />

2<br />

<strong>in</strong>t (3.19)<br />

where wehavemadetheapproximation M n. 9 Thechoice of M is thus a tradeo<br />

between bias <strong>and</strong> variance: the bias can be made small by tak<strong>in</strong>g M large enough so<br />

that (t) isnegligible for jtj >M(e.g. M =afewtimes usually su ces), while<br />

the variance is kept small by tak<strong>in</strong>g M to be no larger than necessary consistent with<br />

this constra<strong>in</strong>t. We have found the follow<strong>in</strong>g \automatic w<strong>in</strong>dow<strong>in</strong>g" algorithm [16] to<br />

be convenient: choose M to bethe smallest <strong>in</strong>teger such that M<br />

were roughly a pure exponential, then it would su ce to take c<br />

c b<strong>in</strong>t(M). If (t)<br />

4(s<strong>in</strong>cee ;4 < 2%).<br />

However, <strong>in</strong> many cases (t) is expected to have an asymptotic or pre-asymptotic decay<br />

slower than exponential, so it is usually prudent totake c at least 6, <strong>and</strong> possibly as<br />

large as 10.<br />

Wehave foundthis automatic w<strong>in</strong>dow<strong>in</strong>g procedure to work well <strong>in</strong> practice, provided<br />

that a su cient quantity ofdata isavailable (n > 1000 ). However, at present we<br />

have very little underst<strong>and</strong><strong>in</strong>g ofthe conditions under which this w<strong>in</strong>dow<strong>in</strong>g algorithm<br />

may produce biased estimates of <strong>in</strong>t or of its own error bars. Further theoretical <strong>and</strong><br />

experimental study of the w<strong>in</strong>dow<strong>in</strong>g algorithm | e.g. experiments onvarious exactlyknown<br />

stochastic processes, with various run lengths | would be highly desirable.<br />

4 Conventional <strong>Monte</strong> <strong>Carlo</strong> Algorithms for<br />

Sp<strong>in</strong> Models<br />

In this lecture we describe the constructionofdynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms for<br />

models <strong>in</strong> statistical mechanics <strong>and</strong> quantum eld theory. Recall our goal: given a<br />

probability measure on the state space S, wewishto construct a transition matrix<br />

P = fpxyg satisfy<strong>in</strong>g:<br />

9 Wehave also assumed that the only strong peaks <strong>in</strong> the Fourier transform of C(t) areat zero<br />

frequency. This assumption is valid if C(t) 0, but could fail if there are strong anticorrelations.<br />

16


(A) Irreducibility. For each pair x y 2 S, there exists ann 0 for which p (n)<br />

xy > 0.<br />

(B) Stationarity of . For each y 2 S,<br />

X<br />

x<br />

x pxy = y: (4.1)<br />

A su cient condition for (B), which isoften more convenient toverify, is:<br />

(B 0 ) Detailed balance for . For each pairx y 2 S xpxy = ypyx.<br />

Avery general method for construct<strong>in</strong>g transition matrices satisfy<strong>in</strong>g detailed balance<br />

for a given distribution was <strong>in</strong>troduced <strong>in</strong> 1953 by Metropolis et al. [17], with<br />

a slight extension two decades later by Hast<strong>in</strong>gs [18]. The idea is the follow<strong>in</strong>g: Let<br />

P (0) = fp (0)<br />

xy g be an arbitrary irreducible transition matrix on S. We callP (0) the<br />

proposal matrix weshall use it to generate proposed moves x ! y that will then be<br />

accepted or rejected with probabilities axy <strong>and</strong> 1; axy, respectively. Ifaproposed move<br />

is rejected, then we make a\null transition" x ! x. Therefore, the transition matrix<br />

P = fpxyg of the full algorithm is<br />

pxy = p (0)<br />

xy axy<br />

pxx = p (0)<br />

xx + X<br />

y6=x<br />

for x 6= y<br />

p (0)<br />

xy (1 ; axy) (4.2)<br />

where of course we must have 0 axy 1 for all x y. Itiseasyto see that P satis es<br />

detailed balance for if <strong>and</strong> only if<br />

axy<br />

ayx<br />

= y p (0)<br />

yx<br />

x p (0)<br />

xy<br />

for all pairs x 6= y. Butthis is easily arranged: just set<br />

axy = F<br />

y p (0)<br />

yx<br />

x p (0)<br />

xy<br />

where F :[0 +1] ! [0 1] is any function satisfy<strong>in</strong>g<br />

F (z)<br />

F (1=z)<br />

The choice suggested by Metropolis et al. is<br />

!<br />

(4.3)<br />

(4.4)<br />

= z for all z: (4.5)<br />

F (z) = m<strong>in</strong>(z1) (4.6)<br />

this is the maximal function satisfy<strong>in</strong>g (4.5). Another choice sometimes used is<br />

F (z) = z<br />

1+z<br />

17<br />

: (4.7)


Of course, it is still necessary to check that P is irreducible this is usually done ona<br />

case-by-case basis.<br />

Note that ifthe proposal matrix P (0) happens to already satisfy detailed balance for<br />

,then we have y p (0)<br />

yx = x p (0)<br />

xy =1,sothat axy =1(ifweusethe Metropolis choice of<br />

F )<strong>and</strong> P = P (0) .Onthe other h<strong>and</strong>, no matter what P (0) is, we obta<strong>in</strong> a matrix P that<br />

satis es detailed balance for . So the Metropolis-Hast<strong>in</strong>gs procedure can be thought<br />

of as a prescription for m<strong>in</strong>imally modify<strong>in</strong>g agiven transition matrix P (0) so that it<br />

satis es detailed balance for .<br />

Many textbooks <strong>and</strong> articles describe the Metropolis-Hast<strong>in</strong>gs procedure only <strong>in</strong> the<br />

special case <strong>in</strong> which the proposal matrix P (0) is symmetric, namely p (0)<br />

xy = p (0)<br />

yx .Inthis<br />

case (4.4) reduces to<br />

In statistical mechanics we have<br />

<strong>and</strong> hence<br />

x =<br />

axy = F<br />

y<br />

x<br />

Ex<br />

e;<br />

e ; Ey<br />

P<br />

y<br />

y<br />

x<br />

= e; Ex<br />

: (4.8)<br />

Z<br />

(4.9)<br />

= e ; (Ey;Ex) : (4.10)<br />

Note that the partition function Z has disappeared from this expression thisiscrucial,<br />

as Z is almost never explicitly computable! Us<strong>in</strong>g the Metropolis acceptance probability<br />

F (z) = m<strong>in</strong>(z1), we obta<strong>in</strong> the follow<strong>in</strong>g rules for acceptance or rejection:<br />

If E Ey ; Ex 0, then we accept the proposal always (i.e. with probability<br />

1).<br />

If E>0, then we accept the proposal with probability e ; E (< 1). That is,<br />

we choose a r<strong>and</strong>om number r uniformly distributed on [0 1], <strong>and</strong> we accept the<br />

proposal if r e ; E .<br />

But there is noth<strong>in</strong>g special about P (0) be<strong>in</strong>g symmetric any proposal matrix P (0) is<br />

perfectly legitimate, <strong>and</strong>the Metropolis-Hast<strong>in</strong>gs procedure is de ned quite generally<br />

by (4.4).<br />

Let us emphasize once more that the Metropolis-Hast<strong>in</strong>gs procedure is a general<br />

technique it produces an <strong>in</strong> nite family of di erent algorithms depend<strong>in</strong>g onthe choice<br />

of the proposal matrix P (0) .Intheliterature the term \Metropolis algorithm" is often<br />

used to denote the algorithm result<strong>in</strong>g fromsomeparticular commonly-used choice of<br />

P (0) ,butit is important nottobemisled. ToseetheMetropolis-Hast<strong>in</strong>gs procedure <strong>in</strong> action, let us consider a typical statisticalmechanical<br />

model, the Is<strong>in</strong>g model: Oneachsiteiof some nited-dimensional lattice,<br />

we place a r<strong>and</strong>om variable i tak<strong>in</strong>g the values 1. The Hamiltonian is<br />

H( ) =; X<br />

18<br />

hiji<br />

i j (4.11)


where the sumruns over all nearest-neighbor pairs. The correspond<strong>in</strong>g Gibbsmeasure<br />

is<br />

( ) = Z ;1 exp[; H( )] (4.12)<br />

where Z is a normalization constant (the partition function). Two di erent proposal<br />

matrices are <strong>in</strong> common use:<br />

1) S<strong>in</strong>gle-sp<strong>in</strong>- ip (Glauber) dynamics. Fix some site i. Theproposal is to ip i,<br />

hence<br />

Here P (0)<br />

i<br />

where<br />

p (0)<br />

i (f g!f 0 g) = 1 if 0 i = ; i <strong>and</strong> 0 j = j for all j 6= i<br />

0 otherwise<br />

is symmetric, so the acceptance probability is<br />

(4.13)<br />

ai(f g!f 0 g) = m<strong>in</strong>(e ; E 1) (4.14)<br />

E E(f 0 g) ; E(f g) = 2 i<br />

X<br />

j n:n: of i<br />

j : (4.15)<br />

So E is easily computed by compar<strong>in</strong>gthe status of i <strong>and</strong> itsneighbors.<br />

This de nes a transition matrix Pi <strong>in</strong> which onlythe sp<strong>in</strong>atsite i is touched. The<br />

full \s<strong>in</strong>gle-sp<strong>in</strong>- ip Metropolis algorithm" <strong>in</strong>volves sweep<strong>in</strong>g through the entire lattice<br />

<strong>in</strong> either a r<strong>and</strong>om or periodic fashion, i.e. either<br />

or<br />

P = 1<br />

V<br />

X<br />

i<br />

Pi (r<strong>and</strong>om site updat<strong>in</strong>g) (4.16)<br />

P = Pi1Pi2 Pi V (sequential site updat<strong>in</strong>g) (4.17)<br />

(here V is the volume). In the former case, the transition matrix P satis es detailed<br />

balance for . In the latter case, P does not <strong>in</strong> general satisfy detailed balance for ,<br />

but itdoessatisfy stationarity for ,whichisallthat reallymatters.<br />

It is easy to checkthat P is irreducible,except<strong>in</strong>the caseofsequential siteupdat<strong>in</strong>g<br />

with =0. 10<br />

2) Pair-<strong>in</strong>terchange (Kawasaki) dynamics. Fix a pair of sites i j. The proposal is<br />

to <strong>in</strong>terchange i <strong>and</strong> j, hence<br />

p (0)<br />

(ij) (f g!f 0 g) = 1 if 0 i = j, 0 j = i <strong>and</strong> 0 k = k for all k 6= i j<br />

0 otherwise<br />

(4.18)<br />

10 Note Added 1996: This statementiswrong!!! For the Metropolis acceptance probability F (z) =<br />

m<strong>in</strong>(z1), one may construct examples of nonergodicity for sequential site updat<strong>in</strong>g at any [104, 105]:<br />

the idea is to construct con gurations for which every proposed update (work<strong>in</strong>g sequentially) has<br />

E =0<strong>and</strong>isthus accepted. Of course, the ergodicity can be restored by tak<strong>in</strong>g any F (z) < 1, such<br />

as the choice F (z) =z=(1 + z).<br />

19


Therestofthe formulae are analogous to those for s<strong>in</strong>gle-sp<strong>in</strong>- ip dynamics. Theoverall<br />

algorithm is aga<strong>in</strong> constructed by a r<strong>and</strong>om or periodic sweep over a suitablesetofpairs<br />

i j, usually taken to benearest-neighbor pairs. It should be noted that this algorithm is<br />

not irreducible, as it conserves the total magnetization M = P i i. But it is irreducible<br />

on subspacesof xedM (except for sequential updat<strong>in</strong>g with =0).<br />

Avery di erent approach to construct<strong>in</strong>g transition matrices satisfy<strong>in</strong>g detailed balance<br />

for is the heat-bath method. This is best illustrated <strong>in</strong> a speci c example. Consider<br />

aga<strong>in</strong> the Is<strong>in</strong>gmodel (4.12), <strong>and</strong> focusonas<strong>in</strong>gle site i. Then the conditional<br />

probability distribution of i, given all the other sp<strong>in</strong>s f jgj6=i, is<br />

P ( i jf jgj6=i) = const (f jgj6=i) exp<br />

2<br />

X<br />

4 i<br />

j n:n: of i<br />

j<br />

3<br />

5 : (4.19)<br />

(Note that this conditional distribution is precisely that ofas<strong>in</strong>gle Is<strong>in</strong>g sp<strong>in</strong> i <strong>in</strong> an<br />

\e ective magnetic eld" produced by the xedneighbor<strong>in</strong>g sp<strong>in</strong>s j.) The heat-bath<br />

algorithm updates i by choos<strong>in</strong>g anew sp<strong>in</strong> value 0 i, <strong>in</strong>dependent of the old value i,<br />

from the conditional distribution (4.19) all the other sp<strong>in</strong>s f jgj6=i rema<strong>in</strong> unchanged. 11<br />

As <strong>in</strong> theMetropolis algorithm, this operation is carried out over thewhole lattice, either<br />

r<strong>and</strong>omly or sequentially.<br />

Analogous algorithms can be developed for more complicated models, e.g. P (')<br />

models, -models <strong>and</strong> lattice gauge theories. In each case, we focus on a s<strong>in</strong>gle eld<br />

variable (hold<strong>in</strong>g allthe other variables xed), <strong>and</strong> giveitanew value, <strong>in</strong>dependent<br />

of the old value, chosen from the appropriate conditional distribution. Of course, the<br />

feasibility ofthis algorithm depends on our ability to construct an e cient subrout<strong>in</strong>e<br />

for generat<strong>in</strong>g the required s<strong>in</strong>gle-site (ors<strong>in</strong>gle-l<strong>in</strong>k) r<strong>and</strong>om variables. But even if this<br />

algorithm is not always the moste cientone<strong>in</strong>practice, it serves as a clear st<strong>and</strong>ard<br />

of comparison, which isuseful<strong>in</strong>the development ofmoresophisticated algorithms.<br />

Amoregeneral version of the heat-bath idea is called partial resampl<strong>in</strong>g: here we<br />

focus on a set of variables rather than only one, <strong>and</strong> the new values need not be <strong>in</strong>dependent<br />

ofthe old values. That is, we divide the variablesofthe system, call them<br />

f'g, <strong>in</strong>to two subsets, call them f g <strong>and</strong> f g. For xed values of the f g variables,<br />

<strong>in</strong>duces a conditional probability distribution of f g given f g, call it P (f gjf g).<br />

Then any algorithm for updat<strong>in</strong>g f g with f g xed that leaves <strong>in</strong>variant all of the<br />

distributions P ( jf g) will also leave <strong>in</strong>variant .One possibility istousean<strong>in</strong>dependent<br />

resampl<strong>in</strong>g of f g: wethrow away the old values f g, <strong>and</strong>take f 0 g to bea<br />

new r<strong>and</strong>om variable chosen from the probability distribution P ( jf g), <strong>in</strong>dependent<br />

of the old values. Independent resampl<strong>in</strong>g might also called block heat-bath updat<strong>in</strong>g.<br />

On the other h<strong>and</strong>, if f g is a large set of variables, <strong>in</strong>dependentresampl<strong>in</strong>g is probably<br />

11 The alert reader will note that <strong>in</strong> the Is<strong>in</strong>g case the heat-bath algorithm is equivalent tothe s<strong>in</strong>glesp<strong>in</strong>-<br />

ip Metropolis algorithm with the choice F (z) = z=(1 + z) of acceptance function. But this<br />

correspondence does not hold for more complicated models.<br />

20


unfeasible, but we are free to useany updat<strong>in</strong>g that leaves <strong>in</strong>variant the appropriate<br />

conditional distributions. Of course, <strong>in</strong> this generality \partial resampl<strong>in</strong>g" <strong>in</strong>cludes all<br />

dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms | we could just take f g to bethe entire system<br />

|butitis<strong>in</strong>many cases conceptually useful to focus on some subset of variables.<br />

The partial-resampl<strong>in</strong>g idea will be at the heart of the multi-grid <strong>Monte</strong> <strong>Carlo</strong> method<br />

(Section 5) <strong>and</strong> the embedd<strong>in</strong>g algorithms (Section 6).<br />

We have nowde nedarather large class of dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms: the<br />

s<strong>in</strong>gle-sp<strong>in</strong>- ip Metropolis algorithm, the s<strong>in</strong>gle-site heat-bath algorithm, <strong>and</strong> so on.<br />

How well do these algorithms perform? Away from phase transitions, they perform<br />

rather well. However, nearaphase transition, the autocorrelation time grows rapidly.<br />

In particular, near a critical po<strong>in</strong>t (second-order phase transition), the autocorrelation<br />

time typically diverges as<br />

m<strong>in</strong>(L ) z (4.20)<br />

where L is the l<strong>in</strong>ear size of the system, is the correlation length ofan<strong>in</strong> nite-volume<br />

system atthe sametemperature, <strong>and</strong> z is a dynamic critical exponent. This phenomenon<br />

is called critical slow<strong>in</strong>g-down itseverely hampers the study of critical phenomena by<br />

<strong>Monte</strong> <strong>Carlo</strong> methods. Most of therema<strong>in</strong>der of these lectures will be devoted, therefore,<br />

to describ<strong>in</strong>g recent progress <strong>in</strong> <strong>in</strong>vent<strong>in</strong>g new <strong>Monte</strong> <strong>Carlo</strong> algorithms with radically<br />

reduced critical slow<strong>in</strong>g-down.<br />

The critical slow<strong>in</strong>g-down of the conventional algorithms arises fundamentally from<br />

the factthat their updates are local: <strong>in</strong>as<strong>in</strong>gle step of the algorithm, \<strong>in</strong>formation" is<br />

transmitted from a given siteonlytoitsnearest neighbors. Crudely onemightguessthat<br />

this \<strong>in</strong>formation" executes a r<strong>and</strong>om walk around thelattice. In order for the system to<br />

evolve to an \essentially new" con guration, the \<strong>in</strong>formation" has to travel a distance<br />

of order ,the (static) correlation length. One would guess, therefore, that 2 near<br />

criticality, i.e. that the dynamic critical exponent z equals 2. This guess is correct for<br />

the Gaussian model (free eld). 12 For other models, we have asituation analogous to<br />

theory of static critical phenomena: thedynamic critical exponentisanontrivial number<br />

that characterizes a rather large class of algorithms (a so-called \dynamic universality<br />

class"). In any case, for most models of <strong>in</strong>terest, the dynamic critical exponent for<br />

local algorithms is close to 2(usuallysomewhat higher) [21]. Accurate measurements<br />

of dynamic critical exponents are,however, very di cult |even more di cult than<br />

measurements ofstatic critical exponents |<strong>and</strong> require enormous quantities of <strong>Monte</strong><br />

<strong>Carlo</strong> data: run lengths of 10000 ,when is itself gett<strong>in</strong>g large!<br />

We can now make a rough estimate ofthe computer time needed to study the Is<strong>in</strong>g<br />

model near its critical po<strong>in</strong>t, or quantum chromodynamics near the cont<strong>in</strong>uum limit.<br />

Eachsweepofthelattice takes a timeoforder L d ,where d is thespatial (or space-\time")<br />

dimensionality ofthe model. And weneed 2 sweeps <strong>in</strong> order to get one \e ectively<br />

12 Indeed, for the Gaussian model this r<strong>and</strong>om-walk picture can be made rigorous: see [19] comb<strong>in</strong>ed<br />

with [20, Section 8].<br />

21


<strong>in</strong>dependent" sample. So this means a computer timeoforder L d z > d+z . 13 For highprecision<br />

statistics one mightwant 10 6 \<strong>in</strong>dependent" samples. The reader is <strong>in</strong>vited<br />

to plug <strong>in</strong> = 100, d = 3 (or d =4ifyou're an elementary-particle physicist) <strong>and</strong> get<br />

depressed. It should be emphasized that the factor d is <strong>in</strong>herent <strong>in</strong>all <strong>Monte</strong> <strong>Carlo</strong><br />

algorithms for sp<strong>in</strong> models <strong>and</strong> eldtheories (but not for self-avoid<strong>in</strong>gwalks, see Section<br />

7). The factor z could, however, conceivably be reduced or elim<strong>in</strong>ated by a more clever<br />

algorithm.<br />

What istobedone? Our knowledge of the physics of critical slow<strong>in</strong>g-down tells us<br />

that the slowmodes are the long-wavelength modes, if the updat<strong>in</strong>g is purely local. The<br />

natural solution is therefore to speedupthose modes by some sort of collective-mode<br />

(nonlocal) updat<strong>in</strong>g. It is necessary,then, to identify physically theappropriatecollective<br />

modes, <strong>and</strong> todevise an e cient computational algorithm for speed<strong>in</strong>g upthose modes.<br />

These two goals are unfortunately <strong>in</strong> con ict it is very di cult todevise collectivemode<br />

algorithms that are not so nonlocal that their computational cost outweighs the<br />

reduction <strong>in</strong> critical slow<strong>in</strong>g-down. Speci c implementations of the collective-mode idea<br />

are thus highly model-dependent. At least three such algorithms have been<strong>in</strong>vented so<br />

far:<br />

Fourier acceleration [24]<br />

Multi-grid <strong>Monte</strong> <strong>Carlo</strong> (MGMC) [25, 20, 26]<br />

The Swendsen-Wang algorithm [27] <strong>and</strong> itsgeneralizations<br />

Fourier acceleration <strong>and</strong> MGMC are very similar <strong>in</strong> spirit (though quite di erent technically).<br />

Their performance is thus probably qualitatively similar, <strong>in</strong> the sensethat they<br />

probably work well for the same models <strong>and</strong> work badlyforthe samemodels. In the<br />

next lecture we givean<strong>in</strong>troduction to the MGMCmethod <strong>in</strong> the follow<strong>in</strong>g lecture we<br />

discuss algorithms of Swendsen-Wang type.<br />

13 Clearly one must take L > <strong>in</strong> order to avoid severe nite-size e ects. Typically one approaches<br />

the critical po<strong>in</strong>twith L c ,where c 2;4, <strong>and</strong> then uses nite-size scal<strong>in</strong>g [22, 23] to extrapolate to<br />

the <strong>in</strong> nite-volume limit. Note Added 1996: Recently, radical advances have beenmade <strong>in</strong>apply<strong>in</strong>g<br />

nite-size scal<strong>in</strong>g to<strong>Monte</strong> <strong>Carlo</strong> simulations (see [97, 98]<strong>and</strong> especially [99, 100, 101,102,103])the<br />

preced<strong>in</strong>g two sentences can now beseentobefartoo pessimistic. For reliable extrapolation to the<br />

<strong>in</strong> nite-volume limit, L <strong>and</strong> must both be 1, but the ratio L= can <strong>in</strong> some cases be as small<br />

as 10 ;3 or even smaller (depend<strong>in</strong>g onthe model <strong>and</strong> onthe quality ofthe data). However, reliable<br />

extrapolation requires careful attention to systematic errors aris<strong>in</strong>g from correction-to-scal<strong>in</strong>g terms.<br />

In practice, reliable <strong>in</strong> nite-volume values (with bothstatistical <strong>and</strong> systematic errors of order a few<br />

percent) can be obta<strong>in</strong>ed at 10 5 from lattices of size L 10 2 ,at least <strong>in</strong> somemodels [101,102, 103].<br />

22


5 Multi-Grid Algorithms<br />

Thephenomenon of critical slow<strong>in</strong>g-down is not con ned to<strong>Monte</strong><strong>Carlo</strong> simulations:<br />

very similar di culties were encountered long agobynumerical analysts concerned<br />

with the numerical solution of partial di erential equations. An <strong>in</strong>genious solution,<br />

now called the multi-grid (MG) method, was proposed <strong>in</strong> 1964 by the Soviet numerical<br />

analyst Fedorenko [28]: the idea is to consider, <strong>in</strong> addition to the orig<strong>in</strong>al (\ ne-grid")<br />

problem, a sequence of auxiliary \coarse-grid" problems that approximate the behavior<br />

of the orig<strong>in</strong>al problem for excitations at successively longer length scales (a sort of<br />

\coarse-gra<strong>in</strong><strong>in</strong>g" procedure). The localupdates of the traditional algorithms are then<br />

supplemented by coarse-grid updates. To apresent-day physicist, this philosophy is<br />

remarkably rem<strong>in</strong>iscentoftherenormalization group | so it is all the more remarkable<br />

that itwas <strong>in</strong>vented two years before the work ofKadano [29] <strong>and</strong> seven years before<br />

the work ofWilson[30]!After a decade of dormancy, multi-grid was revived <strong>in</strong> the mid-<br />

1970's [31], <strong>and</strong> was shown to beanextremely e cient computational method. In the<br />

1980's, multi-grid methods have becomeanactive area of research<strong>in</strong>numerical analysis,<br />

<strong>and</strong> have been applied to awidevariety of problems <strong>in</strong> classical physics [32, 33]. Very<br />

recently [25] it was shown how astochastic generalization of the multi-grid method |<br />

multi-grid <strong>Monte</strong> <strong>Carlo</strong> (MGMC) | can be applied to problems <strong>in</strong> statistical, <strong>and</strong>hence<br />

also Euclidean quantum, physics.<br />

In this lecture we beg<strong>in</strong> by giv<strong>in</strong>g a brief <strong>in</strong>troduction to the determ<strong>in</strong>isticmulti-grid<br />

method we then expla<strong>in</strong> the stochastic analogue. 14 But itisworth <strong>in</strong>dicat<strong>in</strong>g nowthe<br />

basic idea beh<strong>in</strong>d this generalization. There is a strong analogy between solv<strong>in</strong>g lattice<br />

systems of equations (such asthediscrete Laplace equation) <strong>and</strong> mak<strong>in</strong>g <strong>Monte</strong> <strong>Carlo</strong><br />

simulations of lattice r<strong>and</strong>om elds. Indeed, given a Hamiltonian H('), thedeterm<strong>in</strong>istic<br />

problem is that ofm<strong>in</strong>imiz<strong>in</strong>g H('), while the stochastic problem is that ofgenerat<strong>in</strong>g<br />

r<strong>and</strong>om samples from the Boltzmann-Gibbs probability distribution e ; H(') . The<br />

statistical-mechanical problem reduces to the determ<strong>in</strong>istic one<strong>in</strong>thezero-temperature limit ! +1.<br />

Many (butnot all) of the determ<strong>in</strong>istic iterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g H(')<br />

can be generalized to stochastic iterative algorithms | that is, dynamic <strong>Monte</strong> <strong>Carlo</strong><br />

methods | for generat<strong>in</strong>g r<strong>and</strong>om samples from e ; H(') .For example, the Gauss-Seidel<br />

; H<br />

algorithm for m<strong>in</strong>imiz<strong>in</strong>g H <strong>and</strong>theheat-bath algorithm for r<strong>and</strong>om sampl<strong>in</strong>g from e<br />

are very closely related. Both algorithms sweep successively through thelattice, work<strong>in</strong>g<br />

on one site x at atime. The Gauss-Seidel algorithm updates 'x so as to m<strong>in</strong>imize the<br />

Hamiltonian H(') =H('x f'ygy6=x) when all the other elds f'ygy6=x are held xed at<br />

their current values. The heat-bath algorithm gives 'x anew r<strong>and</strong>om value chosen from<br />

the probability distribution exp[;H('x f'ygy6=x)], with allthe elds f'ygy6=x aga<strong>in</strong><br />

held xed. As ! +1 the heat-bath algorithm approaches Gauss-Seidel. A similar<br />

correspondence holds between MG <strong>and</strong> MGMC.<br />

14 For an excellent <strong>in</strong>troduction to thedeterm<strong>in</strong>istic multi-grid method, see Briggs [34] more advanced<br />

topics are covered <strong>in</strong> the book of Hackbusch [32]. Both MG<strong>and</strong> MGMC are discussed <strong>in</strong> detail <strong>in</strong> [20].<br />

23


Before enter<strong>in</strong>g <strong>in</strong>to details, let us emphasize that although the multi-grid method<br />

<strong>and</strong> the block-sp<strong>in</strong> renormalization group (RG) are based on very similar philosophies<br />

|deal<strong>in</strong>g with as<strong>in</strong>gle length scale at atime |they are <strong>in</strong> fact very di erent. In<br />

particular, the conditional coarse-grid Hamiltonian employed <strong>in</strong> the MGMCmethod is<br />

not the sameasthe renormalized Hamiltonian given by ablock-sp<strong>in</strong> RG transformation.<br />

The RG transformation computes the marg<strong>in</strong>al, notthe conditional, distribution of the<br />

block means | that is, it <strong>in</strong>tegrates over the complementary degrees of freedom, while<br />

the MGMC method xes these degrees of freedom at their current (r<strong>and</strong>om) values. The<br />

conditional Hamiltonian employed <strong>in</strong> MGMC is given by anexplicit nite expression,<br />

while themarg<strong>in</strong>al (RG) Hamiltonian cannot be computed <strong>in</strong> closed form. The failure to<br />

appreciatethese dist<strong>in</strong>ctions has unfortunately led to much confusion <strong>in</strong> the literature. 15<br />

5.1 Determ<strong>in</strong>istic Multi-Grid<br />

In this section we giveapedagogical <strong>in</strong>troduction to multi-grid methods<strong>in</strong>the simplest<br />

case, namely the solution of determ<strong>in</strong>isticl<strong>in</strong>ear or nonl<strong>in</strong>ear systems of equations.<br />

Consider, for purposes of exposition, the lattice Poisson equation ; ' = f <strong>in</strong> a<br />

region Zd with zero Dirichlet data. Thus, the equation is<br />

(; ')x 2d'x ; X<br />

'x0 = fx (5.1)<br />

x 0 : jx;x 0j=1<br />

for x 2 , with 'x 0forx =2 . This problem is equivalent to m<strong>in</strong>imiz<strong>in</strong>g the<br />

quadratic Hamiltonian<br />

H(') = 1<br />

2<br />

X<br />

('x ; 'y)<br />

hxyi<br />

2 ; X<br />

x<br />

More generally, wemay wish to solve al<strong>in</strong>ear system<br />

fx'x : (5.2)<br />

A' = f (5.3)<br />

where for simplicity weshall assume A to besymmetric <strong>and</strong> positive-de nite. This<br />

problem is equivalent to m<strong>in</strong>imiz<strong>in</strong>g<br />

H(') = 1<br />

h' A'i;hf'i : (5.4)<br />

2<br />

Later we shall consider also non-quadratic Hamiltonians.<br />

Our goal is todevise a rapidly convergentiterativemethod for solv<strong>in</strong>gnumerically the<br />

l<strong>in</strong>ear system (5.3). We shall restrict attention to rst-order stationary l<strong>in</strong>ear iterations<br />

of the general form<br />

' (n+1) = M' (n) + Nf (5.5)<br />

15 For further discussion, see [20, Section 10.1].<br />

24


where ' (0) is an arbitrary <strong>in</strong>itial guess for the solution. Obviously, wemust dem<strong>and</strong> at<br />

the very least that the truesolution ' A ;1 f be a xed po<strong>in</strong>t of (5.5) impos<strong>in</strong>g this<br />

condition for all f, we conclude that<br />

N = (I ; M)A ;1 : (5.6)<br />

The iteration (5.5) is thus completely speci ed by its iteration matrix M. Moreover,<br />

(5.5)-(5.6) imply that the errore (n) ' (n) ; ' satis es<br />

e (n+1) = Me (n) : (5.7)<br />

That is, the iteration matrix is the ampli cation matrix for the error. It follows easily<br />

that the iteration (5.5) is convergent for all <strong>in</strong>itial vectors ' (0) if <strong>and</strong> only if the spectral<br />

radius (M) limn!1 jjM n jj 1=n is < 1 <strong>and</strong> <strong>in</strong>this case the convergence is exponential<br />

with asymptotic rate at least (M), i.e.<br />

jj' (n) ; 'jj Kn p (M) n<br />

(5.8)<br />

for some K p < 1 (K depends on ' (0) ).<br />

Now let us return to the speci c system (5.1). One simple iterative algorithm arises<br />

by solv<strong>in</strong>g (5.1) repeatedly for 'x:<br />

' (n+1)<br />

x = 1<br />

2d<br />

2<br />

4 X<br />

x 0 : jx;x 0j=1<br />

' (n)<br />

x<br />

3<br />

0 + fx5<br />

: (5.9)<br />

(5.9) is called the Jacobi iteration. Itisconvenienttoconsider also a slightgeneralization<br />

of (5.9): let 0


2<br />

L<br />

where p1p2 = ::: L+1 L+1 L+1 . The spectral radius of MDJ! is the eigenvalue of<br />

largest magnitude, namely<br />

(MDJ!) = L+1 L+1 = 1 ; ! 1 ; cos L +1<br />

= 1 ; O(L ;2 ) : (5.13)<br />

It follows that O(L2 )iterations are needed for the damped Jacobi iteration to converge<br />

adequately. This represents an enormous computational labor when L is large.<br />

It is easy to see what isgo<strong>in</strong>gonhere: the slow modes ( p 1) are the longwavelength<br />

modes (p1p2 1). [If ! 1, then some modeswith wavenumber p =<br />

(p1p2) ( )have eigenvalue p ;1<strong>and</strong>so also are slow. This phenomenon can<br />

be avoided by tak<strong>in</strong>g ! signi cantly less than 1 for simplicity weshall henceforth take<br />

! = 1<br />

2 ,whichmakes p 0 for all p.] It is also easy to see physically why the longwavelength<br />

modesare slow. The key fact is that the (damped) Jacobi iteration is local:<br />

<strong>in</strong> a s<strong>in</strong>gle step of the algorithm, \<strong>in</strong>formation" is transmitted only to nearest neighbors.<br />

One mightguessthat this \<strong>in</strong>formation" executes a r<strong>and</strong>om walk around the lattice <strong>and</strong><br />

for the true solution to bereached, \<strong>in</strong>formation" must propagate fromthe boundaries<br />

to the <strong>in</strong>terior (<strong>and</strong> back<strong>and</strong>forth until \equilibrium" isatta<strong>in</strong>ed). This takesatime<br />

of order L2 , <strong>in</strong> agreement with (5.13). In fact, this r<strong>and</strong>om-walk picture can be made<br />

rigorous [19].<br />

This is an example of a critical phenomenon, <strong>in</strong> precisely thesamesense thattheterm<br />

is used <strong>in</strong> statistical mechanics. The Laplace operator A = ; iscritical, <strong>in</strong>asmuch<br />

as its Green function A ;1 has long-range correlations (power-law decay <strong>in</strong> dimension<br />

d>2, or growth <strong>in</strong>d 2). This means that the solution of Poisson's equation <strong>in</strong> one<br />

region of the lattice depends strongly on the solution <strong>in</strong> distant regions of the lattice<br />

\<strong>in</strong>formation" must propagate globally <strong>in</strong> order for \equilibrium" tobereached. Put<br />

another way, excitations at many length scales are signi cant, from one lattice spac<strong>in</strong>g<br />

at the smallest to the entire lattice at the largest. The situation would be very di erent<br />

if we were to consider <strong>in</strong>stead the Helmholtz-Yukawa equation (; +m2 )' = f with<br />

m>0: its Green function has exponential decay with characteristic length m ;1 ,so<br />

that regions of the lattice separated by distances m ;1 are essentially decoupled. In<br />

this case, \<strong>in</strong>formation" need only propagate adistance of order m<strong>in</strong>(m ;1L) <strong>in</strong> order<br />

for \equilibrium" to bereached. This takes a time oforder m<strong>in</strong>(m ;2L2 ), an estimate<br />

which can be con rmed rigorously by comput<strong>in</strong>g the obvious generalization of (5.12){<br />

(5.13). On the other h<strong>and</strong>, as m ! 0we recover the Laplace operator with itsattendant<br />

di culties: m =0isacritical po<strong>in</strong>t. Wehave here an example of critical slow<strong>in</strong>g-down<br />

<strong>in</strong> classical physics.<br />

The general structure of a remedy should now beobvious to physicists reared on<br />

the renormalization group: don't try to deal with all length scalesatonce, but de ne<br />

<strong>in</strong>stead a sequence of problems <strong>in</strong> which eachlength scale, beg<strong>in</strong>n<strong>in</strong>g withthe smallest<br />

<strong>and</strong> work<strong>in</strong>g towards the largest, can be dealt withseparately. Analgorithm of precisely<br />

thisformwas proposed <strong>in</strong> 1964 by the Soviet numerical analyst Fedorenko [28],<strong>and</strong>is<br />

now called the multi-grid method.<br />

26


Note rst that the only slow modes <strong>in</strong> the damped Jacobi iteration are the longwavelengthmodes<br />

(provided that ! is not near 1): as long as, say, max(p1p2) 2 ,we<br />

3<br />

1<br />

have 0 p 4 (for ! = 2 ), <strong>in</strong>dependent ofL. Itfollows that the short-wavelength<br />

components oftheerror e (n) ' (n) ; ' can be e ectively killed by a few (say, ve<br />

or ten) damped Jacobi iterations. The rema<strong>in</strong><strong>in</strong>g error has primarily long-wavelength<br />

components, <strong>and</strong> so is slowly vary<strong>in</strong>g <strong>in</strong>x-space. But a slowly vary<strong>in</strong>g function can<br />

be well represented on a coarser grid: if, for example, we were told e (n)<br />

x only at even<br />

values of x, we could nevertheless reconstruct with highaccuracythe function e (n)<br />

x at all<br />

x by, say, l<strong>in</strong>ear <strong>in</strong>terpolation. This suggests an improved algorithm for solv<strong>in</strong>g (5.1):<br />

perform a few damped Jacobi iterations on the orig<strong>in</strong>al grid, until the (unknown) error<br />

is smooth <strong>in</strong>x-space then set up an auxiliary coarse-grid problem whose solution will<br />

be approximately this error (this problem will turn out tobeaPoisson equation on the<br />

coarser grid) perform a few damped Jacobi iterations on the coarser grid <strong>and</strong> then<br />

transfer (<strong>in</strong>terpolate) the resultbacktothe orig<strong>in</strong>al ( ne) grid <strong>and</strong> add it <strong>in</strong> to the<br />

current approximate solution.<br />

There are two advantages to perform<strong>in</strong>g the damped Jacobi iterations on the coarse<br />

grid. Firstly, the iterations take lesswork, because there are fewer lattice po<strong>in</strong>ts onthe<br />

coarse grid (2 ;d times as many for a factor-of-2 coarsen<strong>in</strong>g <strong>in</strong>d dimensions). Secondly,<br />

with respect to the coarse grid the long-wavelength modes no longer have such long<br />

wavelength: their wavelength has been halved (i.e. their wavenumber has been doubled).<br />

This suggests that those modes with, say, max(p1p2) 4 can be e ectively killed<br />

by a few damped Jacobi iterations on the coarse grid. And then we can transfer the<br />

rema<strong>in</strong><strong>in</strong>g (smooth) error to ayet coarser grid, <strong>and</strong> so on recursively. Theseare the<br />

essential ideas of the multi-grid method.<br />

Let us now give aprecisede nition of the multi-grid algorithm. For simplicity<br />

we shall restrict attention to problems de ned <strong>in</strong> variational form16 : thus, the goal is<br />

to m<strong>in</strong>imize a real-valued function (\Hamiltonian") H('), where ' runs over some<br />

N-dimensional real vector space U. Weshall treat quadratic <strong>and</strong> non-quadratic Hamiltonians<br />

on an equal foot<strong>in</strong>g. In order to specify the algorithm we must specify the<br />

follow<strong>in</strong>g <strong>in</strong>gredients:<br />

1) A sequence of coarse-grid spaces UM U UM;1 UM;2 ::: U0. Here dim Ul<br />

Nl <strong>and</strong> N = NM >NM;1 >NM;2 > >N0.<br />

2) Prolongation (or \<strong>in</strong>terpolation") operators pll;1: Ul;1 ! Ul for 1 l M.<br />

3) Basic (or \smooth<strong>in</strong>g") iterations Sl: Ul Hl ! Ul for 0 l M. HereHl is<br />

a space of \possible Hamiltonians" de ned on Ul we discuss this <strong>in</strong> more detail<br />

below. Therole of Sl is to takeanapproximate m<strong>in</strong>imizer ' 0 l of the Hamiltonian Hl<br />

<strong>and</strong>computeanew (hopefully better) approximate m<strong>in</strong>imizer ' 00<br />

l = Sl(' 0 l Hl). [For<br />

16 In fact, the multi-grid method can be applied to the solution of l<strong>in</strong>ear or nonl<strong>in</strong>ear systems of<br />

equations, whether or not these equations come from a variational pr<strong>in</strong>ciple. See, for example, [32] <strong>and</strong><br />

[20, Section 2].<br />

27


the present wecanimag<strong>in</strong>e that Sl consists of a few iterations of damped Jacobi<br />

for the Hamiltonian Hl.] Most generally, weshall use two smooth<strong>in</strong>g iterations,<br />

they may be the same, but need not be.<br />

S pre<br />

l<br />

<strong>and</strong> S post<br />

l<br />

4) Cycle control parameters (<strong>in</strong>tegers) l 1 for 1 l M, whichcontrol the<br />

number of times that the coarse grids are visited.<br />

We discuss these <strong>in</strong>gredients <strong>in</strong> more detail below.<br />

The multi-grid algorithm is then de ned recursively as follows:<br />

procedure mgm(l ' Hl)<br />

comment This algorithm takes an approximate m<strong>in</strong>imizer ' of the Hamiltonian<br />

Hl, <strong>and</strong>overwrites it with abetter approximate m<strong>in</strong>imizer.<br />

' S pre<br />

l (' Hl)<br />

if l>0 then<br />

endif<br />

compute Hl;1( ) Hl(' + pll;1 )<br />

0<br />

for j =1 until l do mgm(l ; 1 Hl;1)<br />

' ' + pll;1<br />

' Spost l (' Hl)<br />

end<br />

Here is what is go<strong>in</strong>g on: We wishtosolve the m<strong>in</strong>imize the Hamiltonian Hl, <strong>and</strong> are<br />

given as <strong>in</strong>put anapproximate m<strong>in</strong>imizer '. The algorithm consists ofthree steps:<br />

1) Pre-smooth<strong>in</strong>g. We apply the basiciteration (e.g. a few sweeps of damped Jacobi)<br />

to the given approximate m<strong>in</strong>imizer. This produces a better approximate m<strong>in</strong>imizer <strong>in</strong><br />

whichthe high-frequency (short-wavelength) componentsofthe error havebeenreduced<br />

signi cantly. Therefore, the error, although still large, is smooth <strong>in</strong> x-space (whence the<br />

name \smooth<strong>in</strong>g iteration").<br />

2) Coarse-grid correction. We want tomove rapidly towards the m<strong>in</strong>imizer ' of Hl,<br />

us<strong>in</strong>g coarse-grid updates. Because of the pre-smooth<strong>in</strong>g, the error ' ; ' is a smooth<br />

function <strong>in</strong> x-space, so it should be well approximated by elds <strong>in</strong> the range of the<br />

prolongation operator pll;1. Wewilltherefore carry out a coarse-grid update <strong>in</strong> which<br />

' is replaced by ' + pll;1 , where lies <strong>in</strong> the coarse-grid subspace Ul;1. A sensible<br />

goal is to attempt to choose so as to m<strong>in</strong>imize Hl that is, we attempt to m<strong>in</strong>imize<br />

Hl;1( ) Hl(' + pll;1 ) : (5.14)<br />

To carry out this approximate m<strong>in</strong>imization, we useafew( l) iterations of the best<br />

algorithm we know|namely, multi-grid itself! And westart at the bestapproximate<br />

28


m<strong>in</strong>imizer we know, namely =0! Thegoalofthis coarse-grid correction step is to<br />

reduce signi cantly the low-frequency components ofthe error <strong>in</strong> ' (hopefully without<br />

creat<strong>in</strong>g large new high-frequency error components).<br />

3) Post-smooth<strong>in</strong>g. We apply, for good measure, a few more sweeps of the basic<br />

smoother. (This would protect aga<strong>in</strong>st any high-frequency error components which<br />

may <strong>in</strong>advertently have been created by the coarse-grid correction step.)<br />

The forego<strong>in</strong>g constitutes, of course, a s<strong>in</strong>gle step of the multi-grid algorithm. In<br />

practice this step would be repeated several times, as <strong>in</strong> any other iteration, until the<br />

error has been reduced to an acceptably small value. The advantage of multi-grid over<br />

the traditional (e.g. damped Jacobi) iterative methods is that, with a suitable choice<br />

of the <strong>in</strong>gredients pll;1, Sl <strong>and</strong> so on, only a few (maybe ve orten) iterations are<br />

needed to reduce the errortoasmall value, <strong>in</strong>dependent of the lattice sizeL. This<br />

contrasts favorably with the behavior (5.13) of the damped Jacobi method, <strong>in</strong> which<br />

O(L 2 )iterations are needed.<br />

The multi-grid algorithm is thus a general framework the userhas considerable<br />

freedom <strong>in</strong> choos<strong>in</strong>g the speci c <strong>in</strong>gredients, which must be adapted to the speci c<br />

problem. We now discuss brie y each ofthese <strong>in</strong>gredients more details can be found<br />

<strong>in</strong> Chapter 3 of the book of Hackbusch [32].<br />

Coarse grids. Most commonly one usesauniform factor-of-2 coarsen<strong>in</strong>g between<br />

each grid l <strong>and</strong> the next coarser grid l;1. The coarse-grid po<strong>in</strong>ts could be either a<br />

subset of the ne-grid po<strong>in</strong>ts (Fig. 1) or a subset of the dual lattice (Fig. 2). These<br />

schemes have obvious generalizations to higher-dimensional cubic lattices. In dimension<br />

d =2,another possibility isauniform factor-of- p 2 coarsen<strong>in</strong>g (Fig. 3) note that the<br />

coarse grid is aga<strong>in</strong> a square lattice, rotated by 45. Figs. 1{3 are often referred to as<br />

\st<strong>and</strong>ard coarsen<strong>in</strong>g", \staggered coarsen<strong>in</strong>g", <strong>and</strong> \red-black (orcheckerboard) coarsen<strong>in</strong>g",<br />

respectively. Coarsen<strong>in</strong>gs by a larger factor (e.g. 3) could also be considered,<br />

but are generally disadvantageous. Note that eachofthe above schemes works also for<br />

periodic boundary conditions provided that the l<strong>in</strong>ear size Ll of the grid l is even. For<br />

this reason it is most convenient totake the l<strong>in</strong>ear size L LM of the orig<strong>in</strong>al ( nest)<br />

grid M to beapower of 2, or at least a power of 2 times a small <strong>in</strong>teger. Other<br />

de nitions of coarse grids (e.g. anisotropic coarsen<strong>in</strong>g) are occasionally appropriate.<br />

Prolongation operators. For a coarse grid as <strong>in</strong> Fig. 2, a natural choice of prolongation<br />

operator is piecewise-constant <strong>in</strong>jection:<br />

(pll;1'l;1) x1<br />

1<br />

2 x2<br />

1<br />

2<br />

= ('l;1)x1x2 for all x 2 l;1 (5.15)<br />

(illustrated here for d = 2). It can be represented <strong>in</strong> an obvious shorth<strong>and</strong> notation by<br />

the stencil "<br />

1<br />

1<br />

#<br />

1<br />

:<br />

1<br />

(5.16)<br />

For a coarse grid as <strong>in</strong> Fig. 1, a natural choice is piecewise-l<strong>in</strong>ear <strong>in</strong>terpolation, one<br />

29


Figure 1: St<strong>and</strong>ard coarsen<strong>in</strong>g (factor-of-2) <strong>in</strong> dimension d = 2. Dots are ne-grid sites<br />

<strong>and</strong> crosses are coarse-grid sites.<br />

Figure 2: Staggered coarsen<strong>in</strong>g (factor-of-2) <strong>in</strong> dimension d =2.<br />

30


Figure 3: Red-black coarsen<strong>in</strong>g (factor-of- p 2) <strong>in</strong> dimension d =2.<br />

example of which isthe n<strong>in</strong>e-po<strong>in</strong>t prolongation<br />

2<br />

6<br />

4<br />

1<br />

4<br />

1<br />

2<br />

1<br />

4<br />

1 1<br />

2 1 2<br />

1<br />

4<br />

1<br />

2<br />

1<br />

4<br />

3<br />

7<br />

5<br />

: (5.17)<br />

Higher-order <strong>in</strong>terpolations (e.g. quadratic or cubic) can also be considered. All these<br />

prolongation operators can easily be generalized to higher dimensions.<br />

We have ignored here some important subtleties concern<strong>in</strong>g the treatment ofthe<br />

boundaries <strong>in</strong> de n<strong>in</strong>g the prolongation operator. Fortunately we shall not have to<br />

worry much about this problem, s<strong>in</strong>ce most applications <strong>in</strong> statistical mechanics <strong>and</strong><br />

quantum eld theory use periodic boundary conditions.<br />

Coarse-grid Hamiltonians. What does the coarse-grid Hamiltonian Hl;1 look like?<br />

If the ne-grid Hamiltonian Hl is quadratic,<br />

Hl(') = 1<br />

2 h' Al'i;hfl'i (5.18)<br />

then so is the coarse-grid Hamiltonian Hl;1:<br />

where<br />

Hl;1( ) Hl(' + pll;1 ) (5.19)<br />

= 1<br />

2 h Al;1 i;hd i +const (5.20)<br />

Al;1 p ll;1 Al pll;1 (5.21)<br />

d p ll;1(f ; Al') (5.22)<br />

31


The coarse-grid problemisthus also a l<strong>in</strong>ear equation whose right-h<strong>and</strong> side is just<br />

the \coarse-gra<strong>in</strong><strong>in</strong>g" of the residual r f ; Al' this coarse-gra<strong>in</strong><strong>in</strong>g isperformed<br />

us<strong>in</strong>g the adjo<strong>in</strong>t ofthe <strong>in</strong>terpolation operator pll;1. The exact form of the coarsegrid<br />

operator Al;1 depends on the ne-grid operator Al <strong>and</strong> onthe choice of <strong>in</strong>terpolation<br />

operator pll;1. For example, if Al is the nearest-neighbor Laplacian <strong>and</strong> pll;1 is<br />

piecewise-constant <strong>in</strong>jection, then it is easily checked that Al;1 is also a nearest-neighbor<br />

Laplacian (multiplied by an extra 2 d;1 ). On the other h<strong>and</strong>, if pll;1 is piecewise-l<strong>in</strong>ear<br />

<strong>in</strong>terpolation, then Al;1 will havenearest-neighbor <strong>and</strong> next-nearest-neighbor terms (but<br />

noth<strong>in</strong>g worse than that).<br />

Clearly, the po<strong>in</strong>t istochoose classes of Hamiltonians Hl with the property that if<br />

Hl 2Hl <strong>and</strong> ' 2 Ul, then the coarse-grid Hamiltonian Hl;1 de ned by (5.14) necessarily<br />

lies <strong>in</strong> Hl;1. Inparticular, it is convenient(though not <strong>in</strong> pr<strong>in</strong>ciple necessary) to choose<br />

all the Hamiltonians to have the same \functional form" this functional form must be<br />

one which isstable under the coarsen<strong>in</strong>g operation (5.14). For example, suppose that<br />

the Hamiltonian Hl is a ' 4 theory with nearest-neighbor gradient term <strong>and</strong> possibly<br />

site-dependent coe cients:<br />

where<br />

Hl(') = 2<br />

X<br />

jx;x 0j=1<br />

('x ; 'x 0)2 + X<br />

x<br />

Vx('x) (5.23)<br />

Vx('x) = ' 4 x + x' 3 x + Ax' 2 x + hx'x: (5.24)<br />

Suppose, further, that the prolongation operator pll;1 is piecewise-constant <strong>in</strong>jection<br />

(5.15). Then the coarse-grid Hamiltonian Hl;1( ) Hl(' + pll;1 ) can easily be<br />

computed: it is<br />

where<br />

<strong>and</strong><br />

Hl;1( ) =<br />

2<br />

0<br />

X<br />

jy;y 0j=1<br />

( y ; y 0)2 + X<br />

y<br />

V 0<br />

y( y)+const (5.25)<br />

V 0<br />

y( y) = 0 4 y + 0 y 3 y + A 0 y 2 y + h 0 y y (5.26)<br />

0<br />

=<br />

d;1<br />

2 (5.27)<br />

0<br />

0<br />

y<br />

=<br />

=<br />

d<br />

2 (5.28)<br />

X<br />

A<br />

(4 'x + x)<br />

x2By<br />

(5.29)<br />

0 y =<br />

X<br />

(6 '<br />

x2By<br />

2 x +3 h<br />

x'x + Ax) (5.30)<br />

0 y =<br />

X<br />

(4 ' 3 x +3 x' 2 x +2Ax'x + hx) (5.31)<br />

x2By<br />

Here By is the block consist<strong>in</strong>g ofthose 2 d sites of grid l which are a ected by <strong>in</strong>terpolation<br />

from the coarse-grid site y 2 l;1 (see Figure 2). Note that the coarse-grid<br />

32


Hamiltonian Hl;1 has the same functional form as the \ ne-grid" Hamiltonian Hl: itis<br />

speci ed by the coe cients 0 , 0 , 0 y, A 0 y <strong>and</strong> h 0 y. Thestep \compute Hl;1" therefore<br />

means to compute these coe cients. Note alsothe importance of allow<strong>in</strong>g <strong>in</strong> (5.23)<br />

for ' 3 <strong>and</strong> ' terms <strong>and</strong> for site-dependent coe cients: even if these are not present <strong>in</strong><br />

the orig<strong>in</strong>al Hamiltonian H HM, they will be generated on coarser grids. F<strong>in</strong>ally,<br />

we emphasize that the coarse-grid Hamiltonian Hl;1 depends implicitly on the current<br />

value of the ne-lattice eld ' 2 Ul although our notation suppresses this dependence,<br />

it should be kept <strong>in</strong> m<strong>in</strong>d.<br />

Basic (smooth<strong>in</strong>g) iterations. We have already discussed the damped Jacobi iteration<br />

as one possible smoother. Note that <strong>in</strong>this method only the \old" values ' (n) are<br />

used on the right-h<strong>and</strong> side of (5.9)/(5.10), even though for someofthe terms the \new"<br />

value ' (n+1) may already have been computed. An alternative algorithm is to useat<br />

each stage on the right-h<strong>and</strong> sidethe \newest" available value. This algorithm is called<br />

the Gauss-Seidel iteration. 17 Note that the Gauss-Seidel algorithm, unlike the Jacobi<br />

algorithm, depends on the order<strong>in</strong>g of the grid po<strong>in</strong>ts. For example, if a 2-dimensional<br />

grid is swept <strong>in</strong> lexicographic order (1 1), (2 1), :::,(L 1), (1 2), (2 2), :::,(L 2), :::,<br />

(1L), (2L), :::,(L L), then the Gauss-Seidel iteration becomes<br />

' (n+1)<br />

x1x2<br />

1<br />

=<br />

4 ['(n) x1+1x2 + '(n+1) x1;1x2 + '(n) x1x2+1 + ' (n+1)<br />

x1x2;1 + fx1x2]: (5.32)<br />

Another convenient order<strong>in</strong>g isthe red-black (or checkerboard) order<strong>in</strong>g, <strong>in</strong>whichthe<br />

\red" sublattice r = fx 2 : x1 + + xd is eveng is swept rst, followed by the<br />

\black" sublattice b = fx 2 : x1 + + xd is oddg. Notethat the order<strong>in</strong>g ofthe<br />

grid po<strong>in</strong>ts with<strong>in</strong> each sublattice is irrelevant [forthe usual nearest-neighbor Laplacian<br />

(5.1)], s<strong>in</strong>ce the matrix A does not couple sites of the same color. This means that redblack<br />

Gauss-Seidel is particularly well suited to vector or parallel computation. Note<br />

that the red-black order<strong>in</strong>g makes sense with periodic boundary conditions only if the<br />

l<strong>in</strong>ear size Ll of the grid l is even.<br />

It turns out that Gauss-Seidel is a better smoother than damped Jacobi (even if the<br />

latter is given its optimal !). Moreover, Gauss-Seidel is easier to program <strong>and</strong> requires<br />

only half the storage space (no need for separate storage of \old" <strong>and</strong> \new" values).<br />

The only reason we <strong>in</strong>troduced damped Jacobi at allisthat itiseasiertounderst<strong>and</strong><br />

<strong>and</strong> toanalyze.<br />

Many other smooth<strong>in</strong>g iterations can be considered, <strong>and</strong> can be advantageous <strong>in</strong><br />

anisotropic or otherwise s<strong>in</strong>gular problems [32, Section 3.3 <strong>and</strong> Chapters 10{11]. But<br />

we shall stick toord<strong>in</strong>ary Gauss-Seidel, usually with red-black order<strong>in</strong>g.<br />

will consist, respectively,ofm1 <strong>and</strong> m2 iterations of the Gauss-<br />

Thus, S pre<br />

l<br />

<strong>and</strong> S post<br />

l<br />

Seidel algorithm. The balance between pre-smooth<strong>in</strong>g <strong>and</strong> post-smooth<strong>in</strong>g is usually<br />

not very crucial only the total m1 + m2 seems to matter much. Indeed, one (butnot<br />

17 It is amus<strong>in</strong>g to note that \Gauss did not use a cyclic order of relaxation, <strong>and</strong> :::Seidel speci cally<br />

recommended aga<strong>in</strong>st us<strong>in</strong>g it" [36, p.44n]. See also [37].<br />

33


oth!) of m1 or m2 could be zero, i.e. either the pre-smooth<strong>in</strong>g orthe post-smooth<strong>in</strong>g<br />

could be omitted entirely. Increas<strong>in</strong>g m1 <strong>and</strong> m2 improves the convergence rate ofthe<br />

multi-grid iteration, but atthe expense of <strong>in</strong>creased computational labor per iteration.<br />

The optimal tradeo seems to beachieved <strong>in</strong> most cases with m1 + m2 between about 2<br />

<strong>and</strong> 4.The coarsest grid 0 is a special case: it usually has so few grid po<strong>in</strong>ts (perhaps<br />

only one!) that S0 can be an exact solver.<br />

The variational po<strong>in</strong>t ofviewgives special <strong>in</strong>sight <strong>in</strong>to the Gauss-Seidel algorithm,<br />

<strong>and</strong> shows how togeneralize it to non-quadratic Hamiltonians. When updat<strong>in</strong>g site x,<br />

the new value ' 0 x is chosen so as to m<strong>in</strong>imize the Hamiltonian H(') =H('x f'ygy6=x)<br />

when all the other elds f'ygy6=x are held xed at their current values. The natural generalization<br />

of Gauss-Seidel to non-quadratic Hamiltonians is to adopt this variational<br />

de nition: ' 0 x should be the absolute m<strong>in</strong>imizer of H(') =H('x f'ygy6=x). [If the<br />

absolute m<strong>in</strong>imizer is non-unique, then one such m<strong>in</strong>imizer ischosen by some arbitrary<br />

rule.] This algorithm is called nonl<strong>in</strong>ear Gauss-Seidel with exact m<strong>in</strong>imization (NL-<br />

GSEM) [38]. This de nition of the algorithm presupposes, of course, that itisfeasible<br />

to carry out the requisite exact one-dimensional m<strong>in</strong>imizations. For example, for a ' 4<br />

theory it would be necessary to compute the absolute m<strong>in</strong>imumofaquartic polynomial<br />

<strong>in</strong> one variable. In practice these one-dimensional m<strong>in</strong>imizations might themselves be<br />

carried out iteratively, e.g.by some variant of Newton's method.<br />

Cycle control parameters <strong>and</strong> computational labor. Usually the parameters l are all<br />

taken to be equal, i.e. l = 1for1 l M. Then one iteration of the multi-grid<br />

algorithm at level M comprises one visittogridM, visits togridM ; 1, 2 visits<br />

to gridM ; 2, <strong>and</strong> soon. Thus, determ<strong>in</strong>es the degree of emphasis placed on the<br />

coarse-grid updates. ( =0would correspond tothe pureGauss-Seidel iteration on the<br />

nest grid alone.)<br />

We can now estimatethe computational labor required for one iteration of the multigrid<br />

algorithm. Each visit to agiven grid <strong>in</strong>volves m1 +m2 Gauss-Seidel sweeps on that<br />

grid, plus some computation of the coarse-grid Hamiltonian <strong>and</strong> the prolongation. The<br />

work <strong>in</strong>volved is proportional to the number of lattice po<strong>in</strong>ts onthat grid. Let Wl be<br />

the work required for these operations on grid l. Then, for grids de ned by afactor-of-2<br />

coarsen<strong>in</strong>g <strong>in</strong>d dimensions, we have<br />

so that the total work for one multi-grid iteration is<br />

work(MG) =<br />

Wl 2 ;d(M;l) WM (5.33)<br />

0X<br />

l=M<br />

M;l Wl<br />

0X<br />

WM ( 2<br />

l=M<br />

;d ) M;l<br />

WM(1 ; 2 ;d ) ;1<br />

if


(plus a little auxiliary computation) on the nestgridalone|irrespective of the total<br />

number of levels. The most common choices are = 1 (which is called the V-cycle) <strong>and</strong><br />

=2(the W-cycle).<br />

Convergence proofs. For certa<strong>in</strong> classes of Hamiltonians H |primarily quadratic<br />

ones|<strong>and</strong>suitable choices of the coarse grids, prolongations, smooth<strong>in</strong>g iterations<br />

<strong>and</strong> cycle control parameters, it can be proven rigorously 18 that the multi-grid iteration<br />

matrices Ml satisfy a uniform bound<br />

jjMljj C < 1 (5.35)<br />

valid irrespective of the total number of levels. Thus, a xed number of multi-grid<br />

iterations (maybe ve orten) are su cient toreduce the error to asmall value, <strong>in</strong>dependent<br />

of the lattice sizeL. Inother words, critical slow<strong>in</strong>g-down has been completely<br />

elim<strong>in</strong>ated.<br />

The rigorous convergence proofs are somewhat arcane, so we cannot describe them<br />

here <strong>in</strong> anydetail, but certa<strong>in</strong> general features are worth not<strong>in</strong>g. The convergence proofs<br />

are most straightforward when l<strong>in</strong>ear or higher-order <strong>in</strong>terpolation <strong>and</strong> restriction are<br />

used, <strong>and</strong> >1 (e.g. the W-cycle). When either low-order <strong>in</strong>terpolation (e.g. piecewiseconstant)<br />

or =1(the V-cycle) is used, the convergence proofs become much more<br />

delicate. Indeed, if both piecewise-constant <strong>in</strong>terpolation <strong>and</strong> a V-cycle are used, then<br />

the uniform bound (5.35) has not yet been proven, <strong>and</strong> itismostlikely false! To some<br />

extent these features may be artifacts ofthe current methods of proof, but we suspect<br />

that they do also re ect real properties of the multi-grid method, <strong>and</strong> sotheconvergence<br />

proofs may serve asguidance for practice. For example, <strong>in</strong> our work wehave used<br />

piecewise-constant<strong>in</strong>terpolation (so as to preserve the simple nearest-neighbor coupl<strong>in</strong>g<br />

on the coarse grids), <strong>and</strong> thus for safety westick tothe W-cycle. Thereis<strong>in</strong>any case<br />

much room for further research, both theoretical <strong>and</strong> experimental.<br />

To recapitulate, the extraord<strong>in</strong>ary e ciency of the multi-grid method arises from the<br />

comb<strong>in</strong>ation of two key features:<br />

1) The convergence estimate (5.35). This means that onlyO(1) iterations are needed,<br />

<strong>in</strong>dependent ofthe lattice size L.<br />

2) The work estimate (5.34). This means that eachiteration requires only a computational<br />

labor of order L d (the ne-grid lattice volume).<br />

It follows that the complete solution of the m<strong>in</strong>imization problem, to any speci ed<br />

accuracy ", requires a computational labor of order L d .<br />

Unigrid po<strong>in</strong>t of view. Let us look aga<strong>in</strong> at the multi-grid algorithm from the variational<br />

st<strong>and</strong>po<strong>in</strong>t. One natural class of iterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g H are the<br />

18 For a detailed exposition of multi-grid convergence proofs, see [32, Chapters 6{8, 10, 11], [40] <strong>and</strong><br />

the references cited there<strong>in</strong>. The additional work needed to h<strong>and</strong>le the piecewise-constant <strong>in</strong>terpolation<br />

can be found <strong>in</strong>[41].<br />

35


so-called directional methods: let p0p1::: be a sequence of \direction vectors" <strong>in</strong> U,<br />

<strong>and</strong> de ne ' (n+1) tobethat vector of the form' (n) + pn which m<strong>in</strong>imizes H. The<br />

algorithm thus travels \downhill" from ' (n) along the l<strong>in</strong>e ' (n) + pn until reach<strong>in</strong>g the<br />

m<strong>in</strong>imumofH, then switches to direction pn+1 start<strong>in</strong>g fromthis new po<strong>in</strong>t ' (n+1) ,<strong>and</strong><br />

so on. For a suitable choice of the direction vectors p0p1:::,this method can be proven<br />

to converge to the global m<strong>in</strong>imum ofH [38, pp. 513{520].<br />

Now, some iterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g H(') can be recognized as special<br />

cases of the directional method. For example, the Gauss-Seidel iteration is a directional<br />

method <strong>in</strong> which the direction vectors are chosen to beunit vectors e1e2:::eN (i.e.<br />

vectors which take the value 1 at as<strong>in</strong>gle grid po<strong>in</strong>t <strong>and</strong>zeroatallothers), where<br />

N = dimU. [One step of the Gauss-Seidel iteration corresponds to N steps of the<br />

directional method.] Similarly, it is not hard to see [39] that the multi-grid iteration with<br />

the variational choices of restriction <strong>and</strong> coarse-grid operators, <strong>and</strong> with Gauss-Seidel<br />

smooth<strong>in</strong>g ateach level, is itself a directional method: some ofthe direction vectors are<br />

the unit vectors e (M)<br />

1 e (M)<br />

2 ::: e (M)<br />

N M of the ne-grid space, but other direction vectors<br />

are theimages <strong>in</strong> the ne-grid space of theunit vectors of the coarse-grid spaces, i.e. they<br />

are pMl e (l)<br />

1 pMl e (l)<br />

2 ::: pMl e (l)<br />

N l .The exact order <strong>in</strong> which these direction vectors are<br />

<strong>in</strong>terleaved depends on the parameters m1, m2 <strong>and</strong> which de ne the cycl<strong>in</strong>g structure<br />

of the multi-grid algorithm. For example, if m1 =1,m2 =0<strong>and</strong> =1,the order<br />

of the direction vectors is fMg, fM ; 1g, :::, f0g, where flg denotes the sequence<br />

pMl e (l)<br />

1 pMl e (l)<br />

2 ::: pMl e (l)<br />

N l .Ifm1 =0,m2 =1<strong>and</strong> =1,the order is f0g, f1g, :::,<br />

fMg. Thereader is <strong>in</strong>vited to work outother cases.<br />

Thus, the multi-grid algorithm (for problems de ned <strong>in</strong> variational form) is a directional<br />

method <strong>in</strong> which the direction vectors <strong>in</strong>clude both\s<strong>in</strong>gle-site modes" fMg <strong>and</strong><br />

also \collectivemodes" fM ;1g fM ;2g :::f0g on all length scales. For example, if<br />

pll;1 is piecewise-constant <strong>in</strong>jection, then the direction vectors are characteristic functions<br />

B (i.e. functions which are1onthe block B <strong>and</strong> zero outside B), where the<br />

sets B are successively s<strong>in</strong>gle sites, cubes of side 2,cubes of side 4,<strong>and</strong> so on. Similarly,<br />

if pll;1 is l<strong>in</strong>ear <strong>in</strong>terpolation, then the direction vectors are triangular waves of various<br />

widths.<br />

The multi-grid algorithm has thus an alternative <strong>in</strong>terpretation as a collective-mode<br />

algorithm work<strong>in</strong>g solely <strong>in</strong> the ne-grid space U. We emphasize that this \unigrid"<br />

viewpo<strong>in</strong>t[39]ismathematically fully equivalenttothe recursivede nition given earlier.<br />

Butitgives, weth<strong>in</strong>k, an importantadditional <strong>in</strong>sight<strong>in</strong>towhatthemulti-grid algorithm<br />

is really do<strong>in</strong>g.<br />

For example, for the simple model problem (Poisson equation <strong>in</strong> a square), we know<br />

that the \correct" collective modes are s<strong>in</strong>e waves, <strong>in</strong> the sense that these modes diagonalize<br />

the Laplacian, so that <strong>in</strong>this basis the Jacobi or Gauss-Seidel algorithm would<br />

give the exact solution <strong>in</strong> a s<strong>in</strong>gle iteration (MJacobi = MGS = 0). On the other h<strong>and</strong>,<br />

the multi-grid method uses square-wave (or triangular-wave) updates, which are not<br />

exactly the \correct" collective modes. Nevertheless, the multi-grid convergence proofs<br />

[32, 40, 41] assure us that they are \close enough": the normofthe multi-grid iteration<br />

matrix Ml is bounded away from 1, uniformly <strong>in</strong> the lattice size, so that anaccurate<br />

36


solution is reached <strong>in</strong> a very few MG iterations (<strong>in</strong> particular, critical slow<strong>in</strong>g-down is<br />

completely elim<strong>in</strong>ated). This viewpo<strong>in</strong>t also expla<strong>in</strong>s why MG convergence is more delicate<br />

for piecewise-constant <strong>in</strong>terpolation than for piecewise-l<strong>in</strong>ear: the po<strong>in</strong>t isthat a<br />

s<strong>in</strong>e wave (orother slowly vary<strong>in</strong>g function) can be approximated to arbitrary accuracy<br />

(<strong>in</strong> energy norm) by piecewise-l<strong>in</strong>ear functions but not by piecewise-constant functions.<br />

We remark that McCormick <strong>and</strong>Ruge [39] have advocated the \unigrid" idea not<br />

just as an alternate po<strong>in</strong>t of view on the multi-grid algorithm, but asanalternate computational<br />

procedure. To be sure, the unigrid method is somewhat simpler to program,<br />

<strong>and</strong> this could have pedagogical advantages. But oneofthe key properties of the multigrid<br />

method, namely the O(L d ) computational labor per iteration, is sacri ced <strong>in</strong> the<br />

unigrid scheme. Instead of (5.33){(5.34) one has<br />

<strong>and</strong> hence<br />

work(UG) WM<br />

l=M<br />

Wl WM (5.36)<br />

0X<br />

M;l<br />

MWM if =1<br />

M WM if >1<br />

S<strong>in</strong>ce M log2 L <strong>and</strong> WM Ld ,weobta<strong>in</strong><br />

(<br />

d L log L if =1<br />

work(UG)<br />

Ld+log2 if >1<br />

(5.37)<br />

(5.38)<br />

For a V-cycle the additional factor of log L is perhaps not terribly harmful, but for a<br />

W-cycle the additional factor of L is a severe drawback (though not as severe as the<br />

O(L 2 )critical slow<strong>in</strong>g-down of the traditional algorithms). Thus, we donot advocate<br />

the use of unigrid as a computational method if there is a viable multi-grid alternative.<br />

Unigrid could, however, be of <strong>in</strong>terest <strong>in</strong> cases where true multi-grid is unfeasible, as<br />

may occur for non-Abelian lattice gauge theories.<br />

Multi-grid algorithms can also be devised for some models <strong>in</strong> which state space is a<br />

nonl<strong>in</strong>ear manifold, such as nonl<strong>in</strong>ear -models <strong>and</strong> lattice gauge theories [20, Sections<br />

3{5]. The simplest case is the XY model: both the ne-grid <strong>and</strong> coarse-grid eld<br />

variables are angles, <strong>and</strong>the <strong>in</strong>terpolation operator is piecewise-constant (withangles<br />

added modulo 2 ). Thus, a coarse-grid variable y speci es the angle by whichthe<br />

2 d sp<strong>in</strong>s <strong>in</strong> the block By are to besimultaneously rotated. A similar strategy can be<br />

employed for nonl<strong>in</strong>ear -models tak<strong>in</strong>g values <strong>in</strong> a group G (the so-called \pr<strong>in</strong>cipal<br />

chiral models"): the coarse-grid variable y simultaneously left-multiplies the2 d sp<strong>in</strong>s <strong>in</strong><br />

the block By. For nonl<strong>in</strong>ear -models tak<strong>in</strong>g values <strong>in</strong> a nonl<strong>in</strong>ear manifold M on which<br />

a group G acts [e.g. the n-vector model with M = Sn;1 <strong>and</strong> G = SO(n)], the coarsegrid-correction<br />

moves are still simultaneous rotation this means that while the ne-grid<br />

37


elds lie <strong>in</strong> M, the coarse-grid elds all lie <strong>in</strong> G. Similar ideas can be applied to lattice<br />

gauge theories the key requirement isto respect the geometric (parallel-transport)<br />

properties of the theory. Unfortunately, the result<strong>in</strong>g algorithms appear to be practical<br />

only <strong>in</strong> the abelian case. (In the non-abelian case, the coarse-grid Hamiltonian becomes<br />

too complicated.) Much morework needs to bedoneondevis<strong>in</strong>g good <strong>in</strong>terpolation<br />

operators for non-abelian lattice gauge theories.<br />

5.2 Multi-Grid <strong>Monte</strong> <strong>Carlo</strong><br />

Classical equilibrium statistical mechanics is a natural generalization of classical<br />

statics (for problems posed <strong>in</strong> variational form): <strong>in</strong> the latter we seekto m<strong>in</strong>imize a<br />

Hamiltonian H('), while <strong>in</strong> the former we seek to generate r<strong>and</strong>om samples from the<br />

Boltzmann-Gibbs probability distribution e ; H(') . Thestatistical-mechanical problem<br />

reduces to the determ<strong>in</strong>istic one<strong>in</strong>the zero-temperature limit ! +1.<br />

Likewise, many (but not all) of the determ<strong>in</strong>isticiterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g<br />

H(') can be generalized to stochastic iterative algorithms | that is, dynamic <strong>Monte</strong><br />

<strong>Carlo</strong> methods | for generat<strong>in</strong>g r<strong>and</strong>om samples from e ; H(') . For example, the<br />

stochastic generalization of the Gauss-Seidel algorithm (or more generally, nonl<strong>in</strong>ear<br />

Gauss-Seidel with exact m<strong>in</strong>imization) is the s<strong>in</strong>gle-site heat-bath algorithm <strong>and</strong> the<br />

stochastic generalization of multi-grid is multi-grid <strong>Monte</strong> <strong>Carlo</strong>.<br />

Let us expla<strong>in</strong> these correspondences <strong>in</strong> more detail. In the Gauss-Seidel algorithm,<br />

the gridpo<strong>in</strong>ts areswept <strong>in</strong> someorder, <strong>and</strong> ateachstage the Hamiltonian is m<strong>in</strong>imized<br />

as a function of a s<strong>in</strong>gle variable 'x, with all other variables f'ygy6=x be<strong>in</strong>g held xed.<br />

The s<strong>in</strong>gle-site heat-bath algorithm has the same general structure, but the new value<br />

' 0 x is chosen r<strong>and</strong>omly from the conditional distribution of e ; H(') given f'ygy6=x, i.e.<br />

from the one-dimensional probability distribution<br />

P (' 0 x ) d'0 x = const exp [; H('0 x f'ygy6=x)] d' 0 x (5.39)<br />

(where the normaliz<strong>in</strong>g constant depends on f'ygy6=x). It is not di cult to see that this<br />

operation leaves <strong>in</strong>variant the Gibbs distribution e ; H(') . As ! +1 it reduces to<br />

the Gauss-Seidel algorithm.<br />

It is useful to visualize geometrically the action of the Gauss-Seidel <strong>and</strong> heat-bath<br />

algorithms with<strong>in</strong> thespaceU of all possible eld con gurations. Start<strong>in</strong>g atthe current<br />

eld con guration ', the Gauss-Seidel <strong>and</strong> heat-bath algorithms propose to move the<br />

system along the l<strong>in</strong>e <strong>in</strong>U consist<strong>in</strong>g of con gurations of the form ' 0 = ' + t x<br />

(;1 < t


1) The \ bers" used bythealgorithm need not be l<strong>in</strong>es, but can be higher-dimensional<br />

l<strong>in</strong>ear or even nonl<strong>in</strong>ear manifolds.<br />

2) Thenew con guration ' 0 need not be chosen <strong>in</strong>dependently of theold con guration<br />

' (as <strong>in</strong> the heat-bath algorithm) rather, it can be selected by anyupdat<strong>in</strong>g procedure<br />

which leaves <strong>in</strong>variant the conditional probability distribution of e ; H(') restricted to<br />

the ber.<br />

The multi-grid <strong>Monte</strong> <strong>Carlo</strong> (MGMC) algorithm is a partial-resampl<strong>in</strong>g algorithm<br />

<strong>in</strong> which the \ bers" are the sets of eld con gurations that can be obta<strong>in</strong>ed one from<br />

another by a coarse-grid-correction step, i.e. the sets of elds ' + pll;1 with ' xed<br />

<strong>and</strong> vary<strong>in</strong>g over Ul;1. These bers form a family of parallel a ne subspaces <strong>in</strong> Ul,<br />

of dimension Nl;1 = dim Ul;1.<br />

The <strong>in</strong>gredients ofthe MGMC algorithm are identical to thoseofthe determ<strong>in</strong>istic<br />

MG algorithm, with one exception: the determ<strong>in</strong>istic smooth<strong>in</strong>g iteration Sl is<br />

replaced by astochastic smooth<strong>in</strong>g iteration (for example, s<strong>in</strong>gle-site heat-bath). That<br />

is, Sl( Hl) isastochastic updat<strong>in</strong>g procedure 'l ! ' 0 l that leaves <strong>in</strong>variant the Gibbs<br />

distribution e ; H l: Z d'l e ; H l(' l) PS l( H l)('l ! ' 0 l) = d' 0 l e ; H l(' 0 l ) : (5.40)<br />

The MGMC algorithm is then de ned as follows:<br />

procedure mgmc(l ' Hl)<br />

comment This algorithm updates the eld ' <strong>in</strong> such away as to leave<br />

<strong>in</strong>variant the probability distribution e ; H l.<br />

' S pre<br />

l (' Hl)<br />

if l>0 then<br />

endif<br />

compute Hl;1( ) Hl(' + pll;1 )<br />

0<br />

for j =1 until l do mgmc(l ; 1 Hl;1)<br />

' ' + pll;1<br />

' Spost l (' Hl)<br />

end<br />

The alert reader will note that this algorithm is identical to the determ<strong>in</strong>istic MG<br />

algorithm presented earlier only the mean<strong>in</strong>g ofSl is di erent.<br />

The validity ofthe MGMC algorithm is proven <strong>in</strong>ductively, start<strong>in</strong>g atlevel 0 <strong>and</strong><br />

work<strong>in</strong>g upwards. That is,ifmgmc(l ; 1 Hl;1) isastochastic updat<strong>in</strong>g procedure<br />

that leaves <strong>in</strong>variant the probability distribution e ; H l;1, then mgmc(l Hl) leaves<br />

<strong>in</strong>variant e ; H l. Note that the coarse-grid-correction step of the MGMC algorithm<br />

39


di ers from the heat-bath algorithm <strong>in</strong> that the new con guration ' 0 is not chosen<br />

<strong>in</strong>dependently of the old con guration ' todosowould be impractical, s<strong>in</strong>ce the ber<br />

has such high dimension. Rather, ' 0 (or what isequivalent, ) ischosen by avalid<br />

updat<strong>in</strong>g procedure |namely, MGMC itself!<br />

The MGMC algorithm has also an alternate <strong>in</strong>terpretation | the unigrid viewpo<strong>in</strong>t<br />

|<strong>in</strong>whichthe bers are one-dimensional <strong>and</strong> the resampl<strong>in</strong>gs are <strong>in</strong>dependent. More<br />

precisely,the bers are l<strong>in</strong>es of the form ' 0 = '+t B (;1


critical exponent z =1:4 0:3 for the MGMCalgorithm (<strong>in</strong> either V-cycle or W-cycle),<br />

compared to z =2:1 0:3 for the heat-bath algorithm. Thus, critical slow<strong>in</strong>g-down is<br />

signi cantly reduced but isstill very far from be<strong>in</strong>g elim<strong>in</strong>ated. On the other h<strong>and</strong>, below<br />

the critical temperature, is very small ( 1;2), uniformly <strong>in</strong> L <strong>and</strong> (at least for<br />

the W-cycle) critical slow<strong>in</strong>g-down appears to be completely elim<strong>in</strong>ated. This very different<br />

behavior <strong>in</strong> the two phases can be understood physically: <strong>in</strong> the low-temperature<br />

phase the ma<strong>in</strong> excitations are sp<strong>in</strong> waves, which are well h<strong>and</strong>led by MGMC (as <strong>in</strong><br />

the Gaussian model) but near the critical temperature the important excitations are<br />

widely separated vortex-antivortex pairs, which are apparently not easily created by the<br />

MGMC updates.<br />

For the O(4) nonl<strong>in</strong>ear -model <strong>in</strong> two dimensions, which is asymptotically free,<br />

prelim<strong>in</strong>ary data [42] show avery strong reduction, but not the total elim<strong>in</strong>ation, of<br />

critical slow<strong>in</strong>g-down. For a W-cycle we ndthat z 0:6 (Iemphasize that these data<br />

are very prelim<strong>in</strong>ary!). Previously, Goodman <strong>and</strong> Ihad argued heuristically [20, Section<br />

9.3] that for asymptotically free theories with acont<strong>in</strong>uous symmetry group, MGMC<br />

(with a W-cycle) should completely elim<strong>in</strong>atecritical slow<strong>in</strong>g-down except for a possible<br />

logarithm. But our reason<strong>in</strong>g may now need to be re-exam<strong>in</strong>ed! 19<br />

5.3 Stochastic L<strong>in</strong>ear Iterations for the Gaussian Model<br />

In this section we analyze an important class of Markov cha<strong>in</strong>s, the stochastic l<strong>in</strong>ear<br />

iterations for Gaussian models [20, Section 8]. 20 This class <strong>in</strong>cludes, among others, the<br />

s<strong>in</strong>gle-site heat-bath algorithm (with determ<strong>in</strong>istic sweep of the sites), the stochastic<br />

SOR algorithm [44, 45] <strong>and</strong> the multi-grid <strong>Monte</strong> <strong>Carlo</strong> algorithm | all, of course,<br />

<strong>in</strong> the Gaussian case only. We show that the behavior of the stochastic algorithm is<br />

completely determ<strong>in</strong>ed by the behavior of the correspond<strong>in</strong>g determ<strong>in</strong>isticalgorithm for<br />

solv<strong>in</strong>gl<strong>in</strong>ear equations. In particular, weshowthattheexponential autocorrelation time<br />

exp of the stochastic l<strong>in</strong>ear iteration is equal to the relaxation timeofthe correspond<strong>in</strong>g<br />

l<strong>in</strong>ear iteration.<br />

Consider any quadratic Hamiltonian<br />

H(') = 1<br />

(' A') ; (f') (5.41)<br />

2<br />

where A is a symmetric positive-de nite matrix. The correspond<strong>in</strong>g Gaussian measure<br />

1<br />

d (') = const e<br />

; 2 ('A')+(f') d' (5.42)<br />

has mean A ;1 f <strong>and</strong> covariance matrix A ;1 . Next consider any rst-order stationary<br />

l<strong>in</strong>ear stochastic iteration of the form<br />

' (n+1) = M' (n) + Nf + Q (n) (5.43)<br />

19 Note Added 1996: For work onmulti-grid <strong>Monte</strong> <strong>Carlo</strong> cover<strong>in</strong>g the period 1992{96, see [106,<br />

107, 108, 103]<strong>and</strong>the references cited there<strong>in</strong>.<br />

20 Some ofthis material appears also <strong>in</strong> [43].<br />

41


where M, N <strong>and</strong> Q are xed matrices <strong>and</strong> the (n) are <strong>in</strong>dependent Gaussian r<strong>and</strong>om<br />

vectors with mean zero <strong>and</strong> covariance matrix C. The iteration (5.43) has a unique<br />

stationary distribution if <strong>and</strong> only if the spectral radius (M) limn!1 kM n k 1=n is<br />

< 1 <strong>and</strong> <strong>in</strong>this case the stationary distributionisthe desired Gaussian measure (5.42)<br />

for all f if <strong>and</strong> only if<br />

N = (I ; M)A ;1 (5.44a)<br />

QCQ T = A ;1 ; MA ;1 M T<br />

(5.44b)<br />

(here T denotes transpose).<br />

The reader will note the close analogy with the determ<strong>in</strong>istic l<strong>in</strong>ear problem (5.3){<br />

(5.6). Indeed, (5.44a) is identical with (5.6) <strong>and</strong> ifwetake the \zero-temperature<br />

limit" <strong>in</strong> which H is replaced by H with ! +1, then the Gaussian measure (5.42)<br />

approachesadelta function concentrated at the unique m<strong>in</strong>imum ofH (namely, the<br />

solution of the l<strong>in</strong>ear equation A' = f), <strong>and</strong> the \noise" term disappears (Q ! 0), so<br />

that the stochastic iteration (5.43) turns <strong>in</strong>to the determ<strong>in</strong>istic iteration (5.5). That is:<br />

(a) The l<strong>in</strong>ear determ<strong>in</strong>istic problemisthe zero-temperature limit of the Gaussian<br />

stochastic problem <strong>and</strong> the rst-order stationary l<strong>in</strong>ear determ<strong>in</strong>istic iterationisthe<br />

zero-temperature limit of the rst-order stationary l<strong>in</strong>ear stochastic iteration. Therefore,<br />

any stochastic l<strong>in</strong>ear iteration for generat<strong>in</strong>g samples from the Gaussian measure (5.42)<br />

gives rise to adeterm<strong>in</strong>istic l<strong>in</strong>ear iteration for solv<strong>in</strong>g the l<strong>in</strong>ear equation (5.3), simply<br />

by sett<strong>in</strong>g Q =0.<br />

(b) Conversely, the stochastic problem <strong>and</strong> iteration are the nonzero-temperature<br />

generalizations of the determ<strong>in</strong>istic ones. In pr<strong>in</strong>ciple this means that adeterm<strong>in</strong>istic<br />

l<strong>in</strong>ear iteration for solv<strong>in</strong>g (5.3) can be generalized to astochastic l<strong>in</strong>ear iteration for<br />

generat<strong>in</strong>g samples from (5.42), if <strong>and</strong> onlyifthe matrix A ;1 ; MA ;1 M T is positivesemide<br />

nite: just choose a matrix Q satisfy<strong>in</strong>g (5.44b). In practice, however, such an<br />

algorithm is computationally tractable only if thematrix Q has additional nice properties<br />

such as sparsity (or triangularity with a sparse <strong>in</strong>verse).<br />

Examples. 1. S<strong>in</strong>gle-site heat bath (with determ<strong>in</strong>istic sweep of the sites) = stochastic<br />

Gauss-Seidel. On each visittosite i, 'i is replaced by anew value ' 0 i chosen <strong>in</strong>dependently<br />

from the conditional distribution of (5.42) with f'jgj6=i xed at their current<br />

values: that is,' 0 i is Gaussian with mean (fi ; P j6=i aij'j)=aii <strong>and</strong> variance 1=aii. When<br />

updat<strong>in</strong>g 'i at sweep n +1, the variables 'j with j


N = (D + L) ;1 (5.46b)<br />

Q = (D + L) ;1 D 1=2<br />

(5.46c)<br />

where D <strong>and</strong> L are the diagonal <strong>and</strong> lower-triangular parts ofthematrix A, respectively.<br />

It is straightforward to verify that (5.44a,b) are satis ed. 21 The s<strong>in</strong>gle-site heat-bath<br />

algorithm is clearly the stochastic generalization of the Gauss-Seidel algorithm.<br />

2. Stochastic SOR. For models which are Gaussian (or more generally, \multi-<br />

Gaussian"), Adler [44] <strong>and</strong> Whitmer [45] have shown that the successiveover-relaxation<br />

(SOR) iteration admits astochastic generalization, namely<br />

' (n+1)<br />

i<br />

= (1; !)' (n)<br />

i<br />

+ !a ;1<br />

ii<br />

0<br />

@fi ; X<br />

ji<br />

(n)<br />

j<br />

1<br />

A +<br />

!(2 ; !)<br />

aii<br />

! 1=2 (n)<br />

i<br />

(5.47)<br />

where 0


where is Gaussian white noise with covariance matrix C, us<strong>in</strong>gasmall time step .<br />

The resultisaniteration of the form (5.43) with<br />

M = I ; 2 CA (5.50a)<br />

N = 2 C (5.50b)<br />

Q = 1=2 I (5.50c)<br />

This satis es (5.44a) exactly, butsatis es (5.44b) only up to an error of order . If<br />

C = D ;1 ,these MN correspond toadamped Jacobi iteration with ! = =2 1.<br />

It is straightforward toanalyze thedynamic behavior of thestochastic l<strong>in</strong>ear iteration<br />

(5.43). Us<strong>in</strong>g (5.43) <strong>and</strong> (5.44) to express' (n) <strong>in</strong> terms of the <strong>in</strong>dependent r<strong>and</strong>om<br />

variables ' (0) (0) (1) ::: (n;1) ,we ndafter a bit of manipulation that<br />

h' (n) i = M n h' (0) i +(I ; M n )A ;1 f (5.51)<br />

<strong>and</strong><br />

cov(' (s) ' (t) ) = M s cov(' (0) ' (0) )(M T ) t (<br />

[A ;1 s ; M A ;1 T s T t;s (M ) ](M ) if s t<br />

+<br />

M s;t [A ;1 ; M tA ;1 (M T ) t ] if s t<br />

(5.52)<br />

Now let us either start the stochastic process <strong>in</strong> equilibrium<br />

h' (0) i = A ;1 f (5.53a)<br />

cov(' (0) ' (0) ) = A ;1 (5.53b)<br />

or else let it relax to equilibrium bytak<strong>in</strong>g s t ! +1 with s ; t xed. Either way,<br />

we conclude that <strong>in</strong> equilibrium (5.43) de nes a Gaussian stationary stochastic process<br />

with mean A ;1 f <strong>and</strong> autocovariance matrix<br />

cov(' (s) ' (t) ) = A;1 (M T ) t;s if s t<br />

M s;t A ;1 if s t<br />

(5.54)<br />

Moreover, s<strong>in</strong>ce the stochastic process is Gaussian, all higher-order time-dependent correlation<br />

functions are determ<strong>in</strong>ed <strong>in</strong> terms of the mean <strong>and</strong> autocovariance. Thus, the<br />

matrix M determ<strong>in</strong>es the autocorrelation functions of the <strong>Monte</strong> <strong>Carlo</strong> algorithm.<br />

Another way to state these relationships is to recall [46, 47] that the Hilbert space<br />

L 2 ( ) is isomorphic to the bosonic Fock spaceF(U) builtonthe \energy Hilbert space"<br />

(U A): the \n-particle states" are the homogeneous Wick polynomials of degree n <strong>in</strong><br />

the shifted eld e' = ' ; A ;1 f. (If U is one-dimensional, these are just the Hermite<br />

polynomials.) Then the transition probability P (' (n) ! ' (n+1) )<strong>in</strong>duces on the Fock<br />

space an operator<br />

P = ;(M T ) I M T<br />

44<br />

(M T M T ) (5.55)


that isthe second quantization of the operator M T on the energy Hilbert space (see [20,<br />

Section 8] for details). It follows from (5.55) that<br />

<strong>and</strong> hence that<br />

k;(M) n j n 1 ? kL 2 ( ) = kM n k(UA) (5.56)<br />

(;(M) j n 1 ? ) = (M) : (5.57)<br />

Moreover, P is self-adjo<strong>in</strong>t onL 2 ( ) [i.e. satis es detailed balance] if <strong>and</strong> onlyifM is<br />

self-adjo<strong>in</strong>t with respect to the energy <strong>in</strong>ner product, i.e.<br />

<strong>and</strong> <strong>in</strong>this case<br />

MA = AM T (5.58)<br />

(;(M) j n 1 ? ) =k;(M) j n 1 ? k L 2 ( ) = (M) =kMk(UA) : (5.59)<br />

In summary, wehave shown that the dynamic behavior of any stochastic l<strong>in</strong>ear<br />

iteration is completely determ<strong>in</strong>ed by the behavior of the correspond<strong>in</strong>g determ<strong>in</strong>istic<br />

l<strong>in</strong>ear iteration. In particular, the exponential autocorrelation time exp (slowest decay<br />

rate ofanyautocorrelation function) is given by<br />

exp(;1= exp) = (M) (5.60)<br />

<strong>and</strong> this decay rate isachieved by atleastone observable which is l<strong>in</strong>ear <strong>in</strong> the eld<br />

'. In other words, the (worst-case) convergence rate ofthe <strong>Monte</strong> <strong>Carlo</strong> algorithm is<br />

precisely equal to the (worst-case) convergence rate ofthe correspond<strong>in</strong>g determ<strong>in</strong>istic<br />

iteration.<br />

In particular, for Gaussian MGMC, the convergence proofs for determ<strong>in</strong>istic multigrid<br />

[32, 40, 41] comb<strong>in</strong>ed with the arguments ofthe present section prove rigorously<br />

that critical slow<strong>in</strong>g-down is completely elim<strong>in</strong>ated (at least for a W-cycle). That is,<br />

the autocorrelation time of the MGMCmethod is bounded as criticalityisapproached<br />

(empirically 1 ; 2).<br />

6 Swendsen-Wang Algorithms<br />

Avery di erent type of collective-mode algorithm was proposed two years ago 22 by<br />

Swendsen <strong>and</strong> Wang [27] for Potts sp<strong>in</strong>models. S<strong>in</strong>ce then, there has been an explosion<br />

of work try<strong>in</strong>gtounderst<strong>and</strong> whythis algorithm works so well (<strong>and</strong>why it does not work<br />

22 Note Added 1996: Now nearly ten years ago! The great <strong>in</strong>terest <strong>in</strong> algorithms of Swendsen-<br />

Wang type (frequently called cluster algorithms) has not abated: see the references cited <strong>in</strong> the \Notes<br />

Added" below, plus many others. See also [109, 110] for reviews that are slightly more up-to-date than<br />

the present notes (1990 rather than 1989).<br />

45


even better), <strong>and</strong> try<strong>in</strong>g to improveorgeneralize it. The basic idea beh<strong>in</strong>d all algorithms<br />

of Swendsen-Wang type is to augment the given model by means of auxiliary variables,<br />

<strong>and</strong> then to simulate this augmented model. In this lecture we describe the Swendsen-<br />

Wang (SW) algorithm <strong>and</strong> review some ofthe proposed variants <strong>and</strong> generalizations.<br />

Let us rst recall that the q-state Potts model [48, 49] is a generalization of the<br />

Is<strong>in</strong>g model <strong>in</strong> which eachsp<strong>in</strong> i can take q dist<strong>in</strong>ct values rather than just two ( i =<br />

1 2:::q) here q is an <strong>in</strong>teger 2. Neighbor<strong>in</strong>g sp<strong>in</strong>s prefer to be<strong>in</strong>the samestate,<br />

<strong>and</strong> pay an energy price if they are not. The Hamiltonian is therefore<br />

H( ) = X<br />

Jij (1 ; i j) (6.1)<br />

with Jij 0 for all i j (\ferromagnetism"), <strong>and</strong> the partition function is<br />

Z = X<br />

exp[;H( )]<br />

=<br />

f g<br />

X<br />

2<br />

exp4<br />

f g<br />

X<br />

=<br />

Jij (<br />

hiji<br />

3<br />

i ; 1) 5<br />

j X Y<br />

f g<br />

hiji<br />

[(1 ; pij) +pij i j] (6.2)<br />

hiji<br />

where we have de ned pij =1; exp(;Jij). The Gibbsmeasure P otts( ) is, of course,<br />

We nowusethe deep identity<br />

2<br />

4 X<br />

P otts( ) = Z ;1 exp Jij ( i ; 1)<br />

j<br />

hiji<br />

Y ;1 = Z [(1 ; pij) +pij i j] (6.3)<br />

hiji<br />

a + b =<br />

1X<br />

3<br />

5<br />

[a n0 + b n1] (6.4)<br />

n=0<br />

on each bondhiji that is,we<strong>in</strong>troduce on each bondhiji an auxiliary variable nij<br />

tak<strong>in</strong>g the values 0 <strong>and</strong> 1,<strong>and</strong>obta<strong>in</strong><br />

Z = X X Y<br />

[(1 ; pij) nij0<br />

f g fng hiji<br />

+ pij nij1 i j] : (6.5)<br />

Let us now take seriously the fng as dynamical variables: we canth<strong>in</strong>k of nij as an<br />

occupation variable for the bond hiji (1 = occupied, 0 = empty). We therefore de ne<br />

the Fortu<strong>in</strong>-Kasteleyn-Swendsen-Wang (FKSW) model to be a jo<strong>in</strong>t model hav<strong>in</strong>g qstate<br />

Potts sp<strong>in</strong>s i at the sites <strong>and</strong> occupation variables nij on the bonds, with jo<strong>in</strong>t<br />

probability distribution<br />

FKSW( n) = Z<br />

;1 Y<br />

hiji<br />

[(1 ; pij) nij0 + pij nij1 i j] : (6.6)<br />

46


F<strong>in</strong>ally, let us see what happens if we sumover the f g at xed fng. Each occupied<br />

bond hiji imposes a constra<strong>in</strong>t that the sp<strong>in</strong>s i <strong>and</strong> j mustbe<strong>in</strong>the same state,<br />

but otherwise the sp<strong>in</strong>s are unconstra<strong>in</strong>ed. We therefore group the sites <strong>in</strong>to connected<br />

clusters (two sites are <strong>in</strong> the same cluster if they can be jo<strong>in</strong>ed by apath of occupied<br />

bonds) then all the sp<strong>in</strong>s with<strong>in</strong> a cluster mustbe<strong>in</strong>the samestate (all q values are<br />

equally probable), <strong>and</strong> dist<strong>in</strong>ct clusters are <strong>in</strong>dependent. It follows that<br />

Z = X<br />

fng<br />

0<br />

@ Y<br />

hiji: nij=1<br />

pij<br />

1 0<br />

A @ Y<br />

hiji: nij=0<br />

1<br />

(1 ; pij) A q C(n) (6.7)<br />

where C(n) isthe number of connected clusters (<strong>in</strong>clud<strong>in</strong>g one-siteclusters) <strong>in</strong> the graph<br />

whose edges are the bonds hav<strong>in</strong>g nij =1. The correspond<strong>in</strong>g probability distribution,<br />

RC(n) = Z ;1<br />

0<br />

@ Y<br />

hiji: nij=1<br />

pij<br />

1 0<br />

A @ Y<br />

hiji: nij=0<br />

1<br />

(1 ; pij) A q C(n) (6.8)<br />

is called the r<strong>and</strong>om-cluster model with parameter q. This is a generalized bondpercolation<br />

model, with non-local correlations com<strong>in</strong>g fromthe factor q C(n) forq =1<br />

it reduces to ord<strong>in</strong>ary bond percolation. Note, by the way, that <strong>in</strong>the r<strong>and</strong>om-cluster<br />

model (unlike the Potts <strong>and</strong>FKSWmodels), q is merely a parameter it can take any<br />

positiverealvalue, not just 2 3:::.Sother<strong>and</strong>om-cluster model de nes, <strong>in</strong> some sense,<br />

an analytic cont<strong>in</strong>uation of the Potts model to non-<strong>in</strong>teger q ord<strong>in</strong>ary bond percolation<br />

corresponds to the \one-state Potts model".<br />

We have already veri ed the follow<strong>in</strong>g factsabout the FKSWmodel:<br />

a) ZP otts = ZFKSW = ZRC.<br />

b) The marg<strong>in</strong>al distribution of FKSW on the Potts variables f g (<strong>in</strong>tegrat<strong>in</strong>g out<br />

the fng) is precisely the Potts model Potts( ).<br />

c) The marg<strong>in</strong>al distribution of FKSW on the bondoccupation variables fng (<strong>in</strong>tegrat<strong>in</strong>g<br />

outthe f g) is precisely the r<strong>and</strong>om-cluster model RC(n).<br />

The conditional distributions of FKSW are also simple:<br />

d) The conditional distribution of the fng given the f g is as follows: <strong>in</strong>dependently<br />

for each bond hiji, onesets nij = 0 <strong>in</strong> case i 6= j, <strong>and</strong>sets nij =0 1with<br />

probability 1; pijpij, respectively,<strong>in</strong>case i = j.<br />

e) The conditional distribution of the f g given the fng is as follows: <strong>in</strong>dependently<br />

for each connected cluster, one sets all the sp<strong>in</strong>s i <strong>in</strong> the cluster to thesamevalue,<br />

chosen equiprobably from f1 2:::qg.<br />

47


These facts can be used for both analytic <strong>and</strong> numerical purposes. For example, by<br />

us<strong>in</strong>g facts (b), (c) <strong>and</strong> (e) we can prove anidentity relat<strong>in</strong>g expectations <strong>in</strong> the Potts<br />

model to connection probabilities <strong>in</strong> the r<strong>and</strong>om-cluster model:<br />

Here<br />

h i jiP ottsq = h i jiFKSWq [by (b)]<br />

ij ij(n)<br />

= hE( i jfng)iFKSWq<br />

j * +<br />

(q ; 1) ij +1<br />

=<br />

q<br />

* +<br />

(q ; 1) ij +1<br />

=<br />

q<br />

FKSWq<br />

RCq<br />

1 if i is connected to j<br />

0 if i is not connected to j<br />

[by (e)]<br />

[by (c)] (6.9)<br />

(6.10)<br />

<strong>and</strong> E( jfng) denotes conditional expectation given fng. 23 For the Is<strong>in</strong>gmodel with<br />

the usual convention = 1, (6.9) can be written more simply as<br />

h i jiIs<strong>in</strong>g = h ijiRCq=2 : (6.11)<br />

Similar identities can be proven for higher-order correlation functions, <strong>and</strong> can be employed<br />

to prove Gri ths-type correlation <strong>in</strong>equalities for the Potts model [51, 52].<br />

On the other h<strong>and</strong>, Swendsen <strong>and</strong> Wang [27]exploited facts (b){(e) to devise a<br />

radically new type of <strong>Monte</strong> <strong>Carlo</strong> algorithm. The Swendsen-Wang algorithm (SW)<br />

simulates the jo<strong>in</strong>tmodel (6.6) by alternately apply<strong>in</strong>g the conditional distributions (d)<br />

<strong>and</strong> (e)|that is, byalternately generat<strong>in</strong>gnew bond occupation variables (<strong>in</strong>dependent<br />

of the old ones) given the sp<strong>in</strong>s, <strong>and</strong> new sp<strong>in</strong> variables (<strong>in</strong>dependent ofthe old ones)<br />

given the bonds. Each ofthese operations can be carried out <strong>in</strong> a computer time of<br />

order volume: for generat<strong>in</strong>g the bondvariables thisistrivial,<strong>and</strong> for generat<strong>in</strong>g the<br />

sp<strong>in</strong> variable it relies on e cient (l<strong>in</strong>ear-time) algorithms for comput<strong>in</strong>g the connected<br />

clusters. 24 It is trivial that the SWalgorithm leaves <strong>in</strong>variant the Gibbsmeasure (6.6),<br />

s<strong>in</strong>ce any product of conditional probability operators has this property. It is also easy<br />

to see that the algorithm is ergodic, <strong>in</strong> the sense that every con guration f ng hav<strong>in</strong>g<br />

23 For an excellent <strong>in</strong>troduction to conditional expectations, see [50].<br />

24 Determ<strong>in</strong><strong>in</strong>g the connected components ofanundirected graph is a classic problem of computer<br />

science. The depth- rst-search <strong>and</strong> breadth- rst-search algorithms [53] have arunn<strong>in</strong>g time of order<br />

V , while the Fischer-Galler-Hoshen-Kopelman algorithm (<strong>in</strong> one ofitsvariants) [54] hasaworst-case<br />

runn<strong>in</strong>g time of order V log V ,<strong>and</strong> an observed mean runn<strong>in</strong>g time of order V <strong>in</strong> percolation-type<br />

problems [55]. Both ofthese algorithms are non-vectorizable. Shiloach <strong>and</strong> Vishk<strong>in</strong> [56] have <strong>in</strong>vented<br />

a SIMD parallel algorithm, <strong>and</strong> wehave very recently vectorized it for the Cyber 205, obta<strong>in</strong><strong>in</strong>g a<br />

speedup of a factor of 11 over scalar mode. We are currently carry<strong>in</strong>g out a comparative test of these<br />

three algorithms, as a function of lattice size <strong>and</strong> bonddensity [57]. In view of the extraord<strong>in</strong>ary<br />

performance of the SW algorithm (see below) <strong>and</strong> the factthat virtually all its CPU time isspent<br />

nd<strong>in</strong>g connected components, we feel that the desirability of nd<strong>in</strong>g improved algorithms for this<br />

problem is self-evident.<br />

48


d = 2 Is<strong>in</strong>g Model<br />

L <strong>in</strong>tE<br />

64 1575 ( 10) 5.25 (0.30)<br />

128 5352 ( 53) 7.05 (0.67)<br />

256 17921 (109) 6.83 (0.40)<br />

512 59504 (632) 7.99 (0.81)<br />

Table 1: Susceptibility <strong>and</strong>autocorrelation time <strong>in</strong>tE (E =energy slowest mode) for<br />

two-dimensional Is<strong>in</strong>g model at criticality, us<strong>in</strong>gSwendsen-Wang algorithm. St<strong>and</strong>ard<br />

error is shown <strong>in</strong> parentheses.<br />

nonzero FKSW-measure is accessible from every other. So the SW algorithm is at<br />

least a correct algorithm for simulat<strong>in</strong>g the FKSWmodel. It is also an algorithm for<br />

simulat<strong>in</strong>gthe Potts <strong>and</strong> r<strong>and</strong>om-cluster models, s<strong>in</strong>ce expectations <strong>in</strong> these two models<br />

are equal to the correspond<strong>in</strong>g expectations <strong>in</strong> the FKSWmodel.<br />

Historical remark. The r<strong>and</strong>om-cluster model was <strong>in</strong>troduced <strong>in</strong> 1969 by Fortu<strong>in</strong><br />

<strong>and</strong> Kasteleyn [58] they derived the identity ZPotts = ZRC, alongwiththe correlationfunction<br />

identity (6.9) <strong>and</strong> some generalizations. These relations were rediscovered<br />

several times dur<strong>in</strong>g the subsequent two decades [59]. Surpris<strong>in</strong>gly, however, no one<br />

seems to have noticed the jo<strong>in</strong>t probability distribution FKSW that underlay all these<br />

identities this was discovered implicitly by Swendsen <strong>and</strong> Wang [27], <strong>and</strong> was made<br />

explicit by Edwards <strong>and</strong> Sokal [60].<br />

It is certa<strong>in</strong>ly plausible that the SW algorithm might have lesscritical slow<strong>in</strong>g-down<br />

than the conventional (s<strong>in</strong>gle-sp<strong>in</strong>-update) algorithms: the reasonisthat a local move<strong>in</strong><br />

one setofvariables can have highly nonlocal e ects <strong>in</strong>the other. For example, sett<strong>in</strong>g<br />

nb =0onas<strong>in</strong>gle bond may disconnect a cluster, caus<strong>in</strong>g abigsubset of the sp<strong>in</strong>s <strong>in</strong><br />

that cluster to be ipped simultaneously. In some sense,therefore, the SW algorithm<br />

is a collective-mode algorithm <strong>in</strong> which the collective modes are chosen by the system<br />

rather than imposed from the outside as<strong>in</strong>multi-grid. (The miracle is that thisisdone<br />

<strong>in</strong> a way that preserves the correct Gibbs measure.)<br />

How well does the SW algorithm perform? In at least some cases,the performance<br />

is noth<strong>in</strong>g short of extraord<strong>in</strong>ary. Table 1 shows some prelim<strong>in</strong>ary data [61]onatwodimensional<br />

Is<strong>in</strong>g model at the bulkcritical temperature. These data are consistent<br />

with the estimate SW L 0:35 [27]. 25 By contrast, the conventional s<strong>in</strong>gle-sp<strong>in</strong>- ip<br />

25 But precisely because rises so slowly with L, good estimates of the dynamic critical exponent<br />

will require the useofextremely large lattices. Even with lattices up to L =512,weareunable to<br />

dist<strong>in</strong>guish conv<strong>in</strong>c<strong>in</strong>gly between z 0:35 <strong>and</strong> z 0. Note Added 1996: For more recent <strong>and</strong><br />

49


Estimates of zSW<br />

q =1 q =2 q =3 q =4<br />

d =1 0 0 0 0<br />

d =2 0 0:35 0:55 0:03 1 (exact??)<br />

d =3 0 0:75 | |<br />

d =4 0 1 (exact?) | |<br />

Table 2: Current best estimates of the dynamic critical exponent z for the Swendsen-<br />

Wang algorithm. Estimates are taken from [27] for d =2 3, q =2[62]ford =2,<br />

q =3 4 <strong>and</strong> [63] for d =4,q = 2. Error bar is a 95% con dence <strong>in</strong>terval.<br />

algorithms for thetwo-dimensional Is<strong>in</strong>gmodel have conv L 2:1 [21]. So the advantage<br />

of Swendsen-Wang over conventional algorithms (for this model) grows asymptotically<br />

like L 1:75 .Tobe sure, one iteration of the Swendsen-Wang algorithm may be a factor<br />

of 10 more costly <strong>in</strong> CPU time than one iterationofaconventional algorithm (the<br />

exact factor depends on the e ciency of the cluster- nd<strong>in</strong>g subrout<strong>in</strong>e). But the SW<br />

algorithm w<strong>in</strong>s already for modest values of L.<br />

For other Potts models, the performance of the SW algorithm is less spectacular<br />

than for the two-dimensional Is<strong>in</strong>g model, but itisstill very impressive. In Table 2 we<br />

give the currentbest estimates of the dynamic critical exponent zSW for q-state Potts<br />

models <strong>in</strong> d dimensions, as a function of q <strong>and</strong> d. 26<br />

All these exponents aremuch lower than the z > 2 observed <strong>in</strong> the s<strong>in</strong>gle-sp<strong>in</strong>- ip<br />

algorithms.<br />

Although the SW algorithm performs impressively well, we underst<strong>and</strong> very little<br />

about why these exponents take the values they do. Some cases are easy. Ifq =1,then<br />

all sp<strong>in</strong>s are <strong>in</strong> the samestate (theonly state!), <strong>and</strong> allbonds are thrown <strong>in</strong>dependently,<br />

so the autocorrelation time is zero. (Here the SW algorithm just reduces to the st<strong>and</strong>ard<br />

static algorithm for <strong>in</strong>dependent bondpercolation.) If d = 1 (more generally, ifthe<br />

lattice is a tree), the SWdynamics is exactly soluble: the behavior of each bondis<br />

<strong>in</strong>dependent ofeachother bond, <strong>and</strong> exp !;1= log(1 ; 1=q) < 1 as ! +1. But<br />

the rema<strong>in</strong>der of our underst<strong>and</strong><strong>in</strong>g isvery murky. Two pr<strong>in</strong>cipal <strong>in</strong>sights have been<br />

obta<strong>in</strong>ed so far:<br />

a) A calculation yield<strong>in</strong>g zSW =1<strong>in</strong>amean- eld (Curie-Weiss) Is<strong>in</strong>g model [64].<br />

much more precise data, see [111, 112]. These data are consistent with SW L 0:24 ,butthey are<br />

also consistent with SW log 2 L [112]. It is extremely di cult todist<strong>in</strong>guish a small power from a<br />

logarithm.<br />

26 Note Added 1996: For more recent data, see [111, 112] for d =2,q =2[113]ford =2,q =3<br />

<strong>and</strong> [112, 114]ford =2,q =4.Thesituation for d =3,q = 2 is extremely unclear: exponent estimates<br />

by di erent workers (us<strong>in</strong>g di erent methods) disagree wildly.<br />

50


This suggests (but of course does not prove) that zSW = 1 for Is<strong>in</strong>g models<br />

(q =2)<strong>in</strong>dimension d 4.<br />

b) A rigorous proof that zSW = [62]. This bound, while valid for all d <strong>and</strong> q, is<br />

extremely far from sharp for the Is<strong>in</strong>gmodels <strong>in</strong> dimensions 3 <strong>and</strong> higher. But it<br />

is reasonably good for the 3-<strong>and</strong>4-state Potts models <strong>in</strong> two dimensions, <strong>and</strong> <strong>in</strong><br />

the latter case it may even be sharp. 27<br />

But much rema<strong>in</strong>s to beunderstood!<br />

The Potts model with q large behaves very di erently. Instead of a critical po<strong>in</strong>t,<br />

the model undergoes a rst-order phase transition: <strong>in</strong> two dimensions, this occurs when<br />

q>4, while <strong>in</strong> three or more dimensions, it is believed to occur already when q 3<br />

[49]. At a rst-order transition, both the conventional algorithms <strong>and</strong> the Swendsen-<br />

Wang algorithm have an extremely severe slow<strong>in</strong>g-down (much more severe than the<br />

slow<strong>in</strong>g-down at a critical po<strong>in</strong>t): right atthe transition temperature, we expect<br />

exp(cL d;1 ). This is because sets of con gurations typical of the ordered <strong>and</strong> disordered<br />

phases are separated by free-energy barriers of order L d;1 ,i.e.bysetsof<strong>in</strong>termediate<br />

con gurations that conta<strong>in</strong> <strong>in</strong>terfaces of surface area L d;1 <strong>and</strong> therefore have an<br />

equilibrium probability exp(;cL d;1 ). 28<br />

Wol [65] has proposed a <strong>in</strong>terest<strong>in</strong>gmodication of the SWalgorithm, <strong>in</strong> whichone<br />

builds only a s<strong>in</strong>gle cluster (start<strong>in</strong>g at a r<strong>and</strong>omly chosen site) <strong>and</strong> ips it. Clearly,<br />

one step of the s<strong>in</strong>gle-cluster SW algorithm makes less change <strong>in</strong> the system than one<br />

step of the st<strong>and</strong>ard SW algorithm, but italsotakes much lesswork. If one enumerates<br />

the cluster us<strong>in</strong>g depth- rst-search or breadth- rst-search, then the CPUtime is<br />

proportional to the size of the cluster <strong>and</strong> bythe Fortu<strong>in</strong>-Kasteleyn identity (6.9), the<br />

mean cluster size is equal to the susceptibility:<br />

X<br />

h iji = X<br />

* +<br />

q i ; 1 j<br />

: (6.12)<br />

q ; 1<br />

j<br />

j<br />

So the relevant quantity <strong>in</strong>the s<strong>in</strong>gle-cluster SW algorithm is the dynamic critical exponent<br />

measured <strong>in</strong> CPU units:<br />

z1;clusterCP U = z1;cluster ;<br />

d ;<br />

!<br />

: (6.13)<br />

27Note Added1996: See [112, 113,114] for more recent data onthepossible sharpness of the<br />

Li{Sokal bound fortwo-dimensional models with q =23 4. These data appear to be consistent with<br />

sharpness modulo a logarithm, i.e. SW=CH log L.<br />

28Note Added 1996: Great progress has been made <strong>in</strong> recent years <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> methods for<br />

systems undergo<strong>in</strong>g a rst-order phase transition. The key idea is to simulate asuitably chosen non-<br />

Boltzmann probability distribution (\umbrella", \multicanonical", :::)<strong>and</strong>then apply reweight<strong>in</strong>g<br />

methods. The exponential slow<strong>in</strong>g-down exp(cLd;1 )result<strong>in</strong>g from barrier penetration is replaced<br />

by apower-law slow<strong>in</strong>g-down Lp result<strong>in</strong>g from di usion, provided thatthedistribution to be<br />

simulated is suitably chosen. See [115, 116,117,118,119] for reviews.<br />

51


The value of the s<strong>in</strong>gle-cluster algorithm is that the probability ofchoos<strong>in</strong>g a cluster<br />

is proportional to its size (s<strong>in</strong>ce we pick a r<strong>and</strong>om site), so the work is concentrated<br />

preferentially on larger clusters | <strong>and</strong> weth<strong>in</strong>k that isthese clusters which are most importantnear<br />

thecritical po<strong>in</strong>t. So it would not be surpris<strong>in</strong>gifz1;clusterCP U were smaller<br />

than zSW. Prelim<strong>in</strong>ary measurements <strong>in</strong>dicate that z1;clusterCP U is about the same as<br />

zSW for the two-dimensional Is<strong>in</strong>g model, but is signi cantly smaller for the three- <strong>and</strong><br />

four-dimensional Is<strong>in</strong>g models [66]. But a conv<strong>in</strong>c<strong>in</strong>g theoretical underst<strong>and</strong><strong>in</strong>g ofthis<br />

behavior is lack<strong>in</strong>g.<br />

Several other generalizations of the SW algorithm for Pottsmodels havebeenproposed. 29<br />

Oneisamulti-grid extension of the SWalgorithm: the idea is to carryout only a partial<br />

FKSW transformation, but then to apply this concept recursively [67]. This algorithm<br />

may have a dynamic critical exponent that issmaller than that ofst<strong>and</strong>ard SW (but the<br />

claims that z =0are<strong>in</strong>myop<strong>in</strong>ion unconv<strong>in</strong>c<strong>in</strong>g). 30 A second generalization, which<br />

works only <strong>in</strong> two dimensions, augments the SWalgorithm by transformations to the<br />

dual lattice [68]. This algorithm appears to achieve amodest improvement <strong>in</strong>critical<br />

slow<strong>in</strong>g-down <strong>in</strong> the scal<strong>in</strong>g region j ; cj L ;1= .<br />

F<strong>in</strong>ally, the SW algorithm can be generalized <strong>in</strong> a straightforward manner to Potts<br />

lattice gauge theories (more precisely, lattice gauge theories with a nite abelian gauge<br />

group G <strong>and</strong> Potts ( -function) action). Prelim<strong>in</strong>ary results for the three-dimensional<br />

Z2 gauge theory yield a dynamic critical exponent roughly equal to that ofthe ord<strong>in</strong>ary<br />

SW algorithm for the three-dimensional Is<strong>in</strong>g model (to which the gauge theory is dual)<br />

[69].<br />

In the pastyear there has been a urry of papers try<strong>in</strong>g togeneralize the SW algorithm<br />

to non-Potts models. Interest<strong>in</strong>g proposals for a direct generalization of the SW<br />

algorithm were madebyEdwards <strong>and</strong> Sokal [60] <strong>and</strong> by Niedermayer [70]. But themost<br />

promis<strong>in</strong>gideas atpresent seem to bethe embedd<strong>in</strong>g algorithms proposed byBrower <strong>and</strong><br />

Tamayo [71] for one-component ' 4 models <strong>and</strong> byWol [65, 72] for multi-component<br />

O(n)-<strong>in</strong>variant models.<br />

The idea of the embedd<strong>in</strong>g algorithmsisto ndIs<strong>in</strong>g-like variables underly<strong>in</strong>g a<br />

general sp<strong>in</strong> variable, <strong>and</strong> then to update the result<strong>in</strong>g Is<strong>in</strong>gmodel us<strong>in</strong>g the ord<strong>in</strong>ary<br />

SW algorithm (or the s<strong>in</strong>gle-cluster variant). For one-component sp<strong>in</strong>s, this embedd<strong>in</strong>g<br />

is the obvious decomposition <strong>in</strong>to magnitude <strong>and</strong> sign. Consider, therefore, a one-<br />

29 Note Added 1996: In addition to the generalizations discussed below, see [120, 112] for SW-type<br />

algorithms for the Ashk<strong>in</strong>-Teller model see [121] for a clever SW-type algorithm for the fully frustrated<br />

Is<strong>in</strong>g model <strong>and</strong> see [122, 123, 124,100] for a very <strong>in</strong>terest<strong>in</strong>g SW-type algorithm for antiferromagnetic<br />

Potts models, based on the \embedd<strong>in</strong>g" idea to bedescribed below.<br />

30 Note Added 1996: For at least some versions of the multi-grid SW algorithm, it can be proven<br />

[125] that the bound zMGSW = holds. Thus, the critical slow<strong>in</strong>g-down is not completely elim<strong>in</strong>ated<br />

<strong>in</strong> any model <strong>in</strong> which the speci c heat isdivergent.<br />

52


component model with Hamiltonian<br />

H(') =; X<br />

hxyi<br />

where 0<strong>and</strong> V (') =V (;'). We write<br />

'x 'y + X<br />

V ('x) (6.14)<br />

x<br />

'x = "x j'xj (6.15)<br />

where the signsf"g are Is<strong>in</strong>g variables. For xed values of the magnitudes fj'jg, the<br />

conditional probabilitydistribution of the f"g is given by anIs<strong>in</strong>gmodel with ferromagnetic<br />

(though space-dependent) coupl<strong>in</strong>gs Jxy j'xjj'yj. Therefore, the f"g model<br />

can be updated by the Swendsen-Wang algorithm. (Heat-bath or MGMC sweeps must<br />

also be performed, <strong>in</strong> order to update the magnitudes.) For the two-dimensional '4 model, Brower <strong>and</strong> Tamayo [71] nd adynamic critical behavior identical to that ofthe<br />

SW algorithm for the two-dimensional Is<strong>in</strong>g model | just as one would expect based<br />

on the idea that the \important" collective modes <strong>in</strong> the '4 model are sp<strong>in</strong> ips.<br />

Wol 's embedd<strong>in</strong>g algorithm for n-component models (n<br />

Consider an O(n)-<strong>in</strong>variant model with Hamiltonian<br />

2) is equally simple.<br />

H( ) =; X<br />

x y + X<br />

V (j xj) (6.16)<br />

hxyi<br />

with 0. Now xaunit vector r 2 IR n then any sp<strong>in</strong>vector x 2 IR n can be written<br />

uniquely (except for a set of measure zero) <strong>in</strong> the form<br />

where<br />

is the component of x perpendicular to r, <strong>and</strong><br />

x<br />

x = ? x + "xj x rjr (6.17)<br />

?<br />

x = x ; ( x r)r (6.18)<br />

"x = sgn( x r) (6.19)<br />

takes thevalues 1. Therefore, for xed values of the f ? g <strong>and</strong> fj rjg, the probability<br />

distribution of the f"g is given by anIs<strong>in</strong>gmodel with ferromagnetic coupl<strong>in</strong>gs Jxy<br />

j x rjj y rj. The algorithm is then: Choose at r<strong>and</strong>om a unit vector r xthe f ? g<br />

<strong>and</strong> fj rjg at their currentvalues, <strong>and</strong> update the f"g by thest<strong>and</strong>ard Swendsen-Wang<br />

algorithm (or the s<strong>in</strong>gle-cluster variant). Flipp<strong>in</strong>g "x corresponds to re ect<strong>in</strong>g x <strong>in</strong> the<br />

hyperplane perpendicular to r.<br />

At rst thoughtitmay seem strange (<strong>and</strong> somehow\unphysical") to try to ndIs<strong>in</strong>glike<br />

(i.e. discrete) variables <strong>in</strong> a model with acont<strong>in</strong>uous symmetry group. However,<br />

upon re ection (pardon the pun) one seeswhat isgo<strong>in</strong>g on: if the sp<strong>in</strong> con guration is<br />

slowly vary<strong>in</strong>g (e.g. a long-wavelength sp<strong>in</strong> wave), then the clusters tend to break along<br />

the surfaces where Jxy is small, hence where r 0. Then ipp<strong>in</strong>g "x on some clusters<br />

53


Figure 4: Action of theWol embedd<strong>in</strong>g algorithm on a long-wavelength sp<strong>in</strong>wave. For<br />

simplicity,both sp<strong>in</strong> space ( )<strong>and</strong>physical space (x) are depicted as one-dimensional.<br />

corresponds to a \soft" change near the cluster boundaries but a\hard" change <strong>in</strong> the<br />

cluster <strong>in</strong>teriors, i.e. a long-wavelength collective mode (Figure 4). So it is conceivable<br />

that the algorithm could havevery small (or even zero!) critical slow<strong>in</strong>g-down <strong>in</strong> models<br />

where the important large-scale collective modes are sp<strong>in</strong> waves.<br />

Even more strik<strong>in</strong>gly, consider a two-dimensional XY -model con guration consist<strong>in</strong>g<br />

of a widely separated vortex-antivortex pair: <strong>in</strong> the cont<strong>in</strong>uum limit thisisgiven by<br />

(z) =Imlog(z ; a) ; Imlog(z + a) (6.20)<br />

where z x1 + ix2 <strong>and</strong> (cos s<strong>in</strong> ). Then the surface r = 0 is an ellipse<br />

pass<strong>in</strong>g directly through the vortex <strong>and</strong> antivortex 31 . Re ect<strong>in</strong>g the sp<strong>in</strong>s <strong>in</strong>side this<br />

ellipse produces a con guration f newg that is a cont<strong>in</strong>uous map of thedoubly punctured<br />

plane <strong>in</strong>to the semicircle j j =1, r 0. Such amap necessarily has zero w<strong>in</strong>d<strong>in</strong>g<br />

number around the po<strong>in</strong>ts a. Sothe Wol update has destroyed the vorticity! 32<br />

Therefore, the key collectivemodes <strong>in</strong> the two-dimensional XY <strong>and</strong> O(n) models |<br />

sp<strong>in</strong> waves <strong>and</strong> (for the XY case) vortices | are well \encoded" by the Is<strong>in</strong>gvariables<br />

f"g. So it is quite plausible that critical slow<strong>in</strong>g-down could be elim<strong>in</strong>ated or almost<br />

elim<strong>in</strong>ated. In fact, tests ofthis algorithm on thetwo-dimensional XY (n = 2), classical<br />

Heisenberg (n =3)<strong>and</strong> O(4) models are consistentwiththe completeabsence of critical<br />

slow<strong>in</strong>g-down [72, 73].<br />

In view of the extraord<strong>in</strong>ary success of the Wol algorithm for sp<strong>in</strong> models, it is<br />

tempt<strong>in</strong>g to try to extend ittolattice gauge theories with cont<strong>in</strong>uous gauge group [for<br />

example, U(1), SU(N) orSO(N)]. But Ihave noth<strong>in</strong>g to report at present! 33<br />

31 With r = (cos s<strong>in</strong> ), the equation of this ellipse is x 2 1 +(x2 + a tan ) 2 = a 2 sec 2 .<br />

32 This picture of the action of the Wol algorithm on vortex-antivortex pairs was developed <strong>in</strong><br />

discussions with Richard Brower <strong>and</strong> Robert Swendsen.<br />

33 Note Added 1996: See [126] for a general theory of Wol -type embedd<strong>in</strong>g algorithms as applied<br />

to nonl<strong>in</strong>ear -models (sp<strong>in</strong> models tak<strong>in</strong>g values <strong>in</strong> a Riemannian manifold M). Roughly speak<strong>in</strong>g,<br />

we argue that aWol -type embedd<strong>in</strong>g algorithm can work well |<strong>in</strong> the sense of hav<strong>in</strong>g adynamic<br />

54


7 Algorithms for the Self-Avoid<strong>in</strong>g Walk 34<br />

An N{step self-avoid<strong>in</strong>g walk ! on a lattice L is a sequence of dist<strong>in</strong>ct po<strong>in</strong>ts<br />

!0 !1 ::: !N <strong>in</strong> L suchthat eachpo<strong>in</strong>tisanearest neighbor of its predecessor. Let cN<br />

[resp. cN(x)] be the number of N-step SAWs start<strong>in</strong>g atthe orig<strong>in</strong><strong>and</strong>end<strong>in</strong>g anywhere<br />

[resp. end<strong>in</strong>g at x]. Let h! 2 Ni be the mean-square end-to-end distance of an N-step<br />

SAW. These quantities are believed to have the asymptotic behavior<br />

cN<br />

N ;1<br />

N (7.1)<br />

cN(x)<br />

N<br />

N s<strong>in</strong>g;2<br />

(x xed 6= 0) (7.2)<br />

h! 2 Ni N 2 (7.3)<br />

as N !1here , s<strong>in</strong>g <strong>and</strong> are critical exponents, while (the connective constant<br />

of the lattice) is the analogue of a critical temperature. The SAWhas direct application<br />

<strong>in</strong> polymer physics [74], <strong>and</strong> is<strong>in</strong>directly relevant to ferromagnetism <strong>and</strong> quantum eld<br />

theory by virtue of its equivalence with the n ! 0 limit of the n-vector model [75].<br />

The SAWhas some advantages over sp<strong>in</strong> systems for <strong>Monte</strong> <strong>Carlo</strong> work: Firstly, one<br />

can work directly with SAWs on an <strong>in</strong> nitelattice there are no systematic errors due to<br />

nite-volume corrections. Secondly, there is no L d (or d ) factor <strong>in</strong> the computational<br />

work, so one can go closer tocriticality. Thus, theSAW is an exceptionally advantageous<br />

\laboratory" for the numerical study of critical phenomena.<br />

Di erent aspects ofthe SAW can be probed <strong>in</strong> three di erent ensembles 35 :<br />

Free-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble (variable N, variable x)<br />

Fixed-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble (variable N, xed x)<br />

Canonical ensemble ( xed N, variable x)<br />

In therema<strong>in</strong>der of this section wesurvey sometypical <strong>Monte</strong> <strong>Carlo</strong> algorithms for these<br />

ensembles.<br />

Free-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble. Here the con guration space S is the set<br />

of all SAWs, of arbitrary length, start<strong>in</strong>g atthe orig<strong>in</strong> <strong>and</strong> end<strong>in</strong>g anywhere. The gr<strong>and</strong><br />

critical exponent z signi cantly less than2|only if M is a discrete quotient of a product of spheres:<br />

for example, a sphere S N;1 or a real projective space RP N;1 S N;1 =Z2. In particular, we donot<br />

expect Wol -type embedd<strong>in</strong>g algorithms to work well for -models tak<strong>in</strong>g values <strong>in</strong> SU(N) for N 3.<br />

34 Note Added 1996: An extensive <strong>and</strong>up-to-date review of <strong>Monte</strong> <strong>Carlo</strong> algorithms for the selfavoid<strong>in</strong>g<br />

walk can be found <strong>in</strong> [127]. A briefer version is [128].<br />

35 The proper term<strong>in</strong>ology for these ensembles is unclear to me. Perhaps the gr<strong>and</strong> canonical <strong>and</strong><br />

canonical ensembles ought to be called \canonical" <strong>and</strong> \microcanonical", respectively, reserv<strong>in</strong>g the<br />

term \gr<strong>and</strong> canonical" for ensembles of many SAWs. Note Added 1996: Inow prefer call<strong>in</strong>g these<br />

the \variable-length" <strong>and</strong> \ xed-length" ensembles, respectively, <strong>and</strong>avoid<strong>in</strong>g the \{canonical" terms<br />

entirely this avoids ambiguity.<br />

55


partition function is<br />

<strong>and</strong> the Gibbs measure is<br />

( )=<br />

1X<br />

N=0<br />

N cN<br />

(7.4)<br />

(!) = ( ) ;1 j!j : (7.5)<br />

;1 The \monomer activity" is a user-chosen parameter satisfy<strong>in</strong>g 0 < c .<br />

As approaches the critical activity c, the average walk length hNi tends to <strong>in</strong> nity.<br />

The connective constant <strong>and</strong> the critical exponent can be estimated from the <strong>Monte</strong><br />

<strong>Carlo</strong> data, us<strong>in</strong>g the method of maximum likelihood [76].<br />

A dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithm for this ensemble was proposed by Berretti <strong>and</strong><br />

Sokal [76]. The algorithm's elementary moves are as follows: either one attempts to<br />

appendanew step to thewalk, with equal probability<strong>in</strong>eachoftheq possible directions<br />

(here q is the coord<strong>in</strong>ation number of the lattice) or else one deletes the last step from<br />

thewalk. In theformer case, one must checkthatthe proposed new walkisself-avoid<strong>in</strong>g<br />

if it isn't, then the attempted move is rejected <strong>and</strong> theold con guration is counted aga<strong>in</strong><br />

<strong>in</strong> the sample (\null transition"). If an attempt is madetodeleteastep from an alreadyempty<br />

walk, then a null transition is also made. The relative probabilitiesof N =+1<br />

<strong>and</strong> N = ;1 attempts arechosen to be<br />

IP( N =+1attempt) =<br />

IP( N = ;1 attempt) =<br />

Therefore, the transition probability matrix is<br />

p(! ! ! 0 ) =<br />

8<br />

><<br />

>:<br />

q<br />

1+q<br />

1<br />

1+q<br />

1+q SAW(! 0 ) if ! ! 0<br />

1<br />

1+q if ! 0 ! or ! = ! 0 = <br />

1+q A(!) if ! = !0 = <br />

(7.6)<br />

(7.7)<br />

(7.8)<br />

where<br />

SAW(! 0 ) = 1 if !0 0<br />

is a SAW<br />

if ! 0 is not a SAW<br />

(7.9)<br />

Here ! ! 0 denotes thatthewalk ! 0 is a one-step extension of !<strong>and</strong>A(!)isthenumber of non-self-avoid<strong>in</strong>g walks ! 0 with ! ! 0 . Itiseasilyveri ed that this transition matrix<br />

satis es detailed balance for the Gibbs distribution ,i.e.<br />

(!) p(! ! ! 0 ) = (! 0 ) p(! 0 ! !) (7.10)<br />

for all !! 0 2 S. It is also easily veri ed that the algorithm is ergodic (= irreducible):<br />

to getfromawalk ! to another walk ! 0 , it su ces to use N = ;1 moves to transform<br />

! <strong>in</strong>to the empty walk , <strong>and</strong>then use N =+1moves to build up the walk ! 0 .<br />

Let us now analyze the critical slow<strong>in</strong>g-down of the Berretti-Sokal algorithm. We<br />

can argue heuristically that<br />

hNi 2 : (7.11)<br />

56


To see this, consider the quantity N(t) =j!j(t), the number of steps <strong>in</strong> the walk at<br />

time t. This quantity executes, crudely speak<strong>in</strong>g, a r<strong>and</strong>om walk (with drift)onthe<br />

nonnegative <strong>in</strong>tegers the average time togofromsome po<strong>in</strong>t N to the po<strong>in</strong>t 0 (i.e. the<br />

emptywalk) is of order N 2 .Now, eachtimetheemptywalk is reached, all memory of the<br />

past is erased future walks are then <strong>in</strong>dependentofpastones. Thus, theautocorrelation<br />

time ought tobeoforder hN 2 i, or equivalently hNi 2 .<br />

This heuristic argument can be turned <strong>in</strong>to a rigorous proof of a lower bound<br />

const hNi 2 [10]. However, as an argument for an upper bound of the same form, it<br />

is not entirely conv<strong>in</strong>c<strong>in</strong>g, as it assumes without proof that the slowest mode isthe one<br />

represented by N(t). With considerably more work, it is possible to prove anupper<br />

bound on that is only slightly weaker than theheuristic prediction: const hNi 1+<br />

[10, 77]. 36 (Note that the critical exponent is believed to equal 43=32 <strong>in</strong> dimension<br />

d =2, 1:16 <strong>in</strong> d =3,<strong>and</strong>1<strong>in</strong>d 4.) In fact, we suspect[78]that the truebehavior is<br />

hNi p for some exponent p strictly between 2<strong>and</strong>1+ .Adeeper underst<strong>and</strong><strong>in</strong>g of<br />

the dynamic critical behavior of the Berretti-Sokal algorithm would be of de nite value.<br />

It is worth compar<strong>in</strong>g the computational work required for SAW versus Is<strong>in</strong>g simulations:<br />

hNi 2 2= = 3:4 for the d =3SAW, versus d+z = 5:0 (resp. 3:8 )<br />

for the d =3Is<strong>in</strong>gmodel us<strong>in</strong>g the Metropolis (resp. Swendsen-Wang) algorithm. This<br />

v<strong>in</strong>dicates our assertion thattheSAWisanadvantageous model for <strong>Monte</strong><strong>Carlo</strong>studies<br />

of critical phenomena.<br />

Fixed-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble. The con guration space S is the set of all<br />

SAWs, of arbitrary length, start<strong>in</strong>gatthe orig<strong>in</strong> <strong>and</strong>end<strong>in</strong>gatthe xed site x (6= 0). The<br />

ensemble is as <strong>in</strong> the free-endpo<strong>in</strong>t case, with cN replaced by NcN(x). The connective<br />

constant <strong>and</strong> the critical exponent s<strong>in</strong>g can be estimated from the <strong>Monte</strong> <strong>Carlo</strong> data,<br />

us<strong>in</strong>g the method of maximum likelihood.<br />

Adynamic <strong>Monte</strong> <strong>Carlo</strong> algorithm for this ensemble was proposed by Berg<strong>and</strong><br />

Foerster [79] <strong>and</strong> Arag~ao de Carvalho, Caracciolo <strong>and</strong> Frohlich (BFACF) [75, 80]. The<br />

elementary moves are local deformations of the cha<strong>in</strong>, with N =0 2. The critical<br />

slow<strong>in</strong>g-down <strong>in</strong> the BFACF algorithm is quite subtle. On the oneh<strong>and</strong>, Sokal <strong>and</strong><br />

Thomas [81] have proven the surpris<strong>in</strong>g resultthat exp =+1 for all 6= 0 (see Section<br />

8). On the other h<strong>and</strong>, numerical experiments [12, 82] show that <strong>in</strong>tN hNi 3:0 0:4 (<strong>in</strong><br />

d = 2). Clearly, the BFACF dynamics is not well understood at present: further work,<br />

both theoretical <strong>and</strong> numerical, is needed.<br />

In addition, Caracciolo, Pelissetto <strong>and</strong> Sokal [82] are study<strong>in</strong>g a\hybrid" algorithm<br />

that comb<strong>in</strong>es local (BFACF) moves with non-local (cut-<strong>and</strong>-paste) moves. The critical<br />

slow<strong>in</strong>g-down, measured <strong>in</strong> CPU units, appears to bereduced slightly compared to the<br />

pure BFACF algorithm: hNi 2:3 <strong>in</strong> d =2.<br />

Canonical ensemble. Algorithms for this ensemble, based on local deformations of<br />

the cha<strong>in</strong>, have been used by polymer physicists for more than 25 years [83, 84]. So the<br />

recent proof [85] that allsuch algorithms are nonergodic (= not irreducible) comes as a<br />

36 All these proofs are discussed <strong>in</strong> Section 8.<br />

57


slightembarrassment. Fortunately,there does exist a non-local xed-N algorithm which<br />

is ergodic: the \pivot" algorithm, <strong>in</strong>vented by Lal [86] <strong>and</strong> <strong>in</strong>dependently re<strong>in</strong>vented by<br />

MacDonald et al. [87] <strong>and</strong> by Madras [16]. The elementary move isasfollows: choose<br />

at r<strong>and</strong>om a pivot po<strong>in</strong>t k along the walk (1 k N ; 1) choose at r<strong>and</strong>om a nonidentity<br />

elementofthe symmetry group of the lattice (rotation or re ection) then apply<br />

the symmetry-group element to !k+1 ::: !N us<strong>in</strong>g !k as a pivot. The result<strong>in</strong>g walk<br />

is accepted if it is self-avoid<strong>in</strong>g otherwise it is rejected <strong>and</strong> the walk ! is counted once<br />

more <strong>in</strong> the sample. It can be proven [16] that this algorithm is ergodic <strong>and</strong> preserves<br />

the equal-weight probability distribution.<br />

At rstthought the pivot algorithm sounds terrible (at least it did to me): for N<br />

large, nearly all theproposed moves will get rejected. Thisis<strong>in</strong>facttrue: the acceptance<br />

fraction behaves N ;p as N !1,where p 0:19 <strong>in</strong> d = 2 [16]. On the other h<strong>and</strong>,<br />

the pivot moves are very radical: after very few (5 or 10) accepted moves the SAW<br />

will have reached an \essentially new" conformation. One conjectures, therefore, that<br />

N p . Actually it is necessary to be a bit more careful: for global observables f<br />

(such asthe end-to-end distance ! 2 N )one expects <strong>in</strong>tf N p but local observables<br />

(such asthe angle between the 17 th <strong>and</strong> 18 th bonds of the walk) are expected to evolve a<br />

factor of N more slowly: <strong>in</strong>tf N 1+p .Thus, the slowest mode isexpected to behaveas<br />

exp N 1+p .For thepivot algorithm applied to ord<strong>in</strong>ary r<strong>and</strong>om walk one can calculate<br />

the dynamical behavior exactly [16]: for global observables f theautocorrelation function<br />

behaves roughly like<br />

nX<br />

ff(t) 1 ; i<br />

t<br />

(7.12)<br />

n<br />

from which itfollows that<br />

i=1<br />

<strong>in</strong>tf log N (7.13)<br />

expf N (7.14)<br />

| <strong>in</strong> agreement with our heuristic argument modulo logarithms. For the SAW, it is<br />

found numerically [16] that <strong>in</strong>tf N 0:20 <strong>in</strong> d = 2, also <strong>in</strong> close agreement withthe<br />

heuristic argument.<br />

A careful analysis of the computational complexityofthe pivot algorithm [36] shows<br />

that one \e ectively <strong>in</strong>dependent" sample (at least as regards global observables) can<br />

be produced <strong>in</strong> a computer time oforder N. Thisisafactor N more e cient than the<br />

Berretti-Sokal algorithm, a fact which opens up excit<strong>in</strong>g prospects for high-precision<br />

<strong>Monte</strong> <strong>Carlo</strong> studies of critical phenomena <strong>in</strong>the SAW. Thus, with a modest computational<br />

e ort (300 hours on a Cyber 170-730), Madras <strong>and</strong> I found =0:7496 0:0007<br />

(95% con dence limits) for 2-dimensional SAWs of lengths 200 N 10000 [16].<br />

We hope to carry out soon a conv<strong>in</strong>c<strong>in</strong>g numerical test of hyperscal<strong>in</strong>g <strong>in</strong>the threedimensional<br />

SAW. 37<br />

37 Note Added 1996: This study has now been carried out [129], <strong>and</strong> yields =0:5877 0:0006<br />

(subjective 68% con dence limits), based on SAWs of length upto 80000 steps. Proper treatment<br />

58


8 Rigorous Analysis of Dynamic <strong>Monte</strong> <strong>Carlo</strong> Algorithms<br />

In this nal lecture, we discusstechniques for prov<strong>in</strong>g rigorous lower <strong>and</strong> upper<br />

bounds on the autocorrelation times of dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms. This topic is<br />

of primary <strong>in</strong>terest, of course, to mathematical physicists: it constitutes the rst steps<br />

toward a rigorous theory of dynamic critical phenomena, along l<strong>in</strong>es parallel to the wellestablished<br />

rigorous theory of static critical phenomena. But these proofs are, I believe,<br />

also of some importance for practical <strong>Monte</strong> <strong>Carlo</strong> work, as they give <strong>in</strong>sight <strong>in</strong>to the<br />

physical basis of critical slow<strong>in</strong>g-down <strong>and</strong> may po<strong>in</strong>ttowards improved algorithms with<br />

reduced critical slow<strong>in</strong>g-down.<br />

There is a big di erence between the techniques used for prov<strong>in</strong>g lower <strong>and</strong> upper<br />

bounds, <strong>and</strong> itiseasytounderst<strong>and</strong> this physically. Toprove alower bound onthe<br />

autocorrelation time, it su ces to nd one physical reason why the dynamics should be<br />

slow, i.e. to nd one \slow mode". This physical <strong>in</strong>sight can often be converted directly<br />

<strong>in</strong>to a rigorous proof, us<strong>in</strong>g the variational method described below. On the other h<strong>and</strong>,<br />

to prove anupper bound onthe autocorrelation time, it is necessary to underst<strong>and</strong> all<br />

conceivable physical reasons for slowness, <strong>and</strong>toprovethat none of them cause too great<br />

a slow<strong>in</strong>g-down. This is extremely di cult todo,<strong>and</strong>has been carried to completion<br />

<strong>in</strong> very few cases.<br />

We shall be obliged to restrict attention to reversible Markov cha<strong>in</strong>s [i.e. those satisfy<strong>in</strong>g<br />

the detailed-balance condition (2.27)], as these give rise to self-adjo<strong>in</strong>t operators.<br />

Non-reversible Markov cha<strong>in</strong>s, correspond<strong>in</strong>g to non-self-adjo<strong>in</strong>t operators, are much<br />

more di cult toanalyze rigorously.<br />

The pr<strong>in</strong>cipal method for prov<strong>in</strong>g lower bounds on exp (<strong>and</strong> <strong>in</strong>facton <strong>in</strong>tf) isthe<br />

variational (or Rayleigh-Ritz) method. Let us recall some ofthe theory of reversible<br />

Markov cha<strong>in</strong>s from Section 2: Let be a probability measure, <strong>and</strong> letP = fpxyg be a<br />

transition probability matrix that satis es detailed balance with respectto . Now P<br />

acts naturally on functions (observables) accord<strong>in</strong>g to<br />

(Pf)(x)<br />

X<br />

y<br />

pxy f(y) : (8.1)<br />

In particular, when act<strong>in</strong>g onthe space l 2 ( )of -square-<strong>in</strong>tegrable functions, the operator<br />

P is a self-adjo<strong>in</strong>t contraction. Its spectrum therefore lies <strong>in</strong> the <strong>in</strong>terval [;1 1].<br />

Moreover, P has an eigenvalue 1 with eigenvector equal to the constant function 1.<br />

Let be the orthogonal projector <strong>in</strong> l 2 ( )onto the constant functions. Then, for each<br />

real-valued observable f 2 l 2 ( ), we have<br />

Cff(t) = (f (P jtj ; )f)<br />

of corrections to scal<strong>in</strong>g iscrucial <strong>in</strong> obta<strong>in</strong><strong>in</strong>g this estimate. We also show that the <strong>in</strong>terpenetration<br />

ratio approaches its limit<strong>in</strong>g value =0:2471 0:0003 from above, contrary to the prediction of<br />

the two-parameter renormalization-group theory. Wehave critically reexam<strong>in</strong>ed this theory <strong>and</strong> shown<br />

where the error lies [130, 129].<br />

59


= (f (I ; )P jtj (I ; )f) (8.2)<br />

where (gh) hg hi denotes the <strong>in</strong>ner product <strong>in</strong> l2 ( ). By the spectral theorem, this<br />

can be written as<br />

Cff(t) =<br />

Z 1<br />

jtj<br />

d ff( ) (8.3)<br />

where d ff is a positive measure. It follows that<br />

<strong>in</strong>tf = 1<br />

2<br />

1<br />

2<br />

;1<br />

R 1;1 1+<br />

1; d ff( )<br />

R 1;1 d ff( )<br />

1+ ff(1)<br />

1 ; ff(1)<br />

(8.4)<br />

by Jensen's <strong>in</strong>equality (s<strong>in</strong>ce the function 7! (1 + )=(1 ; )isconvex). 38<br />

Our method will be to compute explicitly a lower bound onthe normalized autocorrelation<br />

function at time lag1,<br />

ff(1) = Cff(1)<br />

Cff(0)<br />

(8.5)<br />

for a suitably chosen trial observable f. Equivalently, we compute anupper bound on<br />

the Rayleigh quotient<br />

(f (I ; P )f)<br />

(f (I ; )f) = Cff(0) ; Cff(1)<br />

Cff(0)<br />

= 1; ff(1) : (8.6)<br />

The crux of the matter is to pick a trial observable f that has a strong enoughoverlap<br />

with the \slowest mode". A useful formula is<br />

(f (I ; P )f) = 1<br />

2<br />

X<br />

xy<br />

x pxy jf(x) ; f(y)j 2 : (8.7)<br />

That is,the numerator of the Rayleigh quotient ishalf the mean-square change <strong>in</strong> f <strong>in</strong><br />

as<strong>in</strong>gle time step.<br />

Example 1. S<strong>in</strong>gle-site heat-bath algorithm. Let us consider, for simplicity, a translation<strong>in</strong>variant<br />

sp<strong>in</strong>model <strong>in</strong> a periodic box ofvolume V . Let Pi = fPi(' ! ' 0 )g be the<br />

transition probability matrix associated with apply<strong>in</strong>g the heat-bath operation at site<br />

i. 39 Then the operator Pi has the follow<strong>in</strong>g properties:<br />

38 It also follows from (8.3) that ff(t) ff(1) jtj for even values of t. Moreover, this holds for odd<br />

values of t if d ff is supported on 0(though not necessarily otherwise) this is the caseforthe<br />

heat-bath <strong>and</strong>Swendsen-Wang algorithms, <strong>in</strong> which P 0. Therefore, the decay as t !1of ff(t)<br />

is also bounded below <strong>in</strong>terms of ff(1).<br />

39 Probabilistically, Pi is the conditional expectation E ( jf'jgj6=i). Analytically, Pi is the orthogonal<br />

projection <strong>in</strong> l 2 ( )onto the l<strong>in</strong>ear subspace consist<strong>in</strong>g offunctions depend<strong>in</strong>g only on f'jgj6=i.<br />

60


(a) Pi is an orthogonal projection. (In particular, P 2<br />

i = Pi <strong>and</strong> 0 Pi I.)<br />

(b) Pi1 = 1. (In particular, Pi = Pi = .)<br />

(c) Pif = f if f depends only on f'jgj6=i. (In fact a stronger propertyholds: Pi(fg)=<br />

fPig if f depends only on f'jgj6=i. Butweshall not need this.)<br />

For simplicityweconsider only r<strong>and</strong>om siteupdat<strong>in</strong>g. Then the transition matrix of the<br />

heat-bath algorithm is<br />

P = 1 X<br />

Pi :<br />

V<br />

(8.8)<br />

We shall use the notation of the Is<strong>in</strong>gmodel, but the same proof applies to much more<br />

general models.<br />

As expla<strong>in</strong>ed <strong>in</strong> Section4,the critical slow<strong>in</strong>g-down of the local algorithms (such as<br />

s<strong>in</strong>gle-siteheat-bath) arises from the fact that large regions <strong>in</strong> which the net magnetization<br />

is positiveornegativetend tomoveslowly. Inparticular, the total magnetization of<br />

the lattice uctuates slowly. Soitisnatural to expect that that the total magnetization<br />

M = P i i will be a \slow mode". Indeed, let us compute anupper bound onthe<br />

Rayleigh quotient (8.6) with f = M. Thedenom<strong>in</strong>ator is<br />

i<br />

(M (I ; )M) =hM 2 i ; hMi 2 = V (8.9)<br />

(s<strong>in</strong>ce hMi =0bysymmetry). Thenumerator is<br />

(M (I ; P )M) = 1<br />

V<br />

= 1<br />

V<br />

1<br />

V<br />

= 1<br />

V<br />

X<br />

ijk<br />

X<br />

i<br />

X<br />

i<br />

( i (I ; Pj) k)<br />

( i (I ; Pi) i)<br />

( i i)<br />

X<br />

h 2 i i<br />

i<br />

= 1 : (8.10)<br />

Here we rstusedproperties (a) <strong>and</strong> (c) to deduce that ( i (I ; Pj) k) =0unless<br />

i = j = k, <strong>and</strong>then used property (a)tobound ( i (I ; Pi) i). It follows from (8.9)<br />

<strong>and</strong> (8.10) that<br />

<strong>and</strong> hence that<br />

1 ; MM(1)<br />

1<br />

V<br />

(8.11)<br />

<strong>in</strong>tM V ; 1<br />

: (8.12)<br />

2<br />

61


Note, however, that here time ismeasured <strong>in</strong> hits of a s<strong>in</strong>gle site. If wemeasure time<br />

<strong>in</strong> hits per site, then we conclude that<br />

(HPS)<br />

<strong>in</strong>tM<br />

; 1<br />

2V<br />

: (8.13)<br />

This provesalower bound [88, 89, 90] on the dynamic critical exponent z, namely<br />

z = . (Here <strong>and</strong> are the static critical exponents forthe susceptibility <strong>and</strong><br />

correlation length, respectively. Note that bythe usual static scal<strong>in</strong>g law, = =2; <br />

<strong>and</strong> for most models is very close to zero. So this argument almost makes rigorous<br />

(as a lower bound) the heuristic argument that z 2.)<br />

Virtually the same argument applies, <strong>in</strong> fact, to any reversible s<strong>in</strong>gle-site algorithm<br />

(e.g. Metropolis). The only di erence is that the property 0 Pi I is replaced by<br />

;I Pi I, sothe bound onthe numerator is a factor of 2 larger. Hence<br />

(HPS)<br />

<strong>in</strong>tM<br />

2<br />

1<br />

; : (8.14)<br />

2V 2<br />

Example 2. Swendsen-Wang algorithm. Recall that the SW algorithm simulates the<br />

jo<strong>in</strong>t model (6.6) by alternately apply<strong>in</strong>g the conditional distributions of fng given f g,<br />

<strong>and</strong> f g given fng |that is, by alternately generat<strong>in</strong>g new bond occupation variables<br />

(<strong>in</strong>dependent ofthe old ones) given the sp<strong>in</strong>s, <strong>and</strong> new sp<strong>in</strong> variables (<strong>in</strong>dependent of<br />

the old ones) given the bonds. The transition matrix PSW = PSW(f ng!f 0 n 0 g)is<br />

therefore a product<br />

PSW = PbondPsp<strong>in</strong> (8.15)<br />

where Pbond is the conditional expectation operator E( jf g), <strong>and</strong> Psp<strong>in</strong> is E( jfng).<br />

We shall use the variational method, with f chosen to bethe bonddensity<br />

f = N = X<br />

nij : (8.16)<br />

To lighten the notation, let us write b i j for a bond b = hiji. Wethen have<br />

It follows from (8.17) that<br />

hiji<br />

E(nbjf g) = pb (8.17)<br />

b<br />

E(nbnb0jf g) = pb pb0 b b0 for b 6= b 0<br />

(8.18)<br />

hnb(t =0)nb 0(t =1)i = hnb E(E(nb<br />

= hnb E(nb<br />

0jf g)jfng)i<br />

0jf g)i<br />

0jf g)i<br />

= hE(nbjf g) E(nb<br />

= pb pb0 h b b0i : (8.19)<br />

62


The correspond<strong>in</strong>g truncated (connected) correlation function is clearly<br />

hnb(t =0)nb 0(t =1)i = pb pb 0 h b b 0i (8.20)<br />

where hA Bi hABi;hAihBi. Wehavethus expressed a dynamic correlation function<br />

of the SW algorithm <strong>in</strong> terms of a static correlation function of the Potts model.<br />

Now let us compute the samequantity attime lag 0: by (8.18), for b 6= b 0 wehave<br />

<strong>and</strong> hence<br />

hnb(t =0)nb 0(t =0)i = pb pb 0 h b b 0i (8.21)<br />

hnb(t =0)nb 0(t =0)i = pb pb 0 h b b 0i : (8.22)<br />

On the other h<strong>and</strong>, for b = b 0 we clearly have<br />

hnb(t =0)nb0(t =0)i =<br />

2 hnbi;hnbi<br />

= pb h bi;p 2 b h bi2 : (8.23)<br />

Consider now the usual case <strong>in</strong> which<br />

(<br />

p for b 2 B<br />

pb =<br />

0 otherwise<br />

(8.24)<br />

for some family of bonds B. Comb<strong>in</strong><strong>in</strong>g (8.20), (8.22) <strong>and</strong> (8.23) <strong>and</strong> summ<strong>in</strong>g over<br />

b b 0 ,we obta<strong>in</strong><br />

hN (t =0)N (t =1)i = p 2 hE Ei (8.25)<br />

hN (t =0)N (t =0)i = p 2 hE Ei ;p(1 ; p)hEi (8.26)<br />

where E ; P b2B b 0isthe energy. Hence the normalized autocorrelation function<br />

at time lag 1 is exactly<br />

NN(1)<br />

hN (t =0)N (t =1)i<br />

hN (t =0)N (t =0)i<br />

= 1; ;(1 ; p)E<br />

pCH ; (1 ; p)E<br />

(8.27)<br />

where E V ;1 hEi is the mean energy <strong>and</strong> CH V ;1 hE Ei is the speci c heat (V is<br />

the volume). At criticality, p ! pcrit > 0<strong>and</strong> E ! Ecrit < 0, so<br />

NN(1) 1 ; const<br />

CH<br />

: (8.28)<br />

We nowremark that although PSW = PbondPsp<strong>in</strong> is not self-adjo<strong>in</strong>t, the modi ed<br />

transition matrix P 0 SW Psp<strong>in</strong>PbondPsp<strong>in</strong> is self-adjo<strong>in</strong>t (<strong>and</strong>positive-semide nite), <strong>and</strong><br />

N has the sameautocorrelation function for both:<br />

(N (PbondPsp<strong>in</strong>) t N ) = (N (Psp<strong>in</strong>PbondPsp<strong>in</strong>) t N ) (8.29)<br />

63


It follows that<br />

<strong>and</strong> hence<br />

NN(t) NN(1) jtj<br />

(1 ; const=CH) jtj<br />

(8.30)<br />

<strong>in</strong>tN const CH : (8.31)<br />

This proves a lower bound [62] on the dynamic critical exponent zSW, namely zSW<br />

= . (Here <strong>and</strong> are the static critical exponents forthesusceptibility <strong>and</strong>correla tion length, respectively.) For the two-dimensional Potts models with q =23 4, it is<br />

known exactly (but non-rigorously!) that = =0 2<br />

5 1, with multiplicative logarithmic<br />

corrections for q = 4 [91]. The bound onzSW may be conceivably be sharp (or sharp<br />

up to a logarithm) for q = 4 [62].<br />

Example 3. Berretti-Sokal algorithm for SAWs. Let us consider the observable<br />

f(!) =j!j (= N) (8.32)<br />

the total number of bonds <strong>in</strong> the walk. We have argued heuristically that hNi 2 we<br />

can now make this argument rigorous as a lower bound. Indeed, it su ces to use (8.7):<br />

s<strong>in</strong>ce the maximumchange of j!j <strong>in</strong> a s<strong>in</strong>gle time step is 1, it follows that<br />

(f (I ; P )f)<br />

1<br />

:<br />

2<br />

(8.33)<br />

On the other h<strong>and</strong>, the denom<strong>in</strong>ator of the Rayleigh quotient is<br />

(f (I ; )f) =hN 2 i;hNi 2 : (8.34)<br />

Assum<strong>in</strong>g the usual scal<strong>in</strong>g behavior (7.1) for the cN, wehave<br />

hNi<br />

hN 2 i<br />

asymptotically as " c = ;1 ,<strong>and</strong>hence<br />

1 ;<br />

( +1)<br />

1 ;<br />

(8.35)<br />

(8.36)<br />

<strong>in</strong>tN const hNi 2 : (8.37)<br />

Avery di erent approach toprov<strong>in</strong>g lower bounds on the autocorrelation time exp<br />

is the m<strong>in</strong>imum hitt<strong>in</strong>g-time argument [81]. Consider a Markov cha<strong>in</strong> with transition<br />

matrix P satisfy<strong>in</strong>g detailed balance for some probability measure . If A <strong>and</strong> B are<br />

subsets ofthe state space S, letTAB be the m<strong>in</strong>imumtime for gett<strong>in</strong>g from A to B with<br />

nonzero probability, i.e.<br />

TAB<br />

m<strong>in</strong>fn: p (n)<br />

xy > 0 for some x 2 A y 2 Bg (8.38)<br />

Then the theorem asserts that ifTAB is large <strong>and</strong> this is not \justi ed" by the rarity of<br />

A <strong>and</strong>/or B <strong>in</strong> the equilibrium distribution ,then the autocorrelation time exp must<br />

be large. More precisely:<br />

64


Theorem 2 Consider a Markov cha<strong>in</strong> with transition matrix P satisfy<strong>in</strong>g detailed balance<br />

for the probability measure .Let TAB be de ned as <strong>in</strong> (8.38). Then<br />

exp<br />

sup<br />

AB2S<br />

2(TAB ; 1)<br />

; log ( (A) (B))<br />

Proof. Let A B S, <strong>and</strong>letn


As noted at the beg<strong>in</strong>n<strong>in</strong>g ofthis lecture, it is much more di cult toprove upper<br />

bounds on the autocorrelation time | or even to prove that exp < 1 |<strong>and</strong> few<br />

nontrivial results have been obta<strong>in</strong>ed so far.<br />

The onlygeneral method (to myknowledge) for prov<strong>in</strong>g upper bounds on the autocorrelation<br />

time isCheeger's <strong>in</strong>equality [77, 92, 93], the basicidea of which isto search<br />

for \bottlenecks" <strong>in</strong> the state space. 40 Consider rst the rate of probability ow, <strong>in</strong><br />

the stationary Markov cha<strong>in</strong>, from a set A to itscomplement A c ,normalized by the<br />

<strong>in</strong>variant probabilities of A <strong>and</strong> A c :<br />

k(A)<br />

P<br />

x2A y2A c<br />

x pxy<br />

Now lookforthe worst such decomposition <strong>in</strong>to A <strong>and</strong> A c :<br />

(A) (Ac ) = ( A P Ac) l2 ( )<br />

(A) (Ac : (8.45)<br />

)<br />

k <strong>in</strong>f k(A) : (8.46)<br />

A: 0< (A)


space is a tree: then A can always be chosen so that itisconnected to Ac byas<strong>in</strong>gle<br />

bond. Us<strong>in</strong>g this fact, Lawler <strong>and</strong> Sokal [77] used Cheeger's <strong>in</strong>equality toprove<br />

0<br />

exp const hNi2 (8.48)<br />

for the Berretti-Sokal algorithm for SAWs.<br />

Avery di erent proof of an upper bound on 0 exp <strong>in</strong> the Berretti-Sokal algorithm was<br />

given by Sokal <strong>and</strong> Thomas [10]. Their method is to study <strong>in</strong> detail the exponential<br />

moments ofthe hitt<strong>in</strong>g times from an arbitrary walk ! to the emptywalk. Us<strong>in</strong>g a<br />

sequence of identities, they are able to write an algebraic <strong>in</strong>equality for such amoment<br />

<strong>in</strong> terms of itself this <strong>in</strong>equality says roughly that the moment iseither small or huge<br />

(where \huge" <strong>in</strong>cludes the possibility of+1) but cannot lie <strong>in</strong>-between (there is a<br />

\forbidden <strong>in</strong>terval"). Then, by acont<strong>in</strong>uity argument, they are able to rule out the<br />

possibility that the moment ishuge. So it must be small. The nal result is<br />

0<br />

exp const hNi 1+ (8.49)<br />

slightly better than the Lawler-Sokal bound. 41<br />

For sp<strong>in</strong> models <strong>and</strong> lattice eld theories, almost noth<strong>in</strong>g isknown about upper<br />

bounds on the autocorrelation time, except at hightemperature. For the s<strong>in</strong>gle-site<br />

heat-bath dynamics, it is easy to show that exp < 1 (uniformly <strong>in</strong> the volume) above<br />

the Dobrush<strong>in</strong> uniqueness temperature <strong>in</strong>deed, this is precisely what the st<strong>and</strong>ard proof<br />

of the Dobrush<strong>in</strong> uniqueness theorem [94, 95] does. One expects the same result tohold<br />

for all temperatures above critical, but this rema<strong>in</strong>s an open problem, despite recent<br />

progress by Aizenman <strong>and</strong> Holley [96]. 42<br />

Acknowledgments<br />

Many ofthe ideas reported here have grown out ofjo<strong>in</strong>twork withmycolleagues<br />

Alberto Berretti, Frank Brown, Sergio Caracciolo, Robert Edwards, Jonathan Goodman,<br />

Tony Guttmann, Greg Lawler, Xiao-Jian Li, Neal Madras, Andrea Pelissetto, Larry<br />

Thomas <strong>and</strong> Dan Zwanziger. I thank them for manypleasant <strong>and</strong> fruitful collaborations.<br />

In addition,Iwould like tothank Mal Kalos for <strong>in</strong>troduc<strong>in</strong>g meto<strong>Monte</strong> <strong>Carlo</strong> work<br />

<strong>and</strong> for patiently answer<strong>in</strong>g my rst naive questions.<br />

The research ofthe author was supported <strong>in</strong> part by the U.S.National Science Foundation<br />

grants DMS{8705599 <strong>and</strong> DMS{8911273, the U.S. Department ofEnergy contract<br />

DE-FG02-90ER40581, the John Von Neumann National Supercomputer Center y<br />

41 Note Added 1996: See also [137], which recovers the bound (8.49) by amuch simpler argument<br />

us<strong>in</strong>g the Po<strong>in</strong>care <strong>in</strong>equality [131].<br />

42 Note Added 1996: There has been great progress on this problem <strong>in</strong> recent years: see [138, 139,<br />

140, 141, 142, 143] <strong>and</strong> references cited there<strong>in</strong>.<br />

y Deceased.<br />

67


grant NAC{705, <strong>and</strong> the Pittsburgh Supercomput<strong>in</strong>g Center grant PHY890035P. Acknowledgment<br />

is also madetothe donors of thePetroleumResearchFund, adm<strong>in</strong>istered<br />

by the American Chemical Society, for partial support of this research.<br />

References<br />

[1] N. Bakhvalov, Methodes Numeriques (MIR, Moscou, 1976), Chapter 5.<br />

[2] J.M. Hammersley <strong>and</strong>D.C.H<strong>and</strong>scomb, <strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong> (Methuen, London,<br />

1964), Chapter 5.<br />

[3] W.W. Wood <strong>and</strong> J.J. Erpenbeck, Ann. Rev. Phys. Chem. 27, 319 (1976).<br />

[4] J. G. Kemeny<strong>and</strong>J.L.Snell, F<strong>in</strong>ite Markov Cha<strong>in</strong>s (Spr<strong>in</strong>ger, New York, 1976).<br />

[5] M. Iosifescu, F<strong>in</strong>ite Markov Processes <strong>and</strong> Their Applications (Wiley, Chichester,<br />

1980).<br />

[6] K. L. Chung, Markov Cha<strong>in</strong>s with Stationary Transition Probabilities, 2 nd ed.<br />

(Spr<strong>in</strong>ger, New York, 1967).<br />

[7] E. Nummel<strong>in</strong>, General Irreducible Markov Cha<strong>in</strong>s <strong>and</strong> Non-Negative Operators<br />

(Cambridge Univ. Press, Cambridge, 1984).<br />

[8] D. Revuz, Markov Cha<strong>in</strong>s (North-Holl<strong>and</strong>, Amsterdam, 1975).<br />

[9] Z. Sidak, Czechoslovak Math. J. 14, 438 (1964).<br />

[10] A. D. Sokal <strong>and</strong> L.E.Thomas, J. Stat. Phys. 54, 797 (1989).<br />

[11] P.C. Hohenberg <strong>and</strong> B.I. Halper<strong>in</strong>, Rev. Mod. Phys. 49, 435 (1977).<br />

[12] S. Caracciolo <strong>and</strong> A.D. Sokal, J. Phys. A19, L797 (1986).<br />

[13] D. Goldsman, Ph.D. thesis, School ofOperations Research <strong>and</strong>Industrial Eng<strong>in</strong>eer<strong>in</strong>g,<br />

Cornell University (1984) L. Schruben, Oper. Res. 30, 569 (1982) <strong>and</strong><br />

31, 1090 (1983).<br />

[14] M.B. Priestley, Spectral Analysis <strong>and</strong> Time Series, 2vols. (Academic, London,<br />

1981), Chapters 5-7.<br />

[15] T.W. Anderson, The <strong>Statistical</strong> Analysis of Time Series (Wiley,NewYork, 1971).<br />

[16] N. Madras <strong>and</strong> A.D. Sokal, J. Stat. Phys. 50, 109 (1988).<br />

68


[17] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller <strong>and</strong> E.Teller, J.<br />

Chem. Phys. 21, 1087 (1953).<br />

[18] W.K. Hast<strong>in</strong>gs, Biometrika 57, 97 (1970).<br />

[19] J. Goodman <strong>and</strong> N. Madras, L<strong>in</strong>. Alg. Appl. 216, 61 (1995).<br />

[20] J. Goodman <strong>and</strong> A.D. Sokal, Phys. Rev. D40, 2035 (1989).<br />

[21] G.F. Mazenko <strong>and</strong>O.T.Valls, Phys. Rev. B24, 1419 (1981) C. Kalle, J. Phys.<br />

A17, L801 (1984) J.K. Williams, J. Phys. A18, 49 (1985) R.B. Pearson, J.L.<br />

Richardson <strong>and</strong> D.Toussa<strong>in</strong>t, Phys. Rev. B31, 4472 (1985) S. Wansleben <strong>and</strong><br />

D.P. L<strong>and</strong>au, J. Appl. Phys. 61, 3968 (1987).<br />

[22] M.N. Barber, <strong>in</strong> Phase Transitions <strong>and</strong> Critical Phenomena, vol. 8, ed. C. Domb<br />

<strong>and</strong> J.L. Lebowitz (Academic Press, London, 1983).<br />

[23] K. B<strong>in</strong>der, J. Comput. Phys. 59, 1 (1985).<br />

[24] G. Parisi, <strong>in</strong> Progress <strong>in</strong> Gauge Field Theory (1983 Cargese lectures), ed. G. 't<br />

Hooft et al. (Plenum, New York, 1984) G.G. Batrouni, G.R. Katz, A.S. Kronfeld,<br />

G.P. Lepage, B. Svetitsky <strong>and</strong> K.G. Wilson, Phys. Rev. D32, 2736 (1985) C.<br />

Davies, G. Batrouni,G.Katz, A. Kronfeld, P. Lepage, P. Rossi, B. Svetitsky <strong>and</strong><br />

K. Wilson, J. Stat. Phys. 43, 1073 (1986) J.B. Kogut, Nucl. Phys. B275 [FS17],<br />

1 (1986) S. Duane, R. Kenway,B.J.Pendleton <strong>and</strong> D.Roweth, Phys. Lett. 176B,<br />

143 (1986) E. Dagotto <strong>and</strong> J.B. Kogut, Phys. Rev. Lett. 58, 299 (1987) <strong>and</strong> Nucl.<br />

Phys. B290 [FS20], 451 (1987).<br />

[25] J. Goodman <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 56, 1015 (1986).<br />

[26] R.G. Edwards, J. Goodman <strong>and</strong> A.D. Sokal, Nucl. Phys. B354, 289 (1991).<br />

[27] R.H. Swendsen <strong>and</strong> J.-S. Wang, Phys. Rev. Lett. 58, 86 (1987).<br />

[28] R.P. Fedorenko, Zh. Vychisl. i Mat. Fiz. 4, 559 (1964) [USSR Comput. Math. <strong>and</strong><br />

Math. Phys. 4, 227 (1964)].<br />

[29] L.P. Kadano , Physics 2, 263 (1966).<br />

[30] K.G. Wilson, Phys. Rev. B4, 3174, 3184 (1971).<br />

[31] A. Br<strong>and</strong>t, <strong>in</strong> Proceed<strong>in</strong>gs of the Third International Conference on Numerical<br />

<strong>Methods</strong> <strong>in</strong> Fluid <strong>Mechanics</strong> (Paris, July 1972), ed. H. Cabannes <strong>and</strong> R.Temam<br />

(Lecture Notes <strong>in</strong> Physics #18) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1973) R.A. Nicolaides, J. Comput.<br />

Phys. 19, 418 (1975) <strong>and</strong> Math. Comp. 31, 892 (1977) A. Br<strong>and</strong>t, Math.<br />

Comp. 31, 333 (1977) W. Hackbusch, <strong>in</strong> Numerical Treatment of Di erential<br />

Equations (Oberwolfach, July 1976), ed. R. Bulirsch, R.D. Griegorie <strong>and</strong> J.<br />

Schroder (Lecture Notes <strong>in</strong> Mathematics #631) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1978).<br />

69


[32] W. Hackbusch, Multi-Grid <strong>Methods</strong> <strong>and</strong> Applications (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1985).<br />

[33] W. Hackbusch <strong>and</strong>U.Trottenberg, editors, Multigrid <strong>Methods</strong> (Lecture Notes<br />

<strong>in</strong> Mathematics #960) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1982) Proceed<strong>in</strong>gs of the First Copper<br />

Mounta<strong>in</strong> Conference on Multigrid <strong>Methods</strong>, ed. S. McCormick <strong>and</strong>U.Trottenberg,<br />

Appl. Math. Comput. 13, no. 3-4, 213-470 (1983) D.J. Paddon <strong>and</strong> H.Holste<strong>in</strong>,<br />

editors, Multigrid <strong>Methods</strong> for Integral <strong>and</strong> Di erential Equations (Clarendon<br />

Press, Oxford, 1985) Proceed<strong>in</strong>gs of theSecondCopper Mounta<strong>in</strong> Conference<br />

on Multigrid <strong>Methods</strong>, ed. S. McCormick, Appl. Math. Comput. 19, no. 1-4, 1-372<br />

(1986) W. Hackbusch <strong>and</strong>U.Trottenberg, editors, Multigrid <strong>Methods</strong> II (Lecture<br />

Notes <strong>in</strong> Mathematics #1228) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1986) S.F. McCormick, editor,<br />

Multigrid <strong>Methods</strong> (SIAM, Philadelphia, 1987) Proceed<strong>in</strong>gs of the Third Copper<br />

Mounta<strong>in</strong> Conference on Multigrid <strong>Methods</strong>, ed. S. McCormick (Dekker, New<br />

York, 1988).<br />

[34] W.L. Briggs, A Multigrid Tutorial (SIAM, Philadelphia, 1987).<br />

[35] R.S. Varga, Matrix Iterative Analysis (Prentice-Hall, Englewood Cli s, N.J.,<br />

1962).<br />

[36] H.R. Schwarz, H. Rutishauer <strong>and</strong> E.Stiefel, Numerical Analysis of Symmetric<br />

Matrices (Prentice-Hall, Englewood Cli s, N.J., 1973).<br />

[37] A.M. Ostrowski, RendicontiMat. e Appl. 13, 140 (1954) at p. 141n [A. Ostrowski,<br />

Collected Mathematical Papers, vol. 1 (Birkhauser, Basel, 1983), p. 169n].<br />

[38] J.M. Ortega <strong>and</strong> W.C.Rhe<strong>in</strong>boldt, Iterative Solution of Nonl<strong>in</strong>ear Equations <strong>in</strong><br />

Several Variables (Academic Press, New York{London, 1970).<br />

[39] S.F. McCormick <strong>and</strong>J.Ruge, Math. Comp. 41, 43 (1983).<br />

[40] J. M<strong>and</strong>el, S. McCormick<strong>and</strong> R. Bank, <strong>in</strong> Multigrid <strong>Methods</strong>, ed. S.F. McCormick<br />

(SIAM, Philadelphia, 1987), Chapter 5.<br />

[41] J. Goodman <strong>and</strong> A.D. Sokal, unpublished.<br />

[42] R.G. Edwards, S.J. Ferreira, J. Goodman <strong>and</strong> A.D. Sokal, Nucl. Phys. B380, 621<br />

(1992).<br />

[43] S.L. Adler, Phys. Rev. Lett. 60, 1243 (1988).<br />

[44] S.L. Adler, Phys. Rev. D23, 2091 (1981).<br />

[45] C. Whitmer, Phys. Rev. D29, 306 (1984).<br />

[46] E. Nelson, <strong>in</strong> Constructive Quantum Field Theory, ed. G. Velo <strong>and</strong> A. Wightman<br />

(Lecture Notes <strong>in</strong> Physics #25) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1973).<br />

70


[47] B. Simon, The P (')2 Euclidean (Quantum) Field Theory (Pr<strong>in</strong>ceton Univ. Press,<br />

Pr<strong>in</strong>ceton, N.J., 1974), Chapter I.<br />

[48] R.B. Potts, Proc. Camb. Phil. Soc. 48, 106 (1952).<br />

[49] F.Y. Wu, Rev. Mod. Phys. 54, 235 (1982) 55, 315 (E) (1983).<br />

[50] K. Krickeberg, Probability Theory (Addison-Wesley, Read<strong>in</strong>g, Mass., 1965), Sections<br />

4.2 <strong>and</strong> 4.5.<br />

[51] R.H. Schonmann, J. Stat. Phys. 52, 61 (1988).<br />

[52] A.D. Sokal, unpublished.<br />

[53] E.M. Re<strong>in</strong>gold, J. Nievergelt <strong>and</strong> N. Deo, Comb<strong>in</strong>atorial Algorithms: Theory <strong>and</strong><br />

Practice (Prentice-Hall, Englewood Cli s, N.J., 1977), Chapter 8 A. Gibbons,<br />

Algorithmic Graph Theory (Cambridge University Press, 1985), Chapter 1 S.<br />

Even, Graph Algorithms (Computer Science Press, Potomac, Maryl<strong>and</strong>, 1979),<br />

Chapter 3.<br />

[54] B.A. Galler <strong>and</strong> M.J. Fischer, Commun. ACM 7, 301, 506 (1964) D.E. Knuth,<br />

The Art of Computer Programm<strong>in</strong>g, vol. 1, 2 nd ed., (Addison-Wesley, Read<strong>in</strong>g,<br />

Massachusetts, 1973), pp. 353{355, 360, 572 J. Hoshen <strong>and</strong> R.Kopelman, Phys.<br />

Rev. B14, 3438 (1976).<br />

[55] K. B<strong>in</strong>der <strong>and</strong>D.Stau er, <strong>in</strong> Applications of the <strong>Monte</strong> <strong>Carlo</strong> Method <strong>in</strong> <strong>Statistical</strong><br />

Physics, ed. K. B<strong>in</strong>der (Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1984), section 8.4 D. Stau er,<br />

Introduction to Percolation Theory (Taylor & Francis, London, 1985).<br />

[56] Y. Shiloach <strong>and</strong> U. Vishk<strong>in</strong>, J. Algorithms 3, 57 (1982).<br />

[57] R.G. Edwards, X.-J. Li <strong>and</strong> A.D. Sokal, Sequential <strong>and</strong> vectorized algorithms for<br />

comput<strong>in</strong>g the connected components ofanundirected graph, unpublished (<strong>and</strong><br />

uncompleted) notes.<br />

[58] P.W. Kasteleyn <strong>and</strong> C.M. Fortu<strong>in</strong>, J. Phys. Soc. Japan 26 (Suppl.), 11 (1969)<br />

C.M. Fortu<strong>in</strong> <strong>and</strong> P.W. Kasteleyn, Physica 57, 536 (1972) C.M. Fortu<strong>in</strong>, Physica<br />

58, 393 (1972) C.M. Fortu<strong>in</strong>, Physica 59, 545 (1972).<br />

[59] C.-K. Hu, Phys. Rev. B29, 5103 (1984) T.A. Larsson, J. Phys. A19, 2383 (1986)<br />

<strong>and</strong> A20, 2239 (1987).<br />

[60] R.G. Edwards <strong>and</strong> A.D. Sokal, Phys. Rev. D38, 2009 (1988).<br />

[61] A.D. Sokal, <strong>in</strong> Computer Simulation Studies <strong>in</strong> Condensed Matter Physics: Recent<br />

Developments (Athens, Georgia, 15{26 February 1988), ed. D.P. L<strong>and</strong>au, K.K.<br />

Mon <strong>and</strong> H.-B. Schuttler (Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>{Heidelberg, 1988).<br />

71


[62] X.-J. Li <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 63, 827 (1989).<br />

[63] W. Kle<strong>in</strong>, T. Ray <strong>and</strong> P.Tamayo, Phys. Rev. Lett. 62, 163 (1989).<br />

[64] T.S. Ray, P.Tamayo <strong>and</strong>W.Kle<strong>in</strong>,Phys. Rev. A39, 5949 (1989).<br />

[65] U. Wol , Phys. Rev. Lett. 62, 361 (1989).<br />

[66] U. Wol , Phys. Lett. B228, 379 (1989) P. Tamayo, R.C. Brower <strong>and</strong> W. Kle<strong>in</strong>,<br />

J. Stat. Phys. 58, 1083 (1990) <strong>and</strong> 60, 889 (1990).<br />

[67] D. K<strong>and</strong>el, E. Domany, D. Ron, A. Br<strong>and</strong>t <strong>and</strong>E.Loh,Phys. Rev. Lett. 60, 1591<br />

(1988) D. K<strong>and</strong>el, E. Domany <strong>and</strong>A.Br<strong>and</strong>t, Phys. Rev. B40, 330 (1989).<br />

[68] R.G. Edwards <strong>and</strong> A.D. Sokal, Swendsen-Wang algorithm augmented by duality<br />

transformations for the two-dimensional Potts model, <strong>in</strong> preparation.<br />

[69] R. Ben-Av, D. K<strong>and</strong>el, E. Katznelson, P.G. Lauwers <strong>and</strong> S.Solomon, J. Stat.<br />

Phys. 58, 125 (1990) R.C. Brower <strong>and</strong> S.Huang, Phys. Rev. B41, 708 (1990).<br />

[70] F. Niedermayer, Phys. Rev. Lett. 61, 2026 (1988).<br />

[71] R.C. Brower <strong>and</strong> P.Tamayo, Phys. Rev. Lett. 62, 1087 (1989).<br />

[72] U. Wol , Nucl. Phys. B322, 759 (1989) Phys. Lett. B222, 473 (1989) Nucl.<br />

Phys. B334, 581 (1990).<br />

[73] R.G. Edwards <strong>and</strong> A.D. Sokal, Phys. Rev. D40, 1374 (1989).<br />

[74] P.G. DeGennes, Scal<strong>in</strong>g Concepts <strong>in</strong> Polymer Physics (Cornell Univ. Press, Ithaca<br />

NY, 1979).<br />

[75] C. Arag~ao de Carvalho, S. Caracciolo <strong>and</strong> J.Frohlich, Nucl. Phys. B215 [FS7],<br />

209 (1983).<br />

[76] A. Berretti <strong>and</strong> A.D. Sokal, J. Stat. Phys. 40, 483 (1985).<br />

[77] G. Lawler <strong>and</strong> A.D. Sokal, Trans. Amer. Math. Soc. 309, 557 (1988).<br />

[78] G. Lawler <strong>and</strong> A.D. Sokal, <strong>in</strong> preparation.<br />

[79] B. Berg <strong>and</strong> D.Foerster, Phys. Lett. 106B, 323 (1981).<br />

[80] C. Arag~ao de Carvalho <strong>and</strong> S. Caracciolo, J. Physique 44, 323 (1983).<br />

[81] A. D. Sokal <strong>and</strong> L.E.Thomas, J. Stat. Phys. 51, 907 (1988).<br />

[82] S. Caracciolo, A. Pelissetto <strong>and</strong> A.D. Sokal, J. Stat. Phys. 60, 1 (1990).<br />

72


[83] M. Delbruck, <strong>in</strong> Mathematical Problems <strong>in</strong> the Biological Sciences (Proc. Symp.<br />

Appl. Math., vol. 14), ed. R.E. Bellman (American Math. Soc., Providence RI,<br />

1962).<br />

[84] P.H. Verdier <strong>and</strong> W.H.Stockmayer, J. Chem. Phys. 36, 227 (1962).<br />

[85] N. Madras <strong>and</strong> A.D. Sokal, J. Stat. Phys. 47, 573 (1987).<br />

[86] M. Lal, Molec. Phys. 17, 57 (1969).<br />

[87] B. MacDonald et al., J.Phys. A18, 2627 (1985).<br />

[88] K. Kawasaki, Phys. Rev. 148, 375 (1966).<br />

[89] B.I. Halper<strong>in</strong>, Phys. Rev. B8, 4437 (1973).<br />

[90] R. Alicki, M. Fannes <strong>and</strong> A.Verbeure, J. Stat. Phys. 41, 263 (1985).<br />

[91] M.P.M. den Nijs, J. Phys. A12, 1857 (1979) J.L. Black <strong>and</strong>V.J.Emery, Phys.<br />

Rev. B23, 429 (1981) B. Nienhuis, J. Stat. Phys. 34, 731 (1984).<br />

[92] A. S<strong>in</strong>clair <strong>and</strong> M. Jerrum, Information <strong>and</strong> Computation 82, 93 (1989).<br />

[93] R. Brooks, P. Doyle <strong>and</strong> R. Kohn, <strong>in</strong> preparation.<br />

[94] L.N. Vasershte<strong>in</strong>, Prob. Inform. Transm. 5, 64 (1969).<br />

[95] O.E. Lanford III, <strong>in</strong> <strong>Statistical</strong> <strong>Mechanics</strong> <strong>and</strong> Mathematical Problems (Lecture<br />

Notes <strong>in</strong> Physics #20), ed. A. Lenard (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1973).<br />

[96] M. Aizenman <strong>and</strong> R.Holley, <strong>in</strong>Percolation Theory <strong>and</strong> Ergodic Theory of In nite<br />

Particle Systems, ed.H.Kesten (Spr<strong>in</strong>ger, New York, 1987).<br />

ADDED BIBLIOGRAPHY (DECEMBER 1996):<br />

[97] M. Luscher, P. Weisz <strong>and</strong> U.Wol , Nucl. Phys. B359, 221 (1991).<br />

[98] J.-K. Kim, Phys. Rev. Lett. 70, 1735 (1993) Nucl. Phys. B (Proc. Suppl.) 34,<br />

702 (1994) Phys. Rev. D50, 4663 (1994) Europhys. Lett. 28, 211 (1994) Phys.<br />

Lett. B345, 469 (1995).<br />

[99] S. Caracciolo, R.G. Edwards, S.J. Ferreira, A. Pelissetto <strong>and</strong> A.D. Sokal, Phys.<br />

Rev. Lett. 74, 2969 (1995).<br />

[100] S.J. Ferreira <strong>and</strong> A.D. Sokal, Phys. Rev. B51, 6720 (1995).<br />

[101] S. Caracciolo, R.G. Edwards, A. Pelissetto <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 75,<br />

1891 (1995).<br />

73


[102] G. Mana, A. Pelissetto <strong>and</strong> A.D. Sokal, Phys. Rev. D54, R1252 (1996).<br />

[103] G. Mana, A. Pelissetto <strong>and</strong> A.D. Sokal, hep-lat/9610021, submitted to Phys.<br />

Rev. D.<br />

[104] C.-R. Hwang <strong>and</strong> S.-J. Sheu, <strong>in</strong> Stochastic Models, <strong>Statistical</strong> <strong>Methods</strong>, <strong>and</strong> Algorithms<br />

<strong>in</strong> Image Analysis (Lecture Notes <strong>in</strong> Statistics #74), ed. P. Barone,<br />

A. Frigessi <strong>and</strong> M. Piccioni (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1992).<br />

[105] B. Gidas, private communication (1991).<br />

[106] T. Mendes <strong>and</strong> A.D. Sokal, Phys. Rev. D53, 3438 (1996).<br />

[107] G. Mana, T. Mendes, A. Pelissetto <strong>and</strong> A.D. Sokal, Nucl. Phys. B (Proc. Suppl.)<br />

47, 796 (1996).<br />

[108] T. Mendes <strong>and</strong> A.Pelissetto <strong>and</strong> A.D. Sokal, Nucl. Phys. B477, 203 (1996).<br />

[109] A.D. Sokal, Nucl. Phys. B (Proc. Suppl.) 20, 55 (1991).<br />

[110] J.-S. Wang <strong>and</strong> R.H. Swendsen, Physica A167, 565 (1990).<br />

[111] C.F. Baillie <strong>and</strong> P.D. Codd<strong>in</strong>gton, Phys. Rev. Lett. 68, 962 (1992) <strong>and</strong> private<br />

communication.<br />

[112] J. Salas <strong>and</strong> A.D. Sokal, J. Stat. Phys. 85, 297 (1996).<br />

[113] J. Salas <strong>and</strong> A.D. Sokal, Dynamic critical behavior of the Swendsen{Wang algorithm:<br />

The two-dimensional 3-state Potts model revisited, hep-lat/9605018, J.<br />

Stat. Phys. 87, no. 1/2 (April 1997).<br />

[114] J. Salas <strong>and</strong> A.D. Sokal, Logarithmic corrections <strong>and</strong> nite-size scal<strong>in</strong>g <strong>in</strong>the<br />

two-dimensional 4-State Potts model, hep-lat/9607030, submitted to J.Stat.<br />

Phys.<br />

[115] G.M. Torrie <strong>and</strong> J.P.Valleau, Chem. Phys. Lett. 28, 578 (1974) J. Comput. Phys.<br />

23, 187 (1977) J. Chem. Phys. 66, 1402 (1977).<br />

[116] B.A. Berg <strong>and</strong> T. Neuhaus, Phys. Lett. B267, 249 (1991) Phys. Rev. Lett. 68, 9<br />

(1992).<br />

[117] B.A. Berg, Int. J. Mod. Phys. C4, 249 (1993).<br />

[118] A.D. Kennedy, Nucl. Phys. B (Proc. Suppl.) 30, 96 (1993).<br />

[119] W. Janke, <strong>in</strong> Computer Simulations <strong>in</strong> Condensed Matter Physics VII , eds. D.P.<br />

L<strong>and</strong>au, K.K. Mon <strong>and</strong> H.-B. Schuttler (Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1994).<br />

[120] S. Wiseman <strong>and</strong> E.Domany, Phys. Rev. E 48, 4080 (1993).<br />

74


[121] D. K<strong>and</strong>el, R. Ben-Av <strong>and</strong> E.Domany, Phys. Rev. Lett. 65, 941 (1990) <strong>and</strong> Phys.<br />

Rev. B45, 4700 (1992).<br />

[122] J.-S. Wang, R.H. Swendsen <strong>and</strong> R.Kotecky, Phys. Rev. Lett. 63, 109 (1989).<br />

[123] J.-S. Wang, R.H. Swendsen <strong>and</strong> R.Kotecky, Phys. Rev. B42, 2465 (1990).<br />

[124] M. Lub<strong>in</strong> <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 71, 1778 (1993).<br />

[125] X.-J. Li <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 67, 1482 (1991).<br />

[126] S. Caracciolo, R.G. Edwards, A. Pelissetto <strong>and</strong> A.D. Sokal, Nucl. Phys. B403,<br />

475 (1993).<br />

[127] A.D. Sokal, <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> <strong>and</strong> Molecular Dynamics Simulations <strong>in</strong> Polymer<br />

Science, ed. K. B<strong>in</strong>der (Oxford University Press, New York, 1995).<br />

[128] A.D. Sokal, Nucl. Phys. B (Proc. Suppl.) 47, 172 (1996).<br />

[129] B. Li, N. Madras <strong>and</strong> A.D. Sokal, J. Stat. Phys. 80, 661 (1995).<br />

[130] A.D. Sokal, Europhys. Lett. 27, 661 (1994).<br />

[131] P. Diaconis <strong>and</strong> D. Stroock, Ann. Appl. Prob. 1, 36 (1991).<br />

[132] A.J. S<strong>in</strong>clair, R<strong>and</strong>omisedAlgorithms for Count<strong>in</strong>g <strong>and</strong> Generat<strong>in</strong>g Comb<strong>in</strong>atorial<br />

Structures (Birkhauser, Boston, 1993).<br />

[133] P. Diaconis <strong>and</strong> L. Salo -Coste, Ann. Appl. Prob. 3, 696 (1993).<br />

[134] P. Diaconis <strong>and</strong> L. Salo -Coste, Ann. Appl. Prob. 6, 695 (1996).<br />

[135] P. Diaconis <strong>and</strong> L. Salo -Coste, J. Theoret. Prob. 9, 459 (1996).<br />

[136] M. Jerrum <strong>and</strong> A. S<strong>in</strong>clair, <strong>in</strong> Approximation Algorithms for NP-Hard Problems,<br />

ed. D.S. Hochbaum (PWS Publish<strong>in</strong>g, Boston, 1996).<br />

[137] D. R<strong>and</strong>all <strong>and</strong> A.J. S<strong>in</strong>clair, <strong>in</strong> Proceed<strong>in</strong>gs of the 5th Annual ACM-SIAM Symposium<br />

on Discrete Algorithms (ACM Press, New York, 1994).<br />

[138] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, J. Funct. Anal. 104, 299 (1992).<br />

[139] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, Commun. Math. Phys. 144, 303 (1992).<br />

[140] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, Commun. Math. Phys. 149, 175 (1992).<br />

[141] S.L. Lu <strong>and</strong> H.T.Yau, Commun. Math. Phys. 156, 399 (1993).<br />

[142] F. Mart<strong>in</strong>elli <strong>and</strong> E. Olivieri, Commun. Math. Phys. 161, 447, 487 (1994).<br />

[143] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, J. Stat. Phys. 81, 1007 (1995).<br />

75

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!