Monte Carlo Methods in Statistical Mechanics: Foundations and ...
Monte Carlo Methods in Statistical Mechanics: Foundations and ...
Monte Carlo Methods in Statistical Mechanics: Foundations and ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong><br />
<strong>in</strong> <strong>Statistical</strong> <strong>Mechanics</strong>:<br />
<strong>Foundations</strong> <strong>and</strong> New Algorithms<br />
Alan D. Sokal<br />
Department of Physics<br />
New York University<br />
4 Wash<strong>in</strong>gton Place<br />
New York, NY 10003<br />
USA<br />
E-mail: SOKAL@NYU.EDU<br />
Lectures at the Cargese Summer School on<br />
\Functional Integration: Basics <strong>and</strong> Applications"<br />
September 1996<br />
These notes are an updated version of lectures given atthe Cours deTroisieme<br />
Cycle de laPhysique en Suisse Rom<strong>and</strong>e (Lausanne, Switzerl<strong>and</strong>) <strong>in</strong> June<br />
1989. We thank the TroisiemeCycledelaPhysique en Suisse Rom<strong>and</strong>e <strong>and</strong><br />
Professor Michel Droz for k<strong>in</strong>dly giv<strong>in</strong>g permission to repr<strong>in</strong>tthese notes.
Note tothe Reader<br />
The follow<strong>in</strong>g notes are based on my course \<strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong> <strong>in</strong> <strong>Statistical</strong><br />
<strong>Mechanics</strong>: <strong>Foundations</strong> <strong>and</strong> New Algorithms" given at the Cours de Troisieme Cycle<br />
de laPhysique en Suisse Rom<strong>and</strong>e (Lausanne, Switzerl<strong>and</strong>) <strong>in</strong> June 1989, <strong>and</strong> onmy<br />
course \Multi-Grid <strong>Monte</strong> <strong>Carlo</strong> for Lattice Field Theories" given at the W<strong>in</strong>ter College<br />
on Multilevel Techniques <strong>in</strong> Computational Physics (Trieste, Italy) <strong>in</strong> January{February<br />
1991.<br />
The reader is warned that some ofthis material is out-of-date (this is particularly<br />
true as regards reports ofnumerical work). For lack oftime, I have made noattempt to<br />
update the text, but Ihave added footnotes marked \Note Added 1996" that correct<br />
a few errors <strong>and</strong> give additional bibliography.<br />
My rst two lectures at Cargese 1996 were based on the material <strong>in</strong>cluded here. My<br />
third lecture described the new nite-size-scal<strong>in</strong>g extrapolation method of [97, 98, 99,<br />
100, 101, 102, 103].<br />
1 Introduction<br />
The goal of these lectures is to givean<strong>in</strong>troduction to current research on <strong>Monte</strong><br />
<strong>Carlo</strong> methods <strong>in</strong> statistical mechanics <strong>and</strong> quantum eld theory, withanemphasis on:<br />
1) the conceptual foundations of the method, <strong>in</strong>clud<strong>in</strong>g the possible dangers <strong>and</strong> misuses,<br />
<strong>and</strong> the correct use of statistical error analysis <strong>and</strong><br />
2) new <strong>Monte</strong> <strong>Carlo</strong> algorithms for problems <strong>in</strong> critical phenomena <strong>and</strong>quantum<br />
eld theory, aimed at reduc<strong>in</strong>g or elim<strong>in</strong>at<strong>in</strong>gthe \critical slow<strong>in</strong>g-down" found <strong>in</strong><br />
conventional algorithms.<br />
These lectures are aimed at a mixed audience of theoretical, computational <strong>and</strong> mathematical<br />
physicists |someofwhom are currently do<strong>in</strong>g orwant todo<strong>Monte</strong> <strong>Carlo</strong><br />
studies themselves, others of whom wanttobeable toevaluatethereliabilityofpublished<br />
<strong>Monte</strong> <strong>Carlo</strong> work.<br />
Before embark<strong>in</strong>g on9hours of lectures on <strong>Monte</strong> <strong>Carlo</strong> methods, let me o er a<br />
warn<strong>in</strong>g:<br />
<strong>Monte</strong> <strong>Carlo</strong> is an extremely bad method it should be used only when all<br />
alternative methods are worse.<br />
Why isthis so? Firstly, all numerical methods are potentially dangerous, compared<br />
to analytic methods there are more ways to make mistakes. Secondly, asnumerical<br />
methods go, <strong>Monte</strong> <strong>Carlo</strong> is one ofthe least e cient it should be used only on those<br />
<strong>in</strong>tractable problems for which all other numerical methods are even less e cient.<br />
1
Let me be more precise about this latter po<strong>in</strong>t. Virtually all <strong>Monte</strong> <strong>Carlo</strong> methods<br />
have the property that the statistical error behaves as<br />
error<br />
1<br />
p computational budget<br />
(or worse) this is an essentially universal consequence of the central limit theorem. It<br />
may be possible to improve the proportionality constant <strong>in</strong> this relation by afactor of<br />
106 or more | this is one ofthepr<strong>in</strong>cipal subjects ofthese lectures | but the overall<br />
1= p n behavior is <strong>in</strong>escapable. This should be contrasted with traditional determ<strong>in</strong>istic<br />
numerical methods whose rate ofconvergence is typically someth<strong>in</strong>g like1=n4 or e ;n or<br />
e ;2n.Therefore,<br />
<strong>Monte</strong> <strong>Carlo</strong> methods should be used only on those extremely di cult<br />
problems <strong>in</strong> which all alternative numerical methods behave even worse than 1= p n.<br />
Consider, for example, the problem of numerical <strong>in</strong>tegration <strong>in</strong> d dimensions, <strong>and</strong><br />
let us compare <strong>Monte</strong> <strong>Carlo</strong> <strong>in</strong>tegration with a traditional determ<strong>in</strong>istic method such<br />
as Simpson's rule. As is well known, the error <strong>in</strong> Simpson's rule with n nodal po<strong>in</strong>ts<br />
behaves asymptotically as n ;4=d (for smooth <strong>in</strong>tegr<strong>and</strong>s). In low dimension (d 8) it is much<br />
worse. So it is not surpris<strong>in</strong>g that <strong>Monte</strong><strong>Carlo</strong> is the method of choice for perform<strong>in</strong>g<br />
high-dimensional <strong>in</strong>tegrals. It is still a bad method: with an error proportional to n ;1=2 ,<br />
it is di cult toachieve more than 4 or 5 digits accuracy. Butnumerical <strong>in</strong>tegration <strong>in</strong><br />
high dimension is very di cult though <strong>Monte</strong> <strong>Carlo</strong> is bad, all other known methods<br />
are worse. 1<br />
In summary,<strong>Monte</strong><strong>Carlo</strong> methods should be used only when neither analytic methods<br />
nor determ<strong>in</strong>istic numerical methods are workable (or e cient). Onegeneral doma<strong>in</strong><br />
of application of <strong>Monte</strong> <strong>Carlo</strong> methods will be, therefore, to systems with many degrees<br />
of freedom, far from the perturbative regime. Butsuch systems are precisely the ones<br />
of greatest <strong>in</strong>terest <strong>in</strong> statistical mechanics <strong>and</strong> quantumeld theory!<br />
It is appropriateto close this <strong>in</strong>troduction with a general methodological observation,<br />
ably articulated by Wood <strong>and</strong> Erpenbeck [3]:<br />
:::these [<strong>Monte</strong> <strong>Carlo</strong>] <strong>in</strong>vestigations share some ofthe features of ord<strong>in</strong>ary<br />
experimental work, <strong>in</strong> that they are susceptible to bothstatistical <strong>and</strong> systematic<br />
errors. With regardtothese matters, we believe that papers should<br />
meet muchthe samest<strong>and</strong>ards as are normally required for experimental <strong>in</strong>vestigations.<br />
We have <strong>in</strong> m<strong>in</strong>d the <strong>in</strong>clusion of estimates of statistical error,<br />
descriptions of experimental conditions (i.e. parameters of the calculation),<br />
1 This discussion of numerical <strong>in</strong>tegration is grossly oversimpli ed. Firstly, there are determ<strong>in</strong>istic<br />
methods better than Simpson's rule <strong>and</strong> there are also sophisticated <strong>Monte</strong> <strong>Carlo</strong> methods whose<br />
asymptotic behavior (on smooth <strong>in</strong>tegr<strong>and</strong>s) behaves as n ;p with p strictly greater than 1=2 [1, 2].<br />
Secondly, forallthese algorithms (except st<strong>and</strong>ard <strong>Monte</strong> <strong>Carlo</strong>), the asymptotic behavior as n !1<br />
may be irrelevant <strong>in</strong> practice, because it is achieved only at ridiculously large values of n. For example,<br />
to carry out Simpson's rule with even 10 nodes per axis (a very coarse mesh) requires n =10 d ,which<br />
is unachievable for d > 10.<br />
2
elevant details of apparatus (program) design, comparisons with previous<br />
<strong>in</strong>vestigations, discussion of systematic errors, etc. Only if these are provided<br />
will the results be trustworthy guides to improved theoretical underst<strong>and</strong><strong>in</strong>g.<br />
2 Dynamic <strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong>:<br />
General Theory<br />
All <strong>Monte</strong> <strong>Carlo</strong> workhas the samegeneral structure: given some probabilitymeasure<br />
on some con guration space S, wewishto generate many r<strong>and</strong>om samples from .<br />
How isthis to be done?<br />
<strong>Monte</strong> <strong>Carlo</strong> methods can be classi ed as static or dynamic. Static methods are<br />
those that generate a sequence of statistically <strong>in</strong>dependent samples from the desired<br />
probability distribution .These techniques are widely used <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> numerical<br />
<strong>in</strong>tegration <strong>in</strong> spaces of not-too-high dimension [2]. But they are unfeasible for most<br />
applications <strong>in</strong> statistical physics <strong>and</strong>quantum eld theory,where is the Gibbs measure<br />
of some rather complicated system (extremely many coupled degrees of freedom).<br />
The idea of dynamic <strong>Monte</strong> <strong>Carlo</strong> methods is to <strong>in</strong>vent astochastic process with<br />
state space S hav<strong>in</strong>g as its unique equilibrium distribution. We then simulate this<br />
stochasticprocessonthecomputer, start<strong>in</strong>g from an arbitrary <strong>in</strong>itial con guration once<br />
the system has reached equilibrium, wemeasure time averages, which converge (as the<br />
runtimetends to <strong>in</strong> nity) to -averages. In physical terms, weare<strong>in</strong>vent<strong>in</strong>gastochastic<br />
time evolution for the given system. Let us emphasize, however, that this timeevolution<br />
need not correspond to any real \physical" dynamics: rather, the dynamics is simply a<br />
numerical algorithm, <strong>and</strong> itistobechosen, like allnumerical algorithms, on the basis<br />
of its computational e ciency.<br />
In practice, the stochastic process is always taken to beaMarkov process. So our<br />
rst order of bus<strong>in</strong>ess is to review some ofthe general theory of Markov cha<strong>in</strong>s. 2 For<br />
simplicity let us assumethatthestate space S is discrete (i.e. niteorcountably <strong>in</strong> nite).<br />
Much ofthe theory for general state space can be guessed from the discretetheory by<br />
mak<strong>in</strong>g the obvious replacements (sums by <strong>in</strong>tegrals, matrices by kernels), although the<br />
proofs tend tobeconsiderably harder.<br />
Loosely speak<strong>in</strong>g, a Markovcha<strong>in</strong> (= discrete-timeMarkov process) withstate space<br />
S is a sequence of S-valued r<strong>and</strong>om variables X0X1X2::: such that successive transitions<br />
Xt ! Xt+1 are statistically <strong>in</strong>dependent (\the future depends on the past only<br />
through the present"). More precisely, aMarkov cha<strong>in</strong> is speci ed by two <strong>in</strong>gredients:<br />
2 The books of Kemeny<strong>and</strong>Snell [4] <strong>and</strong> Iosifescu [5] are excellent references on the theory of Markov<br />
cha<strong>in</strong>s with nite state space. At a somewhat higher mathematical level, the books of Chung [6]<strong>and</strong><br />
Nummel<strong>in</strong> [7] deal with the cases of countable <strong>and</strong> general state space, respectively.<br />
3
The <strong>in</strong>itial distribution . Here is a probability distribution on S, <strong>and</strong>the<br />
process will be de ned so that IP(X0 = x) = x.<br />
The transition probability matrix P = fpxygxy2S = fp(x ! y)gxy2S. HereP is a<br />
matrix satisfy<strong>in</strong>g pxy 0 for all x y <strong>and</strong> P y pxy = 1 for all x. The process will be<br />
de ned so that IP(Xt+1 = y j Xt = x) =pxy.<br />
The Markov cha<strong>in</strong> is then completely speci ed by the jo<strong>in</strong>t probabilities<br />
IP(X0 = x0 X1 = x1 X2 = x2 ::: Xn = xn) x0 px0x1 px1x2 pxn;1xn : (2.1)<br />
This product structure expresses the factthat \successive transitions are <strong>in</strong>dependent".<br />
Next we de ne the n-step transition probabilities<br />
p (n)<br />
xy = IP(Xt+n = y j Xt = x) : (2.2)<br />
Clearly p (0)<br />
xy = xy, p (1)<br />
xy = pxy, <strong>and</strong> <strong>in</strong> general fp (n)<br />
xy g are the matrix elements ofP n .<br />
AMarkov cha<strong>in</strong> is said to beirreducible if from each state it is possible to get to<br />
each other state: that is, for each pair x y 2 S, there exists ann 0 for which p (n)<br />
xy > 0.<br />
We shall be considered almost exclusively with irreducible Markov cha<strong>in</strong>s.<br />
For each state x, wede ne the period ofx (denoted dx) tobethe greatest common<br />
divisor of the numbers n > 0 for which p (n)<br />
xx > 0. If dx =1,the state x is called<br />
aperiodic. It can be shown that, <strong>in</strong> an irreducible cha<strong>in</strong>, all states have the same period<br />
so we canspeakofthe cha<strong>in</strong> hav<strong>in</strong>g period d. Moreover, the state space can then<br />
be partitioned <strong>in</strong>to subsets S1S2:::Sd around whichthe cha<strong>in</strong> moves cyclically, i.e.<br />
p (n)<br />
xy =0whenever x 2 Si, y 2 Sj with j ; i 6= n (mod d). F<strong>in</strong>ally, itcanbeshown<br />
that acha<strong>in</strong> is irreducible <strong>and</strong> aperiodic if <strong>and</strong> only if, for each pair x y, there exists<br />
Nxy such that p (n)<br />
xy > 0forall n Nxy.<br />
We nowcome tothe fundamental topic <strong>in</strong> the theory of Markov cha<strong>in</strong>s, which isthe<br />
problem of convergence to equilibrium. A probability measure = f xgx2S is called<br />
a stationary distribution (or <strong>in</strong>variant distribution or equilibrium distribution) for the<br />
Markov cha<strong>in</strong> P <strong>in</strong> case X<br />
x<br />
x pxy = y for all y: (2.3)<br />
Astationary probability distribution need not exist but if it does, then a lot more<br />
follows:<br />
Theorem 1 Let P be thetransition probability matrix of an irreducible Markov cha<strong>in</strong><br />
of period d. If a stationary probability distribution exists, then it is unique, <strong>and</strong> x > 0<br />
for all x. Moreover,<br />
lim<br />
n!1 p(nd+r) xy<br />
=<br />
for all x y. Inparticular, if P is aperiodic, then<br />
( d y if x 2 Si, y 2 Sj with j ; i = r (mod d)<br />
0 if x 2 Si, y 2 Sj with j ; i 6= r (mod d)<br />
(2.4)<br />
lim<br />
n!1 p(n)<br />
xy = y : (2.5)<br />
4
This theorem shows that the Markov cha<strong>in</strong> converges as t !1to the equilibrium<br />
distribution , irrespective ofthe <strong>in</strong>itial distribution . Moreover, under the conditions<br />
of this theorem much more can be proven | for example, a strong law of large numbers,<br />
a central limit theorem, <strong>and</strong> a law of the iterated logarithm. For statements <strong>and</strong> proofs<br />
of all these theorems, we refer the reader to Chung [6].<br />
Wecannowseehowtosetupadynamic <strong>Monte</strong><strong>Carlo</strong>method for generat<strong>in</strong>g samples<br />
from the probability distribution . It su ces to <strong>in</strong>vent a transition probability matrix<br />
P = fpxyg = fp(x ! y)g satisfy<strong>in</strong>g the follow<strong>in</strong>g two conditions:<br />
(A) Irreducibility. 3 For each pair x y 2 S, there exists ann 0 for which p (n)<br />
xy<br />
(B) Stationarity of . For each y 2 S,<br />
X<br />
x<br />
> 0.<br />
x pxy = y: (2.6)<br />
Then Theorem 1 (together with its more precise counterparts) shows that simulation of<br />
the Markovcha<strong>in</strong> P constitutes a legitimate<strong>Monte</strong> <strong>Carlo</strong> method for estimat<strong>in</strong>gaverages<br />
with respect to .Wecanstart the system <strong>in</strong> any state x, <strong>and</strong>the system is guaranteed<br />
to converge to equilibrium ast !1[at least <strong>in</strong> the averaged sense (2.4)]. Long-time<br />
averages of any observable f will converge with probability 1to -averages (strong<br />
law of large numbers), <strong>and</strong> willdosowith uctuations of size n ;1=2 (central limit<br />
theorem). In practice we will discard the data fromthe <strong>in</strong>itial transient, i.e. before the<br />
system has come close to equilibrium, but <strong>in</strong> pr<strong>in</strong>ciple this is not necessary (the bias is<br />
of order n ;1 ,hence asymptotically much smaller than the statistical uctuations).<br />
So far, so good! But while this is a correct <strong>Monte</strong> <strong>Carlo</strong> algorithm,itmay or may<br />
not be an e cient one. The key di culty isthat the successivestates X0 X1 :::of the<br />
Markov cha<strong>in</strong> are correlated |perhaps very strongly | so the variance of estimates<br />
producedfromthe dynamic <strong>Monte</strong> <strong>Carlo</strong>simulation may be much higher than <strong>in</strong> static<br />
<strong>Monte</strong> <strong>Carlo</strong> (<strong>in</strong>dependent sampl<strong>in</strong>g). To make this precise, let f = ff(x)gx2S be a<br />
real-valued function de ned on the state space S (i.e. a real-valued observable) that is<br />
square-<strong>in</strong>tegrable with respect to . Now consider the stationary Markov cha<strong>in</strong> (i.e.<br />
start the system <strong>in</strong> the stationary distribution ,orequivalently, \equilibrate" it for a<br />
very long time prior to observ<strong>in</strong>g the system). Then fftg ff(Xt)g is a stationary<br />
stochastic process with mean<br />
f<br />
hfti = X<br />
x<br />
x f(x) (2.7)<br />
3 Weavoid the term \ergodicity" because of its multiple <strong>and</strong> con ict<strong>in</strong>g mean<strong>in</strong>gs. In the physics<br />
literature, <strong>and</strong> <strong>in</strong>the mathematics literature on Markov cha<strong>in</strong>s with nite state space, \ergodic" is typically<br />
used as a synonym for \irreducible" [4, Section 2.4] [5, Chapter 4]. However, <strong>in</strong> the mathematics<br />
literature on Markov cha<strong>in</strong>s with general state space, \ergodic" is used as a synonym for \irreducible,<br />
aperiodic <strong>and</strong> positive Harris recurrent" [7, p.114][8, p. 169].<br />
5
<strong>and</strong> unnormalized autocorrelation function 4<br />
Cff(t) hfsfs+ti; 2 f (2.8)<br />
= X<br />
The normalized autocorrelation function is then<br />
xy<br />
f(x)[ xp (jtj)<br />
xy ; x y] f(y):<br />
ff(t) Cff(t)=Cff(0): (2.9)<br />
Typically ff(t) decays exponentially ( e ;jtj= ) for large t wede ne the exponential<br />
autocorrelation time<br />
t<br />
expf =limsup<br />
t!1 ; log j ff(t)j<br />
(2.10)<br />
<strong>and</strong><br />
exp =sup<br />
f<br />
expf: (2.11)<br />
Thus, exp is the relaxation time ofthe slowest mode <strong>in</strong>the system. (If the state space<br />
is <strong>in</strong> nite, exp might be+1!)<br />
An equivalent de nition, which is useful for rigorous analysis, <strong>in</strong>volves consider<strong>in</strong>g<br />
the spectrum ofthe transition probability matrix P considered as an operator on the<br />
Hilbert space l 2 ( ). 5 It is not hard to prove the follow<strong>in</strong>g factsabout P :<br />
(a) The operator P is a contraction. (In particular, its spectrum lies<strong>in</strong>the closed<br />
unit disk.)<br />
(b) 1 is a simple eigenvalue of P ,aswell as of its adjo<strong>in</strong>t P ,witheigenvector equal<br />
to the constant function 1.<br />
(c) If the Markov cha<strong>in</strong> is aperiodic, then 1 is the onlyeigenvalue of P (<strong>and</strong> ofP )<br />
on the unit circle.<br />
(d) Let R be the spectral radius of P act<strong>in</strong>g onthe orthogonal complement ofthe<br />
constant functions:<br />
Then R = e ;1= exp .<br />
R <strong>in</strong>f n r: spec(P j n 1 ? ) f : j j rg o : (2.12)<br />
Facts (a){(c) are a generalized Perron-Frobenius theorem [9] fact (d) is a consequence<br />
of a generalized spectral radius formula [10, Propositions 2.3{2.5].<br />
4 In the statistics literature, this is called the autocovariance function.<br />
5 l 2 ( )isthe space of complex-valued functions on S that are square-<strong>in</strong>tegrable with respect to :<br />
kfk ( P x xjf(x)j 2 ) 1=2 < 1. The<strong>in</strong>ner product is given by (fg) P x xf(x) g(x).<br />
6
The rate ofconvergence to equilibrium from an <strong>in</strong>itial nonequilibrium distribution<br />
can be bounded above <strong>in</strong>terms of R (<strong>and</strong> hence exp). More precisely, let is a probability<br />
measure on S, <strong>and</strong>let us de ne itsdeviation from equilibrium <strong>in</strong>thel2 sense,<br />
Then, clearly,<br />
d2( ) jj ; 1jjl 2( ) = sup<br />
jjfjj l 2 ( ) 1<br />
And bythe spectral radius formula,<br />
Z fd ; Z fd : (2.13)<br />
d2( P t ) jjP t j n 1 ? jj d2( ) : (2.14)<br />
jjP t j n 1 ? jj R t = exp(;t= exp) (2.15)<br />
asymptotically as t !1,withequality for all t if P is self-adjo<strong>in</strong>t (seebelow).<br />
On the other h<strong>and</strong>, for a given observable f we de ne the <strong>in</strong>tegrated autocorrelation<br />
time<br />
<strong>in</strong>tf = 1<br />
2<br />
= 1<br />
2 +<br />
1X<br />
t=;1 ff(t) (2.16)<br />
1X<br />
t=1<br />
ff(t)<br />
[The factor of 1<br />
2 is purely a matter of convention it is <strong>in</strong>serted so that <strong>in</strong>tf expf if<br />
ff(t) e ;jtj= with 1.] The <strong>in</strong>tegrated autocorrelation time controls the statistical<br />
error <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> measurements ofhfi. More precisely, the sample mean<br />
has variance<br />
f<br />
var(f) = 1<br />
n 2<br />
= 1<br />
n<br />
nX<br />
rs=1<br />
n;1 X<br />
1<br />
n<br />
t=;(n;1)<br />
nX<br />
ft<br />
t=1<br />
(2.17)<br />
Cff(r ; s) (2.18)<br />
1 ; jtj<br />
!<br />
Cff(t)<br />
n<br />
(2.19)<br />
1<br />
n (2 <strong>in</strong>tf) Cff(0) for n (2.20)<br />
Thus, the variance of f is a factor 2 <strong>in</strong>tf larger than it would be if the fftg were statistically<br />
<strong>in</strong>dependent. Stated di erently, the number of \e ectively <strong>in</strong>dependent samples"<br />
<strong>in</strong> a run of length n is roughly n=2 <strong>in</strong>tf .<br />
It is sometimes convenienttomeasure the<strong>in</strong>tegrated autocorrelation time<strong>in</strong>terms of<br />
the equivalent pure exponential decay thatwould produce thesamevalue of P1 t=;1 ff(t),<br />
namely<br />
;1<br />
e <strong>in</strong>tf<br />
log 2 <strong>in</strong>tf;1<br />
2 <strong>in</strong>tf +1<br />
7<br />
: (2.21)
This quantity has the nicefeature that asequenceofuncorrelated data has e<strong>in</strong>tf =0<br />
(but <strong>in</strong>tf = 1<br />
2 ). Of course, e<strong>in</strong>tf is ill-de ned if <strong>in</strong>tf < 1<br />
2 , as can occasionally happen<br />
<strong>in</strong> cases of anticorrelation.<br />
In summary, the autocorrelation times exp <strong>and</strong> <strong>in</strong>tf play di erent roles <strong>in</strong> <strong>Monte</strong><br />
<strong>Carlo</strong> simulations. exp places an upper bound onthe number of iterations ndisc which<br />
should be discarded at the beg<strong>in</strong>n<strong>in</strong>g ofthe run, before the system has atta<strong>in</strong>ed equilibrium<br />
for example, ndisc 20 exp is usually more than adequate. On the other h<strong>and</strong>,<br />
<strong>in</strong>tf determ<strong>in</strong>es the statistical errors <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong>measurements ofhfi, once equilibrium<br />
has been atta<strong>in</strong>ed.<br />
Most commonly it is assumed that exp <strong>and</strong> <strong>in</strong>tf are of the same order of magnitude,<br />
at least for \reasonable" observables f. But this is not true <strong>in</strong> general. In<br />
fact, <strong>in</strong> statistical-mechanical problems near a critical po<strong>in</strong>t, one usually expects the<br />
autocorrelation function ff(t) to obey a dynamic scal<strong>in</strong>g law [11] of the form<br />
valid <strong>in</strong> the region<br />
ff(t ) jtj ;a F ( ; c) jtj b (2.22)<br />
jtj 1 j ; cj 1 j ; cjjtj b bounded. (2.23)<br />
Here a b > 0 are dynamic critical exponents <strong>and</strong> F is a suitable scal<strong>in</strong>g function is<br />
some \temperature-like" parameter, <strong>and</strong> c is the critical po<strong>in</strong>t. Now suppose that F<br />
is cont<strong>in</strong>uous <strong>and</strong> strictly positive, with F (x) decay<strong>in</strong>g rapidly (e.g. exponentially) as<br />
jxj!1.Then it is not hard to see that<br />
expf<br />
<strong>in</strong>tf<br />
ff(t = c) jtj ;a<br />
j ; cj ;1=b<br />
j ; cj ;(1;a)=b<br />
(2.24)<br />
(2.25)<br />
(2.26)<br />
so that expf <strong>and</strong> <strong>in</strong>tf have di erent critical exponents unless a =0. 6 Actually, this<br />
should not be surpris<strong>in</strong>g: replac<strong>in</strong>g \time" by \space", we see that expf is the analogue<br />
of a correlation length, while <strong>in</strong>tf is the analogue of a susceptibility <strong>and</strong> (2.24){(2.26)<br />
are the analogue of the well-known scal<strong>in</strong>g law =(2; ) | clearly 6= <strong>in</strong> general!<br />
So it is crucial to dist<strong>in</strong>guish between the two types of autocorrelation time.<br />
Return<strong>in</strong>g tothe general theory, wenotethat oneconvenient way of satisfy<strong>in</strong>g condition<br />
(B) is to satisfy the follow<strong>in</strong>g stronger condition:<br />
(B 0 ) For each pairx y 2 S xpxy = ypyx: (2.27)<br />
6 Our discussion of this topic <strong>in</strong> [12] is <strong>in</strong>correct.<br />
8
[Summ<strong>in</strong>g (B 0 )over x, we recover (B).] (B 0 ) is called the detailed-balance condition<br />
a Markov cha<strong>in</strong> satisfy<strong>in</strong>g (B 0 ) is called reversible. 7 (B 0 ) is equivalent tothe selfadjo<strong>in</strong>tness<br />
of P as on operator on the space l 2 ( ). In this case, the spectrum ofP<br />
is real <strong>and</strong> lies<strong>in</strong>the closed <strong>in</strong>terval [;1 1] we de ne<br />
>From (2.12) we have<br />
m<strong>in</strong> = <strong>in</strong>f spec (P j n 1 ? ) (2.28)<br />
max = sup spec (P j n 1 ? ) (2.29)<br />
exp =<br />
;1<br />
log max[j m<strong>in</strong>j max]<br />
: (2.30)<br />
For many purposes, only the spectrum near +1 matters, so it is useful to de ne the<br />
modi ed exponential autocorrelation time<br />
0<br />
exp = ;1= log max if max > 0<br />
+1 if max 0<br />
(2.31)<br />
Now let us apply the spectral theorem to the operator P : it follows that the autocorrelation<br />
function ff(t) has a spectral representation<br />
ff(t) =<br />
maxf Z<br />
m<strong>in</strong>f<br />
jtj d ff( ) (2.32)<br />
with a nonnegative spectral weight d ff( ) supported on an <strong>in</strong>terval<br />
[ m<strong>in</strong>f maxf] [ m<strong>in</strong> max]. Clearly<br />
<strong>and</strong> we can de ne<br />
0<br />
expf =<br />
expf =<br />
(if maxf 0). On the other h<strong>and</strong>,<br />
It follows that<br />
<strong>in</strong>tf<br />
0<br />
1<br />
2 1 ; e<br />
;1<br />
log max[j m<strong>in</strong>fj maxf]<br />
( ;1= log maxf if maxf > 0<br />
0 if maxf 0<br />
<strong>in</strong>tf = 1<br />
2<br />
@ 1+e;1= 0 expf<br />
;1= 0<br />
expf<br />
maxf Z<br />
m<strong>in</strong>f<br />
1<br />
A<br />
(2.33)<br />
(2.34)<br />
1+<br />
1 ; d ff( ) (2.35)<br />
1<br />
2<br />
;1= 1+e 0 exp<br />
1 ; e ;1= 0 !<br />
exp<br />
0<br />
exp : (2.36)<br />
7 For the physical signi cance of this term, see Kemeny <strong>and</strong>Snell [4, section 5.3] or Iosifescu [5,<br />
section 4.5].<br />
9
Moreover, s<strong>in</strong>ce 7! (1 + )=(1 ; ) is a convex function, Jensen's <strong>in</strong>equality implies<br />
that<br />
<strong>in</strong>tf<br />
1<br />
2<br />
1+ ff(1)<br />
1 ; ff(1)<br />
If we de ne the <strong>in</strong>itial autocorrelation time<br />
(<br />
;1= log<br />
<strong>in</strong>itf =<br />
ff(1) if ff(1) 0<br />
unde ned if ff(1) < 0<br />
then these <strong>in</strong>equalities can be summarized conveniently as<br />
<strong>in</strong>itf<br />
Conversely, itisnothard to see that<br />
sup<br />
f2l 2 ( )<br />
e <strong>in</strong>tf<br />
0<br />
expf<br />
<strong>in</strong>itf = sup<br />
f2l2 e<strong>in</strong>tf = sup<br />
( )<br />
f2l2 ( )<br />
: (2.37)<br />
(2.38)<br />
0<br />
exp : (2.39)<br />
0<br />
expf = 0 exp (2.40)<br />
it su ces to choose f so that its spectral weight d ff is supported <strong>in</strong> a very small<br />
<strong>in</strong>terval near max.<br />
F<strong>in</strong>ally, let us make a remark about transition probabilities P that are \built upout<br />
of" other transition probabilities P1P2:::Pn:<br />
a) If P1P2:::Pn satisfy the stationarity condition (resp. the detailed-balance condition)<br />
for ,then so does any convex comb<strong>in</strong>ation P = P n i=1 iPi. Here i 0<br />
<strong>and</strong> P n i=1 i =1.<br />
b) If P1P2:::Pn satisfy the stationarity condition for ,then so does the product<br />
P = P1P2 Pn. (Note, however, that P does not <strong>in</strong> general satisfy the detailedbalance<br />
condition, even if the <strong>in</strong>dividual Pi do. 8 )<br />
Algorithmically,the convex comb<strong>in</strong>ation amounts tochoos<strong>in</strong>g r<strong>and</strong>omly, with probabilities<br />
f ig, from among the \elementary operations" Pi. (It is crucial here that the i<br />
are constants, <strong>in</strong>dependent ofthe current con guration of the system only <strong>in</strong> this case<br />
does P leave stationary <strong>in</strong> general.) Similarly,the product corresponds to perform<strong>in</strong>g<br />
sequentially the operations P1P2:::Pn.<br />
8 Recall that ifA <strong>and</strong> B are self-adjo<strong>in</strong>t operators, then AB is self-adjo<strong>in</strong>t if <strong>and</strong> only if A <strong>and</strong> B<br />
commute.<br />
10
3 <strong>Statistical</strong> Analysis of Dynamic <strong>Monte</strong> <strong>Carlo</strong>Data<br />
Many published <strong>Monte</strong> <strong>Carlo</strong> studies conta<strong>in</strong> statements like:<br />
We ran for a total of 100000 iterations, discard<strong>in</strong>g the rst 50000 iterations<br />
(for equilibration) <strong>and</strong> then tak<strong>in</strong>g measurements once every 100 iterations.<br />
It is important to emphasize that unless further <strong>in</strong>formation is given | namely, the<br />
autocorrelation time ofthe algorithm | such statements have no value whatsoever.<br />
Is a run of 100000 iterations long enough? Are 50000 iterations su cient for equilibration?<br />
That depends on how bigthe autocorrelation time is. The purpose of this<br />
lecture is to give some practical advice for choos<strong>in</strong>g the parameters of a dynamic <strong>Monte</strong><br />
<strong>Carlo</strong> simulation, <strong>and</strong> to give an<strong>in</strong>troduction to the statistical theory that puts this<br />
advice on a sound mathematical foot<strong>in</strong>g.<br />
There are two fundamental | <strong>and</strong> quitedist<strong>in</strong>ct | issues <strong>in</strong> dynamic <strong>Monte</strong> <strong>Carlo</strong><br />
simulation:<br />
Initialization bias. If the Markov cha<strong>in</strong> is started <strong>in</strong> a distribution that isnot<br />
equal to the stationary distribution ,then there is an \<strong>in</strong>itial transient" <strong>in</strong> which<br />
the data do not re ect the desired equilibrium distribution . This results <strong>in</strong>a<br />
systematic error (bias).<br />
Autocorrelation <strong>in</strong> equilibrium. The Markov cha<strong>in</strong>, once it reaches equilibrium,<br />
provides correlated samples from . This correlation causes the statistical error<br />
(variance) to beafactor 2 <strong>in</strong>tf larger than <strong>in</strong> <strong>in</strong>dependent sampl<strong>in</strong>g.<br />
Let us discuss these issues <strong>in</strong> turn.<br />
Initialization bias. Often the Markov cha<strong>in</strong> is started <strong>in</strong> some chosen con guration<br />
x then = x. For example, <strong>in</strong> an Is<strong>in</strong>g model, x might bethe con guration with<br />
\all sp<strong>in</strong>s up" this is sometimes called an ordered or cold start. Alternatively, the<br />
Markov cha<strong>in</strong> might bestarted <strong>in</strong> a r<strong>and</strong>om con guration chosen accord<strong>in</strong>g to some<br />
simple probability distribution . For example, <strong>in</strong> an Is<strong>in</strong>g model, we might<strong>in</strong>itialize<br />
the sp<strong>in</strong>s r<strong>and</strong>omly <strong>and</strong> <strong>in</strong>dependently, with equal probabilities of up <strong>and</strong> down this is<br />
sometimes called a r<strong>and</strong>om or hot start. In all these cases, the <strong>in</strong>itial distribution is<br />
clearly not equal to the equilibrium distribution . Therefore, the system is <strong>in</strong>itially<br />
\out of equilibrium". Theorem 1 guarantees that the system approaches equilibriumas<br />
t !1,butweneed to knowsometh<strong>in</strong>g about the rate of convergence to equilibrium.<br />
Us<strong>in</strong>g the exponential autocorrelation time exp, we can set an upper bound on the<br />
amount oftime wehave towait before equilibrium is \for all practical purposes" atta<strong>in</strong>ed.<br />
For example, if we wait a time 20 exp, then the deviation from equilibrium (<strong>in</strong><br />
the l 2 sense) will be at moste ;20 ( 2 10 ;9 )times the <strong>in</strong>itial deviation from equilibrium.<br />
There are two di culties with this bound. Firstly, it is usually impossible to<br />
apply <strong>in</strong> practice, s<strong>in</strong>ce we almost never know exp (or a rigorous upper bound forit).<br />
Secondly, even if we canapply it, it may be overly conservative <strong>in</strong>deed, there exist<br />
perfectly reasonable algorithms <strong>in</strong> which exp =+1 (see Sections 7 <strong>and</strong> 8).<br />
11
Lack<strong>in</strong>g rigorous knowledge of the autocorrelation time exp, weshould try to estimate<br />
itboth theoretically <strong>and</strong> empirically. Tomake aheuristic theoretical estimate of<br />
exp, weattempt to underst<strong>and</strong> the physical mechanism(s) caus<strong>in</strong>g slowconvergence to<br />
equilibrium but it is always possible that wehave overlooked one or more such mechanisms,<br />
<strong>and</strong> have therefore grossly underestimated exp. To make a rough empirical<br />
estimate of exp, wemeasure the autocorrelation function Cff(t) for a suitably large<br />
setofobservables f [see below] but there is always the danger that ourchosen set of<br />
observables has failed to <strong>in</strong>cludeonethat has strong enoughoverlap with the slowest<br />
mode, aga<strong>in</strong> lead<strong>in</strong>g to a gross underestimate of exp.<br />
On the other h<strong>and</strong>, the actual rate ofconvergence to equilibrium from a given <strong>in</strong>itial<br />
distribution may be much faster than the worst-case estimate given by exp. So it<br />
is usual to determ<strong>in</strong>e empirically when \equilibrium" has been achieved, by plott<strong>in</strong>g<br />
selected observables as a function of time <strong>and</strong>not<strong>in</strong>g when the <strong>in</strong>itial transient appears<br />
to end. More sophisticated statistical tests for <strong>in</strong>itialization bias can also be employed<br />
[13].<br />
In all empirical methods of determ<strong>in</strong><strong>in</strong>g when \equilibrium" has been<br />
achieved, a serious danger is the possibilityofmetastability. That is, it could appear that<br />
equilibriumhas been achieved, when <strong>in</strong> fact the system has only settled down to a longlived<br />
(metastable) region of con guration space that may be very far from equilibrium.<br />
The only sure- re protection aga<strong>in</strong>st metastabilityisaproof of an upper bound on exp<br />
(or more generally, onthe deviation from equilibriumasafunction of the elapsedtime<br />
t). The next-best protection is a conv<strong>in</strong>c<strong>in</strong>g heuristic argument that metastability is<br />
unlikely (i.e. that exp is not too large) but asmentioned before, even if one rules out<br />
several potential physical mechanisms formetastability,itisvery di cult to be certa<strong>in</strong><br />
that onehas not overlooked others. If one cannot rule out metastability ontheoretical<br />
grounds, then it is helpful at least to haveanidea of what the possible metastable regions<br />
look like then one can perform several runs with di erent <strong>in</strong>itial conditions typical of<br />
each ofthe possible metastable regions, <strong>and</strong> test whether the answers are consistent.<br />
For example, near a rst-order phase transition, most <strong>Monte</strong> <strong>Carlo</strong> methods su er from<br />
metastability associated with transitions between con gurations typical of the dist<strong>in</strong>ct<br />
pure phases. We can try <strong>in</strong>itial conditions typical of each ofthese phases (e.g. for many<br />
models, a \hot" start <strong>and</strong> a \cold" start). Consistency between these runs does not<br />
guarantee that metastability isabsent, but itdoesgive <strong>in</strong>creased con dence. Plots of<br />
observables as a function of time are also useful <strong>in</strong>dicators of possible metastability.<br />
But when all is said <strong>and</strong> done, no purely empirical estimateof from a run of length<br />
n can be guaranteed to beeven approximately correct. What we can say is that if<br />
estimated n, then either estimated or else > n.<br />
Once we know (or guess) the time needed to atta<strong>in</strong> \equilibrium", what dowedo<br />
with it? The answer is clear: we discard the data fromthe <strong>in</strong>itial transient, up to some<br />
time ndisc, <strong>and</strong> <strong>in</strong>clude onlythe subsequent data <strong>in</strong>ouraverages. In pr<strong>in</strong>ciple, this is<br />
(asymptotically) unnecessary, because the systematic errors from this <strong>in</strong>itial transient<br />
will be of order =n, while thestatistical errors will be of order ( =n) 1=2 .But <strong>in</strong> practice,<br />
the coe cientof =n <strong>in</strong> the systematic error may be fairly large, if the <strong>in</strong>itial distribution<br />
12
is very far from equilibrium. By throw<strong>in</strong>g away the data from the <strong>in</strong>itial transient, we<br />
lose noth<strong>in</strong>g, <strong>and</strong> avoid a potentially large systematic error.<br />
Autocorrelation <strong>in</strong> equilibrium. As expla<strong>in</strong>ed <strong>in</strong> the preced<strong>in</strong>g lecture, the variance<br />
of the sample mean f <strong>in</strong>adynamic <strong>Monte</strong> <strong>Carlo</strong>method is a factor 2 <strong>in</strong>tf higher than<br />
it would be <strong>in</strong> <strong>in</strong>dependent sampl<strong>in</strong>g. Otherwise put, a run oflength n conta<strong>in</strong>s only<br />
n=2 <strong>in</strong>tf \e ectively <strong>in</strong>dependent data po<strong>in</strong>ts".<br />
This has several implications for <strong>Monte</strong> <strong>Carlo</strong> work. On the oneh<strong>and</strong>, it means<br />
that the the computational e ciency of the algorithm is determ<strong>in</strong>ed pr<strong>in</strong>cipally by its<br />
autocorrelation time. More precisely, ifonewishes to compare two alternative <strong>Monte</strong><br />
<strong>Carlo</strong> algorithms for the same problem, then the better algorithm is the one that has<br />
the smaller autocorrelation time, when time is measured <strong>in</strong> units of computer (CPU)<br />
time. [In general there may arise tradeo s between \physical" autocorrelation time (i.e.<br />
measured <strong>in</strong> iterations) <strong>and</strong> computational complexity per iteration.] So accurate<br />
measurements ofthe autocorrelation time are essential to evaluat<strong>in</strong>g the computational<br />
e ciency of compet<strong>in</strong>g algorithms.<br />
On the other h<strong>and</strong>, even for a xed algorithm, knowledge of <strong>in</strong>tf is essential for<br />
determ<strong>in</strong><strong>in</strong>g run lengths | is a run of 100000 sweeps long enough? | <strong>and</strong> for sett<strong>in</strong>g<br />
error bars on estimates of hfi. Roughly speak<strong>in</strong>g, error bars will be of order ( =n) 1=2 <br />
so if we want 1% accuracy, then we need a run oflength 10000 ,<strong>and</strong>soon. Above<br />
all, there is a basic self-consistency requirement: the runlength n must be than the<br />
estimates of produced by that samerun, otherwise none of the results fromthat run<br />
should be believed. Of course, while self-consistency is a necessary condition for the<br />
trustworth<strong>in</strong>ess of <strong>Monte</strong> <strong>Carlo</strong> data, it is not a su cient condition there is always the<br />
danger of metastability.<br />
Already we can draw a conclusion about the relative importance of <strong>in</strong>itialization<br />
bias <strong>and</strong> autocorrelation as di culties <strong>in</strong> dynamic <strong>Monte</strong> <strong>Carlo</strong> work. Let us assume<br />
that the time for <strong>in</strong>itial convergence to equilibrium is comparable to (orat least not too<br />
much larger than) the equilibrium autocorrelation time <strong>in</strong>tf (for the observables f of<br />
<strong>in</strong>terest) | thisisoften but notalways the case. Then <strong>in</strong>itialization bias is a relatively<br />
trivial problem compared to autocorrelation <strong>in</strong> equilibrium. To elim<strong>in</strong>ate <strong>in</strong>itialization<br />
bias, it su ces to discard 20 of the data atthe beg<strong>in</strong>n<strong>in</strong>g ofthe run but toachieve<br />
a reasonably small statistical error, it is necessary to make arunoflength 1000 or<br />
more. So the data that must be discarded at the beg<strong>in</strong>n<strong>in</strong>g, ndisc, isanegligible fraction<br />
of the total run length n. This estimate alsoshows that the exact value of ndisc is not<br />
particularly delicate: anyth<strong>in</strong>g between 20 <strong>and</strong> n=5 will elim<strong>in</strong>ate essentially all<br />
<strong>in</strong>itialization bias while pay<strong>in</strong>g less than a 10% price <strong>in</strong> the nal error bars.<br />
In this rema<strong>in</strong>der of this lecture I would like to discuss <strong>in</strong> more detail the statistical<br />
analysis of dynamic <strong>Monte</strong> <strong>Carlo</strong> data (assumed to be already \<strong>in</strong> equilibrium"), with<br />
emphasis on how toestimate the autocorrelation time <strong>in</strong>tf <strong>and</strong> how tocompute valid<br />
error bars. What is<strong>in</strong>volved here is a branch ofmathematical statistics called timeseries<br />
analysis. An excellent exposition can be found <strong>in</strong>the books of of Priestley [14]<br />
<strong>and</strong> Anderson [15].<br />
13
Let fftg be a real-valued stationary stochastic process with mean<br />
unnormalized autocorrelation function<br />
normalized autocorrelation function<br />
<strong>and</strong> <strong>in</strong>tegrated autocorrelation time<br />
hfti (3.1)<br />
C(t) hfs fs+ti ; 2 (3.2)<br />
(t) C(t)=C(0) (3.3)<br />
<strong>in</strong>t = 1<br />
2<br />
1X<br />
t = ;1<br />
(t) : (3.4)<br />
Our goal is to estimate , C(t), (t) <strong>and</strong> <strong>in</strong>t based on a nite (but large) sample<br />
f1 ::: fn from this stochastic process.<br />
The \natural" estimator of is the sample mean<br />
f<br />
1<br />
n<br />
nX<br />
i =1<br />
This estimator is unbiased (i.e. hfi = )<strong>and</strong>has variance<br />
var(f) = 1<br />
n<br />
n;1 X<br />
t=;(n;1)<br />
fi : (3.5)<br />
(1 ; jtj<br />
) C(t) (3.6)<br />
n<br />
1<br />
n (2 <strong>in</strong>t) C(0) for n (3.7)<br />
Thus, even if we are<strong>in</strong>terested only <strong>in</strong> the static quantity ,itisnecessary to estimate<br />
the dynamic quantity <strong>in</strong>t <strong>in</strong> order to determ<strong>in</strong>e valid error bars for .<br />
The \natural" estimator of C(t) is<br />
bC(t)<br />
if the mean is known, <strong>and</strong><br />
bC(t)<br />
1<br />
n ;jtj<br />
1<br />
n ;jtj<br />
n X;jtj<br />
i =1<br />
n X;jtj<br />
i =1<br />
(fi ; )(fi + jtj ; ) (3.8)<br />
(fi ; f)(fi + jtj ; f) (3.9)<br />
if the mean is unknown. We emphasize the conceptual dist<strong>in</strong>ction between the autocorrelation<br />
function C(t), which foreach t is a number, <strong>and</strong>the estimator b C(t) or<br />
bC(t), which foreach t is a r<strong>and</strong>om variable. As will become clear, this dist<strong>in</strong>ction is<br />
14
also of practical importance. b C(t) isanunbiased estimator of C(t), <strong>and</strong> b C(t) is almost<br />
unbiased (the bias is of order 1=n) [15, p. 463]. Their variances <strong>and</strong> covariances are [15,<br />
pp. 464{471] [14, pp. 324{328]<br />
1X<br />
var( b C(t)) = 1<br />
n m=;1<br />
+ o 1<br />
n<br />
cov( b C(t) b C(u)) = 1<br />
n<br />
1X<br />
h C(m) 2 + C(m + t)C(m ; t)+ (t m m + t) i<br />
[C(m)C(m + u ; t)+C(m + u)C(m ; t)<br />
m=;1<br />
+ (t m m + u)] + o 1<br />
n<br />
where t u 0<strong>and</strong> is the connected 4-po<strong>in</strong>t autocorrelation function<br />
(3.10)<br />
(3.11)<br />
(r st) h(fi ; )(fi+r ; )(fi+s ; )(fi+t ; )i<br />
; C(r)C(t ; s) ; C(s)C(t ; r) ; C(t)C(s ; r) : (3.12)<br />
To lead<strong>in</strong>g order <strong>in</strong> 1=n, the behavior of b C is identical to that of b C.<br />
The \natural" estimator of (t) is<br />
if the mean is known, <strong>and</strong><br />
b(t) b C(t)= b C(0) (3.13)<br />
b(t)<br />
bC(t)= b C(0) (3.14)<br />
if the mean is unknown. The variances <strong>and</strong> covariances of b(t) <strong>and</strong> b(t) canbe<br />
computed (for large n) from (3.11) we omitthe detailed formulae.<br />
The \natural" estimator of <strong>in</strong>t would seem to be<br />
b <strong>in</strong>t<br />
? 1<br />
2<br />
n X;<br />
1<br />
t = ;(n ; 1)<br />
b(t) (3.15)<br />
(or the analogous th<strong>in</strong>g with b), but this is wrong! The estimator de ned <strong>in</strong> (3.15) has a<br />
variance that doesnotgoto zero as the sample size n goes to <strong>in</strong> nity [14, pp. 420{431],<br />
so it is clearly a very bad estimator of <strong>in</strong>t. Roughly speak<strong>in</strong>g, this is because the sample<br />
autocorrelations b(t) for jtj conta<strong>in</strong> much \noise" but little \signal" <strong>and</strong> there are<br />
so many ofthem (order n) that the noise adds up to atotal variance of order 1. (For a<br />
more detailed discussion, see [14, pp. 432{437].) The solution is to cuto the sum<strong>in</strong><br />
(3.15) us<strong>in</strong>g a \w<strong>in</strong>dow" (t) whichis 1forjtj < but 0 for jtj :<br />
b <strong>in</strong>t<br />
1<br />
2<br />
n X;<br />
1<br />
t = ;(n ; 1)<br />
15<br />
(t) b(t) : (3.16)
This reta<strong>in</strong>s most of the \signal" but discards most of the \noise". A good choice is the<br />
rectangular w<strong>in</strong>dow<br />
(<br />
1<br />
(t) =<br />
0<br />
if jtj M<br />
if jtj >M<br />
(3.17)<br />
where M is a suitably chosen cuto . This cuto <strong>in</strong>troduces a bias<br />
bias(b <strong>in</strong>t) =; 1<br />
2<br />
X<br />
jtj>M<br />
(t)+o 1<br />
n<br />
: (3.18)<br />
On the other h<strong>and</strong>, the variance of b<strong>in</strong>t can be computed from (3.11) after some algebra,<br />
one obta<strong>in</strong>s<br />
var(b<strong>in</strong>t) 2(2M +1)<br />
n<br />
2<br />
<strong>in</strong>t (3.19)<br />
where wehavemadetheapproximation M n. 9 Thechoice of M is thus a tradeo<br />
between bias <strong>and</strong> variance: the bias can be made small by tak<strong>in</strong>g M large enough so<br />
that (t) isnegligible for jtj >M(e.g. M =afewtimes usually su ces), while<br />
the variance is kept small by tak<strong>in</strong>g M to be no larger than necessary consistent with<br />
this constra<strong>in</strong>t. We have found the follow<strong>in</strong>g \automatic w<strong>in</strong>dow<strong>in</strong>g" algorithm [16] to<br />
be convenient: choose M to bethe smallest <strong>in</strong>teger such that M<br />
were roughly a pure exponential, then it would su ce to take c<br />
c b<strong>in</strong>t(M). If (t)<br />
4(s<strong>in</strong>cee ;4 < 2%).<br />
However, <strong>in</strong> many cases (t) is expected to have an asymptotic or pre-asymptotic decay<br />
slower than exponential, so it is usually prudent totake c at least 6, <strong>and</strong> possibly as<br />
large as 10.<br />
Wehave foundthis automatic w<strong>in</strong>dow<strong>in</strong>g procedure to work well <strong>in</strong> practice, provided<br />
that a su cient quantity ofdata isavailable (n > 1000 ). However, at present we<br />
have very little underst<strong>and</strong><strong>in</strong>g ofthe conditions under which this w<strong>in</strong>dow<strong>in</strong>g algorithm<br />
may produce biased estimates of <strong>in</strong>t or of its own error bars. Further theoretical <strong>and</strong><br />
experimental study of the w<strong>in</strong>dow<strong>in</strong>g algorithm | e.g. experiments onvarious exactlyknown<br />
stochastic processes, with various run lengths | would be highly desirable.<br />
4 Conventional <strong>Monte</strong> <strong>Carlo</strong> Algorithms for<br />
Sp<strong>in</strong> Models<br />
In this lecture we describe the constructionofdynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms for<br />
models <strong>in</strong> statistical mechanics <strong>and</strong> quantum eld theory. Recall our goal: given a<br />
probability measure on the state space S, wewishto construct a transition matrix<br />
P = fpxyg satisfy<strong>in</strong>g:<br />
9 Wehave also assumed that the only strong peaks <strong>in</strong> the Fourier transform of C(t) areat zero<br />
frequency. This assumption is valid if C(t) 0, but could fail if there are strong anticorrelations.<br />
16
(A) Irreducibility. For each pair x y 2 S, there exists ann 0 for which p (n)<br />
xy > 0.<br />
(B) Stationarity of . For each y 2 S,<br />
X<br />
x<br />
x pxy = y: (4.1)<br />
A su cient condition for (B), which isoften more convenient toverify, is:<br />
(B 0 ) Detailed balance for . For each pairx y 2 S xpxy = ypyx.<br />
Avery general method for construct<strong>in</strong>g transition matrices satisfy<strong>in</strong>g detailed balance<br />
for a given distribution was <strong>in</strong>troduced <strong>in</strong> 1953 by Metropolis et al. [17], with<br />
a slight extension two decades later by Hast<strong>in</strong>gs [18]. The idea is the follow<strong>in</strong>g: Let<br />
P (0) = fp (0)<br />
xy g be an arbitrary irreducible transition matrix on S. We callP (0) the<br />
proposal matrix weshall use it to generate proposed moves x ! y that will then be<br />
accepted or rejected with probabilities axy <strong>and</strong> 1; axy, respectively. Ifaproposed move<br />
is rejected, then we make a\null transition" x ! x. Therefore, the transition matrix<br />
P = fpxyg of the full algorithm is<br />
pxy = p (0)<br />
xy axy<br />
pxx = p (0)<br />
xx + X<br />
y6=x<br />
for x 6= y<br />
p (0)<br />
xy (1 ; axy) (4.2)<br />
where of course we must have 0 axy 1 for all x y. Itiseasyto see that P satis es<br />
detailed balance for if <strong>and</strong> only if<br />
axy<br />
ayx<br />
= y p (0)<br />
yx<br />
x p (0)<br />
xy<br />
for all pairs x 6= y. Butthis is easily arranged: just set<br />
axy = F<br />
y p (0)<br />
yx<br />
x p (0)<br />
xy<br />
where F :[0 +1] ! [0 1] is any function satisfy<strong>in</strong>g<br />
F (z)<br />
F (1=z)<br />
The choice suggested by Metropolis et al. is<br />
!<br />
(4.3)<br />
(4.4)<br />
= z for all z: (4.5)<br />
F (z) = m<strong>in</strong>(z1) (4.6)<br />
this is the maximal function satisfy<strong>in</strong>g (4.5). Another choice sometimes used is<br />
F (z) = z<br />
1+z<br />
17<br />
: (4.7)
Of course, it is still necessary to check that P is irreducible this is usually done ona<br />
case-by-case basis.<br />
Note that ifthe proposal matrix P (0) happens to already satisfy detailed balance for<br />
,then we have y p (0)<br />
yx = x p (0)<br />
xy =1,sothat axy =1(ifweusethe Metropolis choice of<br />
F )<strong>and</strong> P = P (0) .Onthe other h<strong>and</strong>, no matter what P (0) is, we obta<strong>in</strong> a matrix P that<br />
satis es detailed balance for . So the Metropolis-Hast<strong>in</strong>gs procedure can be thought<br />
of as a prescription for m<strong>in</strong>imally modify<strong>in</strong>g agiven transition matrix P (0) so that it<br />
satis es detailed balance for .<br />
Many textbooks <strong>and</strong> articles describe the Metropolis-Hast<strong>in</strong>gs procedure only <strong>in</strong> the<br />
special case <strong>in</strong> which the proposal matrix P (0) is symmetric, namely p (0)<br />
xy = p (0)<br />
yx .Inthis<br />
case (4.4) reduces to<br />
In statistical mechanics we have<br />
<strong>and</strong> hence<br />
x =<br />
axy = F<br />
y<br />
x<br />
Ex<br />
e;<br />
e ; Ey<br />
P<br />
y<br />
y<br />
x<br />
= e; Ex<br />
: (4.8)<br />
Z<br />
(4.9)<br />
= e ; (Ey;Ex) : (4.10)<br />
Note that the partition function Z has disappeared from this expression thisiscrucial,<br />
as Z is almost never explicitly computable! Us<strong>in</strong>g the Metropolis acceptance probability<br />
F (z) = m<strong>in</strong>(z1), we obta<strong>in</strong> the follow<strong>in</strong>g rules for acceptance or rejection:<br />
If E Ey ; Ex 0, then we accept the proposal always (i.e. with probability<br />
1).<br />
If E>0, then we accept the proposal with probability e ; E (< 1). That is,<br />
we choose a r<strong>and</strong>om number r uniformly distributed on [0 1], <strong>and</strong> we accept the<br />
proposal if r e ; E .<br />
But there is noth<strong>in</strong>g special about P (0) be<strong>in</strong>g symmetric any proposal matrix P (0) is<br />
perfectly legitimate, <strong>and</strong>the Metropolis-Hast<strong>in</strong>gs procedure is de ned quite generally<br />
by (4.4).<br />
Let us emphasize once more that the Metropolis-Hast<strong>in</strong>gs procedure is a general<br />
technique it produces an <strong>in</strong> nite family of di erent algorithms depend<strong>in</strong>g onthe choice<br />
of the proposal matrix P (0) .Intheliterature the term \Metropolis algorithm" is often<br />
used to denote the algorithm result<strong>in</strong>g fromsomeparticular commonly-used choice of<br />
P (0) ,butit is important nottobemisled. ToseetheMetropolis-Hast<strong>in</strong>gs procedure <strong>in</strong> action, let us consider a typical statisticalmechanical<br />
model, the Is<strong>in</strong>g model: Oneachsiteiof some nited-dimensional lattice,<br />
we place a r<strong>and</strong>om variable i tak<strong>in</strong>g the values 1. The Hamiltonian is<br />
H( ) =; X<br />
18<br />
hiji<br />
i j (4.11)
where the sumruns over all nearest-neighbor pairs. The correspond<strong>in</strong>g Gibbsmeasure<br />
is<br />
( ) = Z ;1 exp[; H( )] (4.12)<br />
where Z is a normalization constant (the partition function). Two di erent proposal<br />
matrices are <strong>in</strong> common use:<br />
1) S<strong>in</strong>gle-sp<strong>in</strong>- ip (Glauber) dynamics. Fix some site i. Theproposal is to ip i,<br />
hence<br />
Here P (0)<br />
i<br />
where<br />
p (0)<br />
i (f g!f 0 g) = 1 if 0 i = ; i <strong>and</strong> 0 j = j for all j 6= i<br />
0 otherwise<br />
is symmetric, so the acceptance probability is<br />
(4.13)<br />
ai(f g!f 0 g) = m<strong>in</strong>(e ; E 1) (4.14)<br />
E E(f 0 g) ; E(f g) = 2 i<br />
X<br />
j n:n: of i<br />
j : (4.15)<br />
So E is easily computed by compar<strong>in</strong>gthe status of i <strong>and</strong> itsneighbors.<br />
This de nes a transition matrix Pi <strong>in</strong> which onlythe sp<strong>in</strong>atsite i is touched. The<br />
full \s<strong>in</strong>gle-sp<strong>in</strong>- ip Metropolis algorithm" <strong>in</strong>volves sweep<strong>in</strong>g through the entire lattice<br />
<strong>in</strong> either a r<strong>and</strong>om or periodic fashion, i.e. either<br />
or<br />
P = 1<br />
V<br />
X<br />
i<br />
Pi (r<strong>and</strong>om site updat<strong>in</strong>g) (4.16)<br />
P = Pi1Pi2 Pi V (sequential site updat<strong>in</strong>g) (4.17)<br />
(here V is the volume). In the former case, the transition matrix P satis es detailed<br />
balance for . In the latter case, P does not <strong>in</strong> general satisfy detailed balance for ,<br />
but itdoessatisfy stationarity for ,whichisallthat reallymatters.<br />
It is easy to checkthat P is irreducible,except<strong>in</strong>the caseofsequential siteupdat<strong>in</strong>g<br />
with =0. 10<br />
2) Pair-<strong>in</strong>terchange (Kawasaki) dynamics. Fix a pair of sites i j. The proposal is<br />
to <strong>in</strong>terchange i <strong>and</strong> j, hence<br />
p (0)<br />
(ij) (f g!f 0 g) = 1 if 0 i = j, 0 j = i <strong>and</strong> 0 k = k for all k 6= i j<br />
0 otherwise<br />
(4.18)<br />
10 Note Added 1996: This statementiswrong!!! For the Metropolis acceptance probability F (z) =<br />
m<strong>in</strong>(z1), one may construct examples of nonergodicity for sequential site updat<strong>in</strong>g at any [104, 105]:<br />
the idea is to construct con gurations for which every proposed update (work<strong>in</strong>g sequentially) has<br />
E =0<strong>and</strong>isthus accepted. Of course, the ergodicity can be restored by tak<strong>in</strong>g any F (z) < 1, such<br />
as the choice F (z) =z=(1 + z).<br />
19
Therestofthe formulae are analogous to those for s<strong>in</strong>gle-sp<strong>in</strong>- ip dynamics. Theoverall<br />
algorithm is aga<strong>in</strong> constructed by a r<strong>and</strong>om or periodic sweep over a suitablesetofpairs<br />
i j, usually taken to benearest-neighbor pairs. It should be noted that this algorithm is<br />
not irreducible, as it conserves the total magnetization M = P i i. But it is irreducible<br />
on subspacesof xedM (except for sequential updat<strong>in</strong>g with =0).<br />
Avery di erent approach to construct<strong>in</strong>g transition matrices satisfy<strong>in</strong>g detailed balance<br />
for is the heat-bath method. This is best illustrated <strong>in</strong> a speci c example. Consider<br />
aga<strong>in</strong> the Is<strong>in</strong>gmodel (4.12), <strong>and</strong> focusonas<strong>in</strong>gle site i. Then the conditional<br />
probability distribution of i, given all the other sp<strong>in</strong>s f jgj6=i, is<br />
P ( i jf jgj6=i) = const (f jgj6=i) exp<br />
2<br />
X<br />
4 i<br />
j n:n: of i<br />
j<br />
3<br />
5 : (4.19)<br />
(Note that this conditional distribution is precisely that ofas<strong>in</strong>gle Is<strong>in</strong>g sp<strong>in</strong> i <strong>in</strong> an<br />
\e ective magnetic eld" produced by the xedneighbor<strong>in</strong>g sp<strong>in</strong>s j.) The heat-bath<br />
algorithm updates i by choos<strong>in</strong>g anew sp<strong>in</strong> value 0 i, <strong>in</strong>dependent of the old value i,<br />
from the conditional distribution (4.19) all the other sp<strong>in</strong>s f jgj6=i rema<strong>in</strong> unchanged. 11<br />
As <strong>in</strong> theMetropolis algorithm, this operation is carried out over thewhole lattice, either<br />
r<strong>and</strong>omly or sequentially.<br />
Analogous algorithms can be developed for more complicated models, e.g. P (')<br />
models, -models <strong>and</strong> lattice gauge theories. In each case, we focus on a s<strong>in</strong>gle eld<br />
variable (hold<strong>in</strong>g allthe other variables xed), <strong>and</strong> giveitanew value, <strong>in</strong>dependent<br />
of the old value, chosen from the appropriate conditional distribution. Of course, the<br />
feasibility ofthis algorithm depends on our ability to construct an e cient subrout<strong>in</strong>e<br />
for generat<strong>in</strong>g the required s<strong>in</strong>gle-site (ors<strong>in</strong>gle-l<strong>in</strong>k) r<strong>and</strong>om variables. But even if this<br />
algorithm is not always the moste cientone<strong>in</strong>practice, it serves as a clear st<strong>and</strong>ard<br />
of comparison, which isuseful<strong>in</strong>the development ofmoresophisticated algorithms.<br />
Amoregeneral version of the heat-bath idea is called partial resampl<strong>in</strong>g: here we<br />
focus on a set of variables rather than only one, <strong>and</strong> the new values need not be <strong>in</strong>dependent<br />
ofthe old values. That is, we divide the variablesofthe system, call them<br />
f'g, <strong>in</strong>to two subsets, call them f g <strong>and</strong> f g. For xed values of the f g variables,<br />
<strong>in</strong>duces a conditional probability distribution of f g given f g, call it P (f gjf g).<br />
Then any algorithm for updat<strong>in</strong>g f g with f g xed that leaves <strong>in</strong>variant all of the<br />
distributions P ( jf g) will also leave <strong>in</strong>variant .One possibility istousean<strong>in</strong>dependent<br />
resampl<strong>in</strong>g of f g: wethrow away the old values f g, <strong>and</strong>take f 0 g to bea<br />
new r<strong>and</strong>om variable chosen from the probability distribution P ( jf g), <strong>in</strong>dependent<br />
of the old values. Independent resampl<strong>in</strong>g might also called block heat-bath updat<strong>in</strong>g.<br />
On the other h<strong>and</strong>, if f g is a large set of variables, <strong>in</strong>dependentresampl<strong>in</strong>g is probably<br />
11 The alert reader will note that <strong>in</strong> the Is<strong>in</strong>g case the heat-bath algorithm is equivalent tothe s<strong>in</strong>glesp<strong>in</strong>-<br />
ip Metropolis algorithm with the choice F (z) = z=(1 + z) of acceptance function. But this<br />
correspondence does not hold for more complicated models.<br />
20
unfeasible, but we are free to useany updat<strong>in</strong>g that leaves <strong>in</strong>variant the appropriate<br />
conditional distributions. Of course, <strong>in</strong> this generality \partial resampl<strong>in</strong>g" <strong>in</strong>cludes all<br />
dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms | we could just take f g to bethe entire system<br />
|butitis<strong>in</strong>many cases conceptually useful to focus on some subset of variables.<br />
The partial-resampl<strong>in</strong>g idea will be at the heart of the multi-grid <strong>Monte</strong> <strong>Carlo</strong> method<br />
(Section 5) <strong>and</strong> the embedd<strong>in</strong>g algorithms (Section 6).<br />
We have nowde nedarather large class of dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms: the<br />
s<strong>in</strong>gle-sp<strong>in</strong>- ip Metropolis algorithm, the s<strong>in</strong>gle-site heat-bath algorithm, <strong>and</strong> so on.<br />
How well do these algorithms perform? Away from phase transitions, they perform<br />
rather well. However, nearaphase transition, the autocorrelation time grows rapidly.<br />
In particular, near a critical po<strong>in</strong>t (second-order phase transition), the autocorrelation<br />
time typically diverges as<br />
m<strong>in</strong>(L ) z (4.20)<br />
where L is the l<strong>in</strong>ear size of the system, is the correlation length ofan<strong>in</strong> nite-volume<br />
system atthe sametemperature, <strong>and</strong> z is a dynamic critical exponent. This phenomenon<br />
is called critical slow<strong>in</strong>g-down itseverely hampers the study of critical phenomena by<br />
<strong>Monte</strong> <strong>Carlo</strong> methods. Most of therema<strong>in</strong>der of these lectures will be devoted, therefore,<br />
to describ<strong>in</strong>g recent progress <strong>in</strong> <strong>in</strong>vent<strong>in</strong>g new <strong>Monte</strong> <strong>Carlo</strong> algorithms with radically<br />
reduced critical slow<strong>in</strong>g-down.<br />
The critical slow<strong>in</strong>g-down of the conventional algorithms arises fundamentally from<br />
the factthat their updates are local: <strong>in</strong>as<strong>in</strong>gle step of the algorithm, \<strong>in</strong>formation" is<br />
transmitted from a given siteonlytoitsnearest neighbors. Crudely onemightguessthat<br />
this \<strong>in</strong>formation" executes a r<strong>and</strong>om walk around thelattice. In order for the system to<br />
evolve to an \essentially new" con guration, the \<strong>in</strong>formation" has to travel a distance<br />
of order ,the (static) correlation length. One would guess, therefore, that 2 near<br />
criticality, i.e. that the dynamic critical exponent z equals 2. This guess is correct for<br />
the Gaussian model (free eld). 12 For other models, we have asituation analogous to<br />
theory of static critical phenomena: thedynamic critical exponentisanontrivial number<br />
that characterizes a rather large class of algorithms (a so-called \dynamic universality<br />
class"). In any case, for most models of <strong>in</strong>terest, the dynamic critical exponent for<br />
local algorithms is close to 2(usuallysomewhat higher) [21]. Accurate measurements<br />
of dynamic critical exponents are,however, very di cult |even more di cult than<br />
measurements ofstatic critical exponents |<strong>and</strong> require enormous quantities of <strong>Monte</strong><br />
<strong>Carlo</strong> data: run lengths of 10000 ,when is itself gett<strong>in</strong>g large!<br />
We can now make a rough estimate ofthe computer time needed to study the Is<strong>in</strong>g<br />
model near its critical po<strong>in</strong>t, or quantum chromodynamics near the cont<strong>in</strong>uum limit.<br />
Eachsweepofthelattice takes a timeoforder L d ,where d is thespatial (or space-\time")<br />
dimensionality ofthe model. And weneed 2 sweeps <strong>in</strong> order to get one \e ectively<br />
12 Indeed, for the Gaussian model this r<strong>and</strong>om-walk picture can be made rigorous: see [19] comb<strong>in</strong>ed<br />
with [20, Section 8].<br />
21
<strong>in</strong>dependent" sample. So this means a computer timeoforder L d z > d+z . 13 For highprecision<br />
statistics one mightwant 10 6 \<strong>in</strong>dependent" samples. The reader is <strong>in</strong>vited<br />
to plug <strong>in</strong> = 100, d = 3 (or d =4ifyou're an elementary-particle physicist) <strong>and</strong> get<br />
depressed. It should be emphasized that the factor d is <strong>in</strong>herent <strong>in</strong>all <strong>Monte</strong> <strong>Carlo</strong><br />
algorithms for sp<strong>in</strong> models <strong>and</strong> eldtheories (but not for self-avoid<strong>in</strong>gwalks, see Section<br />
7). The factor z could, however, conceivably be reduced or elim<strong>in</strong>ated by a more clever<br />
algorithm.<br />
What istobedone? Our knowledge of the physics of critical slow<strong>in</strong>g-down tells us<br />
that the slowmodes are the long-wavelength modes, if the updat<strong>in</strong>g is purely local. The<br />
natural solution is therefore to speedupthose modes by some sort of collective-mode<br />
(nonlocal) updat<strong>in</strong>g. It is necessary,then, to identify physically theappropriatecollective<br />
modes, <strong>and</strong> todevise an e cient computational algorithm for speed<strong>in</strong>g upthose modes.<br />
These two goals are unfortunately <strong>in</strong> con ict it is very di cult todevise collectivemode<br />
algorithms that are not so nonlocal that their computational cost outweighs the<br />
reduction <strong>in</strong> critical slow<strong>in</strong>g-down. Speci c implementations of the collective-mode idea<br />
are thus highly model-dependent. At least three such algorithms have been<strong>in</strong>vented so<br />
far:<br />
Fourier acceleration [24]<br />
Multi-grid <strong>Monte</strong> <strong>Carlo</strong> (MGMC) [25, 20, 26]<br />
The Swendsen-Wang algorithm [27] <strong>and</strong> itsgeneralizations<br />
Fourier acceleration <strong>and</strong> MGMC are very similar <strong>in</strong> spirit (though quite di erent technically).<br />
Their performance is thus probably qualitatively similar, <strong>in</strong> the sensethat they<br />
probably work well for the same models <strong>and</strong> work badlyforthe samemodels. In the<br />
next lecture we givean<strong>in</strong>troduction to the MGMCmethod <strong>in</strong> the follow<strong>in</strong>g lecture we<br />
discuss algorithms of Swendsen-Wang type.<br />
13 Clearly one must take L > <strong>in</strong> order to avoid severe nite-size e ects. Typically one approaches<br />
the critical po<strong>in</strong>twith L c ,where c 2;4, <strong>and</strong> then uses nite-size scal<strong>in</strong>g [22, 23] to extrapolate to<br />
the <strong>in</strong> nite-volume limit. Note Added 1996: Recently, radical advances have beenmade <strong>in</strong>apply<strong>in</strong>g<br />
nite-size scal<strong>in</strong>g to<strong>Monte</strong> <strong>Carlo</strong> simulations (see [97, 98]<strong>and</strong> especially [99, 100, 101,102,103])the<br />
preced<strong>in</strong>g two sentences can now beseentobefartoo pessimistic. For reliable extrapolation to the<br />
<strong>in</strong> nite-volume limit, L <strong>and</strong> must both be 1, but the ratio L= can <strong>in</strong> some cases be as small<br />
as 10 ;3 or even smaller (depend<strong>in</strong>g onthe model <strong>and</strong> onthe quality ofthe data). However, reliable<br />
extrapolation requires careful attention to systematic errors aris<strong>in</strong>g from correction-to-scal<strong>in</strong>g terms.<br />
In practice, reliable <strong>in</strong> nite-volume values (with bothstatistical <strong>and</strong> systematic errors of order a few<br />
percent) can be obta<strong>in</strong>ed at 10 5 from lattices of size L 10 2 ,at least <strong>in</strong> somemodels [101,102, 103].<br />
22
5 Multi-Grid Algorithms<br />
Thephenomenon of critical slow<strong>in</strong>g-down is not con ned to<strong>Monte</strong><strong>Carlo</strong> simulations:<br />
very similar di culties were encountered long agobynumerical analysts concerned<br />
with the numerical solution of partial di erential equations. An <strong>in</strong>genious solution,<br />
now called the multi-grid (MG) method, was proposed <strong>in</strong> 1964 by the Soviet numerical<br />
analyst Fedorenko [28]: the idea is to consider, <strong>in</strong> addition to the orig<strong>in</strong>al (\ ne-grid")<br />
problem, a sequence of auxiliary \coarse-grid" problems that approximate the behavior<br />
of the orig<strong>in</strong>al problem for excitations at successively longer length scales (a sort of<br />
\coarse-gra<strong>in</strong><strong>in</strong>g" procedure). The localupdates of the traditional algorithms are then<br />
supplemented by coarse-grid updates. To apresent-day physicist, this philosophy is<br />
remarkably rem<strong>in</strong>iscentoftherenormalization group | so it is all the more remarkable<br />
that itwas <strong>in</strong>vented two years before the work ofKadano [29] <strong>and</strong> seven years before<br />
the work ofWilson[30]!After a decade of dormancy, multi-grid was revived <strong>in</strong> the mid-<br />
1970's [31], <strong>and</strong> was shown to beanextremely e cient computational method. In the<br />
1980's, multi-grid methods have becomeanactive area of research<strong>in</strong>numerical analysis,<br />
<strong>and</strong> have been applied to awidevariety of problems <strong>in</strong> classical physics [32, 33]. Very<br />
recently [25] it was shown how astochastic generalization of the multi-grid method |<br />
multi-grid <strong>Monte</strong> <strong>Carlo</strong> (MGMC) | can be applied to problems <strong>in</strong> statistical, <strong>and</strong>hence<br />
also Euclidean quantum, physics.<br />
In this lecture we beg<strong>in</strong> by giv<strong>in</strong>g a brief <strong>in</strong>troduction to the determ<strong>in</strong>isticmulti-grid<br />
method we then expla<strong>in</strong> the stochastic analogue. 14 But itisworth <strong>in</strong>dicat<strong>in</strong>g nowthe<br />
basic idea beh<strong>in</strong>d this generalization. There is a strong analogy between solv<strong>in</strong>g lattice<br />
systems of equations (such asthediscrete Laplace equation) <strong>and</strong> mak<strong>in</strong>g <strong>Monte</strong> <strong>Carlo</strong><br />
simulations of lattice r<strong>and</strong>om elds. Indeed, given a Hamiltonian H('), thedeterm<strong>in</strong>istic<br />
problem is that ofm<strong>in</strong>imiz<strong>in</strong>g H('), while the stochastic problem is that ofgenerat<strong>in</strong>g<br />
r<strong>and</strong>om samples from the Boltzmann-Gibbs probability distribution e ; H(') . The<br />
statistical-mechanical problem reduces to the determ<strong>in</strong>istic one<strong>in</strong>thezero-temperature limit ! +1.<br />
Many (butnot all) of the determ<strong>in</strong>istic iterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g H(')<br />
can be generalized to stochastic iterative algorithms | that is, dynamic <strong>Monte</strong> <strong>Carlo</strong><br />
methods | for generat<strong>in</strong>g r<strong>and</strong>om samples from e ; H(') .For example, the Gauss-Seidel<br />
; H<br />
algorithm for m<strong>in</strong>imiz<strong>in</strong>g H <strong>and</strong>theheat-bath algorithm for r<strong>and</strong>om sampl<strong>in</strong>g from e<br />
are very closely related. Both algorithms sweep successively through thelattice, work<strong>in</strong>g<br />
on one site x at atime. The Gauss-Seidel algorithm updates 'x so as to m<strong>in</strong>imize the<br />
Hamiltonian H(') =H('x f'ygy6=x) when all the other elds f'ygy6=x are held xed at<br />
their current values. The heat-bath algorithm gives 'x anew r<strong>and</strong>om value chosen from<br />
the probability distribution exp[;H('x f'ygy6=x)], with allthe elds f'ygy6=x aga<strong>in</strong><br />
held xed. As ! +1 the heat-bath algorithm approaches Gauss-Seidel. A similar<br />
correspondence holds between MG <strong>and</strong> MGMC.<br />
14 For an excellent <strong>in</strong>troduction to thedeterm<strong>in</strong>istic multi-grid method, see Briggs [34] more advanced<br />
topics are covered <strong>in</strong> the book of Hackbusch [32]. Both MG<strong>and</strong> MGMC are discussed <strong>in</strong> detail <strong>in</strong> [20].<br />
23
Before enter<strong>in</strong>g <strong>in</strong>to details, let us emphasize that although the multi-grid method<br />
<strong>and</strong> the block-sp<strong>in</strong> renormalization group (RG) are based on very similar philosophies<br />
|deal<strong>in</strong>g with as<strong>in</strong>gle length scale at atime |they are <strong>in</strong> fact very di erent. In<br />
particular, the conditional coarse-grid Hamiltonian employed <strong>in</strong> the MGMCmethod is<br />
not the sameasthe renormalized Hamiltonian given by ablock-sp<strong>in</strong> RG transformation.<br />
The RG transformation computes the marg<strong>in</strong>al, notthe conditional, distribution of the<br />
block means | that is, it <strong>in</strong>tegrates over the complementary degrees of freedom, while<br />
the MGMC method xes these degrees of freedom at their current (r<strong>and</strong>om) values. The<br />
conditional Hamiltonian employed <strong>in</strong> MGMC is given by anexplicit nite expression,<br />
while themarg<strong>in</strong>al (RG) Hamiltonian cannot be computed <strong>in</strong> closed form. The failure to<br />
appreciatethese dist<strong>in</strong>ctions has unfortunately led to much confusion <strong>in</strong> the literature. 15<br />
5.1 Determ<strong>in</strong>istic Multi-Grid<br />
In this section we giveapedagogical <strong>in</strong>troduction to multi-grid methods<strong>in</strong>the simplest<br />
case, namely the solution of determ<strong>in</strong>isticl<strong>in</strong>ear or nonl<strong>in</strong>ear systems of equations.<br />
Consider, for purposes of exposition, the lattice Poisson equation ; ' = f <strong>in</strong> a<br />
region Zd with zero Dirichlet data. Thus, the equation is<br />
(; ')x 2d'x ; X<br />
'x0 = fx (5.1)<br />
x 0 : jx;x 0j=1<br />
for x 2 , with 'x 0forx =2 . This problem is equivalent to m<strong>in</strong>imiz<strong>in</strong>g the<br />
quadratic Hamiltonian<br />
H(') = 1<br />
2<br />
X<br />
('x ; 'y)<br />
hxyi<br />
2 ; X<br />
x<br />
More generally, wemay wish to solve al<strong>in</strong>ear system<br />
fx'x : (5.2)<br />
A' = f (5.3)<br />
where for simplicity weshall assume A to besymmetric <strong>and</strong> positive-de nite. This<br />
problem is equivalent to m<strong>in</strong>imiz<strong>in</strong>g<br />
H(') = 1<br />
h' A'i;hf'i : (5.4)<br />
2<br />
Later we shall consider also non-quadratic Hamiltonians.<br />
Our goal is todevise a rapidly convergentiterativemethod for solv<strong>in</strong>gnumerically the<br />
l<strong>in</strong>ear system (5.3). We shall restrict attention to rst-order stationary l<strong>in</strong>ear iterations<br />
of the general form<br />
' (n+1) = M' (n) + Nf (5.5)<br />
15 For further discussion, see [20, Section 10.1].<br />
24
where ' (0) is an arbitrary <strong>in</strong>itial guess for the solution. Obviously, wemust dem<strong>and</strong> at<br />
the very least that the truesolution ' A ;1 f be a xed po<strong>in</strong>t of (5.5) impos<strong>in</strong>g this<br />
condition for all f, we conclude that<br />
N = (I ; M)A ;1 : (5.6)<br />
The iteration (5.5) is thus completely speci ed by its iteration matrix M. Moreover,<br />
(5.5)-(5.6) imply that the errore (n) ' (n) ; ' satis es<br />
e (n+1) = Me (n) : (5.7)<br />
That is, the iteration matrix is the ampli cation matrix for the error. It follows easily<br />
that the iteration (5.5) is convergent for all <strong>in</strong>itial vectors ' (0) if <strong>and</strong> only if the spectral<br />
radius (M) limn!1 jjM n jj 1=n is < 1 <strong>and</strong> <strong>in</strong>this case the convergence is exponential<br />
with asymptotic rate at least (M), i.e.<br />
jj' (n) ; 'jj Kn p (M) n<br />
(5.8)<br />
for some K p < 1 (K depends on ' (0) ).<br />
Now let us return to the speci c system (5.1). One simple iterative algorithm arises<br />
by solv<strong>in</strong>g (5.1) repeatedly for 'x:<br />
' (n+1)<br />
x = 1<br />
2d<br />
2<br />
4 X<br />
x 0 : jx;x 0j=1<br />
' (n)<br />
x<br />
3<br />
0 + fx5<br />
: (5.9)<br />
(5.9) is called the Jacobi iteration. Itisconvenienttoconsider also a slightgeneralization<br />
of (5.9): let 0
2<br />
L<br />
where p1p2 = ::: L+1 L+1 L+1 . The spectral radius of MDJ! is the eigenvalue of<br />
largest magnitude, namely<br />
(MDJ!) = L+1 L+1 = 1 ; ! 1 ; cos L +1<br />
= 1 ; O(L ;2 ) : (5.13)<br />
It follows that O(L2 )iterations are needed for the damped Jacobi iteration to converge<br />
adequately. This represents an enormous computational labor when L is large.<br />
It is easy to see what isgo<strong>in</strong>gonhere: the slow modes ( p 1) are the longwavelength<br />
modes (p1p2 1). [If ! 1, then some modeswith wavenumber p =<br />
(p1p2) ( )have eigenvalue p ;1<strong>and</strong>so also are slow. This phenomenon can<br />
be avoided by tak<strong>in</strong>g ! signi cantly less than 1 for simplicity weshall henceforth take<br />
! = 1<br />
2 ,whichmakes p 0 for all p.] It is also easy to see physically why the longwavelength<br />
modesare slow. The key fact is that the (damped) Jacobi iteration is local:<br />
<strong>in</strong> a s<strong>in</strong>gle step of the algorithm, \<strong>in</strong>formation" is transmitted only to nearest neighbors.<br />
One mightguessthat this \<strong>in</strong>formation" executes a r<strong>and</strong>om walk around the lattice <strong>and</strong><br />
for the true solution to bereached, \<strong>in</strong>formation" must propagate fromthe boundaries<br />
to the <strong>in</strong>terior (<strong>and</strong> back<strong>and</strong>forth until \equilibrium" isatta<strong>in</strong>ed). This takesatime<br />
of order L2 , <strong>in</strong> agreement with (5.13). In fact, this r<strong>and</strong>om-walk picture can be made<br />
rigorous [19].<br />
This is an example of a critical phenomenon, <strong>in</strong> precisely thesamesense thattheterm<br />
is used <strong>in</strong> statistical mechanics. The Laplace operator A = ; iscritical, <strong>in</strong>asmuch<br />
as its Green function A ;1 has long-range correlations (power-law decay <strong>in</strong> dimension<br />
d>2, or growth <strong>in</strong>d 2). This means that the solution of Poisson's equation <strong>in</strong> one<br />
region of the lattice depends strongly on the solution <strong>in</strong> distant regions of the lattice<br />
\<strong>in</strong>formation" must propagate globally <strong>in</strong> order for \equilibrium" tobereached. Put<br />
another way, excitations at many length scales are signi cant, from one lattice spac<strong>in</strong>g<br />
at the smallest to the entire lattice at the largest. The situation would be very di erent<br />
if we were to consider <strong>in</strong>stead the Helmholtz-Yukawa equation (; +m2 )' = f with<br />
m>0: its Green function has exponential decay with characteristic length m ;1 ,so<br />
that regions of the lattice separated by distances m ;1 are essentially decoupled. In<br />
this case, \<strong>in</strong>formation" need only propagate adistance of order m<strong>in</strong>(m ;1L) <strong>in</strong> order<br />
for \equilibrium" to bereached. This takes a time oforder m<strong>in</strong>(m ;2L2 ), an estimate<br />
which can be con rmed rigorously by comput<strong>in</strong>g the obvious generalization of (5.12){<br />
(5.13). On the other h<strong>and</strong>, as m ! 0we recover the Laplace operator with itsattendant<br />
di culties: m =0isacritical po<strong>in</strong>t. Wehave here an example of critical slow<strong>in</strong>g-down<br />
<strong>in</strong> classical physics.<br />
The general structure of a remedy should now beobvious to physicists reared on<br />
the renormalization group: don't try to deal with all length scalesatonce, but de ne<br />
<strong>in</strong>stead a sequence of problems <strong>in</strong> which eachlength scale, beg<strong>in</strong>n<strong>in</strong>g withthe smallest<br />
<strong>and</strong> work<strong>in</strong>g towards the largest, can be dealt withseparately. Analgorithm of precisely<br />
thisformwas proposed <strong>in</strong> 1964 by the Soviet numerical analyst Fedorenko [28],<strong>and</strong>is<br />
now called the multi-grid method.<br />
26
Note rst that the only slow modes <strong>in</strong> the damped Jacobi iteration are the longwavelengthmodes<br />
(provided that ! is not near 1): as long as, say, max(p1p2) 2 ,we<br />
3<br />
1<br />
have 0 p 4 (for ! = 2 ), <strong>in</strong>dependent ofL. Itfollows that the short-wavelength<br />
components oftheerror e (n) ' (n) ; ' can be e ectively killed by a few (say, ve<br />
or ten) damped Jacobi iterations. The rema<strong>in</strong><strong>in</strong>g error has primarily long-wavelength<br />
components, <strong>and</strong> so is slowly vary<strong>in</strong>g <strong>in</strong>x-space. But a slowly vary<strong>in</strong>g function can<br />
be well represented on a coarser grid: if, for example, we were told e (n)<br />
x only at even<br />
values of x, we could nevertheless reconstruct with highaccuracythe function e (n)<br />
x at all<br />
x by, say, l<strong>in</strong>ear <strong>in</strong>terpolation. This suggests an improved algorithm for solv<strong>in</strong>g (5.1):<br />
perform a few damped Jacobi iterations on the orig<strong>in</strong>al grid, until the (unknown) error<br />
is smooth <strong>in</strong>x-space then set up an auxiliary coarse-grid problem whose solution will<br />
be approximately this error (this problem will turn out tobeaPoisson equation on the<br />
coarser grid) perform a few damped Jacobi iterations on the coarser grid <strong>and</strong> then<br />
transfer (<strong>in</strong>terpolate) the resultbacktothe orig<strong>in</strong>al ( ne) grid <strong>and</strong> add it <strong>in</strong> to the<br />
current approximate solution.<br />
There are two advantages to perform<strong>in</strong>g the damped Jacobi iterations on the coarse<br />
grid. Firstly, the iterations take lesswork, because there are fewer lattice po<strong>in</strong>ts onthe<br />
coarse grid (2 ;d times as many for a factor-of-2 coarsen<strong>in</strong>g <strong>in</strong>d dimensions). Secondly,<br />
with respect to the coarse grid the long-wavelength modes no longer have such long<br />
wavelength: their wavelength has been halved (i.e. their wavenumber has been doubled).<br />
This suggests that those modes with, say, max(p1p2) 4 can be e ectively killed<br />
by a few damped Jacobi iterations on the coarse grid. And then we can transfer the<br />
rema<strong>in</strong><strong>in</strong>g (smooth) error to ayet coarser grid, <strong>and</strong> so on recursively. Theseare the<br />
essential ideas of the multi-grid method.<br />
Let us now give aprecisede nition of the multi-grid algorithm. For simplicity<br />
we shall restrict attention to problems de ned <strong>in</strong> variational form16 : thus, the goal is<br />
to m<strong>in</strong>imize a real-valued function (\Hamiltonian") H('), where ' runs over some<br />
N-dimensional real vector space U. Weshall treat quadratic <strong>and</strong> non-quadratic Hamiltonians<br />
on an equal foot<strong>in</strong>g. In order to specify the algorithm we must specify the<br />
follow<strong>in</strong>g <strong>in</strong>gredients:<br />
1) A sequence of coarse-grid spaces UM U UM;1 UM;2 ::: U0. Here dim Ul<br />
Nl <strong>and</strong> N = NM >NM;1 >NM;2 > >N0.<br />
2) Prolongation (or \<strong>in</strong>terpolation") operators pll;1: Ul;1 ! Ul for 1 l M.<br />
3) Basic (or \smooth<strong>in</strong>g") iterations Sl: Ul Hl ! Ul for 0 l M. HereHl is<br />
a space of \possible Hamiltonians" de ned on Ul we discuss this <strong>in</strong> more detail<br />
below. Therole of Sl is to takeanapproximate m<strong>in</strong>imizer ' 0 l of the Hamiltonian Hl<br />
<strong>and</strong>computeanew (hopefully better) approximate m<strong>in</strong>imizer ' 00<br />
l = Sl(' 0 l Hl). [For<br />
16 In fact, the multi-grid method can be applied to the solution of l<strong>in</strong>ear or nonl<strong>in</strong>ear systems of<br />
equations, whether or not these equations come from a variational pr<strong>in</strong>ciple. See, for example, [32] <strong>and</strong><br />
[20, Section 2].<br />
27
the present wecanimag<strong>in</strong>e that Sl consists of a few iterations of damped Jacobi<br />
for the Hamiltonian Hl.] Most generally, weshall use two smooth<strong>in</strong>g iterations,<br />
they may be the same, but need not be.<br />
S pre<br />
l<br />
<strong>and</strong> S post<br />
l<br />
4) Cycle control parameters (<strong>in</strong>tegers) l 1 for 1 l M, whichcontrol the<br />
number of times that the coarse grids are visited.<br />
We discuss these <strong>in</strong>gredients <strong>in</strong> more detail below.<br />
The multi-grid algorithm is then de ned recursively as follows:<br />
procedure mgm(l ' Hl)<br />
comment This algorithm takes an approximate m<strong>in</strong>imizer ' of the Hamiltonian<br />
Hl, <strong>and</strong>overwrites it with abetter approximate m<strong>in</strong>imizer.<br />
' S pre<br />
l (' Hl)<br />
if l>0 then<br />
endif<br />
compute Hl;1( ) Hl(' + pll;1 )<br />
0<br />
for j =1 until l do mgm(l ; 1 Hl;1)<br />
' ' + pll;1<br />
' Spost l (' Hl)<br />
end<br />
Here is what is go<strong>in</strong>g on: We wishtosolve the m<strong>in</strong>imize the Hamiltonian Hl, <strong>and</strong> are<br />
given as <strong>in</strong>put anapproximate m<strong>in</strong>imizer '. The algorithm consists ofthree steps:<br />
1) Pre-smooth<strong>in</strong>g. We apply the basiciteration (e.g. a few sweeps of damped Jacobi)<br />
to the given approximate m<strong>in</strong>imizer. This produces a better approximate m<strong>in</strong>imizer <strong>in</strong><br />
whichthe high-frequency (short-wavelength) componentsofthe error havebeenreduced<br />
signi cantly. Therefore, the error, although still large, is smooth <strong>in</strong> x-space (whence the<br />
name \smooth<strong>in</strong>g iteration").<br />
2) Coarse-grid correction. We want tomove rapidly towards the m<strong>in</strong>imizer ' of Hl,<br />
us<strong>in</strong>g coarse-grid updates. Because of the pre-smooth<strong>in</strong>g, the error ' ; ' is a smooth<br />
function <strong>in</strong> x-space, so it should be well approximated by elds <strong>in</strong> the range of the<br />
prolongation operator pll;1. Wewilltherefore carry out a coarse-grid update <strong>in</strong> which<br />
' is replaced by ' + pll;1 , where lies <strong>in</strong> the coarse-grid subspace Ul;1. A sensible<br />
goal is to attempt to choose so as to m<strong>in</strong>imize Hl that is, we attempt to m<strong>in</strong>imize<br />
Hl;1( ) Hl(' + pll;1 ) : (5.14)<br />
To carry out this approximate m<strong>in</strong>imization, we useafew( l) iterations of the best<br />
algorithm we know|namely, multi-grid itself! And westart at the bestapproximate<br />
28
m<strong>in</strong>imizer we know, namely =0! Thegoalofthis coarse-grid correction step is to<br />
reduce signi cantly the low-frequency components ofthe error <strong>in</strong> ' (hopefully without<br />
creat<strong>in</strong>g large new high-frequency error components).<br />
3) Post-smooth<strong>in</strong>g. We apply, for good measure, a few more sweeps of the basic<br />
smoother. (This would protect aga<strong>in</strong>st any high-frequency error components which<br />
may <strong>in</strong>advertently have been created by the coarse-grid correction step.)<br />
The forego<strong>in</strong>g constitutes, of course, a s<strong>in</strong>gle step of the multi-grid algorithm. In<br />
practice this step would be repeated several times, as <strong>in</strong> any other iteration, until the<br />
error has been reduced to an acceptably small value. The advantage of multi-grid over<br />
the traditional (e.g. damped Jacobi) iterative methods is that, with a suitable choice<br />
of the <strong>in</strong>gredients pll;1, Sl <strong>and</strong> so on, only a few (maybe ve orten) iterations are<br />
needed to reduce the errortoasmall value, <strong>in</strong>dependent of the lattice sizeL. This<br />
contrasts favorably with the behavior (5.13) of the damped Jacobi method, <strong>in</strong> which<br />
O(L 2 )iterations are needed.<br />
The multi-grid algorithm is thus a general framework the userhas considerable<br />
freedom <strong>in</strong> choos<strong>in</strong>g the speci c <strong>in</strong>gredients, which must be adapted to the speci c<br />
problem. We now discuss brie y each ofthese <strong>in</strong>gredients more details can be found<br />
<strong>in</strong> Chapter 3 of the book of Hackbusch [32].<br />
Coarse grids. Most commonly one usesauniform factor-of-2 coarsen<strong>in</strong>g between<br />
each grid l <strong>and</strong> the next coarser grid l;1. The coarse-grid po<strong>in</strong>ts could be either a<br />
subset of the ne-grid po<strong>in</strong>ts (Fig. 1) or a subset of the dual lattice (Fig. 2). These<br />
schemes have obvious generalizations to higher-dimensional cubic lattices. In dimension<br />
d =2,another possibility isauniform factor-of- p 2 coarsen<strong>in</strong>g (Fig. 3) note that the<br />
coarse grid is aga<strong>in</strong> a square lattice, rotated by 45. Figs. 1{3 are often referred to as<br />
\st<strong>and</strong>ard coarsen<strong>in</strong>g", \staggered coarsen<strong>in</strong>g", <strong>and</strong> \red-black (orcheckerboard) coarsen<strong>in</strong>g",<br />
respectively. Coarsen<strong>in</strong>gs by a larger factor (e.g. 3) could also be considered,<br />
but are generally disadvantageous. Note that eachofthe above schemes works also for<br />
periodic boundary conditions provided that the l<strong>in</strong>ear size Ll of the grid l is even. For<br />
this reason it is most convenient totake the l<strong>in</strong>ear size L LM of the orig<strong>in</strong>al ( nest)<br />
grid M to beapower of 2, or at least a power of 2 times a small <strong>in</strong>teger. Other<br />
de nitions of coarse grids (e.g. anisotropic coarsen<strong>in</strong>g) are occasionally appropriate.<br />
Prolongation operators. For a coarse grid as <strong>in</strong> Fig. 2, a natural choice of prolongation<br />
operator is piecewise-constant <strong>in</strong>jection:<br />
(pll;1'l;1) x1<br />
1<br />
2 x2<br />
1<br />
2<br />
= ('l;1)x1x2 for all x 2 l;1 (5.15)<br />
(illustrated here for d = 2). It can be represented <strong>in</strong> an obvious shorth<strong>and</strong> notation by<br />
the stencil "<br />
1<br />
1<br />
#<br />
1<br />
:<br />
1<br />
(5.16)<br />
For a coarse grid as <strong>in</strong> Fig. 1, a natural choice is piecewise-l<strong>in</strong>ear <strong>in</strong>terpolation, one<br />
29
Figure 1: St<strong>and</strong>ard coarsen<strong>in</strong>g (factor-of-2) <strong>in</strong> dimension d = 2. Dots are ne-grid sites<br />
<strong>and</strong> crosses are coarse-grid sites.<br />
Figure 2: Staggered coarsen<strong>in</strong>g (factor-of-2) <strong>in</strong> dimension d =2.<br />
30
Figure 3: Red-black coarsen<strong>in</strong>g (factor-of- p 2) <strong>in</strong> dimension d =2.<br />
example of which isthe n<strong>in</strong>e-po<strong>in</strong>t prolongation<br />
2<br />
6<br />
4<br />
1<br />
4<br />
1<br />
2<br />
1<br />
4<br />
1 1<br />
2 1 2<br />
1<br />
4<br />
1<br />
2<br />
1<br />
4<br />
3<br />
7<br />
5<br />
: (5.17)<br />
Higher-order <strong>in</strong>terpolations (e.g. quadratic or cubic) can also be considered. All these<br />
prolongation operators can easily be generalized to higher dimensions.<br />
We have ignored here some important subtleties concern<strong>in</strong>g the treatment ofthe<br />
boundaries <strong>in</strong> de n<strong>in</strong>g the prolongation operator. Fortunately we shall not have to<br />
worry much about this problem, s<strong>in</strong>ce most applications <strong>in</strong> statistical mechanics <strong>and</strong><br />
quantum eld theory use periodic boundary conditions.<br />
Coarse-grid Hamiltonians. What does the coarse-grid Hamiltonian Hl;1 look like?<br />
If the ne-grid Hamiltonian Hl is quadratic,<br />
Hl(') = 1<br />
2 h' Al'i;hfl'i (5.18)<br />
then so is the coarse-grid Hamiltonian Hl;1:<br />
where<br />
Hl;1( ) Hl(' + pll;1 ) (5.19)<br />
= 1<br />
2 h Al;1 i;hd i +const (5.20)<br />
Al;1 p ll;1 Al pll;1 (5.21)<br />
d p ll;1(f ; Al') (5.22)<br />
31
The coarse-grid problemisthus also a l<strong>in</strong>ear equation whose right-h<strong>and</strong> side is just<br />
the \coarse-gra<strong>in</strong><strong>in</strong>g" of the residual r f ; Al' this coarse-gra<strong>in</strong><strong>in</strong>g isperformed<br />
us<strong>in</strong>g the adjo<strong>in</strong>t ofthe <strong>in</strong>terpolation operator pll;1. The exact form of the coarsegrid<br />
operator Al;1 depends on the ne-grid operator Al <strong>and</strong> onthe choice of <strong>in</strong>terpolation<br />
operator pll;1. For example, if Al is the nearest-neighbor Laplacian <strong>and</strong> pll;1 is<br />
piecewise-constant <strong>in</strong>jection, then it is easily checked that Al;1 is also a nearest-neighbor<br />
Laplacian (multiplied by an extra 2 d;1 ). On the other h<strong>and</strong>, if pll;1 is piecewise-l<strong>in</strong>ear<br />
<strong>in</strong>terpolation, then Al;1 will havenearest-neighbor <strong>and</strong> next-nearest-neighbor terms (but<br />
noth<strong>in</strong>g worse than that).<br />
Clearly, the po<strong>in</strong>t istochoose classes of Hamiltonians Hl with the property that if<br />
Hl 2Hl <strong>and</strong> ' 2 Ul, then the coarse-grid Hamiltonian Hl;1 de ned by (5.14) necessarily<br />
lies <strong>in</strong> Hl;1. Inparticular, it is convenient(though not <strong>in</strong> pr<strong>in</strong>ciple necessary) to choose<br />
all the Hamiltonians to have the same \functional form" this functional form must be<br />
one which isstable under the coarsen<strong>in</strong>g operation (5.14). For example, suppose that<br />
the Hamiltonian Hl is a ' 4 theory with nearest-neighbor gradient term <strong>and</strong> possibly<br />
site-dependent coe cients:<br />
where<br />
Hl(') = 2<br />
X<br />
jx;x 0j=1<br />
('x ; 'x 0)2 + X<br />
x<br />
Vx('x) (5.23)<br />
Vx('x) = ' 4 x + x' 3 x + Ax' 2 x + hx'x: (5.24)<br />
Suppose, further, that the prolongation operator pll;1 is piecewise-constant <strong>in</strong>jection<br />
(5.15). Then the coarse-grid Hamiltonian Hl;1( ) Hl(' + pll;1 ) can easily be<br />
computed: it is<br />
where<br />
<strong>and</strong><br />
Hl;1( ) =<br />
2<br />
0<br />
X<br />
jy;y 0j=1<br />
( y ; y 0)2 + X<br />
y<br />
V 0<br />
y( y)+const (5.25)<br />
V 0<br />
y( y) = 0 4 y + 0 y 3 y + A 0 y 2 y + h 0 y y (5.26)<br />
0<br />
=<br />
d;1<br />
2 (5.27)<br />
0<br />
0<br />
y<br />
=<br />
=<br />
d<br />
2 (5.28)<br />
X<br />
A<br />
(4 'x + x)<br />
x2By<br />
(5.29)<br />
0 y =<br />
X<br />
(6 '<br />
x2By<br />
2 x +3 h<br />
x'x + Ax) (5.30)<br />
0 y =<br />
X<br />
(4 ' 3 x +3 x' 2 x +2Ax'x + hx) (5.31)<br />
x2By<br />
Here By is the block consist<strong>in</strong>g ofthose 2 d sites of grid l which are a ected by <strong>in</strong>terpolation<br />
from the coarse-grid site y 2 l;1 (see Figure 2). Note that the coarse-grid<br />
32
Hamiltonian Hl;1 has the same functional form as the \ ne-grid" Hamiltonian Hl: itis<br />
speci ed by the coe cients 0 , 0 , 0 y, A 0 y <strong>and</strong> h 0 y. Thestep \compute Hl;1" therefore<br />
means to compute these coe cients. Note alsothe importance of allow<strong>in</strong>g <strong>in</strong> (5.23)<br />
for ' 3 <strong>and</strong> ' terms <strong>and</strong> for site-dependent coe cients: even if these are not present <strong>in</strong><br />
the orig<strong>in</strong>al Hamiltonian H HM, they will be generated on coarser grids. F<strong>in</strong>ally,<br />
we emphasize that the coarse-grid Hamiltonian Hl;1 depends implicitly on the current<br />
value of the ne-lattice eld ' 2 Ul although our notation suppresses this dependence,<br />
it should be kept <strong>in</strong> m<strong>in</strong>d.<br />
Basic (smooth<strong>in</strong>g) iterations. We have already discussed the damped Jacobi iteration<br />
as one possible smoother. Note that <strong>in</strong>this method only the \old" values ' (n) are<br />
used on the right-h<strong>and</strong> side of (5.9)/(5.10), even though for someofthe terms the \new"<br />
value ' (n+1) may already have been computed. An alternative algorithm is to useat<br />
each stage on the right-h<strong>and</strong> sidethe \newest" available value. This algorithm is called<br />
the Gauss-Seidel iteration. 17 Note that the Gauss-Seidel algorithm, unlike the Jacobi<br />
algorithm, depends on the order<strong>in</strong>g of the grid po<strong>in</strong>ts. For example, if a 2-dimensional<br />
grid is swept <strong>in</strong> lexicographic order (1 1), (2 1), :::,(L 1), (1 2), (2 2), :::,(L 2), :::,<br />
(1L), (2L), :::,(L L), then the Gauss-Seidel iteration becomes<br />
' (n+1)<br />
x1x2<br />
1<br />
=<br />
4 ['(n) x1+1x2 + '(n+1) x1;1x2 + '(n) x1x2+1 + ' (n+1)<br />
x1x2;1 + fx1x2]: (5.32)<br />
Another convenient order<strong>in</strong>g isthe red-black (or checkerboard) order<strong>in</strong>g, <strong>in</strong>whichthe<br />
\red" sublattice r = fx 2 : x1 + + xd is eveng is swept rst, followed by the<br />
\black" sublattice b = fx 2 : x1 + + xd is oddg. Notethat the order<strong>in</strong>g ofthe<br />
grid po<strong>in</strong>ts with<strong>in</strong> each sublattice is irrelevant [forthe usual nearest-neighbor Laplacian<br />
(5.1)], s<strong>in</strong>ce the matrix A does not couple sites of the same color. This means that redblack<br />
Gauss-Seidel is particularly well suited to vector or parallel computation. Note<br />
that the red-black order<strong>in</strong>g makes sense with periodic boundary conditions only if the<br />
l<strong>in</strong>ear size Ll of the grid l is even.<br />
It turns out that Gauss-Seidel is a better smoother than damped Jacobi (even if the<br />
latter is given its optimal !). Moreover, Gauss-Seidel is easier to program <strong>and</strong> requires<br />
only half the storage space (no need for separate storage of \old" <strong>and</strong> \new" values).<br />
The only reason we <strong>in</strong>troduced damped Jacobi at allisthat itiseasiertounderst<strong>and</strong><br />
<strong>and</strong> toanalyze.<br />
Many other smooth<strong>in</strong>g iterations can be considered, <strong>and</strong> can be advantageous <strong>in</strong><br />
anisotropic or otherwise s<strong>in</strong>gular problems [32, Section 3.3 <strong>and</strong> Chapters 10{11]. But<br />
we shall stick toord<strong>in</strong>ary Gauss-Seidel, usually with red-black order<strong>in</strong>g.<br />
will consist, respectively,ofm1 <strong>and</strong> m2 iterations of the Gauss-<br />
Thus, S pre<br />
l<br />
<strong>and</strong> S post<br />
l<br />
Seidel algorithm. The balance between pre-smooth<strong>in</strong>g <strong>and</strong> post-smooth<strong>in</strong>g is usually<br />
not very crucial only the total m1 + m2 seems to matter much. Indeed, one (butnot<br />
17 It is amus<strong>in</strong>g to note that \Gauss did not use a cyclic order of relaxation, <strong>and</strong> :::Seidel speci cally<br />
recommended aga<strong>in</strong>st us<strong>in</strong>g it" [36, p.44n]. See also [37].<br />
33
oth!) of m1 or m2 could be zero, i.e. either the pre-smooth<strong>in</strong>g orthe post-smooth<strong>in</strong>g<br />
could be omitted entirely. Increas<strong>in</strong>g m1 <strong>and</strong> m2 improves the convergence rate ofthe<br />
multi-grid iteration, but atthe expense of <strong>in</strong>creased computational labor per iteration.<br />
The optimal tradeo seems to beachieved <strong>in</strong> most cases with m1 + m2 between about 2<br />
<strong>and</strong> 4.The coarsest grid 0 is a special case: it usually has so few grid po<strong>in</strong>ts (perhaps<br />
only one!) that S0 can be an exact solver.<br />
The variational po<strong>in</strong>t ofviewgives special <strong>in</strong>sight <strong>in</strong>to the Gauss-Seidel algorithm,<br />
<strong>and</strong> shows how togeneralize it to non-quadratic Hamiltonians. When updat<strong>in</strong>g site x,<br />
the new value ' 0 x is chosen so as to m<strong>in</strong>imize the Hamiltonian H(') =H('x f'ygy6=x)<br />
when all the other elds f'ygy6=x are held xed at their current values. The natural generalization<br />
of Gauss-Seidel to non-quadratic Hamiltonians is to adopt this variational<br />
de nition: ' 0 x should be the absolute m<strong>in</strong>imizer of H(') =H('x f'ygy6=x). [If the<br />
absolute m<strong>in</strong>imizer is non-unique, then one such m<strong>in</strong>imizer ischosen by some arbitrary<br />
rule.] This algorithm is called nonl<strong>in</strong>ear Gauss-Seidel with exact m<strong>in</strong>imization (NL-<br />
GSEM) [38]. This de nition of the algorithm presupposes, of course, that itisfeasible<br />
to carry out the requisite exact one-dimensional m<strong>in</strong>imizations. For example, for a ' 4<br />
theory it would be necessary to compute the absolute m<strong>in</strong>imumofaquartic polynomial<br />
<strong>in</strong> one variable. In practice these one-dimensional m<strong>in</strong>imizations might themselves be<br />
carried out iteratively, e.g.by some variant of Newton's method.<br />
Cycle control parameters <strong>and</strong> computational labor. Usually the parameters l are all<br />
taken to be equal, i.e. l = 1for1 l M. Then one iteration of the multi-grid<br />
algorithm at level M comprises one visittogridM, visits togridM ; 1, 2 visits<br />
to gridM ; 2, <strong>and</strong> soon. Thus, determ<strong>in</strong>es the degree of emphasis placed on the<br />
coarse-grid updates. ( =0would correspond tothe pureGauss-Seidel iteration on the<br />
nest grid alone.)<br />
We can now estimatethe computational labor required for one iteration of the multigrid<br />
algorithm. Each visit to agiven grid <strong>in</strong>volves m1 +m2 Gauss-Seidel sweeps on that<br />
grid, plus some computation of the coarse-grid Hamiltonian <strong>and</strong> the prolongation. The<br />
work <strong>in</strong>volved is proportional to the number of lattice po<strong>in</strong>ts onthat grid. Let Wl be<br />
the work required for these operations on grid l. Then, for grids de ned by afactor-of-2<br />
coarsen<strong>in</strong>g <strong>in</strong>d dimensions, we have<br />
so that the total work for one multi-grid iteration is<br />
work(MG) =<br />
Wl 2 ;d(M;l) WM (5.33)<br />
0X<br />
l=M<br />
M;l Wl<br />
0X<br />
WM ( 2<br />
l=M<br />
;d ) M;l<br />
WM(1 ; 2 ;d ) ;1<br />
if
(plus a little auxiliary computation) on the nestgridalone|irrespective of the total<br />
number of levels. The most common choices are = 1 (which is called the V-cycle) <strong>and</strong><br />
=2(the W-cycle).<br />
Convergence proofs. For certa<strong>in</strong> classes of Hamiltonians H |primarily quadratic<br />
ones|<strong>and</strong>suitable choices of the coarse grids, prolongations, smooth<strong>in</strong>g iterations<br />
<strong>and</strong> cycle control parameters, it can be proven rigorously 18 that the multi-grid iteration<br />
matrices Ml satisfy a uniform bound<br />
jjMljj C < 1 (5.35)<br />
valid irrespective of the total number of levels. Thus, a xed number of multi-grid<br />
iterations (maybe ve orten) are su cient toreduce the error to asmall value, <strong>in</strong>dependent<br />
of the lattice sizeL. Inother words, critical slow<strong>in</strong>g-down has been completely<br />
elim<strong>in</strong>ated.<br />
The rigorous convergence proofs are somewhat arcane, so we cannot describe them<br />
here <strong>in</strong> anydetail, but certa<strong>in</strong> general features are worth not<strong>in</strong>g. The convergence proofs<br />
are most straightforward when l<strong>in</strong>ear or higher-order <strong>in</strong>terpolation <strong>and</strong> restriction are<br />
used, <strong>and</strong> >1 (e.g. the W-cycle). When either low-order <strong>in</strong>terpolation (e.g. piecewiseconstant)<br />
or =1(the V-cycle) is used, the convergence proofs become much more<br />
delicate. Indeed, if both piecewise-constant <strong>in</strong>terpolation <strong>and</strong> a V-cycle are used, then<br />
the uniform bound (5.35) has not yet been proven, <strong>and</strong> itismostlikely false! To some<br />
extent these features may be artifacts ofthe current methods of proof, but we suspect<br />
that they do also re ect real properties of the multi-grid method, <strong>and</strong> sotheconvergence<br />
proofs may serve asguidance for practice. For example, <strong>in</strong> our work wehave used<br />
piecewise-constant<strong>in</strong>terpolation (so as to preserve the simple nearest-neighbor coupl<strong>in</strong>g<br />
on the coarse grids), <strong>and</strong> thus for safety westick tothe W-cycle. Thereis<strong>in</strong>any case<br />
much room for further research, both theoretical <strong>and</strong> experimental.<br />
To recapitulate, the extraord<strong>in</strong>ary e ciency of the multi-grid method arises from the<br />
comb<strong>in</strong>ation of two key features:<br />
1) The convergence estimate (5.35). This means that onlyO(1) iterations are needed,<br />
<strong>in</strong>dependent ofthe lattice size L.<br />
2) The work estimate (5.34). This means that eachiteration requires only a computational<br />
labor of order L d (the ne-grid lattice volume).<br />
It follows that the complete solution of the m<strong>in</strong>imization problem, to any speci ed<br />
accuracy ", requires a computational labor of order L d .<br />
Unigrid po<strong>in</strong>t of view. Let us look aga<strong>in</strong> at the multi-grid algorithm from the variational<br />
st<strong>and</strong>po<strong>in</strong>t. One natural class of iterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g H are the<br />
18 For a detailed exposition of multi-grid convergence proofs, see [32, Chapters 6{8, 10, 11], [40] <strong>and</strong><br />
the references cited there<strong>in</strong>. The additional work needed to h<strong>and</strong>le the piecewise-constant <strong>in</strong>terpolation<br />
can be found <strong>in</strong>[41].<br />
35
so-called directional methods: let p0p1::: be a sequence of \direction vectors" <strong>in</strong> U,<br />
<strong>and</strong> de ne ' (n+1) tobethat vector of the form' (n) + pn which m<strong>in</strong>imizes H. The<br />
algorithm thus travels \downhill" from ' (n) along the l<strong>in</strong>e ' (n) + pn until reach<strong>in</strong>g the<br />
m<strong>in</strong>imumofH, then switches to direction pn+1 start<strong>in</strong>g fromthis new po<strong>in</strong>t ' (n+1) ,<strong>and</strong><br />
so on. For a suitable choice of the direction vectors p0p1:::,this method can be proven<br />
to converge to the global m<strong>in</strong>imum ofH [38, pp. 513{520].<br />
Now, some iterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g H(') can be recognized as special<br />
cases of the directional method. For example, the Gauss-Seidel iteration is a directional<br />
method <strong>in</strong> which the direction vectors are chosen to beunit vectors e1e2:::eN (i.e.<br />
vectors which take the value 1 at as<strong>in</strong>gle grid po<strong>in</strong>t <strong>and</strong>zeroatallothers), where<br />
N = dimU. [One step of the Gauss-Seidel iteration corresponds to N steps of the<br />
directional method.] Similarly, it is not hard to see [39] that the multi-grid iteration with<br />
the variational choices of restriction <strong>and</strong> coarse-grid operators, <strong>and</strong> with Gauss-Seidel<br />
smooth<strong>in</strong>g ateach level, is itself a directional method: some ofthe direction vectors are<br />
the unit vectors e (M)<br />
1 e (M)<br />
2 ::: e (M)<br />
N M of the ne-grid space, but other direction vectors<br />
are theimages <strong>in</strong> the ne-grid space of theunit vectors of the coarse-grid spaces, i.e. they<br />
are pMl e (l)<br />
1 pMl e (l)<br />
2 ::: pMl e (l)<br />
N l .The exact order <strong>in</strong> which these direction vectors are<br />
<strong>in</strong>terleaved depends on the parameters m1, m2 <strong>and</strong> which de ne the cycl<strong>in</strong>g structure<br />
of the multi-grid algorithm. For example, if m1 =1,m2 =0<strong>and</strong> =1,the order<br />
of the direction vectors is fMg, fM ; 1g, :::, f0g, where flg denotes the sequence<br />
pMl e (l)<br />
1 pMl e (l)<br />
2 ::: pMl e (l)<br />
N l .Ifm1 =0,m2 =1<strong>and</strong> =1,the order is f0g, f1g, :::,<br />
fMg. Thereader is <strong>in</strong>vited to work outother cases.<br />
Thus, the multi-grid algorithm (for problems de ned <strong>in</strong> variational form) is a directional<br />
method <strong>in</strong> which the direction vectors <strong>in</strong>clude both\s<strong>in</strong>gle-site modes" fMg <strong>and</strong><br />
also \collectivemodes" fM ;1g fM ;2g :::f0g on all length scales. For example, if<br />
pll;1 is piecewise-constant <strong>in</strong>jection, then the direction vectors are characteristic functions<br />
B (i.e. functions which are1onthe block B <strong>and</strong> zero outside B), where the<br />
sets B are successively s<strong>in</strong>gle sites, cubes of side 2,cubes of side 4,<strong>and</strong> so on. Similarly,<br />
if pll;1 is l<strong>in</strong>ear <strong>in</strong>terpolation, then the direction vectors are triangular waves of various<br />
widths.<br />
The multi-grid algorithm has thus an alternative <strong>in</strong>terpretation as a collective-mode<br />
algorithm work<strong>in</strong>g solely <strong>in</strong> the ne-grid space U. We emphasize that this \unigrid"<br />
viewpo<strong>in</strong>t[39]ismathematically fully equivalenttothe recursivede nition given earlier.<br />
Butitgives, weth<strong>in</strong>k, an importantadditional <strong>in</strong>sight<strong>in</strong>towhatthemulti-grid algorithm<br />
is really do<strong>in</strong>g.<br />
For example, for the simple model problem (Poisson equation <strong>in</strong> a square), we know<br />
that the \correct" collective modes are s<strong>in</strong>e waves, <strong>in</strong> the sense that these modes diagonalize<br />
the Laplacian, so that <strong>in</strong>this basis the Jacobi or Gauss-Seidel algorithm would<br />
give the exact solution <strong>in</strong> a s<strong>in</strong>gle iteration (MJacobi = MGS = 0). On the other h<strong>and</strong>,<br />
the multi-grid method uses square-wave (or triangular-wave) updates, which are not<br />
exactly the \correct" collective modes. Nevertheless, the multi-grid convergence proofs<br />
[32, 40, 41] assure us that they are \close enough": the normofthe multi-grid iteration<br />
matrix Ml is bounded away from 1, uniformly <strong>in</strong> the lattice size, so that anaccurate<br />
36
solution is reached <strong>in</strong> a very few MG iterations (<strong>in</strong> particular, critical slow<strong>in</strong>g-down is<br />
completely elim<strong>in</strong>ated). This viewpo<strong>in</strong>t also expla<strong>in</strong>s why MG convergence is more delicate<br />
for piecewise-constant <strong>in</strong>terpolation than for piecewise-l<strong>in</strong>ear: the po<strong>in</strong>t isthat a<br />
s<strong>in</strong>e wave (orother slowly vary<strong>in</strong>g function) can be approximated to arbitrary accuracy<br />
(<strong>in</strong> energy norm) by piecewise-l<strong>in</strong>ear functions but not by piecewise-constant functions.<br />
We remark that McCormick <strong>and</strong>Ruge [39] have advocated the \unigrid" idea not<br />
just as an alternate po<strong>in</strong>t of view on the multi-grid algorithm, but asanalternate computational<br />
procedure. To be sure, the unigrid method is somewhat simpler to program,<br />
<strong>and</strong> this could have pedagogical advantages. But oneofthe key properties of the multigrid<br />
method, namely the O(L d ) computational labor per iteration, is sacri ced <strong>in</strong> the<br />
unigrid scheme. Instead of (5.33){(5.34) one has<br />
<strong>and</strong> hence<br />
work(UG) WM<br />
l=M<br />
Wl WM (5.36)<br />
0X<br />
M;l<br />
MWM if =1<br />
M WM if >1<br />
S<strong>in</strong>ce M log2 L <strong>and</strong> WM Ld ,weobta<strong>in</strong><br />
(<br />
d L log L if =1<br />
work(UG)<br />
Ld+log2 if >1<br />
(5.37)<br />
(5.38)<br />
For a V-cycle the additional factor of log L is perhaps not terribly harmful, but for a<br />
W-cycle the additional factor of L is a severe drawback (though not as severe as the<br />
O(L 2 )critical slow<strong>in</strong>g-down of the traditional algorithms). Thus, we donot advocate<br />
the use of unigrid as a computational method if there is a viable multi-grid alternative.<br />
Unigrid could, however, be of <strong>in</strong>terest <strong>in</strong> cases where true multi-grid is unfeasible, as<br />
may occur for non-Abelian lattice gauge theories.<br />
Multi-grid algorithms can also be devised for some models <strong>in</strong> which state space is a<br />
nonl<strong>in</strong>ear manifold, such as nonl<strong>in</strong>ear -models <strong>and</strong> lattice gauge theories [20, Sections<br />
3{5]. The simplest case is the XY model: both the ne-grid <strong>and</strong> coarse-grid eld<br />
variables are angles, <strong>and</strong>the <strong>in</strong>terpolation operator is piecewise-constant (withangles<br />
added modulo 2 ). Thus, a coarse-grid variable y speci es the angle by whichthe<br />
2 d sp<strong>in</strong>s <strong>in</strong> the block By are to besimultaneously rotated. A similar strategy can be<br />
employed for nonl<strong>in</strong>ear -models tak<strong>in</strong>g values <strong>in</strong> a group G (the so-called \pr<strong>in</strong>cipal<br />
chiral models"): the coarse-grid variable y simultaneously left-multiplies the2 d sp<strong>in</strong>s <strong>in</strong><br />
the block By. For nonl<strong>in</strong>ear -models tak<strong>in</strong>g values <strong>in</strong> a nonl<strong>in</strong>ear manifold M on which<br />
a group G acts [e.g. the n-vector model with M = Sn;1 <strong>and</strong> G = SO(n)], the coarsegrid-correction<br />
moves are still simultaneous rotation this means that while the ne-grid<br />
37
elds lie <strong>in</strong> M, the coarse-grid elds all lie <strong>in</strong> G. Similar ideas can be applied to lattice<br />
gauge theories the key requirement isto respect the geometric (parallel-transport)<br />
properties of the theory. Unfortunately, the result<strong>in</strong>g algorithms appear to be practical<br />
only <strong>in</strong> the abelian case. (In the non-abelian case, the coarse-grid Hamiltonian becomes<br />
too complicated.) Much morework needs to bedoneondevis<strong>in</strong>g good <strong>in</strong>terpolation<br />
operators for non-abelian lattice gauge theories.<br />
5.2 Multi-Grid <strong>Monte</strong> <strong>Carlo</strong><br />
Classical equilibrium statistical mechanics is a natural generalization of classical<br />
statics (for problems posed <strong>in</strong> variational form): <strong>in</strong> the latter we seekto m<strong>in</strong>imize a<br />
Hamiltonian H('), while <strong>in</strong> the former we seek to generate r<strong>and</strong>om samples from the<br />
Boltzmann-Gibbs probability distribution e ; H(') . Thestatistical-mechanical problem<br />
reduces to the determ<strong>in</strong>istic one<strong>in</strong>the zero-temperature limit ! +1.<br />
Likewise, many (but not all) of the determ<strong>in</strong>isticiterative algorithms for m<strong>in</strong>imiz<strong>in</strong>g<br />
H(') can be generalized to stochastic iterative algorithms | that is, dynamic <strong>Monte</strong><br />
<strong>Carlo</strong> methods | for generat<strong>in</strong>g r<strong>and</strong>om samples from e ; H(') . For example, the<br />
stochastic generalization of the Gauss-Seidel algorithm (or more generally, nonl<strong>in</strong>ear<br />
Gauss-Seidel with exact m<strong>in</strong>imization) is the s<strong>in</strong>gle-site heat-bath algorithm <strong>and</strong> the<br />
stochastic generalization of multi-grid is multi-grid <strong>Monte</strong> <strong>Carlo</strong>.<br />
Let us expla<strong>in</strong> these correspondences <strong>in</strong> more detail. In the Gauss-Seidel algorithm,<br />
the gridpo<strong>in</strong>ts areswept <strong>in</strong> someorder, <strong>and</strong> ateachstage the Hamiltonian is m<strong>in</strong>imized<br />
as a function of a s<strong>in</strong>gle variable 'x, with all other variables f'ygy6=x be<strong>in</strong>g held xed.<br />
The s<strong>in</strong>gle-site heat-bath algorithm has the same general structure, but the new value<br />
' 0 x is chosen r<strong>and</strong>omly from the conditional distribution of e ; H(') given f'ygy6=x, i.e.<br />
from the one-dimensional probability distribution<br />
P (' 0 x ) d'0 x = const exp [; H('0 x f'ygy6=x)] d' 0 x (5.39)<br />
(where the normaliz<strong>in</strong>g constant depends on f'ygy6=x). It is not di cult to see that this<br />
operation leaves <strong>in</strong>variant the Gibbs distribution e ; H(') . As ! +1 it reduces to<br />
the Gauss-Seidel algorithm.<br />
It is useful to visualize geometrically the action of the Gauss-Seidel <strong>and</strong> heat-bath<br />
algorithms with<strong>in</strong> thespaceU of all possible eld con gurations. Start<strong>in</strong>g atthe current<br />
eld con guration ', the Gauss-Seidel <strong>and</strong> heat-bath algorithms propose to move the<br />
system along the l<strong>in</strong>e <strong>in</strong>U consist<strong>in</strong>g of con gurations of the form ' 0 = ' + t x<br />
(;1 < t
1) The \ bers" used bythealgorithm need not be l<strong>in</strong>es, but can be higher-dimensional<br />
l<strong>in</strong>ear or even nonl<strong>in</strong>ear manifolds.<br />
2) Thenew con guration ' 0 need not be chosen <strong>in</strong>dependently of theold con guration<br />
' (as <strong>in</strong> the heat-bath algorithm) rather, it can be selected by anyupdat<strong>in</strong>g procedure<br />
which leaves <strong>in</strong>variant the conditional probability distribution of e ; H(') restricted to<br />
the ber.<br />
The multi-grid <strong>Monte</strong> <strong>Carlo</strong> (MGMC) algorithm is a partial-resampl<strong>in</strong>g algorithm<br />
<strong>in</strong> which the \ bers" are the sets of eld con gurations that can be obta<strong>in</strong>ed one from<br />
another by a coarse-grid-correction step, i.e. the sets of elds ' + pll;1 with ' xed<br />
<strong>and</strong> vary<strong>in</strong>g over Ul;1. These bers form a family of parallel a ne subspaces <strong>in</strong> Ul,<br />
of dimension Nl;1 = dim Ul;1.<br />
The <strong>in</strong>gredients ofthe MGMC algorithm are identical to thoseofthe determ<strong>in</strong>istic<br />
MG algorithm, with one exception: the determ<strong>in</strong>istic smooth<strong>in</strong>g iteration Sl is<br />
replaced by astochastic smooth<strong>in</strong>g iteration (for example, s<strong>in</strong>gle-site heat-bath). That<br />
is, Sl( Hl) isastochastic updat<strong>in</strong>g procedure 'l ! ' 0 l that leaves <strong>in</strong>variant the Gibbs<br />
distribution e ; H l: Z d'l e ; H l(' l) PS l( H l)('l ! ' 0 l) = d' 0 l e ; H l(' 0 l ) : (5.40)<br />
The MGMC algorithm is then de ned as follows:<br />
procedure mgmc(l ' Hl)<br />
comment This algorithm updates the eld ' <strong>in</strong> such away as to leave<br />
<strong>in</strong>variant the probability distribution e ; H l.<br />
' S pre<br />
l (' Hl)<br />
if l>0 then<br />
endif<br />
compute Hl;1( ) Hl(' + pll;1 )<br />
0<br />
for j =1 until l do mgmc(l ; 1 Hl;1)<br />
' ' + pll;1<br />
' Spost l (' Hl)<br />
end<br />
The alert reader will note that this algorithm is identical to the determ<strong>in</strong>istic MG<br />
algorithm presented earlier only the mean<strong>in</strong>g ofSl is di erent.<br />
The validity ofthe MGMC algorithm is proven <strong>in</strong>ductively, start<strong>in</strong>g atlevel 0 <strong>and</strong><br />
work<strong>in</strong>g upwards. That is,ifmgmc(l ; 1 Hl;1) isastochastic updat<strong>in</strong>g procedure<br />
that leaves <strong>in</strong>variant the probability distribution e ; H l;1, then mgmc(l Hl) leaves<br />
<strong>in</strong>variant e ; H l. Note that the coarse-grid-correction step of the MGMC algorithm<br />
39
di ers from the heat-bath algorithm <strong>in</strong> that the new con guration ' 0 is not chosen<br />
<strong>in</strong>dependently of the old con guration ' todosowould be impractical, s<strong>in</strong>ce the ber<br />
has such high dimension. Rather, ' 0 (or what isequivalent, ) ischosen by avalid<br />
updat<strong>in</strong>g procedure |namely, MGMC itself!<br />
The MGMC algorithm has also an alternate <strong>in</strong>terpretation | the unigrid viewpo<strong>in</strong>t<br />
|<strong>in</strong>whichthe bers are one-dimensional <strong>and</strong> the resampl<strong>in</strong>gs are <strong>in</strong>dependent. More<br />
precisely,the bers are l<strong>in</strong>es of the form ' 0 = '+t B (;1
critical exponent z =1:4 0:3 for the MGMCalgorithm (<strong>in</strong> either V-cycle or W-cycle),<br />
compared to z =2:1 0:3 for the heat-bath algorithm. Thus, critical slow<strong>in</strong>g-down is<br />
signi cantly reduced but isstill very far from be<strong>in</strong>g elim<strong>in</strong>ated. On the other h<strong>and</strong>, below<br />
the critical temperature, is very small ( 1;2), uniformly <strong>in</strong> L <strong>and</strong> (at least for<br />
the W-cycle) critical slow<strong>in</strong>g-down appears to be completely elim<strong>in</strong>ated. This very different<br />
behavior <strong>in</strong> the two phases can be understood physically: <strong>in</strong> the low-temperature<br />
phase the ma<strong>in</strong> excitations are sp<strong>in</strong> waves, which are well h<strong>and</strong>led by MGMC (as <strong>in</strong><br />
the Gaussian model) but near the critical temperature the important excitations are<br />
widely separated vortex-antivortex pairs, which are apparently not easily created by the<br />
MGMC updates.<br />
For the O(4) nonl<strong>in</strong>ear -model <strong>in</strong> two dimensions, which is asymptotically free,<br />
prelim<strong>in</strong>ary data [42] show avery strong reduction, but not the total elim<strong>in</strong>ation, of<br />
critical slow<strong>in</strong>g-down. For a W-cycle we ndthat z 0:6 (Iemphasize that these data<br />
are very prelim<strong>in</strong>ary!). Previously, Goodman <strong>and</strong> Ihad argued heuristically [20, Section<br />
9.3] that for asymptotically free theories with acont<strong>in</strong>uous symmetry group, MGMC<br />
(with a W-cycle) should completely elim<strong>in</strong>atecritical slow<strong>in</strong>g-down except for a possible<br />
logarithm. But our reason<strong>in</strong>g may now need to be re-exam<strong>in</strong>ed! 19<br />
5.3 Stochastic L<strong>in</strong>ear Iterations for the Gaussian Model<br />
In this section we analyze an important class of Markov cha<strong>in</strong>s, the stochastic l<strong>in</strong>ear<br />
iterations for Gaussian models [20, Section 8]. 20 This class <strong>in</strong>cludes, among others, the<br />
s<strong>in</strong>gle-site heat-bath algorithm (with determ<strong>in</strong>istic sweep of the sites), the stochastic<br />
SOR algorithm [44, 45] <strong>and</strong> the multi-grid <strong>Monte</strong> <strong>Carlo</strong> algorithm | all, of course,<br />
<strong>in</strong> the Gaussian case only. We show that the behavior of the stochastic algorithm is<br />
completely determ<strong>in</strong>ed by the behavior of the correspond<strong>in</strong>g determ<strong>in</strong>isticalgorithm for<br />
solv<strong>in</strong>gl<strong>in</strong>ear equations. In particular, weshowthattheexponential autocorrelation time<br />
exp of the stochastic l<strong>in</strong>ear iteration is equal to the relaxation timeofthe correspond<strong>in</strong>g<br />
l<strong>in</strong>ear iteration.<br />
Consider any quadratic Hamiltonian<br />
H(') = 1<br />
(' A') ; (f') (5.41)<br />
2<br />
where A is a symmetric positive-de nite matrix. The correspond<strong>in</strong>g Gaussian measure<br />
1<br />
d (') = const e<br />
; 2 ('A')+(f') d' (5.42)<br />
has mean A ;1 f <strong>and</strong> covariance matrix A ;1 . Next consider any rst-order stationary<br />
l<strong>in</strong>ear stochastic iteration of the form<br />
' (n+1) = M' (n) + Nf + Q (n) (5.43)<br />
19 Note Added 1996: For work onmulti-grid <strong>Monte</strong> <strong>Carlo</strong> cover<strong>in</strong>g the period 1992{96, see [106,<br />
107, 108, 103]<strong>and</strong>the references cited there<strong>in</strong>.<br />
20 Some ofthis material appears also <strong>in</strong> [43].<br />
41
where M, N <strong>and</strong> Q are xed matrices <strong>and</strong> the (n) are <strong>in</strong>dependent Gaussian r<strong>and</strong>om<br />
vectors with mean zero <strong>and</strong> covariance matrix C. The iteration (5.43) has a unique<br />
stationary distribution if <strong>and</strong> only if the spectral radius (M) limn!1 kM n k 1=n is<br />
< 1 <strong>and</strong> <strong>in</strong>this case the stationary distributionisthe desired Gaussian measure (5.42)<br />
for all f if <strong>and</strong> only if<br />
N = (I ; M)A ;1 (5.44a)<br />
QCQ T = A ;1 ; MA ;1 M T<br />
(5.44b)<br />
(here T denotes transpose).<br />
The reader will note the close analogy with the determ<strong>in</strong>istic l<strong>in</strong>ear problem (5.3){<br />
(5.6). Indeed, (5.44a) is identical with (5.6) <strong>and</strong> ifwetake the \zero-temperature<br />
limit" <strong>in</strong> which H is replaced by H with ! +1, then the Gaussian measure (5.42)<br />
approachesadelta function concentrated at the unique m<strong>in</strong>imum ofH (namely, the<br />
solution of the l<strong>in</strong>ear equation A' = f), <strong>and</strong> the \noise" term disappears (Q ! 0), so<br />
that the stochastic iteration (5.43) turns <strong>in</strong>to the determ<strong>in</strong>istic iteration (5.5). That is:<br />
(a) The l<strong>in</strong>ear determ<strong>in</strong>istic problemisthe zero-temperature limit of the Gaussian<br />
stochastic problem <strong>and</strong> the rst-order stationary l<strong>in</strong>ear determ<strong>in</strong>istic iterationisthe<br />
zero-temperature limit of the rst-order stationary l<strong>in</strong>ear stochastic iteration. Therefore,<br />
any stochastic l<strong>in</strong>ear iteration for generat<strong>in</strong>g samples from the Gaussian measure (5.42)<br />
gives rise to adeterm<strong>in</strong>istic l<strong>in</strong>ear iteration for solv<strong>in</strong>g the l<strong>in</strong>ear equation (5.3), simply<br />
by sett<strong>in</strong>g Q =0.<br />
(b) Conversely, the stochastic problem <strong>and</strong> iteration are the nonzero-temperature<br />
generalizations of the determ<strong>in</strong>istic ones. In pr<strong>in</strong>ciple this means that adeterm<strong>in</strong>istic<br />
l<strong>in</strong>ear iteration for solv<strong>in</strong>g (5.3) can be generalized to astochastic l<strong>in</strong>ear iteration for<br />
generat<strong>in</strong>g samples from (5.42), if <strong>and</strong> onlyifthe matrix A ;1 ; MA ;1 M T is positivesemide<br />
nite: just choose a matrix Q satisfy<strong>in</strong>g (5.44b). In practice, however, such an<br />
algorithm is computationally tractable only if thematrix Q has additional nice properties<br />
such as sparsity (or triangularity with a sparse <strong>in</strong>verse).<br />
Examples. 1. S<strong>in</strong>gle-site heat bath (with determ<strong>in</strong>istic sweep of the sites) = stochastic<br />
Gauss-Seidel. On each visittosite i, 'i is replaced by anew value ' 0 i chosen <strong>in</strong>dependently<br />
from the conditional distribution of (5.42) with f'jgj6=i xed at their current<br />
values: that is,' 0 i is Gaussian with mean (fi ; P j6=i aij'j)=aii <strong>and</strong> variance 1=aii. When<br />
updat<strong>in</strong>g 'i at sweep n +1, the variables 'j with j
N = (D + L) ;1 (5.46b)<br />
Q = (D + L) ;1 D 1=2<br />
(5.46c)<br />
where D <strong>and</strong> L are the diagonal <strong>and</strong> lower-triangular parts ofthematrix A, respectively.<br />
It is straightforward to verify that (5.44a,b) are satis ed. 21 The s<strong>in</strong>gle-site heat-bath<br />
algorithm is clearly the stochastic generalization of the Gauss-Seidel algorithm.<br />
2. Stochastic SOR. For models which are Gaussian (or more generally, \multi-<br />
Gaussian"), Adler [44] <strong>and</strong> Whitmer [45] have shown that the successiveover-relaxation<br />
(SOR) iteration admits astochastic generalization, namely<br />
' (n+1)<br />
i<br />
= (1; !)' (n)<br />
i<br />
+ !a ;1<br />
ii<br />
0<br />
@fi ; X<br />
ji<br />
(n)<br />
j<br />
1<br />
A +<br />
!(2 ; !)<br />
aii<br />
! 1=2 (n)<br />
i<br />
(5.47)<br />
where 0
where is Gaussian white noise with covariance matrix C, us<strong>in</strong>gasmall time step .<br />
The resultisaniteration of the form (5.43) with<br />
M = I ; 2 CA (5.50a)<br />
N = 2 C (5.50b)<br />
Q = 1=2 I (5.50c)<br />
This satis es (5.44a) exactly, butsatis es (5.44b) only up to an error of order . If<br />
C = D ;1 ,these MN correspond toadamped Jacobi iteration with ! = =2 1.<br />
It is straightforward toanalyze thedynamic behavior of thestochastic l<strong>in</strong>ear iteration<br />
(5.43). Us<strong>in</strong>g (5.43) <strong>and</strong> (5.44) to express' (n) <strong>in</strong> terms of the <strong>in</strong>dependent r<strong>and</strong>om<br />
variables ' (0) (0) (1) ::: (n;1) ,we ndafter a bit of manipulation that<br />
h' (n) i = M n h' (0) i +(I ; M n )A ;1 f (5.51)<br />
<strong>and</strong><br />
cov(' (s) ' (t) ) = M s cov(' (0) ' (0) )(M T ) t (<br />
[A ;1 s ; M A ;1 T s T t;s (M ) ](M ) if s t<br />
+<br />
M s;t [A ;1 ; M tA ;1 (M T ) t ] if s t<br />
(5.52)<br />
Now let us either start the stochastic process <strong>in</strong> equilibrium<br />
h' (0) i = A ;1 f (5.53a)<br />
cov(' (0) ' (0) ) = A ;1 (5.53b)<br />
or else let it relax to equilibrium bytak<strong>in</strong>g s t ! +1 with s ; t xed. Either way,<br />
we conclude that <strong>in</strong> equilibrium (5.43) de nes a Gaussian stationary stochastic process<br />
with mean A ;1 f <strong>and</strong> autocovariance matrix<br />
cov(' (s) ' (t) ) = A;1 (M T ) t;s if s t<br />
M s;t A ;1 if s t<br />
(5.54)<br />
Moreover, s<strong>in</strong>ce the stochastic process is Gaussian, all higher-order time-dependent correlation<br />
functions are determ<strong>in</strong>ed <strong>in</strong> terms of the mean <strong>and</strong> autocovariance. Thus, the<br />
matrix M determ<strong>in</strong>es the autocorrelation functions of the <strong>Monte</strong> <strong>Carlo</strong> algorithm.<br />
Another way to state these relationships is to recall [46, 47] that the Hilbert space<br />
L 2 ( ) is isomorphic to the bosonic Fock spaceF(U) builtonthe \energy Hilbert space"<br />
(U A): the \n-particle states" are the homogeneous Wick polynomials of degree n <strong>in</strong><br />
the shifted eld e' = ' ; A ;1 f. (If U is one-dimensional, these are just the Hermite<br />
polynomials.) Then the transition probability P (' (n) ! ' (n+1) )<strong>in</strong>duces on the Fock<br />
space an operator<br />
P = ;(M T ) I M T<br />
44<br />
(M T M T ) (5.55)
that isthe second quantization of the operator M T on the energy Hilbert space (see [20,<br />
Section 8] for details). It follows from (5.55) that<br />
<strong>and</strong> hence that<br />
k;(M) n j n 1 ? kL 2 ( ) = kM n k(UA) (5.56)<br />
(;(M) j n 1 ? ) = (M) : (5.57)<br />
Moreover, P is self-adjo<strong>in</strong>t onL 2 ( ) [i.e. satis es detailed balance] if <strong>and</strong> onlyifM is<br />
self-adjo<strong>in</strong>t with respect to the energy <strong>in</strong>ner product, i.e.<br />
<strong>and</strong> <strong>in</strong>this case<br />
MA = AM T (5.58)<br />
(;(M) j n 1 ? ) =k;(M) j n 1 ? k L 2 ( ) = (M) =kMk(UA) : (5.59)<br />
In summary, wehave shown that the dynamic behavior of any stochastic l<strong>in</strong>ear<br />
iteration is completely determ<strong>in</strong>ed by the behavior of the correspond<strong>in</strong>g determ<strong>in</strong>istic<br />
l<strong>in</strong>ear iteration. In particular, the exponential autocorrelation time exp (slowest decay<br />
rate ofanyautocorrelation function) is given by<br />
exp(;1= exp) = (M) (5.60)<br />
<strong>and</strong> this decay rate isachieved by atleastone observable which is l<strong>in</strong>ear <strong>in</strong> the eld<br />
'. In other words, the (worst-case) convergence rate ofthe <strong>Monte</strong> <strong>Carlo</strong> algorithm is<br />
precisely equal to the (worst-case) convergence rate ofthe correspond<strong>in</strong>g determ<strong>in</strong>istic<br />
iteration.<br />
In particular, for Gaussian MGMC, the convergence proofs for determ<strong>in</strong>istic multigrid<br />
[32, 40, 41] comb<strong>in</strong>ed with the arguments ofthe present section prove rigorously<br />
that critical slow<strong>in</strong>g-down is completely elim<strong>in</strong>ated (at least for a W-cycle). That is,<br />
the autocorrelation time of the MGMCmethod is bounded as criticalityisapproached<br />
(empirically 1 ; 2).<br />
6 Swendsen-Wang Algorithms<br />
Avery di erent type of collective-mode algorithm was proposed two years ago 22 by<br />
Swendsen <strong>and</strong> Wang [27] for Potts sp<strong>in</strong>models. S<strong>in</strong>ce then, there has been an explosion<br />
of work try<strong>in</strong>gtounderst<strong>and</strong> whythis algorithm works so well (<strong>and</strong>why it does not work<br />
22 Note Added 1996: Now nearly ten years ago! The great <strong>in</strong>terest <strong>in</strong> algorithms of Swendsen-<br />
Wang type (frequently called cluster algorithms) has not abated: see the references cited <strong>in</strong> the \Notes<br />
Added" below, plus many others. See also [109, 110] for reviews that are slightly more up-to-date than<br />
the present notes (1990 rather than 1989).<br />
45
even better), <strong>and</strong> try<strong>in</strong>g to improveorgeneralize it. The basic idea beh<strong>in</strong>d all algorithms<br />
of Swendsen-Wang type is to augment the given model by means of auxiliary variables,<br />
<strong>and</strong> then to simulate this augmented model. In this lecture we describe the Swendsen-<br />
Wang (SW) algorithm <strong>and</strong> review some ofthe proposed variants <strong>and</strong> generalizations.<br />
Let us rst recall that the q-state Potts model [48, 49] is a generalization of the<br />
Is<strong>in</strong>g model <strong>in</strong> which eachsp<strong>in</strong> i can take q dist<strong>in</strong>ct values rather than just two ( i =<br />
1 2:::q) here q is an <strong>in</strong>teger 2. Neighbor<strong>in</strong>g sp<strong>in</strong>s prefer to be<strong>in</strong>the samestate,<br />
<strong>and</strong> pay an energy price if they are not. The Hamiltonian is therefore<br />
H( ) = X<br />
Jij (1 ; i j) (6.1)<br />
with Jij 0 for all i j (\ferromagnetism"), <strong>and</strong> the partition function is<br />
Z = X<br />
exp[;H( )]<br />
=<br />
f g<br />
X<br />
2<br />
exp4<br />
f g<br />
X<br />
=<br />
Jij (<br />
hiji<br />
3<br />
i ; 1) 5<br />
j X Y<br />
f g<br />
hiji<br />
[(1 ; pij) +pij i j] (6.2)<br />
hiji<br />
where we have de ned pij =1; exp(;Jij). The Gibbsmeasure P otts( ) is, of course,<br />
We nowusethe deep identity<br />
2<br />
4 X<br />
P otts( ) = Z ;1 exp Jij ( i ; 1)<br />
j<br />
hiji<br />
Y ;1 = Z [(1 ; pij) +pij i j] (6.3)<br />
hiji<br />
a + b =<br />
1X<br />
3<br />
5<br />
[a n0 + b n1] (6.4)<br />
n=0<br />
on each bondhiji that is,we<strong>in</strong>troduce on each bondhiji an auxiliary variable nij<br />
tak<strong>in</strong>g the values 0 <strong>and</strong> 1,<strong>and</strong>obta<strong>in</strong><br />
Z = X X Y<br />
[(1 ; pij) nij0<br />
f g fng hiji<br />
+ pij nij1 i j] : (6.5)<br />
Let us now take seriously the fng as dynamical variables: we canth<strong>in</strong>k of nij as an<br />
occupation variable for the bond hiji (1 = occupied, 0 = empty). We therefore de ne<br />
the Fortu<strong>in</strong>-Kasteleyn-Swendsen-Wang (FKSW) model to be a jo<strong>in</strong>t model hav<strong>in</strong>g qstate<br />
Potts sp<strong>in</strong>s i at the sites <strong>and</strong> occupation variables nij on the bonds, with jo<strong>in</strong>t<br />
probability distribution<br />
FKSW( n) = Z<br />
;1 Y<br />
hiji<br />
[(1 ; pij) nij0 + pij nij1 i j] : (6.6)<br />
46
F<strong>in</strong>ally, let us see what happens if we sumover the f g at xed fng. Each occupied<br />
bond hiji imposes a constra<strong>in</strong>t that the sp<strong>in</strong>s i <strong>and</strong> j mustbe<strong>in</strong>the same state,<br />
but otherwise the sp<strong>in</strong>s are unconstra<strong>in</strong>ed. We therefore group the sites <strong>in</strong>to connected<br />
clusters (two sites are <strong>in</strong> the same cluster if they can be jo<strong>in</strong>ed by apath of occupied<br />
bonds) then all the sp<strong>in</strong>s with<strong>in</strong> a cluster mustbe<strong>in</strong>the samestate (all q values are<br />
equally probable), <strong>and</strong> dist<strong>in</strong>ct clusters are <strong>in</strong>dependent. It follows that<br />
Z = X<br />
fng<br />
0<br />
@ Y<br />
hiji: nij=1<br />
pij<br />
1 0<br />
A @ Y<br />
hiji: nij=0<br />
1<br />
(1 ; pij) A q C(n) (6.7)<br />
where C(n) isthe number of connected clusters (<strong>in</strong>clud<strong>in</strong>g one-siteclusters) <strong>in</strong> the graph<br />
whose edges are the bonds hav<strong>in</strong>g nij =1. The correspond<strong>in</strong>g probability distribution,<br />
RC(n) = Z ;1<br />
0<br />
@ Y<br />
hiji: nij=1<br />
pij<br />
1 0<br />
A @ Y<br />
hiji: nij=0<br />
1<br />
(1 ; pij) A q C(n) (6.8)<br />
is called the r<strong>and</strong>om-cluster model with parameter q. This is a generalized bondpercolation<br />
model, with non-local correlations com<strong>in</strong>g fromthe factor q C(n) forq =1<br />
it reduces to ord<strong>in</strong>ary bond percolation. Note, by the way, that <strong>in</strong>the r<strong>and</strong>om-cluster<br />
model (unlike the Potts <strong>and</strong>FKSWmodels), q is merely a parameter it can take any<br />
positiverealvalue, not just 2 3:::.Sother<strong>and</strong>om-cluster model de nes, <strong>in</strong> some sense,<br />
an analytic cont<strong>in</strong>uation of the Potts model to non-<strong>in</strong>teger q ord<strong>in</strong>ary bond percolation<br />
corresponds to the \one-state Potts model".<br />
We have already veri ed the follow<strong>in</strong>g factsabout the FKSWmodel:<br />
a) ZP otts = ZFKSW = ZRC.<br />
b) The marg<strong>in</strong>al distribution of FKSW on the Potts variables f g (<strong>in</strong>tegrat<strong>in</strong>g out<br />
the fng) is precisely the Potts model Potts( ).<br />
c) The marg<strong>in</strong>al distribution of FKSW on the bondoccupation variables fng (<strong>in</strong>tegrat<strong>in</strong>g<br />
outthe f g) is precisely the r<strong>and</strong>om-cluster model RC(n).<br />
The conditional distributions of FKSW are also simple:<br />
d) The conditional distribution of the fng given the f g is as follows: <strong>in</strong>dependently<br />
for each bond hiji, onesets nij = 0 <strong>in</strong> case i 6= j, <strong>and</strong>sets nij =0 1with<br />
probability 1; pijpij, respectively,<strong>in</strong>case i = j.<br />
e) The conditional distribution of the f g given the fng is as follows: <strong>in</strong>dependently<br />
for each connected cluster, one sets all the sp<strong>in</strong>s i <strong>in</strong> the cluster to thesamevalue,<br />
chosen equiprobably from f1 2:::qg.<br />
47
These facts can be used for both analytic <strong>and</strong> numerical purposes. For example, by<br />
us<strong>in</strong>g facts (b), (c) <strong>and</strong> (e) we can prove anidentity relat<strong>in</strong>g expectations <strong>in</strong> the Potts<br />
model to connection probabilities <strong>in</strong> the r<strong>and</strong>om-cluster model:<br />
Here<br />
h i jiP ottsq = h i jiFKSWq [by (b)]<br />
ij ij(n)<br />
= hE( i jfng)iFKSWq<br />
j * +<br />
(q ; 1) ij +1<br />
=<br />
q<br />
* +<br />
(q ; 1) ij +1<br />
=<br />
q<br />
FKSWq<br />
RCq<br />
1 if i is connected to j<br />
0 if i is not connected to j<br />
[by (e)]<br />
[by (c)] (6.9)<br />
(6.10)<br />
<strong>and</strong> E( jfng) denotes conditional expectation given fng. 23 For the Is<strong>in</strong>gmodel with<br />
the usual convention = 1, (6.9) can be written more simply as<br />
h i jiIs<strong>in</strong>g = h ijiRCq=2 : (6.11)<br />
Similar identities can be proven for higher-order correlation functions, <strong>and</strong> can be employed<br />
to prove Gri ths-type correlation <strong>in</strong>equalities for the Potts model [51, 52].<br />
On the other h<strong>and</strong>, Swendsen <strong>and</strong> Wang [27]exploited facts (b){(e) to devise a<br />
radically new type of <strong>Monte</strong> <strong>Carlo</strong> algorithm. The Swendsen-Wang algorithm (SW)<br />
simulates the jo<strong>in</strong>tmodel (6.6) by alternately apply<strong>in</strong>g the conditional distributions (d)<br />
<strong>and</strong> (e)|that is, byalternately generat<strong>in</strong>gnew bond occupation variables (<strong>in</strong>dependent<br />
of the old ones) given the sp<strong>in</strong>s, <strong>and</strong> new sp<strong>in</strong> variables (<strong>in</strong>dependent ofthe old ones)<br />
given the bonds. Each ofthese operations can be carried out <strong>in</strong> a computer time of<br />
order volume: for generat<strong>in</strong>g the bondvariables thisistrivial,<strong>and</strong> for generat<strong>in</strong>g the<br />
sp<strong>in</strong> variable it relies on e cient (l<strong>in</strong>ear-time) algorithms for comput<strong>in</strong>g the connected<br />
clusters. 24 It is trivial that the SWalgorithm leaves <strong>in</strong>variant the Gibbsmeasure (6.6),<br />
s<strong>in</strong>ce any product of conditional probability operators has this property. It is also easy<br />
to see that the algorithm is ergodic, <strong>in</strong> the sense that every con guration f ng hav<strong>in</strong>g<br />
23 For an excellent <strong>in</strong>troduction to conditional expectations, see [50].<br />
24 Determ<strong>in</strong><strong>in</strong>g the connected components ofanundirected graph is a classic problem of computer<br />
science. The depth- rst-search <strong>and</strong> breadth- rst-search algorithms [53] have arunn<strong>in</strong>g time of order<br />
V , while the Fischer-Galler-Hoshen-Kopelman algorithm (<strong>in</strong> one ofitsvariants) [54] hasaworst-case<br />
runn<strong>in</strong>g time of order V log V ,<strong>and</strong> an observed mean runn<strong>in</strong>g time of order V <strong>in</strong> percolation-type<br />
problems [55]. Both ofthese algorithms are non-vectorizable. Shiloach <strong>and</strong> Vishk<strong>in</strong> [56] have <strong>in</strong>vented<br />
a SIMD parallel algorithm, <strong>and</strong> wehave very recently vectorized it for the Cyber 205, obta<strong>in</strong><strong>in</strong>g a<br />
speedup of a factor of 11 over scalar mode. We are currently carry<strong>in</strong>g out a comparative test of these<br />
three algorithms, as a function of lattice size <strong>and</strong> bonddensity [57]. In view of the extraord<strong>in</strong>ary<br />
performance of the SW algorithm (see below) <strong>and</strong> the factthat virtually all its CPU time isspent<br />
nd<strong>in</strong>g connected components, we feel that the desirability of nd<strong>in</strong>g improved algorithms for this<br />
problem is self-evident.<br />
48
d = 2 Is<strong>in</strong>g Model<br />
L <strong>in</strong>tE<br />
64 1575 ( 10) 5.25 (0.30)<br />
128 5352 ( 53) 7.05 (0.67)<br />
256 17921 (109) 6.83 (0.40)<br />
512 59504 (632) 7.99 (0.81)<br />
Table 1: Susceptibility <strong>and</strong>autocorrelation time <strong>in</strong>tE (E =energy slowest mode) for<br />
two-dimensional Is<strong>in</strong>g model at criticality, us<strong>in</strong>gSwendsen-Wang algorithm. St<strong>and</strong>ard<br />
error is shown <strong>in</strong> parentheses.<br />
nonzero FKSW-measure is accessible from every other. So the SW algorithm is at<br />
least a correct algorithm for simulat<strong>in</strong>g the FKSWmodel. It is also an algorithm for<br />
simulat<strong>in</strong>gthe Potts <strong>and</strong> r<strong>and</strong>om-cluster models, s<strong>in</strong>ce expectations <strong>in</strong> these two models<br />
are equal to the correspond<strong>in</strong>g expectations <strong>in</strong> the FKSWmodel.<br />
Historical remark. The r<strong>and</strong>om-cluster model was <strong>in</strong>troduced <strong>in</strong> 1969 by Fortu<strong>in</strong><br />
<strong>and</strong> Kasteleyn [58] they derived the identity ZPotts = ZRC, alongwiththe correlationfunction<br />
identity (6.9) <strong>and</strong> some generalizations. These relations were rediscovered<br />
several times dur<strong>in</strong>g the subsequent two decades [59]. Surpris<strong>in</strong>gly, however, no one<br />
seems to have noticed the jo<strong>in</strong>t probability distribution FKSW that underlay all these<br />
identities this was discovered implicitly by Swendsen <strong>and</strong> Wang [27], <strong>and</strong> was made<br />
explicit by Edwards <strong>and</strong> Sokal [60].<br />
It is certa<strong>in</strong>ly plausible that the SW algorithm might have lesscritical slow<strong>in</strong>g-down<br />
than the conventional (s<strong>in</strong>gle-sp<strong>in</strong>-update) algorithms: the reasonisthat a local move<strong>in</strong><br />
one setofvariables can have highly nonlocal e ects <strong>in</strong>the other. For example, sett<strong>in</strong>g<br />
nb =0onas<strong>in</strong>gle bond may disconnect a cluster, caus<strong>in</strong>g abigsubset of the sp<strong>in</strong>s <strong>in</strong><br />
that cluster to be ipped simultaneously. In some sense,therefore, the SW algorithm<br />
is a collective-mode algorithm <strong>in</strong> which the collective modes are chosen by the system<br />
rather than imposed from the outside as<strong>in</strong>multi-grid. (The miracle is that thisisdone<br />
<strong>in</strong> a way that preserves the correct Gibbs measure.)<br />
How well does the SW algorithm perform? In at least some cases,the performance<br />
is noth<strong>in</strong>g short of extraord<strong>in</strong>ary. Table 1 shows some prelim<strong>in</strong>ary data [61]onatwodimensional<br />
Is<strong>in</strong>g model at the bulkcritical temperature. These data are consistent<br />
with the estimate SW L 0:35 [27]. 25 By contrast, the conventional s<strong>in</strong>gle-sp<strong>in</strong>- ip<br />
25 But precisely because rises so slowly with L, good estimates of the dynamic critical exponent<br />
will require the useofextremely large lattices. Even with lattices up to L =512,weareunable to<br />
dist<strong>in</strong>guish conv<strong>in</strong>c<strong>in</strong>gly between z 0:35 <strong>and</strong> z 0. Note Added 1996: For more recent <strong>and</strong><br />
49
Estimates of zSW<br />
q =1 q =2 q =3 q =4<br />
d =1 0 0 0 0<br />
d =2 0 0:35 0:55 0:03 1 (exact??)<br />
d =3 0 0:75 | |<br />
d =4 0 1 (exact?) | |<br />
Table 2: Current best estimates of the dynamic critical exponent z for the Swendsen-<br />
Wang algorithm. Estimates are taken from [27] for d =2 3, q =2[62]ford =2,<br />
q =3 4 <strong>and</strong> [63] for d =4,q = 2. Error bar is a 95% con dence <strong>in</strong>terval.<br />
algorithms for thetwo-dimensional Is<strong>in</strong>gmodel have conv L 2:1 [21]. So the advantage<br />
of Swendsen-Wang over conventional algorithms (for this model) grows asymptotically<br />
like L 1:75 .Tobe sure, one iteration of the Swendsen-Wang algorithm may be a factor<br />
of 10 more costly <strong>in</strong> CPU time than one iterationofaconventional algorithm (the<br />
exact factor depends on the e ciency of the cluster- nd<strong>in</strong>g subrout<strong>in</strong>e). But the SW<br />
algorithm w<strong>in</strong>s already for modest values of L.<br />
For other Potts models, the performance of the SW algorithm is less spectacular<br />
than for the two-dimensional Is<strong>in</strong>g model, but itisstill very impressive. In Table 2 we<br />
give the currentbest estimates of the dynamic critical exponent zSW for q-state Potts<br />
models <strong>in</strong> d dimensions, as a function of q <strong>and</strong> d. 26<br />
All these exponents aremuch lower than the z > 2 observed <strong>in</strong> the s<strong>in</strong>gle-sp<strong>in</strong>- ip<br />
algorithms.<br />
Although the SW algorithm performs impressively well, we underst<strong>and</strong> very little<br />
about why these exponents take the values they do. Some cases are easy. Ifq =1,then<br />
all sp<strong>in</strong>s are <strong>in</strong> the samestate (theonly state!), <strong>and</strong> allbonds are thrown <strong>in</strong>dependently,<br />
so the autocorrelation time is zero. (Here the SW algorithm just reduces to the st<strong>and</strong>ard<br />
static algorithm for <strong>in</strong>dependent bondpercolation.) If d = 1 (more generally, ifthe<br />
lattice is a tree), the SWdynamics is exactly soluble: the behavior of each bondis<br />
<strong>in</strong>dependent ofeachother bond, <strong>and</strong> exp !;1= log(1 ; 1=q) < 1 as ! +1. But<br />
the rema<strong>in</strong>der of our underst<strong>and</strong><strong>in</strong>g isvery murky. Two pr<strong>in</strong>cipal <strong>in</strong>sights have been<br />
obta<strong>in</strong>ed so far:<br />
a) A calculation yield<strong>in</strong>g zSW =1<strong>in</strong>amean- eld (Curie-Weiss) Is<strong>in</strong>g model [64].<br />
much more precise data, see [111, 112]. These data are consistent with SW L 0:24 ,butthey are<br />
also consistent with SW log 2 L [112]. It is extremely di cult todist<strong>in</strong>guish a small power from a<br />
logarithm.<br />
26 Note Added 1996: For more recent data, see [111, 112] for d =2,q =2[113]ford =2,q =3<br />
<strong>and</strong> [112, 114]ford =2,q =4.Thesituation for d =3,q = 2 is extremely unclear: exponent estimates<br />
by di erent workers (us<strong>in</strong>g di erent methods) disagree wildly.<br />
50
This suggests (but of course does not prove) that zSW = 1 for Is<strong>in</strong>g models<br />
(q =2)<strong>in</strong>dimension d 4.<br />
b) A rigorous proof that zSW = [62]. This bound, while valid for all d <strong>and</strong> q, is<br />
extremely far from sharp for the Is<strong>in</strong>gmodels <strong>in</strong> dimensions 3 <strong>and</strong> higher. But it<br />
is reasonably good for the 3-<strong>and</strong>4-state Potts models <strong>in</strong> two dimensions, <strong>and</strong> <strong>in</strong><br />
the latter case it may even be sharp. 27<br />
But much rema<strong>in</strong>s to beunderstood!<br />
The Potts model with q large behaves very di erently. Instead of a critical po<strong>in</strong>t,<br />
the model undergoes a rst-order phase transition: <strong>in</strong> two dimensions, this occurs when<br />
q>4, while <strong>in</strong> three or more dimensions, it is believed to occur already when q 3<br />
[49]. At a rst-order transition, both the conventional algorithms <strong>and</strong> the Swendsen-<br />
Wang algorithm have an extremely severe slow<strong>in</strong>g-down (much more severe than the<br />
slow<strong>in</strong>g-down at a critical po<strong>in</strong>t): right atthe transition temperature, we expect<br />
exp(cL d;1 ). This is because sets of con gurations typical of the ordered <strong>and</strong> disordered<br />
phases are separated by free-energy barriers of order L d;1 ,i.e.bysetsof<strong>in</strong>termediate<br />
con gurations that conta<strong>in</strong> <strong>in</strong>terfaces of surface area L d;1 <strong>and</strong> therefore have an<br />
equilibrium probability exp(;cL d;1 ). 28<br />
Wol [65] has proposed a <strong>in</strong>terest<strong>in</strong>gmodication of the SWalgorithm, <strong>in</strong> whichone<br />
builds only a s<strong>in</strong>gle cluster (start<strong>in</strong>g at a r<strong>and</strong>omly chosen site) <strong>and</strong> ips it. Clearly,<br />
one step of the s<strong>in</strong>gle-cluster SW algorithm makes less change <strong>in</strong> the system than one<br />
step of the st<strong>and</strong>ard SW algorithm, but italsotakes much lesswork. If one enumerates<br />
the cluster us<strong>in</strong>g depth- rst-search or breadth- rst-search, then the CPUtime is<br />
proportional to the size of the cluster <strong>and</strong> bythe Fortu<strong>in</strong>-Kasteleyn identity (6.9), the<br />
mean cluster size is equal to the susceptibility:<br />
X<br />
h iji = X<br />
* +<br />
q i ; 1 j<br />
: (6.12)<br />
q ; 1<br />
j<br />
j<br />
So the relevant quantity <strong>in</strong>the s<strong>in</strong>gle-cluster SW algorithm is the dynamic critical exponent<br />
measured <strong>in</strong> CPU units:<br />
z1;clusterCP U = z1;cluster ;<br />
d ;<br />
!<br />
: (6.13)<br />
27Note Added1996: See [112, 113,114] for more recent data onthepossible sharpness of the<br />
Li{Sokal bound fortwo-dimensional models with q =23 4. These data appear to be consistent with<br />
sharpness modulo a logarithm, i.e. SW=CH log L.<br />
28Note Added 1996: Great progress has been made <strong>in</strong> recent years <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> methods for<br />
systems undergo<strong>in</strong>g a rst-order phase transition. The key idea is to simulate asuitably chosen non-<br />
Boltzmann probability distribution (\umbrella", \multicanonical", :::)<strong>and</strong>then apply reweight<strong>in</strong>g<br />
methods. The exponential slow<strong>in</strong>g-down exp(cLd;1 )result<strong>in</strong>g from barrier penetration is replaced<br />
by apower-law slow<strong>in</strong>g-down Lp result<strong>in</strong>g from di usion, provided thatthedistribution to be<br />
simulated is suitably chosen. See [115, 116,117,118,119] for reviews.<br />
51
The value of the s<strong>in</strong>gle-cluster algorithm is that the probability ofchoos<strong>in</strong>g a cluster<br />
is proportional to its size (s<strong>in</strong>ce we pick a r<strong>and</strong>om site), so the work is concentrated<br />
preferentially on larger clusters | <strong>and</strong> weth<strong>in</strong>k that isthese clusters which are most importantnear<br />
thecritical po<strong>in</strong>t. So it would not be surpris<strong>in</strong>gifz1;clusterCP U were smaller<br />
than zSW. Prelim<strong>in</strong>ary measurements <strong>in</strong>dicate that z1;clusterCP U is about the same as<br />
zSW for the two-dimensional Is<strong>in</strong>g model, but is signi cantly smaller for the three- <strong>and</strong><br />
four-dimensional Is<strong>in</strong>g models [66]. But a conv<strong>in</strong>c<strong>in</strong>g theoretical underst<strong>and</strong><strong>in</strong>g ofthis<br />
behavior is lack<strong>in</strong>g.<br />
Several other generalizations of the SW algorithm for Pottsmodels havebeenproposed. 29<br />
Oneisamulti-grid extension of the SWalgorithm: the idea is to carryout only a partial<br />
FKSW transformation, but then to apply this concept recursively [67]. This algorithm<br />
may have a dynamic critical exponent that issmaller than that ofst<strong>and</strong>ard SW (but the<br />
claims that z =0are<strong>in</strong>myop<strong>in</strong>ion unconv<strong>in</strong>c<strong>in</strong>g). 30 A second generalization, which<br />
works only <strong>in</strong> two dimensions, augments the SWalgorithm by transformations to the<br />
dual lattice [68]. This algorithm appears to achieve amodest improvement <strong>in</strong>critical<br />
slow<strong>in</strong>g-down <strong>in</strong> the scal<strong>in</strong>g region j ; cj L ;1= .<br />
F<strong>in</strong>ally, the SW algorithm can be generalized <strong>in</strong> a straightforward manner to Potts<br />
lattice gauge theories (more precisely, lattice gauge theories with a nite abelian gauge<br />
group G <strong>and</strong> Potts ( -function) action). Prelim<strong>in</strong>ary results for the three-dimensional<br />
Z2 gauge theory yield a dynamic critical exponent roughly equal to that ofthe ord<strong>in</strong>ary<br />
SW algorithm for the three-dimensional Is<strong>in</strong>g model (to which the gauge theory is dual)<br />
[69].<br />
In the pastyear there has been a urry of papers try<strong>in</strong>g togeneralize the SW algorithm<br />
to non-Potts models. Interest<strong>in</strong>g proposals for a direct generalization of the SW<br />
algorithm were madebyEdwards <strong>and</strong> Sokal [60] <strong>and</strong> by Niedermayer [70]. But themost<br />
promis<strong>in</strong>gideas atpresent seem to bethe embedd<strong>in</strong>g algorithms proposed byBrower <strong>and</strong><br />
Tamayo [71] for one-component ' 4 models <strong>and</strong> byWol [65, 72] for multi-component<br />
O(n)-<strong>in</strong>variant models.<br />
The idea of the embedd<strong>in</strong>g algorithmsisto ndIs<strong>in</strong>g-like variables underly<strong>in</strong>g a<br />
general sp<strong>in</strong> variable, <strong>and</strong> then to update the result<strong>in</strong>g Is<strong>in</strong>gmodel us<strong>in</strong>g the ord<strong>in</strong>ary<br />
SW algorithm (or the s<strong>in</strong>gle-cluster variant). For one-component sp<strong>in</strong>s, this embedd<strong>in</strong>g<br />
is the obvious decomposition <strong>in</strong>to magnitude <strong>and</strong> sign. Consider, therefore, a one-<br />
29 Note Added 1996: In addition to the generalizations discussed below, see [120, 112] for SW-type<br />
algorithms for the Ashk<strong>in</strong>-Teller model see [121] for a clever SW-type algorithm for the fully frustrated<br />
Is<strong>in</strong>g model <strong>and</strong> see [122, 123, 124,100] for a very <strong>in</strong>terest<strong>in</strong>g SW-type algorithm for antiferromagnetic<br />
Potts models, based on the \embedd<strong>in</strong>g" idea to bedescribed below.<br />
30 Note Added 1996: For at least some versions of the multi-grid SW algorithm, it can be proven<br />
[125] that the bound zMGSW = holds. Thus, the critical slow<strong>in</strong>g-down is not completely elim<strong>in</strong>ated<br />
<strong>in</strong> any model <strong>in</strong> which the speci c heat isdivergent.<br />
52
component model with Hamiltonian<br />
H(') =; X<br />
hxyi<br />
where 0<strong>and</strong> V (') =V (;'). We write<br />
'x 'y + X<br />
V ('x) (6.14)<br />
x<br />
'x = "x j'xj (6.15)<br />
where the signsf"g are Is<strong>in</strong>g variables. For xed values of the magnitudes fj'jg, the<br />
conditional probabilitydistribution of the f"g is given by anIs<strong>in</strong>gmodel with ferromagnetic<br />
(though space-dependent) coupl<strong>in</strong>gs Jxy j'xjj'yj. Therefore, the f"g model<br />
can be updated by the Swendsen-Wang algorithm. (Heat-bath or MGMC sweeps must<br />
also be performed, <strong>in</strong> order to update the magnitudes.) For the two-dimensional '4 model, Brower <strong>and</strong> Tamayo [71] nd adynamic critical behavior identical to that ofthe<br />
SW algorithm for the two-dimensional Is<strong>in</strong>g model | just as one would expect based<br />
on the idea that the \important" collective modes <strong>in</strong> the '4 model are sp<strong>in</strong> ips.<br />
Wol 's embedd<strong>in</strong>g algorithm for n-component models (n<br />
Consider an O(n)-<strong>in</strong>variant model with Hamiltonian<br />
2) is equally simple.<br />
H( ) =; X<br />
x y + X<br />
V (j xj) (6.16)<br />
hxyi<br />
with 0. Now xaunit vector r 2 IR n then any sp<strong>in</strong>vector x 2 IR n can be written<br />
uniquely (except for a set of measure zero) <strong>in</strong> the form<br />
where<br />
is the component of x perpendicular to r, <strong>and</strong><br />
x<br />
x = ? x + "xj x rjr (6.17)<br />
?<br />
x = x ; ( x r)r (6.18)<br />
"x = sgn( x r) (6.19)<br />
takes thevalues 1. Therefore, for xed values of the f ? g <strong>and</strong> fj rjg, the probability<br />
distribution of the f"g is given by anIs<strong>in</strong>gmodel with ferromagnetic coupl<strong>in</strong>gs Jxy<br />
j x rjj y rj. The algorithm is then: Choose at r<strong>and</strong>om a unit vector r xthe f ? g<br />
<strong>and</strong> fj rjg at their currentvalues, <strong>and</strong> update the f"g by thest<strong>and</strong>ard Swendsen-Wang<br />
algorithm (or the s<strong>in</strong>gle-cluster variant). Flipp<strong>in</strong>g "x corresponds to re ect<strong>in</strong>g x <strong>in</strong> the<br />
hyperplane perpendicular to r.<br />
At rst thoughtitmay seem strange (<strong>and</strong> somehow\unphysical") to try to ndIs<strong>in</strong>glike<br />
(i.e. discrete) variables <strong>in</strong> a model with acont<strong>in</strong>uous symmetry group. However,<br />
upon re ection (pardon the pun) one seeswhat isgo<strong>in</strong>g on: if the sp<strong>in</strong> con guration is<br />
slowly vary<strong>in</strong>g (e.g. a long-wavelength sp<strong>in</strong> wave), then the clusters tend to break along<br />
the surfaces where Jxy is small, hence where r 0. Then ipp<strong>in</strong>g "x on some clusters<br />
53
Figure 4: Action of theWol embedd<strong>in</strong>g algorithm on a long-wavelength sp<strong>in</strong>wave. For<br />
simplicity,both sp<strong>in</strong> space ( )<strong>and</strong>physical space (x) are depicted as one-dimensional.<br />
corresponds to a \soft" change near the cluster boundaries but a\hard" change <strong>in</strong> the<br />
cluster <strong>in</strong>teriors, i.e. a long-wavelength collective mode (Figure 4). So it is conceivable<br />
that the algorithm could havevery small (or even zero!) critical slow<strong>in</strong>g-down <strong>in</strong> models<br />
where the important large-scale collective modes are sp<strong>in</strong> waves.<br />
Even more strik<strong>in</strong>gly, consider a two-dimensional XY -model con guration consist<strong>in</strong>g<br />
of a widely separated vortex-antivortex pair: <strong>in</strong> the cont<strong>in</strong>uum limit thisisgiven by<br />
(z) =Imlog(z ; a) ; Imlog(z + a) (6.20)<br />
where z x1 + ix2 <strong>and</strong> (cos s<strong>in</strong> ). Then the surface r = 0 is an ellipse<br />
pass<strong>in</strong>g directly through the vortex <strong>and</strong> antivortex 31 . Re ect<strong>in</strong>g the sp<strong>in</strong>s <strong>in</strong>side this<br />
ellipse produces a con guration f newg that is a cont<strong>in</strong>uous map of thedoubly punctured<br />
plane <strong>in</strong>to the semicircle j j =1, r 0. Such amap necessarily has zero w<strong>in</strong>d<strong>in</strong>g<br />
number around the po<strong>in</strong>ts a. Sothe Wol update has destroyed the vorticity! 32<br />
Therefore, the key collectivemodes <strong>in</strong> the two-dimensional XY <strong>and</strong> O(n) models |<br />
sp<strong>in</strong> waves <strong>and</strong> (for the XY case) vortices | are well \encoded" by the Is<strong>in</strong>gvariables<br />
f"g. So it is quite plausible that critical slow<strong>in</strong>g-down could be elim<strong>in</strong>ated or almost<br />
elim<strong>in</strong>ated. In fact, tests ofthis algorithm on thetwo-dimensional XY (n = 2), classical<br />
Heisenberg (n =3)<strong>and</strong> O(4) models are consistentwiththe completeabsence of critical<br />
slow<strong>in</strong>g-down [72, 73].<br />
In view of the extraord<strong>in</strong>ary success of the Wol algorithm for sp<strong>in</strong> models, it is<br />
tempt<strong>in</strong>g to try to extend ittolattice gauge theories with cont<strong>in</strong>uous gauge group [for<br />
example, U(1), SU(N) orSO(N)]. But Ihave noth<strong>in</strong>g to report at present! 33<br />
31 With r = (cos s<strong>in</strong> ), the equation of this ellipse is x 2 1 +(x2 + a tan ) 2 = a 2 sec 2 .<br />
32 This picture of the action of the Wol algorithm on vortex-antivortex pairs was developed <strong>in</strong><br />
discussions with Richard Brower <strong>and</strong> Robert Swendsen.<br />
33 Note Added 1996: See [126] for a general theory of Wol -type embedd<strong>in</strong>g algorithms as applied<br />
to nonl<strong>in</strong>ear -models (sp<strong>in</strong> models tak<strong>in</strong>g values <strong>in</strong> a Riemannian manifold M). Roughly speak<strong>in</strong>g,<br />
we argue that aWol -type embedd<strong>in</strong>g algorithm can work well |<strong>in</strong> the sense of hav<strong>in</strong>g adynamic<br />
54
7 Algorithms for the Self-Avoid<strong>in</strong>g Walk 34<br />
An N{step self-avoid<strong>in</strong>g walk ! on a lattice L is a sequence of dist<strong>in</strong>ct po<strong>in</strong>ts<br />
!0 !1 ::: !N <strong>in</strong> L suchthat eachpo<strong>in</strong>tisanearest neighbor of its predecessor. Let cN<br />
[resp. cN(x)] be the number of N-step SAWs start<strong>in</strong>g atthe orig<strong>in</strong><strong>and</strong>end<strong>in</strong>g anywhere<br />
[resp. end<strong>in</strong>g at x]. Let h! 2 Ni be the mean-square end-to-end distance of an N-step<br />
SAW. These quantities are believed to have the asymptotic behavior<br />
cN<br />
N ;1<br />
N (7.1)<br />
cN(x)<br />
N<br />
N s<strong>in</strong>g;2<br />
(x xed 6= 0) (7.2)<br />
h! 2 Ni N 2 (7.3)<br />
as N !1here , s<strong>in</strong>g <strong>and</strong> are critical exponents, while (the connective constant<br />
of the lattice) is the analogue of a critical temperature. The SAWhas direct application<br />
<strong>in</strong> polymer physics [74], <strong>and</strong> is<strong>in</strong>directly relevant to ferromagnetism <strong>and</strong> quantum eld<br />
theory by virtue of its equivalence with the n ! 0 limit of the n-vector model [75].<br />
The SAWhas some advantages over sp<strong>in</strong> systems for <strong>Monte</strong> <strong>Carlo</strong> work: Firstly, one<br />
can work directly with SAWs on an <strong>in</strong> nitelattice there are no systematic errors due to<br />
nite-volume corrections. Secondly, there is no L d (or d ) factor <strong>in</strong> the computational<br />
work, so one can go closer tocriticality. Thus, theSAW is an exceptionally advantageous<br />
\laboratory" for the numerical study of critical phenomena.<br />
Di erent aspects ofthe SAW can be probed <strong>in</strong> three di erent ensembles 35 :<br />
Free-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble (variable N, variable x)<br />
Fixed-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble (variable N, xed x)<br />
Canonical ensemble ( xed N, variable x)<br />
In therema<strong>in</strong>der of this section wesurvey sometypical <strong>Monte</strong> <strong>Carlo</strong> algorithms for these<br />
ensembles.<br />
Free-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble. Here the con guration space S is the set<br />
of all SAWs, of arbitrary length, start<strong>in</strong>g atthe orig<strong>in</strong> <strong>and</strong> end<strong>in</strong>g anywhere. The gr<strong>and</strong><br />
critical exponent z signi cantly less than2|only if M is a discrete quotient of a product of spheres:<br />
for example, a sphere S N;1 or a real projective space RP N;1 S N;1 =Z2. In particular, we donot<br />
expect Wol -type embedd<strong>in</strong>g algorithms to work well for -models tak<strong>in</strong>g values <strong>in</strong> SU(N) for N 3.<br />
34 Note Added 1996: An extensive <strong>and</strong>up-to-date review of <strong>Monte</strong> <strong>Carlo</strong> algorithms for the selfavoid<strong>in</strong>g<br />
walk can be found <strong>in</strong> [127]. A briefer version is [128].<br />
35 The proper term<strong>in</strong>ology for these ensembles is unclear to me. Perhaps the gr<strong>and</strong> canonical <strong>and</strong><br />
canonical ensembles ought to be called \canonical" <strong>and</strong> \microcanonical", respectively, reserv<strong>in</strong>g the<br />
term \gr<strong>and</strong> canonical" for ensembles of many SAWs. Note Added 1996: Inow prefer call<strong>in</strong>g these<br />
the \variable-length" <strong>and</strong> \ xed-length" ensembles, respectively, <strong>and</strong>avoid<strong>in</strong>g the \{canonical" terms<br />
entirely this avoids ambiguity.<br />
55
partition function is<br />
<strong>and</strong> the Gibbs measure is<br />
( )=<br />
1X<br />
N=0<br />
N cN<br />
(7.4)<br />
(!) = ( ) ;1 j!j : (7.5)<br />
;1 The \monomer activity" is a user-chosen parameter satisfy<strong>in</strong>g 0 < c .<br />
As approaches the critical activity c, the average walk length hNi tends to <strong>in</strong> nity.<br />
The connective constant <strong>and</strong> the critical exponent can be estimated from the <strong>Monte</strong><br />
<strong>Carlo</strong> data, us<strong>in</strong>g the method of maximum likelihood [76].<br />
A dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithm for this ensemble was proposed by Berretti <strong>and</strong><br />
Sokal [76]. The algorithm's elementary moves are as follows: either one attempts to<br />
appendanew step to thewalk, with equal probability<strong>in</strong>eachoftheq possible directions<br />
(here q is the coord<strong>in</strong>ation number of the lattice) or else one deletes the last step from<br />
thewalk. In theformer case, one must checkthatthe proposed new walkisself-avoid<strong>in</strong>g<br />
if it isn't, then the attempted move is rejected <strong>and</strong> theold con guration is counted aga<strong>in</strong><br />
<strong>in</strong> the sample (\null transition"). If an attempt is madetodeleteastep from an alreadyempty<br />
walk, then a null transition is also made. The relative probabilitiesof N =+1<br />
<strong>and</strong> N = ;1 attempts arechosen to be<br />
IP( N =+1attempt) =<br />
IP( N = ;1 attempt) =<br />
Therefore, the transition probability matrix is<br />
p(! ! ! 0 ) =<br />
8<br />
><<br />
>:<br />
q<br />
1+q<br />
1<br />
1+q<br />
1+q SAW(! 0 ) if ! ! 0<br />
1<br />
1+q if ! 0 ! or ! = ! 0 = <br />
1+q A(!) if ! = !0 = <br />
(7.6)<br />
(7.7)<br />
(7.8)<br />
where<br />
SAW(! 0 ) = 1 if !0 0<br />
is a SAW<br />
if ! 0 is not a SAW<br />
(7.9)<br />
Here ! ! 0 denotes thatthewalk ! 0 is a one-step extension of !<strong>and</strong>A(!)isthenumber of non-self-avoid<strong>in</strong>g walks ! 0 with ! ! 0 . Itiseasilyveri ed that this transition matrix<br />
satis es detailed balance for the Gibbs distribution ,i.e.<br />
(!) p(! ! ! 0 ) = (! 0 ) p(! 0 ! !) (7.10)<br />
for all !! 0 2 S. It is also easily veri ed that the algorithm is ergodic (= irreducible):<br />
to getfromawalk ! to another walk ! 0 , it su ces to use N = ;1 moves to transform<br />
! <strong>in</strong>to the empty walk , <strong>and</strong>then use N =+1moves to build up the walk ! 0 .<br />
Let us now analyze the critical slow<strong>in</strong>g-down of the Berretti-Sokal algorithm. We<br />
can argue heuristically that<br />
hNi 2 : (7.11)<br />
56
To see this, consider the quantity N(t) =j!j(t), the number of steps <strong>in</strong> the walk at<br />
time t. This quantity executes, crudely speak<strong>in</strong>g, a r<strong>and</strong>om walk (with drift)onthe<br />
nonnegative <strong>in</strong>tegers the average time togofromsome po<strong>in</strong>t N to the po<strong>in</strong>t 0 (i.e. the<br />
emptywalk) is of order N 2 .Now, eachtimetheemptywalk is reached, all memory of the<br />
past is erased future walks are then <strong>in</strong>dependentofpastones. Thus, theautocorrelation<br />
time ought tobeoforder hN 2 i, or equivalently hNi 2 .<br />
This heuristic argument can be turned <strong>in</strong>to a rigorous proof of a lower bound<br />
const hNi 2 [10]. However, as an argument for an upper bound of the same form, it<br />
is not entirely conv<strong>in</strong>c<strong>in</strong>g, as it assumes without proof that the slowest mode isthe one<br />
represented by N(t). With considerably more work, it is possible to prove anupper<br />
bound on that is only slightly weaker than theheuristic prediction: const hNi 1+<br />
[10, 77]. 36 (Note that the critical exponent is believed to equal 43=32 <strong>in</strong> dimension<br />
d =2, 1:16 <strong>in</strong> d =3,<strong>and</strong>1<strong>in</strong>d 4.) In fact, we suspect[78]that the truebehavior is<br />
hNi p for some exponent p strictly between 2<strong>and</strong>1+ .Adeeper underst<strong>and</strong><strong>in</strong>g of<br />
the dynamic critical behavior of the Berretti-Sokal algorithm would be of de nite value.<br />
It is worth compar<strong>in</strong>g the computational work required for SAW versus Is<strong>in</strong>g simulations:<br />
hNi 2 2= = 3:4 for the d =3SAW, versus d+z = 5:0 (resp. 3:8 )<br />
for the d =3Is<strong>in</strong>gmodel us<strong>in</strong>g the Metropolis (resp. Swendsen-Wang) algorithm. This<br />
v<strong>in</strong>dicates our assertion thattheSAWisanadvantageous model for <strong>Monte</strong><strong>Carlo</strong>studies<br />
of critical phenomena.<br />
Fixed-endpo<strong>in</strong>t gr<strong>and</strong> canonical ensemble. The con guration space S is the set of all<br />
SAWs, of arbitrary length, start<strong>in</strong>gatthe orig<strong>in</strong> <strong>and</strong>end<strong>in</strong>gatthe xed site x (6= 0). The<br />
ensemble is as <strong>in</strong> the free-endpo<strong>in</strong>t case, with cN replaced by NcN(x). The connective<br />
constant <strong>and</strong> the critical exponent s<strong>in</strong>g can be estimated from the <strong>Monte</strong> <strong>Carlo</strong> data,<br />
us<strong>in</strong>g the method of maximum likelihood.<br />
Adynamic <strong>Monte</strong> <strong>Carlo</strong> algorithm for this ensemble was proposed by Berg<strong>and</strong><br />
Foerster [79] <strong>and</strong> Arag~ao de Carvalho, Caracciolo <strong>and</strong> Frohlich (BFACF) [75, 80]. The<br />
elementary moves are local deformations of the cha<strong>in</strong>, with N =0 2. The critical<br />
slow<strong>in</strong>g-down <strong>in</strong> the BFACF algorithm is quite subtle. On the oneh<strong>and</strong>, Sokal <strong>and</strong><br />
Thomas [81] have proven the surpris<strong>in</strong>g resultthat exp =+1 for all 6= 0 (see Section<br />
8). On the other h<strong>and</strong>, numerical experiments [12, 82] show that <strong>in</strong>tN hNi 3:0 0:4 (<strong>in</strong><br />
d = 2). Clearly, the BFACF dynamics is not well understood at present: further work,<br />
both theoretical <strong>and</strong> numerical, is needed.<br />
In addition, Caracciolo, Pelissetto <strong>and</strong> Sokal [82] are study<strong>in</strong>g a\hybrid" algorithm<br />
that comb<strong>in</strong>es local (BFACF) moves with non-local (cut-<strong>and</strong>-paste) moves. The critical<br />
slow<strong>in</strong>g-down, measured <strong>in</strong> CPU units, appears to bereduced slightly compared to the<br />
pure BFACF algorithm: hNi 2:3 <strong>in</strong> d =2.<br />
Canonical ensemble. Algorithms for this ensemble, based on local deformations of<br />
the cha<strong>in</strong>, have been used by polymer physicists for more than 25 years [83, 84]. So the<br />
recent proof [85] that allsuch algorithms are nonergodic (= not irreducible) comes as a<br />
36 All these proofs are discussed <strong>in</strong> Section 8.<br />
57
slightembarrassment. Fortunately,there does exist a non-local xed-N algorithm which<br />
is ergodic: the \pivot" algorithm, <strong>in</strong>vented by Lal [86] <strong>and</strong> <strong>in</strong>dependently re<strong>in</strong>vented by<br />
MacDonald et al. [87] <strong>and</strong> by Madras [16]. The elementary move isasfollows: choose<br />
at r<strong>and</strong>om a pivot po<strong>in</strong>t k along the walk (1 k N ; 1) choose at r<strong>and</strong>om a nonidentity<br />
elementofthe symmetry group of the lattice (rotation or re ection) then apply<br />
the symmetry-group element to !k+1 ::: !N us<strong>in</strong>g !k as a pivot. The result<strong>in</strong>g walk<br />
is accepted if it is self-avoid<strong>in</strong>g otherwise it is rejected <strong>and</strong> the walk ! is counted once<br />
more <strong>in</strong> the sample. It can be proven [16] that this algorithm is ergodic <strong>and</strong> preserves<br />
the equal-weight probability distribution.<br />
At rstthought the pivot algorithm sounds terrible (at least it did to me): for N<br />
large, nearly all theproposed moves will get rejected. Thisis<strong>in</strong>facttrue: the acceptance<br />
fraction behaves N ;p as N !1,where p 0:19 <strong>in</strong> d = 2 [16]. On the other h<strong>and</strong>,<br />
the pivot moves are very radical: after very few (5 or 10) accepted moves the SAW<br />
will have reached an \essentially new" conformation. One conjectures, therefore, that<br />
N p . Actually it is necessary to be a bit more careful: for global observables f<br />
(such asthe end-to-end distance ! 2 N )one expects <strong>in</strong>tf N p but local observables<br />
(such asthe angle between the 17 th <strong>and</strong> 18 th bonds of the walk) are expected to evolve a<br />
factor of N more slowly: <strong>in</strong>tf N 1+p .Thus, the slowest mode isexpected to behaveas<br />
exp N 1+p .For thepivot algorithm applied to ord<strong>in</strong>ary r<strong>and</strong>om walk one can calculate<br />
the dynamical behavior exactly [16]: for global observables f theautocorrelation function<br />
behaves roughly like<br />
nX<br />
ff(t) 1 ; i<br />
t<br />
(7.12)<br />
n<br />
from which itfollows that<br />
i=1<br />
<strong>in</strong>tf log N (7.13)<br />
expf N (7.14)<br />
| <strong>in</strong> agreement with our heuristic argument modulo logarithms. For the SAW, it is<br />
found numerically [16] that <strong>in</strong>tf N 0:20 <strong>in</strong> d = 2, also <strong>in</strong> close agreement withthe<br />
heuristic argument.<br />
A careful analysis of the computational complexityofthe pivot algorithm [36] shows<br />
that one \e ectively <strong>in</strong>dependent" sample (at least as regards global observables) can<br />
be produced <strong>in</strong> a computer time oforder N. Thisisafactor N more e cient than the<br />
Berretti-Sokal algorithm, a fact which opens up excit<strong>in</strong>g prospects for high-precision<br />
<strong>Monte</strong> <strong>Carlo</strong> studies of critical phenomena <strong>in</strong>the SAW. Thus, with a modest computational<br />
e ort (300 hours on a Cyber 170-730), Madras <strong>and</strong> I found =0:7496 0:0007<br />
(95% con dence limits) for 2-dimensional SAWs of lengths 200 N 10000 [16].<br />
We hope to carry out soon a conv<strong>in</strong>c<strong>in</strong>g numerical test of hyperscal<strong>in</strong>g <strong>in</strong>the threedimensional<br />
SAW. 37<br />
37 Note Added 1996: This study has now been carried out [129], <strong>and</strong> yields =0:5877 0:0006<br />
(subjective 68% con dence limits), based on SAWs of length upto 80000 steps. Proper treatment<br />
58
8 Rigorous Analysis of Dynamic <strong>Monte</strong> <strong>Carlo</strong> Algorithms<br />
In this nal lecture, we discusstechniques for prov<strong>in</strong>g rigorous lower <strong>and</strong> upper<br />
bounds on the autocorrelation times of dynamic <strong>Monte</strong> <strong>Carlo</strong> algorithms. This topic is<br />
of primary <strong>in</strong>terest, of course, to mathematical physicists: it constitutes the rst steps<br />
toward a rigorous theory of dynamic critical phenomena, along l<strong>in</strong>es parallel to the wellestablished<br />
rigorous theory of static critical phenomena. But these proofs are, I believe,<br />
also of some importance for practical <strong>Monte</strong> <strong>Carlo</strong> work, as they give <strong>in</strong>sight <strong>in</strong>to the<br />
physical basis of critical slow<strong>in</strong>g-down <strong>and</strong> may po<strong>in</strong>ttowards improved algorithms with<br />
reduced critical slow<strong>in</strong>g-down.<br />
There is a big di erence between the techniques used for prov<strong>in</strong>g lower <strong>and</strong> upper<br />
bounds, <strong>and</strong> itiseasytounderst<strong>and</strong> this physically. Toprove alower bound onthe<br />
autocorrelation time, it su ces to nd one physical reason why the dynamics should be<br />
slow, i.e. to nd one \slow mode". This physical <strong>in</strong>sight can often be converted directly<br />
<strong>in</strong>to a rigorous proof, us<strong>in</strong>g the variational method described below. On the other h<strong>and</strong>,<br />
to prove anupper bound onthe autocorrelation time, it is necessary to underst<strong>and</strong> all<br />
conceivable physical reasons for slowness, <strong>and</strong>toprovethat none of them cause too great<br />
a slow<strong>in</strong>g-down. This is extremely di cult todo,<strong>and</strong>has been carried to completion<br />
<strong>in</strong> very few cases.<br />
We shall be obliged to restrict attention to reversible Markov cha<strong>in</strong>s [i.e. those satisfy<strong>in</strong>g<br />
the detailed-balance condition (2.27)], as these give rise to self-adjo<strong>in</strong>t operators.<br />
Non-reversible Markov cha<strong>in</strong>s, correspond<strong>in</strong>g to non-self-adjo<strong>in</strong>t operators, are much<br />
more di cult toanalyze rigorously.<br />
The pr<strong>in</strong>cipal method for prov<strong>in</strong>g lower bounds on exp (<strong>and</strong> <strong>in</strong>facton <strong>in</strong>tf) isthe<br />
variational (or Rayleigh-Ritz) method. Let us recall some ofthe theory of reversible<br />
Markov cha<strong>in</strong>s from Section 2: Let be a probability measure, <strong>and</strong> letP = fpxyg be a<br />
transition probability matrix that satis es detailed balance with respectto . Now P<br />
acts naturally on functions (observables) accord<strong>in</strong>g to<br />
(Pf)(x)<br />
X<br />
y<br />
pxy f(y) : (8.1)<br />
In particular, when act<strong>in</strong>g onthe space l 2 ( )of -square-<strong>in</strong>tegrable functions, the operator<br />
P is a self-adjo<strong>in</strong>t contraction. Its spectrum therefore lies <strong>in</strong> the <strong>in</strong>terval [;1 1].<br />
Moreover, P has an eigenvalue 1 with eigenvector equal to the constant function 1.<br />
Let be the orthogonal projector <strong>in</strong> l 2 ( )onto the constant functions. Then, for each<br />
real-valued observable f 2 l 2 ( ), we have<br />
Cff(t) = (f (P jtj ; )f)<br />
of corrections to scal<strong>in</strong>g iscrucial <strong>in</strong> obta<strong>in</strong><strong>in</strong>g this estimate. We also show that the <strong>in</strong>terpenetration<br />
ratio approaches its limit<strong>in</strong>g value =0:2471 0:0003 from above, contrary to the prediction of<br />
the two-parameter renormalization-group theory. Wehave critically reexam<strong>in</strong>ed this theory <strong>and</strong> shown<br />
where the error lies [130, 129].<br />
59
= (f (I ; )P jtj (I ; )f) (8.2)<br />
where (gh) hg hi denotes the <strong>in</strong>ner product <strong>in</strong> l2 ( ). By the spectral theorem, this<br />
can be written as<br />
Cff(t) =<br />
Z 1<br />
jtj<br />
d ff( ) (8.3)<br />
where d ff is a positive measure. It follows that<br />
<strong>in</strong>tf = 1<br />
2<br />
1<br />
2<br />
;1<br />
R 1;1 1+<br />
1; d ff( )<br />
R 1;1 d ff( )<br />
1+ ff(1)<br />
1 ; ff(1)<br />
(8.4)<br />
by Jensen's <strong>in</strong>equality (s<strong>in</strong>ce the function 7! (1 + )=(1 ; )isconvex). 38<br />
Our method will be to compute explicitly a lower bound onthe normalized autocorrelation<br />
function at time lag1,<br />
ff(1) = Cff(1)<br />
Cff(0)<br />
(8.5)<br />
for a suitably chosen trial observable f. Equivalently, we compute anupper bound on<br />
the Rayleigh quotient<br />
(f (I ; P )f)<br />
(f (I ; )f) = Cff(0) ; Cff(1)<br />
Cff(0)<br />
= 1; ff(1) : (8.6)<br />
The crux of the matter is to pick a trial observable f that has a strong enoughoverlap<br />
with the \slowest mode". A useful formula is<br />
(f (I ; P )f) = 1<br />
2<br />
X<br />
xy<br />
x pxy jf(x) ; f(y)j 2 : (8.7)<br />
That is,the numerator of the Rayleigh quotient ishalf the mean-square change <strong>in</strong> f <strong>in</strong><br />
as<strong>in</strong>gle time step.<br />
Example 1. S<strong>in</strong>gle-site heat-bath algorithm. Let us consider, for simplicity, a translation<strong>in</strong>variant<br />
sp<strong>in</strong>model <strong>in</strong> a periodic box ofvolume V . Let Pi = fPi(' ! ' 0 )g be the<br />
transition probability matrix associated with apply<strong>in</strong>g the heat-bath operation at site<br />
i. 39 Then the operator Pi has the follow<strong>in</strong>g properties:<br />
38 It also follows from (8.3) that ff(t) ff(1) jtj for even values of t. Moreover, this holds for odd<br />
values of t if d ff is supported on 0(though not necessarily otherwise) this is the caseforthe<br />
heat-bath <strong>and</strong>Swendsen-Wang algorithms, <strong>in</strong> which P 0. Therefore, the decay as t !1of ff(t)<br />
is also bounded below <strong>in</strong>terms of ff(1).<br />
39 Probabilistically, Pi is the conditional expectation E ( jf'jgj6=i). Analytically, Pi is the orthogonal<br />
projection <strong>in</strong> l 2 ( )onto the l<strong>in</strong>ear subspace consist<strong>in</strong>g offunctions depend<strong>in</strong>g only on f'jgj6=i.<br />
60
(a) Pi is an orthogonal projection. (In particular, P 2<br />
i = Pi <strong>and</strong> 0 Pi I.)<br />
(b) Pi1 = 1. (In particular, Pi = Pi = .)<br />
(c) Pif = f if f depends only on f'jgj6=i. (In fact a stronger propertyholds: Pi(fg)=<br />
fPig if f depends only on f'jgj6=i. Butweshall not need this.)<br />
For simplicityweconsider only r<strong>and</strong>om siteupdat<strong>in</strong>g. Then the transition matrix of the<br />
heat-bath algorithm is<br />
P = 1 X<br />
Pi :<br />
V<br />
(8.8)<br />
We shall use the notation of the Is<strong>in</strong>gmodel, but the same proof applies to much more<br />
general models.<br />
As expla<strong>in</strong>ed <strong>in</strong> Section4,the critical slow<strong>in</strong>g-down of the local algorithms (such as<br />
s<strong>in</strong>gle-siteheat-bath) arises from the fact that large regions <strong>in</strong> which the net magnetization<br />
is positiveornegativetend tomoveslowly. Inparticular, the total magnetization of<br />
the lattice uctuates slowly. Soitisnatural to expect that that the total magnetization<br />
M = P i i will be a \slow mode". Indeed, let us compute anupper bound onthe<br />
Rayleigh quotient (8.6) with f = M. Thedenom<strong>in</strong>ator is<br />
i<br />
(M (I ; )M) =hM 2 i ; hMi 2 = V (8.9)<br />
(s<strong>in</strong>ce hMi =0bysymmetry). Thenumerator is<br />
(M (I ; P )M) = 1<br />
V<br />
= 1<br />
V<br />
1<br />
V<br />
= 1<br />
V<br />
X<br />
ijk<br />
X<br />
i<br />
X<br />
i<br />
( i (I ; Pj) k)<br />
( i (I ; Pi) i)<br />
( i i)<br />
X<br />
h 2 i i<br />
i<br />
= 1 : (8.10)<br />
Here we rstusedproperties (a) <strong>and</strong> (c) to deduce that ( i (I ; Pj) k) =0unless<br />
i = j = k, <strong>and</strong>then used property (a)tobound ( i (I ; Pi) i). It follows from (8.9)<br />
<strong>and</strong> (8.10) that<br />
<strong>and</strong> hence that<br />
1 ; MM(1)<br />
1<br />
V<br />
(8.11)<br />
<strong>in</strong>tM V ; 1<br />
: (8.12)<br />
2<br />
61
Note, however, that here time ismeasured <strong>in</strong> hits of a s<strong>in</strong>gle site. If wemeasure time<br />
<strong>in</strong> hits per site, then we conclude that<br />
(HPS)<br />
<strong>in</strong>tM<br />
; 1<br />
2V<br />
: (8.13)<br />
This provesalower bound [88, 89, 90] on the dynamic critical exponent z, namely<br />
z = . (Here <strong>and</strong> are the static critical exponents forthe susceptibility <strong>and</strong><br />
correlation length, respectively. Note that bythe usual static scal<strong>in</strong>g law, = =2; <br />
<strong>and</strong> for most models is very close to zero. So this argument almost makes rigorous<br />
(as a lower bound) the heuristic argument that z 2.)<br />
Virtually the same argument applies, <strong>in</strong> fact, to any reversible s<strong>in</strong>gle-site algorithm<br />
(e.g. Metropolis). The only di erence is that the property 0 Pi I is replaced by<br />
;I Pi I, sothe bound onthe numerator is a factor of 2 larger. Hence<br />
(HPS)<br />
<strong>in</strong>tM<br />
2<br />
1<br />
; : (8.14)<br />
2V 2<br />
Example 2. Swendsen-Wang algorithm. Recall that the SW algorithm simulates the<br />
jo<strong>in</strong>t model (6.6) by alternately apply<strong>in</strong>g the conditional distributions of fng given f g,<br />
<strong>and</strong> f g given fng |that is, by alternately generat<strong>in</strong>g new bond occupation variables<br />
(<strong>in</strong>dependent ofthe old ones) given the sp<strong>in</strong>s, <strong>and</strong> new sp<strong>in</strong> variables (<strong>in</strong>dependent of<br />
the old ones) given the bonds. The transition matrix PSW = PSW(f ng!f 0 n 0 g)is<br />
therefore a product<br />
PSW = PbondPsp<strong>in</strong> (8.15)<br />
where Pbond is the conditional expectation operator E( jf g), <strong>and</strong> Psp<strong>in</strong> is E( jfng).<br />
We shall use the variational method, with f chosen to bethe bonddensity<br />
f = N = X<br />
nij : (8.16)<br />
To lighten the notation, let us write b i j for a bond b = hiji. Wethen have<br />
It follows from (8.17) that<br />
hiji<br />
E(nbjf g) = pb (8.17)<br />
b<br />
E(nbnb0jf g) = pb pb0 b b0 for b 6= b 0<br />
(8.18)<br />
hnb(t =0)nb 0(t =1)i = hnb E(E(nb<br />
= hnb E(nb<br />
0jf g)jfng)i<br />
0jf g)i<br />
0jf g)i<br />
= hE(nbjf g) E(nb<br />
= pb pb0 h b b0i : (8.19)<br />
62
The correspond<strong>in</strong>g truncated (connected) correlation function is clearly<br />
hnb(t =0)nb 0(t =1)i = pb pb 0 h b b 0i (8.20)<br />
where hA Bi hABi;hAihBi. Wehavethus expressed a dynamic correlation function<br />
of the SW algorithm <strong>in</strong> terms of a static correlation function of the Potts model.<br />
Now let us compute the samequantity attime lag 0: by (8.18), for b 6= b 0 wehave<br />
<strong>and</strong> hence<br />
hnb(t =0)nb 0(t =0)i = pb pb 0 h b b 0i (8.21)<br />
hnb(t =0)nb 0(t =0)i = pb pb 0 h b b 0i : (8.22)<br />
On the other h<strong>and</strong>, for b = b 0 we clearly have<br />
hnb(t =0)nb0(t =0)i =<br />
2 hnbi;hnbi<br />
= pb h bi;p 2 b h bi2 : (8.23)<br />
Consider now the usual case <strong>in</strong> which<br />
(<br />
p for b 2 B<br />
pb =<br />
0 otherwise<br />
(8.24)<br />
for some family of bonds B. Comb<strong>in</strong><strong>in</strong>g (8.20), (8.22) <strong>and</strong> (8.23) <strong>and</strong> summ<strong>in</strong>g over<br />
b b 0 ,we obta<strong>in</strong><br />
hN (t =0)N (t =1)i = p 2 hE Ei (8.25)<br />
hN (t =0)N (t =0)i = p 2 hE Ei ;p(1 ; p)hEi (8.26)<br />
where E ; P b2B b 0isthe energy. Hence the normalized autocorrelation function<br />
at time lag 1 is exactly<br />
NN(1)<br />
hN (t =0)N (t =1)i<br />
hN (t =0)N (t =0)i<br />
= 1; ;(1 ; p)E<br />
pCH ; (1 ; p)E<br />
(8.27)<br />
where E V ;1 hEi is the mean energy <strong>and</strong> CH V ;1 hE Ei is the speci c heat (V is<br />
the volume). At criticality, p ! pcrit > 0<strong>and</strong> E ! Ecrit < 0, so<br />
NN(1) 1 ; const<br />
CH<br />
: (8.28)<br />
We nowremark that although PSW = PbondPsp<strong>in</strong> is not self-adjo<strong>in</strong>t, the modi ed<br />
transition matrix P 0 SW Psp<strong>in</strong>PbondPsp<strong>in</strong> is self-adjo<strong>in</strong>t (<strong>and</strong>positive-semide nite), <strong>and</strong><br />
N has the sameautocorrelation function for both:<br />
(N (PbondPsp<strong>in</strong>) t N ) = (N (Psp<strong>in</strong>PbondPsp<strong>in</strong>) t N ) (8.29)<br />
63
It follows that<br />
<strong>and</strong> hence<br />
NN(t) NN(1) jtj<br />
(1 ; const=CH) jtj<br />
(8.30)<br />
<strong>in</strong>tN const CH : (8.31)<br />
This proves a lower bound [62] on the dynamic critical exponent zSW, namely zSW<br />
= . (Here <strong>and</strong> are the static critical exponents forthesusceptibility <strong>and</strong>correla tion length, respectively.) For the two-dimensional Potts models with q =23 4, it is<br />
known exactly (but non-rigorously!) that = =0 2<br />
5 1, with multiplicative logarithmic<br />
corrections for q = 4 [91]. The bound onzSW may be conceivably be sharp (or sharp<br />
up to a logarithm) for q = 4 [62].<br />
Example 3. Berretti-Sokal algorithm for SAWs. Let us consider the observable<br />
f(!) =j!j (= N) (8.32)<br />
the total number of bonds <strong>in</strong> the walk. We have argued heuristically that hNi 2 we<br />
can now make this argument rigorous as a lower bound. Indeed, it su ces to use (8.7):<br />
s<strong>in</strong>ce the maximumchange of j!j <strong>in</strong> a s<strong>in</strong>gle time step is 1, it follows that<br />
(f (I ; P )f)<br />
1<br />
:<br />
2<br />
(8.33)<br />
On the other h<strong>and</strong>, the denom<strong>in</strong>ator of the Rayleigh quotient is<br />
(f (I ; )f) =hN 2 i;hNi 2 : (8.34)<br />
Assum<strong>in</strong>g the usual scal<strong>in</strong>g behavior (7.1) for the cN, wehave<br />
hNi<br />
hN 2 i<br />
asymptotically as " c = ;1 ,<strong>and</strong>hence<br />
1 ;<br />
( +1)<br />
1 ;<br />
(8.35)<br />
(8.36)<br />
<strong>in</strong>tN const hNi 2 : (8.37)<br />
Avery di erent approach toprov<strong>in</strong>g lower bounds on the autocorrelation time exp<br />
is the m<strong>in</strong>imum hitt<strong>in</strong>g-time argument [81]. Consider a Markov cha<strong>in</strong> with transition<br />
matrix P satisfy<strong>in</strong>g detailed balance for some probability measure . If A <strong>and</strong> B are<br />
subsets ofthe state space S, letTAB be the m<strong>in</strong>imumtime for gett<strong>in</strong>g from A to B with<br />
nonzero probability, i.e.<br />
TAB<br />
m<strong>in</strong>fn: p (n)<br />
xy > 0 for some x 2 A y 2 Bg (8.38)<br />
Then the theorem asserts that ifTAB is large <strong>and</strong> this is not \justi ed" by the rarity of<br />
A <strong>and</strong>/or B <strong>in</strong> the equilibrium distribution ,then the autocorrelation time exp must<br />
be large. More precisely:<br />
64
Theorem 2 Consider a Markov cha<strong>in</strong> with transition matrix P satisfy<strong>in</strong>g detailed balance<br />
for the probability measure .Let TAB be de ned as <strong>in</strong> (8.38). Then<br />
exp<br />
sup<br />
AB2S<br />
2(TAB ; 1)<br />
; log ( (A) (B))<br />
Proof. Let A B S, <strong>and</strong>letn
As noted at the beg<strong>in</strong>n<strong>in</strong>g ofthis lecture, it is much more di cult toprove upper<br />
bounds on the autocorrelation time | or even to prove that exp < 1 |<strong>and</strong> few<br />
nontrivial results have been obta<strong>in</strong>ed so far.<br />
The onlygeneral method (to myknowledge) for prov<strong>in</strong>g upper bounds on the autocorrelation<br />
time isCheeger's <strong>in</strong>equality [77, 92, 93], the basicidea of which isto search<br />
for \bottlenecks" <strong>in</strong> the state space. 40 Consider rst the rate of probability ow, <strong>in</strong><br />
the stationary Markov cha<strong>in</strong>, from a set A to itscomplement A c ,normalized by the<br />
<strong>in</strong>variant probabilities of A <strong>and</strong> A c :<br />
k(A)<br />
P<br />
x2A y2A c<br />
x pxy<br />
Now lookforthe worst such decomposition <strong>in</strong>to A <strong>and</strong> A c :<br />
(A) (Ac ) = ( A P Ac) l2 ( )<br />
(A) (Ac : (8.45)<br />
)<br />
k <strong>in</strong>f k(A) : (8.46)<br />
A: 0< (A)
space is a tree: then A can always be chosen so that itisconnected to Ac byas<strong>in</strong>gle<br />
bond. Us<strong>in</strong>g this fact, Lawler <strong>and</strong> Sokal [77] used Cheeger's <strong>in</strong>equality toprove<br />
0<br />
exp const hNi2 (8.48)<br />
for the Berretti-Sokal algorithm for SAWs.<br />
Avery di erent proof of an upper bound on 0 exp <strong>in</strong> the Berretti-Sokal algorithm was<br />
given by Sokal <strong>and</strong> Thomas [10]. Their method is to study <strong>in</strong> detail the exponential<br />
moments ofthe hitt<strong>in</strong>g times from an arbitrary walk ! to the emptywalk. Us<strong>in</strong>g a<br />
sequence of identities, they are able to write an algebraic <strong>in</strong>equality for such amoment<br />
<strong>in</strong> terms of itself this <strong>in</strong>equality says roughly that the moment iseither small or huge<br />
(where \huge" <strong>in</strong>cludes the possibility of+1) but cannot lie <strong>in</strong>-between (there is a<br />
\forbidden <strong>in</strong>terval"). Then, by acont<strong>in</strong>uity argument, they are able to rule out the<br />
possibility that the moment ishuge. So it must be small. The nal result is<br />
0<br />
exp const hNi 1+ (8.49)<br />
slightly better than the Lawler-Sokal bound. 41<br />
For sp<strong>in</strong> models <strong>and</strong> lattice eld theories, almost noth<strong>in</strong>g isknown about upper<br />
bounds on the autocorrelation time, except at hightemperature. For the s<strong>in</strong>gle-site<br />
heat-bath dynamics, it is easy to show that exp < 1 (uniformly <strong>in</strong> the volume) above<br />
the Dobrush<strong>in</strong> uniqueness temperature <strong>in</strong>deed, this is precisely what the st<strong>and</strong>ard proof<br />
of the Dobrush<strong>in</strong> uniqueness theorem [94, 95] does. One expects the same result tohold<br />
for all temperatures above critical, but this rema<strong>in</strong>s an open problem, despite recent<br />
progress by Aizenman <strong>and</strong> Holley [96]. 42<br />
Acknowledgments<br />
Many ofthe ideas reported here have grown out ofjo<strong>in</strong>twork withmycolleagues<br />
Alberto Berretti, Frank Brown, Sergio Caracciolo, Robert Edwards, Jonathan Goodman,<br />
Tony Guttmann, Greg Lawler, Xiao-Jian Li, Neal Madras, Andrea Pelissetto, Larry<br />
Thomas <strong>and</strong> Dan Zwanziger. I thank them for manypleasant <strong>and</strong> fruitful collaborations.<br />
In addition,Iwould like tothank Mal Kalos for <strong>in</strong>troduc<strong>in</strong>g meto<strong>Monte</strong> <strong>Carlo</strong> work<br />
<strong>and</strong> for patiently answer<strong>in</strong>g my rst naive questions.<br />
The research ofthe author was supported <strong>in</strong> part by the U.S.National Science Foundation<br />
grants DMS{8705599 <strong>and</strong> DMS{8911273, the U.S. Department ofEnergy contract<br />
DE-FG02-90ER40581, the John Von Neumann National Supercomputer Center y<br />
41 Note Added 1996: See also [137], which recovers the bound (8.49) by amuch simpler argument<br />
us<strong>in</strong>g the Po<strong>in</strong>care <strong>in</strong>equality [131].<br />
42 Note Added 1996: There has been great progress on this problem <strong>in</strong> recent years: see [138, 139,<br />
140, 141, 142, 143] <strong>and</strong> references cited there<strong>in</strong>.<br />
y Deceased.<br />
67
grant NAC{705, <strong>and</strong> the Pittsburgh Supercomput<strong>in</strong>g Center grant PHY890035P. Acknowledgment<br />
is also madetothe donors of thePetroleumResearchFund, adm<strong>in</strong>istered<br />
by the American Chemical Society, for partial support of this research.<br />
References<br />
[1] N. Bakhvalov, Methodes Numeriques (MIR, Moscou, 1976), Chapter 5.<br />
[2] J.M. Hammersley <strong>and</strong>D.C.H<strong>and</strong>scomb, <strong>Monte</strong> <strong>Carlo</strong> <strong>Methods</strong> (Methuen, London,<br />
1964), Chapter 5.<br />
[3] W.W. Wood <strong>and</strong> J.J. Erpenbeck, Ann. Rev. Phys. Chem. 27, 319 (1976).<br />
[4] J. G. Kemeny<strong>and</strong>J.L.Snell, F<strong>in</strong>ite Markov Cha<strong>in</strong>s (Spr<strong>in</strong>ger, New York, 1976).<br />
[5] M. Iosifescu, F<strong>in</strong>ite Markov Processes <strong>and</strong> Their Applications (Wiley, Chichester,<br />
1980).<br />
[6] K. L. Chung, Markov Cha<strong>in</strong>s with Stationary Transition Probabilities, 2 nd ed.<br />
(Spr<strong>in</strong>ger, New York, 1967).<br />
[7] E. Nummel<strong>in</strong>, General Irreducible Markov Cha<strong>in</strong>s <strong>and</strong> Non-Negative Operators<br />
(Cambridge Univ. Press, Cambridge, 1984).<br />
[8] D. Revuz, Markov Cha<strong>in</strong>s (North-Holl<strong>and</strong>, Amsterdam, 1975).<br />
[9] Z. Sidak, Czechoslovak Math. J. 14, 438 (1964).<br />
[10] A. D. Sokal <strong>and</strong> L.E.Thomas, J. Stat. Phys. 54, 797 (1989).<br />
[11] P.C. Hohenberg <strong>and</strong> B.I. Halper<strong>in</strong>, Rev. Mod. Phys. 49, 435 (1977).<br />
[12] S. Caracciolo <strong>and</strong> A.D. Sokal, J. Phys. A19, L797 (1986).<br />
[13] D. Goldsman, Ph.D. thesis, School ofOperations Research <strong>and</strong>Industrial Eng<strong>in</strong>eer<strong>in</strong>g,<br />
Cornell University (1984) L. Schruben, Oper. Res. 30, 569 (1982) <strong>and</strong><br />
31, 1090 (1983).<br />
[14] M.B. Priestley, Spectral Analysis <strong>and</strong> Time Series, 2vols. (Academic, London,<br />
1981), Chapters 5-7.<br />
[15] T.W. Anderson, The <strong>Statistical</strong> Analysis of Time Series (Wiley,NewYork, 1971).<br />
[16] N. Madras <strong>and</strong> A.D. Sokal, J. Stat. Phys. 50, 109 (1988).<br />
68
[17] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller <strong>and</strong> E.Teller, J.<br />
Chem. Phys. 21, 1087 (1953).<br />
[18] W.K. Hast<strong>in</strong>gs, Biometrika 57, 97 (1970).<br />
[19] J. Goodman <strong>and</strong> N. Madras, L<strong>in</strong>. Alg. Appl. 216, 61 (1995).<br />
[20] J. Goodman <strong>and</strong> A.D. Sokal, Phys. Rev. D40, 2035 (1989).<br />
[21] G.F. Mazenko <strong>and</strong>O.T.Valls, Phys. Rev. B24, 1419 (1981) C. Kalle, J. Phys.<br />
A17, L801 (1984) J.K. Williams, J. Phys. A18, 49 (1985) R.B. Pearson, J.L.<br />
Richardson <strong>and</strong> D.Toussa<strong>in</strong>t, Phys. Rev. B31, 4472 (1985) S. Wansleben <strong>and</strong><br />
D.P. L<strong>and</strong>au, J. Appl. Phys. 61, 3968 (1987).<br />
[22] M.N. Barber, <strong>in</strong> Phase Transitions <strong>and</strong> Critical Phenomena, vol. 8, ed. C. Domb<br />
<strong>and</strong> J.L. Lebowitz (Academic Press, London, 1983).<br />
[23] K. B<strong>in</strong>der, J. Comput. Phys. 59, 1 (1985).<br />
[24] G. Parisi, <strong>in</strong> Progress <strong>in</strong> Gauge Field Theory (1983 Cargese lectures), ed. G. 't<br />
Hooft et al. (Plenum, New York, 1984) G.G. Batrouni, G.R. Katz, A.S. Kronfeld,<br />
G.P. Lepage, B. Svetitsky <strong>and</strong> K.G. Wilson, Phys. Rev. D32, 2736 (1985) C.<br />
Davies, G. Batrouni,G.Katz, A. Kronfeld, P. Lepage, P. Rossi, B. Svetitsky <strong>and</strong><br />
K. Wilson, J. Stat. Phys. 43, 1073 (1986) J.B. Kogut, Nucl. Phys. B275 [FS17],<br />
1 (1986) S. Duane, R. Kenway,B.J.Pendleton <strong>and</strong> D.Roweth, Phys. Lett. 176B,<br />
143 (1986) E. Dagotto <strong>and</strong> J.B. Kogut, Phys. Rev. Lett. 58, 299 (1987) <strong>and</strong> Nucl.<br />
Phys. B290 [FS20], 451 (1987).<br />
[25] J. Goodman <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 56, 1015 (1986).<br />
[26] R.G. Edwards, J. Goodman <strong>and</strong> A.D. Sokal, Nucl. Phys. B354, 289 (1991).<br />
[27] R.H. Swendsen <strong>and</strong> J.-S. Wang, Phys. Rev. Lett. 58, 86 (1987).<br />
[28] R.P. Fedorenko, Zh. Vychisl. i Mat. Fiz. 4, 559 (1964) [USSR Comput. Math. <strong>and</strong><br />
Math. Phys. 4, 227 (1964)].<br />
[29] L.P. Kadano , Physics 2, 263 (1966).<br />
[30] K.G. Wilson, Phys. Rev. B4, 3174, 3184 (1971).<br />
[31] A. Br<strong>and</strong>t, <strong>in</strong> Proceed<strong>in</strong>gs of the Third International Conference on Numerical<br />
<strong>Methods</strong> <strong>in</strong> Fluid <strong>Mechanics</strong> (Paris, July 1972), ed. H. Cabannes <strong>and</strong> R.Temam<br />
(Lecture Notes <strong>in</strong> Physics #18) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1973) R.A. Nicolaides, J. Comput.<br />
Phys. 19, 418 (1975) <strong>and</strong> Math. Comp. 31, 892 (1977) A. Br<strong>and</strong>t, Math.<br />
Comp. 31, 333 (1977) W. Hackbusch, <strong>in</strong> Numerical Treatment of Di erential<br />
Equations (Oberwolfach, July 1976), ed. R. Bulirsch, R.D. Griegorie <strong>and</strong> J.<br />
Schroder (Lecture Notes <strong>in</strong> Mathematics #631) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1978).<br />
69
[32] W. Hackbusch, Multi-Grid <strong>Methods</strong> <strong>and</strong> Applications (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1985).<br />
[33] W. Hackbusch <strong>and</strong>U.Trottenberg, editors, Multigrid <strong>Methods</strong> (Lecture Notes<br />
<strong>in</strong> Mathematics #960) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1982) Proceed<strong>in</strong>gs of the First Copper<br />
Mounta<strong>in</strong> Conference on Multigrid <strong>Methods</strong>, ed. S. McCormick <strong>and</strong>U.Trottenberg,<br />
Appl. Math. Comput. 13, no. 3-4, 213-470 (1983) D.J. Paddon <strong>and</strong> H.Holste<strong>in</strong>,<br />
editors, Multigrid <strong>Methods</strong> for Integral <strong>and</strong> Di erential Equations (Clarendon<br />
Press, Oxford, 1985) Proceed<strong>in</strong>gs of theSecondCopper Mounta<strong>in</strong> Conference<br />
on Multigrid <strong>Methods</strong>, ed. S. McCormick, Appl. Math. Comput. 19, no. 1-4, 1-372<br />
(1986) W. Hackbusch <strong>and</strong>U.Trottenberg, editors, Multigrid <strong>Methods</strong> II (Lecture<br />
Notes <strong>in</strong> Mathematics #1228) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1986) S.F. McCormick, editor,<br />
Multigrid <strong>Methods</strong> (SIAM, Philadelphia, 1987) Proceed<strong>in</strong>gs of the Third Copper<br />
Mounta<strong>in</strong> Conference on Multigrid <strong>Methods</strong>, ed. S. McCormick (Dekker, New<br />
York, 1988).<br />
[34] W.L. Briggs, A Multigrid Tutorial (SIAM, Philadelphia, 1987).<br />
[35] R.S. Varga, Matrix Iterative Analysis (Prentice-Hall, Englewood Cli s, N.J.,<br />
1962).<br />
[36] H.R. Schwarz, H. Rutishauer <strong>and</strong> E.Stiefel, Numerical Analysis of Symmetric<br />
Matrices (Prentice-Hall, Englewood Cli s, N.J., 1973).<br />
[37] A.M. Ostrowski, RendicontiMat. e Appl. 13, 140 (1954) at p. 141n [A. Ostrowski,<br />
Collected Mathematical Papers, vol. 1 (Birkhauser, Basel, 1983), p. 169n].<br />
[38] J.M. Ortega <strong>and</strong> W.C.Rhe<strong>in</strong>boldt, Iterative Solution of Nonl<strong>in</strong>ear Equations <strong>in</strong><br />
Several Variables (Academic Press, New York{London, 1970).<br />
[39] S.F. McCormick <strong>and</strong>J.Ruge, Math. Comp. 41, 43 (1983).<br />
[40] J. M<strong>and</strong>el, S. McCormick<strong>and</strong> R. Bank, <strong>in</strong> Multigrid <strong>Methods</strong>, ed. S.F. McCormick<br />
(SIAM, Philadelphia, 1987), Chapter 5.<br />
[41] J. Goodman <strong>and</strong> A.D. Sokal, unpublished.<br />
[42] R.G. Edwards, S.J. Ferreira, J. Goodman <strong>and</strong> A.D. Sokal, Nucl. Phys. B380, 621<br />
(1992).<br />
[43] S.L. Adler, Phys. Rev. Lett. 60, 1243 (1988).<br />
[44] S.L. Adler, Phys. Rev. D23, 2091 (1981).<br />
[45] C. Whitmer, Phys. Rev. D29, 306 (1984).<br />
[46] E. Nelson, <strong>in</strong> Constructive Quantum Field Theory, ed. G. Velo <strong>and</strong> A. Wightman<br />
(Lecture Notes <strong>in</strong> Physics #25) (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1973).<br />
70
[47] B. Simon, The P (')2 Euclidean (Quantum) Field Theory (Pr<strong>in</strong>ceton Univ. Press,<br />
Pr<strong>in</strong>ceton, N.J., 1974), Chapter I.<br />
[48] R.B. Potts, Proc. Camb. Phil. Soc. 48, 106 (1952).<br />
[49] F.Y. Wu, Rev. Mod. Phys. 54, 235 (1982) 55, 315 (E) (1983).<br />
[50] K. Krickeberg, Probability Theory (Addison-Wesley, Read<strong>in</strong>g, Mass., 1965), Sections<br />
4.2 <strong>and</strong> 4.5.<br />
[51] R.H. Schonmann, J. Stat. Phys. 52, 61 (1988).<br />
[52] A.D. Sokal, unpublished.<br />
[53] E.M. Re<strong>in</strong>gold, J. Nievergelt <strong>and</strong> N. Deo, Comb<strong>in</strong>atorial Algorithms: Theory <strong>and</strong><br />
Practice (Prentice-Hall, Englewood Cli s, N.J., 1977), Chapter 8 A. Gibbons,<br />
Algorithmic Graph Theory (Cambridge University Press, 1985), Chapter 1 S.<br />
Even, Graph Algorithms (Computer Science Press, Potomac, Maryl<strong>and</strong>, 1979),<br />
Chapter 3.<br />
[54] B.A. Galler <strong>and</strong> M.J. Fischer, Commun. ACM 7, 301, 506 (1964) D.E. Knuth,<br />
The Art of Computer Programm<strong>in</strong>g, vol. 1, 2 nd ed., (Addison-Wesley, Read<strong>in</strong>g,<br />
Massachusetts, 1973), pp. 353{355, 360, 572 J. Hoshen <strong>and</strong> R.Kopelman, Phys.<br />
Rev. B14, 3438 (1976).<br />
[55] K. B<strong>in</strong>der <strong>and</strong>D.Stau er, <strong>in</strong> Applications of the <strong>Monte</strong> <strong>Carlo</strong> Method <strong>in</strong> <strong>Statistical</strong><br />
Physics, ed. K. B<strong>in</strong>der (Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1984), section 8.4 D. Stau er,<br />
Introduction to Percolation Theory (Taylor & Francis, London, 1985).<br />
[56] Y. Shiloach <strong>and</strong> U. Vishk<strong>in</strong>, J. Algorithms 3, 57 (1982).<br />
[57] R.G. Edwards, X.-J. Li <strong>and</strong> A.D. Sokal, Sequential <strong>and</strong> vectorized algorithms for<br />
comput<strong>in</strong>g the connected components ofanundirected graph, unpublished (<strong>and</strong><br />
uncompleted) notes.<br />
[58] P.W. Kasteleyn <strong>and</strong> C.M. Fortu<strong>in</strong>, J. Phys. Soc. Japan 26 (Suppl.), 11 (1969)<br />
C.M. Fortu<strong>in</strong> <strong>and</strong> P.W. Kasteleyn, Physica 57, 536 (1972) C.M. Fortu<strong>in</strong>, Physica<br />
58, 393 (1972) C.M. Fortu<strong>in</strong>, Physica 59, 545 (1972).<br />
[59] C.-K. Hu, Phys. Rev. B29, 5103 (1984) T.A. Larsson, J. Phys. A19, 2383 (1986)<br />
<strong>and</strong> A20, 2239 (1987).<br />
[60] R.G. Edwards <strong>and</strong> A.D. Sokal, Phys. Rev. D38, 2009 (1988).<br />
[61] A.D. Sokal, <strong>in</strong> Computer Simulation Studies <strong>in</strong> Condensed Matter Physics: Recent<br />
Developments (Athens, Georgia, 15{26 February 1988), ed. D.P. L<strong>and</strong>au, K.K.<br />
Mon <strong>and</strong> H.-B. Schuttler (Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>{Heidelberg, 1988).<br />
71
[62] X.-J. Li <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 63, 827 (1989).<br />
[63] W. Kle<strong>in</strong>, T. Ray <strong>and</strong> P.Tamayo, Phys. Rev. Lett. 62, 163 (1989).<br />
[64] T.S. Ray, P.Tamayo <strong>and</strong>W.Kle<strong>in</strong>,Phys. Rev. A39, 5949 (1989).<br />
[65] U. Wol , Phys. Rev. Lett. 62, 361 (1989).<br />
[66] U. Wol , Phys. Lett. B228, 379 (1989) P. Tamayo, R.C. Brower <strong>and</strong> W. Kle<strong>in</strong>,<br />
J. Stat. Phys. 58, 1083 (1990) <strong>and</strong> 60, 889 (1990).<br />
[67] D. K<strong>and</strong>el, E. Domany, D. Ron, A. Br<strong>and</strong>t <strong>and</strong>E.Loh,Phys. Rev. Lett. 60, 1591<br />
(1988) D. K<strong>and</strong>el, E. Domany <strong>and</strong>A.Br<strong>and</strong>t, Phys. Rev. B40, 330 (1989).<br />
[68] R.G. Edwards <strong>and</strong> A.D. Sokal, Swendsen-Wang algorithm augmented by duality<br />
transformations for the two-dimensional Potts model, <strong>in</strong> preparation.<br />
[69] R. Ben-Av, D. K<strong>and</strong>el, E. Katznelson, P.G. Lauwers <strong>and</strong> S.Solomon, J. Stat.<br />
Phys. 58, 125 (1990) R.C. Brower <strong>and</strong> S.Huang, Phys. Rev. B41, 708 (1990).<br />
[70] F. Niedermayer, Phys. Rev. Lett. 61, 2026 (1988).<br />
[71] R.C. Brower <strong>and</strong> P.Tamayo, Phys. Rev. Lett. 62, 1087 (1989).<br />
[72] U. Wol , Nucl. Phys. B322, 759 (1989) Phys. Lett. B222, 473 (1989) Nucl.<br />
Phys. B334, 581 (1990).<br />
[73] R.G. Edwards <strong>and</strong> A.D. Sokal, Phys. Rev. D40, 1374 (1989).<br />
[74] P.G. DeGennes, Scal<strong>in</strong>g Concepts <strong>in</strong> Polymer Physics (Cornell Univ. Press, Ithaca<br />
NY, 1979).<br />
[75] C. Arag~ao de Carvalho, S. Caracciolo <strong>and</strong> J.Frohlich, Nucl. Phys. B215 [FS7],<br />
209 (1983).<br />
[76] A. Berretti <strong>and</strong> A.D. Sokal, J. Stat. Phys. 40, 483 (1985).<br />
[77] G. Lawler <strong>and</strong> A.D. Sokal, Trans. Amer. Math. Soc. 309, 557 (1988).<br />
[78] G. Lawler <strong>and</strong> A.D. Sokal, <strong>in</strong> preparation.<br />
[79] B. Berg <strong>and</strong> D.Foerster, Phys. Lett. 106B, 323 (1981).<br />
[80] C. Arag~ao de Carvalho <strong>and</strong> S. Caracciolo, J. Physique 44, 323 (1983).<br />
[81] A. D. Sokal <strong>and</strong> L.E.Thomas, J. Stat. Phys. 51, 907 (1988).<br />
[82] S. Caracciolo, A. Pelissetto <strong>and</strong> A.D. Sokal, J. Stat. Phys. 60, 1 (1990).<br />
72
[83] M. Delbruck, <strong>in</strong> Mathematical Problems <strong>in</strong> the Biological Sciences (Proc. Symp.<br />
Appl. Math., vol. 14), ed. R.E. Bellman (American Math. Soc., Providence RI,<br />
1962).<br />
[84] P.H. Verdier <strong>and</strong> W.H.Stockmayer, J. Chem. Phys. 36, 227 (1962).<br />
[85] N. Madras <strong>and</strong> A.D. Sokal, J. Stat. Phys. 47, 573 (1987).<br />
[86] M. Lal, Molec. Phys. 17, 57 (1969).<br />
[87] B. MacDonald et al., J.Phys. A18, 2627 (1985).<br />
[88] K. Kawasaki, Phys. Rev. 148, 375 (1966).<br />
[89] B.I. Halper<strong>in</strong>, Phys. Rev. B8, 4437 (1973).<br />
[90] R. Alicki, M. Fannes <strong>and</strong> A.Verbeure, J. Stat. Phys. 41, 263 (1985).<br />
[91] M.P.M. den Nijs, J. Phys. A12, 1857 (1979) J.L. Black <strong>and</strong>V.J.Emery, Phys.<br />
Rev. B23, 429 (1981) B. Nienhuis, J. Stat. Phys. 34, 731 (1984).<br />
[92] A. S<strong>in</strong>clair <strong>and</strong> M. Jerrum, Information <strong>and</strong> Computation 82, 93 (1989).<br />
[93] R. Brooks, P. Doyle <strong>and</strong> R. Kohn, <strong>in</strong> preparation.<br />
[94] L.N. Vasershte<strong>in</strong>, Prob. Inform. Transm. 5, 64 (1969).<br />
[95] O.E. Lanford III, <strong>in</strong> <strong>Statistical</strong> <strong>Mechanics</strong> <strong>and</strong> Mathematical Problems (Lecture<br />
Notes <strong>in</strong> Physics #20), ed. A. Lenard (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1973).<br />
[96] M. Aizenman <strong>and</strong> R.Holley, <strong>in</strong>Percolation Theory <strong>and</strong> Ergodic Theory of In nite<br />
Particle Systems, ed.H.Kesten (Spr<strong>in</strong>ger, New York, 1987).<br />
ADDED BIBLIOGRAPHY (DECEMBER 1996):<br />
[97] M. Luscher, P. Weisz <strong>and</strong> U.Wol , Nucl. Phys. B359, 221 (1991).<br />
[98] J.-K. Kim, Phys. Rev. Lett. 70, 1735 (1993) Nucl. Phys. B (Proc. Suppl.) 34,<br />
702 (1994) Phys. Rev. D50, 4663 (1994) Europhys. Lett. 28, 211 (1994) Phys.<br />
Lett. B345, 469 (1995).<br />
[99] S. Caracciolo, R.G. Edwards, S.J. Ferreira, A. Pelissetto <strong>and</strong> A.D. Sokal, Phys.<br />
Rev. Lett. 74, 2969 (1995).<br />
[100] S.J. Ferreira <strong>and</strong> A.D. Sokal, Phys. Rev. B51, 6720 (1995).<br />
[101] S. Caracciolo, R.G. Edwards, A. Pelissetto <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 75,<br />
1891 (1995).<br />
73
[102] G. Mana, A. Pelissetto <strong>and</strong> A.D. Sokal, Phys. Rev. D54, R1252 (1996).<br />
[103] G. Mana, A. Pelissetto <strong>and</strong> A.D. Sokal, hep-lat/9610021, submitted to Phys.<br />
Rev. D.<br />
[104] C.-R. Hwang <strong>and</strong> S.-J. Sheu, <strong>in</strong> Stochastic Models, <strong>Statistical</strong> <strong>Methods</strong>, <strong>and</strong> Algorithms<br />
<strong>in</strong> Image Analysis (Lecture Notes <strong>in</strong> Statistics #74), ed. P. Barone,<br />
A. Frigessi <strong>and</strong> M. Piccioni (Spr<strong>in</strong>ger, Berl<strong>in</strong>, 1992).<br />
[105] B. Gidas, private communication (1991).<br />
[106] T. Mendes <strong>and</strong> A.D. Sokal, Phys. Rev. D53, 3438 (1996).<br />
[107] G. Mana, T. Mendes, A. Pelissetto <strong>and</strong> A.D. Sokal, Nucl. Phys. B (Proc. Suppl.)<br />
47, 796 (1996).<br />
[108] T. Mendes <strong>and</strong> A.Pelissetto <strong>and</strong> A.D. Sokal, Nucl. Phys. B477, 203 (1996).<br />
[109] A.D. Sokal, Nucl. Phys. B (Proc. Suppl.) 20, 55 (1991).<br />
[110] J.-S. Wang <strong>and</strong> R.H. Swendsen, Physica A167, 565 (1990).<br />
[111] C.F. Baillie <strong>and</strong> P.D. Codd<strong>in</strong>gton, Phys. Rev. Lett. 68, 962 (1992) <strong>and</strong> private<br />
communication.<br />
[112] J. Salas <strong>and</strong> A.D. Sokal, J. Stat. Phys. 85, 297 (1996).<br />
[113] J. Salas <strong>and</strong> A.D. Sokal, Dynamic critical behavior of the Swendsen{Wang algorithm:<br />
The two-dimensional 3-state Potts model revisited, hep-lat/9605018, J.<br />
Stat. Phys. 87, no. 1/2 (April 1997).<br />
[114] J. Salas <strong>and</strong> A.D. Sokal, Logarithmic corrections <strong>and</strong> nite-size scal<strong>in</strong>g <strong>in</strong>the<br />
two-dimensional 4-State Potts model, hep-lat/9607030, submitted to J.Stat.<br />
Phys.<br />
[115] G.M. Torrie <strong>and</strong> J.P.Valleau, Chem. Phys. Lett. 28, 578 (1974) J. Comput. Phys.<br />
23, 187 (1977) J. Chem. Phys. 66, 1402 (1977).<br />
[116] B.A. Berg <strong>and</strong> T. Neuhaus, Phys. Lett. B267, 249 (1991) Phys. Rev. Lett. 68, 9<br />
(1992).<br />
[117] B.A. Berg, Int. J. Mod. Phys. C4, 249 (1993).<br />
[118] A.D. Kennedy, Nucl. Phys. B (Proc. Suppl.) 30, 96 (1993).<br />
[119] W. Janke, <strong>in</strong> Computer Simulations <strong>in</strong> Condensed Matter Physics VII , eds. D.P.<br />
L<strong>and</strong>au, K.K. Mon <strong>and</strong> H.-B. Schuttler (Spr<strong>in</strong>ger-Verlag, Berl<strong>in</strong>, 1994).<br />
[120] S. Wiseman <strong>and</strong> E.Domany, Phys. Rev. E 48, 4080 (1993).<br />
74
[121] D. K<strong>and</strong>el, R. Ben-Av <strong>and</strong> E.Domany, Phys. Rev. Lett. 65, 941 (1990) <strong>and</strong> Phys.<br />
Rev. B45, 4700 (1992).<br />
[122] J.-S. Wang, R.H. Swendsen <strong>and</strong> R.Kotecky, Phys. Rev. Lett. 63, 109 (1989).<br />
[123] J.-S. Wang, R.H. Swendsen <strong>and</strong> R.Kotecky, Phys. Rev. B42, 2465 (1990).<br />
[124] M. Lub<strong>in</strong> <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 71, 1778 (1993).<br />
[125] X.-J. Li <strong>and</strong> A.D. Sokal, Phys. Rev. Lett. 67, 1482 (1991).<br />
[126] S. Caracciolo, R.G. Edwards, A. Pelissetto <strong>and</strong> A.D. Sokal, Nucl. Phys. B403,<br />
475 (1993).<br />
[127] A.D. Sokal, <strong>in</strong> <strong>Monte</strong> <strong>Carlo</strong> <strong>and</strong> Molecular Dynamics Simulations <strong>in</strong> Polymer<br />
Science, ed. K. B<strong>in</strong>der (Oxford University Press, New York, 1995).<br />
[128] A.D. Sokal, Nucl. Phys. B (Proc. Suppl.) 47, 172 (1996).<br />
[129] B. Li, N. Madras <strong>and</strong> A.D. Sokal, J. Stat. Phys. 80, 661 (1995).<br />
[130] A.D. Sokal, Europhys. Lett. 27, 661 (1994).<br />
[131] P. Diaconis <strong>and</strong> D. Stroock, Ann. Appl. Prob. 1, 36 (1991).<br />
[132] A.J. S<strong>in</strong>clair, R<strong>and</strong>omisedAlgorithms for Count<strong>in</strong>g <strong>and</strong> Generat<strong>in</strong>g Comb<strong>in</strong>atorial<br />
Structures (Birkhauser, Boston, 1993).<br />
[133] P. Diaconis <strong>and</strong> L. Salo -Coste, Ann. Appl. Prob. 3, 696 (1993).<br />
[134] P. Diaconis <strong>and</strong> L. Salo -Coste, Ann. Appl. Prob. 6, 695 (1996).<br />
[135] P. Diaconis <strong>and</strong> L. Salo -Coste, J. Theoret. Prob. 9, 459 (1996).<br />
[136] M. Jerrum <strong>and</strong> A. S<strong>in</strong>clair, <strong>in</strong> Approximation Algorithms for NP-Hard Problems,<br />
ed. D.S. Hochbaum (PWS Publish<strong>in</strong>g, Boston, 1996).<br />
[137] D. R<strong>and</strong>all <strong>and</strong> A.J. S<strong>in</strong>clair, <strong>in</strong> Proceed<strong>in</strong>gs of the 5th Annual ACM-SIAM Symposium<br />
on Discrete Algorithms (ACM Press, New York, 1994).<br />
[138] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, J. Funct. Anal. 104, 299 (1992).<br />
[139] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, Commun. Math. Phys. 144, 303 (1992).<br />
[140] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, Commun. Math. Phys. 149, 175 (1992).<br />
[141] S.L. Lu <strong>and</strong> H.T.Yau, Commun. Math. Phys. 156, 399 (1993).<br />
[142] F. Mart<strong>in</strong>elli <strong>and</strong> E. Olivieri, Commun. Math. Phys. 161, 447, 487 (1994).<br />
[143] D.W. Stroock <strong>and</strong> B. Zegarl<strong>in</strong>ski, J. Stat. Phys. 81, 1007 (1995).<br />
75