14.07.2013 Views

A Course on Large Deviations with an Introduction to Gibbs Measures.

A Course on Large Deviations with an Introduction to Gibbs Measures.

A Course on Large Deviations with an Introduction to Gibbs Measures.

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A <str<strong>on</strong>g>Course</str<strong>on</strong>g> <strong>on</strong> <strong>Large</strong> Deviati<strong>on</strong>s <strong>with</strong> <strong>an</strong><br />

Introducti<strong>on</strong> <strong>to</strong> <strong>Gibbs</strong> <strong>Measures</strong> 1<br />

Firas Rassoul-Agha<br />

Timo Seppäläinen<br />

Department of Mathematics, University of Utah, 155 South<br />

1400 East, Salt Lake City, UT 84112, USA<br />

E-mail address: firas@math.utah.edu<br />

Mathematics Department, University of Wisc<strong>on</strong>sin-Madis<strong>on</strong>,<br />

419 V<strong>an</strong> Vleck Hall, Madis<strong>on</strong>, WI 53706, USA<br />

E-mail address: seppalai@math.wisc.edu<br />

c○Copyright 2010 Firas Rassoul-Agha <strong>an</strong>d Timo Seppäläinen<br />

1 Last updated: April 24, 2011, 20:54 MDT


2000 Mathematics Subject Classificati<strong>on</strong>. Primary 60F10, 82B20<br />

Key words <strong>an</strong>d phrases. large deviati<strong>on</strong>s, statistical mech<strong>an</strong>ics, <strong>Gibbs</strong><br />

measures, relative entropy, variati<strong>on</strong>al principle, Ising model, phase<br />

tr<strong>an</strong>siti<strong>on</strong>


To Alla, Maxim, <strong>an</strong>d Kirill<br />

To Celeste, David, Ansa, <strong>an</strong>d Timo


C<strong>on</strong>tents<br />

Preface (<strong>an</strong>d <strong>to</strong> the reviewers) xi<br />

Part I. <strong>Large</strong> Deviati<strong>on</strong>s: general theory <strong>an</strong>d i.i.d. processes<br />

Chapter 1. Introducti<strong>on</strong> 3<br />

§1.1. Informati<strong>on</strong>-theoretic entropy 5<br />

§1.2. Thermodynamic entropy 6<br />

Chapter 2. Preliminary examples <strong>an</strong>d generalities 11<br />

§2.1. Informal large deviati<strong>on</strong>s 11<br />

§2.2. Formal large deviati<strong>on</strong>s 12<br />

§2.3. Lower semic<strong>on</strong>tinuity <strong>an</strong>d uniqueness 15<br />

Chapter 3. More generalities <strong>an</strong>d Cramér’s theorem 19<br />

§3.1. Weak large deviati<strong>on</strong> principles 19<br />

§3.2. Cramér’s theorem 21<br />

§3.3. Limits, deviati<strong>on</strong>s, <strong>an</strong>d fluctuati<strong>on</strong>s 27<br />

Chapter 4. Yet some more generalities 29<br />

§4.1. C<strong>on</strong>tracti<strong>on</strong> principle 29<br />

§4.2. Varadh<strong>an</strong>’s theorem <strong>an</strong>d Bryc’s theorem 31<br />

§4.3. Curie-Weiss model for ferromagnetism 37<br />

Chapter 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory 41<br />

§5.1. Some elementary c<strong>on</strong>vex <strong>an</strong>alysis 41<br />

§5.2. Rate functi<strong>on</strong> as a c<strong>on</strong>vex c<strong>on</strong>jugate 49<br />

vii


viii C<strong>on</strong>tents<br />

§5.3. Multidimensi<strong>on</strong>al Cramér theorem revisited 51<br />

Chapter 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical<br />

measures 57<br />

§6.1. Relative entropy 58<br />

§6.2. S<strong>an</strong>ov’s theorem 62<br />

§6.3. Maximum entropy principle 67<br />

Chapter 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level 71<br />

§7.1. Setting 71<br />

§7.2. Specific relative entropy 73<br />

§7.3. Pressure <strong>an</strong>d the large deviati<strong>on</strong> principle 78<br />

Part II. Statistical Mech<strong>an</strong>ics<br />

Chapter 8. Formalism for classical lattice systems 87<br />

§8.1. Finite volume model 87<br />

§8.2. Potentials <strong>an</strong>d Hamilt<strong>on</strong>i<strong>an</strong>s 88<br />

§8.3. Specificati<strong>on</strong>s 90<br />

§8.4. <strong>Gibbs</strong> specificati<strong>on</strong>s <strong>an</strong>d phase tr<strong>an</strong>siti<strong>on</strong> 93<br />

§8.5. Observables 96<br />

Chapter 9. <strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d equilibrium statistical mech<strong>an</strong>ics 99<br />

§9.1. Replacing Hamilt<strong>on</strong>i<strong>an</strong>s <strong>with</strong> averages 99<br />

§9.2. Thermodynamic limit of the pressure 101<br />

§9.3. Specific relative entropy 102<br />

§9.4. <strong>Large</strong> deviati<strong>on</strong>s under <strong>Gibbs</strong> kernels 104<br />

§9.5. Dobrushin-L<strong>an</strong>ford-Ruelle (DLR) variati<strong>on</strong>al principle 105<br />

Chapter 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model 111<br />

§10.1. One-dimensi<strong>on</strong>al Ising model 113<br />

§10.2. Phase tr<strong>an</strong>siti<strong>on</strong> at low temperature 115<br />

§10.3. Uniqueness of phase at high temperature 119<br />

§10.4. Case of no external field 121<br />

§10.5. Case of n<strong>on</strong>zero external field 124<br />

Chapter 11. Percolati<strong>on</strong> approach <strong>to</strong> phase tr<strong>an</strong>siti<strong>on</strong> 129<br />

Part III. Further large deviati<strong>on</strong>s <strong>to</strong>pics<br />

Chapter 12. Further asymp<strong>to</strong>tics for i.i.d. r<strong>an</strong>dom variables 133


C<strong>on</strong>tents ix<br />

§12.1. Refinement of Cramér’s theorem 133<br />

§12.2. Moderate deviati<strong>on</strong>s 136<br />

Chapter 13. <strong>Large</strong> deviati<strong>on</strong>s for Markov chains 141<br />

§13.1. Restricting entropies <strong>on</strong> product spaces 141<br />

§13.2. <strong>Large</strong> deviati<strong>on</strong>s 144<br />

Chapter 14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s 145<br />

Chapter 15. N<strong>on</strong>stati<strong>on</strong>ary independent variables 153<br />

§15.1. Generalizati<strong>on</strong> of relative entropy <strong>an</strong>d S<strong>an</strong>ov’s theorem 153<br />

§15.2. Proof of the large deviati<strong>on</strong> principle 155<br />

Appendixes<br />

Appendix A. Topics from probability 167<br />

§A.1. Weak c<strong>on</strong>vergence of probability measures 167<br />

§A.2. Ergodic theorem 170<br />

§A.3. S<strong>to</strong>chastic ordering 175<br />

Appendix B. Topics from <strong>an</strong>alysis 177<br />

§B.1. Measure-theoretic lemma 177<br />

§B.2. Minimax theorem 178<br />

Appendix C. Inequalities 183<br />

§C.1. Holley’s inequality 183<br />

§C.2. Griffiths’ inequality 185<br />

§C.3. Griffiths-Hurst-Sherm<strong>an</strong> inequality 186<br />

Bibliography 189<br />

Notati<strong>on</strong> index 193<br />

Theorems, principles, <strong>an</strong>d models index 197<br />

Author index 199<br />

General index 201


Preface (<strong>an</strong>d <strong>to</strong> the reviewers)<br />

This book arose from courses <strong>on</strong> large deviati<strong>on</strong>s <strong>an</strong>d related <strong>to</strong>pics given by<br />

the authors at the Departments of Mathematics at the Ohio State University<br />

(1993), at the University of Wisc<strong>on</strong>sin-Madis<strong>on</strong> (2006), <strong>an</strong>d at the University<br />

of Utah (2008).<br />

Our goal has been <strong>to</strong> create <strong>an</strong> attractive <strong>an</strong>d exciting collecti<strong>on</strong> of material<br />

for a semester’s course. This has two implicati<strong>on</strong>s.<br />

(1) First, we have not aimed at <strong>an</strong>ything like <strong>an</strong> encyclopedic coverage<br />

of different techniques for proving large deviati<strong>on</strong> principles<br />

(LDPs). Instead, our treatment centers <strong>on</strong> <strong>on</strong>e classic line of reas<strong>on</strong>ing:<br />

(i) upper bound by <strong>an</strong> exp<strong>on</strong>ential Chebyshev inequality,<br />

(ii) lower bound by a ch<strong>an</strong>ge of measure, <strong>an</strong>d (iii) <strong>an</strong> argument <strong>to</strong><br />

match the rates from the first two steps. Bey<strong>on</strong>d this technique<br />

we do cover Bryc’s theorem, the subadditive method, <strong>an</strong>d also <strong>an</strong><br />

approach based <strong>on</strong> the c<strong>on</strong>vexity of a local rate functi<strong>on</strong> due <strong>to</strong><br />

Baxter <strong>an</strong>d Jain.<br />

(2) Sec<strong>on</strong>d, we have not felt obligated <strong>to</strong> stay <strong>with</strong>in the boundaries<br />

of large deviati<strong>on</strong> theory but instead follow the trail of interesting<br />

material. <strong>Large</strong> deviati<strong>on</strong> theory is a natural gateway <strong>to</strong> statistical<br />

mech<strong>an</strong>ics. But we also d<strong>on</strong>’t hesitate <strong>to</strong> leave large deviati<strong>on</strong> theory<br />

completely behind when we study the phase tr<strong>an</strong>siti<strong>on</strong> of the<br />

Ising model.<br />

Here is a brief overview of the c<strong>on</strong>tents of the book.<br />

Part I covers core general large deviati<strong>on</strong> theory, the relev<strong>an</strong>t c<strong>on</strong>vex<br />

<strong>an</strong>alysis, <strong>an</strong>d the large deviati<strong>on</strong>s of i.i.d. processes <strong>on</strong> three levels: Cramér’s<br />

xi


xii Preface (<strong>an</strong>d <strong>to</strong> the reviewers)<br />

theorem, S<strong>an</strong>ov’s theorem, <strong>an</strong>d the process level LDP for i.i.d. variables<br />

indexed by a multidimensi<strong>on</strong>al square lattice.<br />

Part II introduces <strong>Gibbs</strong> measures <strong>an</strong>d proves the Dobrushin-L<strong>an</strong>ford-<br />

Ruelle variati<strong>on</strong>al principle that characterizes tr<strong>an</strong>slati<strong>on</strong>-invari<strong>an</strong>t <strong>Gibbs</strong><br />

measures in terms of the v<strong>an</strong>ishing of a large deviati<strong>on</strong> rate functi<strong>on</strong> <strong>an</strong>d<br />

in terms of a minimizati<strong>on</strong> problem. After this we proceed <strong>to</strong> study the<br />

phase tr<strong>an</strong>siti<strong>on</strong> of the Ising model. We also pl<strong>an</strong> <strong>to</strong> add a secti<strong>on</strong> <strong>on</strong> the<br />

Fortyin-Kasteleyn r<strong>an</strong>dom cluster model <strong>an</strong>d cover the modern percolati<strong>on</strong><br />

approach <strong>to</strong> phase tr<strong>an</strong>siti<strong>on</strong>s.<br />

Part III develops the large deviati<strong>on</strong> themes of Part I in several directi<strong>on</strong>s.<br />

<strong>Large</strong> deviati<strong>on</strong>s of i.i.d. variables are complemented <strong>with</strong> moderate<br />

deviati<strong>on</strong>s <strong>an</strong>d <strong>with</strong> more precise large deviati<strong>on</strong> asymp<strong>to</strong>tics. From i.i.d.<br />

processes we generalize <strong>to</strong> Markov chains <strong>an</strong>d <strong>to</strong> independent but n<strong>on</strong>stati<strong>on</strong>ary<br />

processes. The latter <strong>to</strong>pic gives us the opportunity <strong>to</strong> introduce<br />

<strong>an</strong> entirely different way of proving large deviati<strong>on</strong> principles, namely the<br />

Baxter-Jain theorem. This material has not previously appeared in textbooks.<br />

We are also pl<strong>an</strong>ning a secti<strong>on</strong> <strong>on</strong> large deviati<strong>on</strong>s for some flavor of<br />

r<strong>an</strong>dom walk in r<strong>an</strong>dom envir<strong>on</strong>ment.<br />

The reader should note that starred exercises are used in the text later<br />

<strong>on</strong>. Also, Dembo <strong>an</strong>d Zei<strong>to</strong>uni c<strong>on</strong>tain thorough his<strong>to</strong>rical notes <strong>an</strong>d references,<br />

hence we have not attempted <strong>to</strong> present the his<strong>to</strong>rical record.<br />

The ideal background for reading this book would be some familiarity<br />

<strong>with</strong> the l<strong>an</strong>guage of measure-theoretic probability, <strong>an</strong>d <strong>with</strong> <strong>an</strong>alysis<br />

in general so that the reader is comfortable working <strong>with</strong> noti<strong>on</strong>s such as<br />

lower semic<strong>on</strong>tinuity or <strong>with</strong> linear spaces. In reality our courses have been<br />

populated by students <strong>with</strong> diverse backgrounds, m<strong>an</strong>y <strong>with</strong> less th<strong>an</strong> ideal<br />

knowledge of <strong>an</strong>alysis <strong>an</strong>d probability. To make the material accessible <strong>to</strong><br />

these students we pl<strong>an</strong> <strong>to</strong> enh<strong>an</strong>ce the appendixes <strong>to</strong> include <strong>an</strong> overview<br />

of the main theorems of measure theory <strong>an</strong>d probability. (And <strong>on</strong>ce we do<br />

this the frequent references <strong>to</strong> basic probability textbooks that now interrupt<br />

the flow of the prose will disappear.) The actual technical needs for<br />

following the book are not deep <strong>an</strong>d it should be possible for <strong>an</strong> instruc<strong>to</strong>r<br />

<strong>to</strong> accommodate students <strong>with</strong> quick lectures <strong>on</strong> <strong>an</strong>alytic technicalities<br />

whenever needed. We have found that there is great interest in probability<br />

theory am<strong>on</strong>g students of ec<strong>on</strong>omics, engineering <strong>an</strong>d sciences. This interest<br />

should be encouraged <strong>an</strong>d nurtured <strong>with</strong> accessible courses.<br />

We are of course greatly indebted <strong>to</strong> the existing books <strong>on</strong> the subject,<br />

especially those by Amir Dembo <strong>an</strong>d Ofer Zei<strong>to</strong>uni, Je<strong>an</strong>-Dominique Deuchel<br />

<strong>an</strong>d D<strong>an</strong>iel Stroock, Fr<strong>an</strong>k den Holl<strong>an</strong>der, <strong>an</strong>d Richard Ellis. We th<strong>an</strong>k Jeff<br />

Steif for lecture notes that helped shape the proof of Theorem 10.2, Jim<br />

Kuelbs for the material for Secti<strong>on</strong>s 12.1 <strong>an</strong>d 12.2, <strong>an</strong>d Chuck Newm<strong>an</strong> for


Preface (<strong>an</strong>d <strong>to</strong> the reviewers) xiii<br />

a helpful discussi<strong>on</strong> <strong>on</strong> the liquid-gas phase tr<strong>an</strong>siti<strong>on</strong> menti<strong>on</strong>ed in Chapter<br />

8. We also th<strong>an</strong>k Davar Khoshnevis<strong>an</strong> for several valuable suggesti<strong>on</strong>s.<br />

Support from the Nati<strong>on</strong>al Science Foundati<strong>on</strong> <strong>an</strong>d the Wisc<strong>on</strong>sin Alumni<br />

Research Foundati<strong>on</strong> is gratefully acknowledged.<br />

Firas Rassoul-Agha<br />

Timo Seppäläinen<br />

November 23rd, 2010


Part I<br />

<strong>Large</strong> Deviati<strong>on</strong>s:<br />

general theory <strong>an</strong>d<br />

i.i.d. processes


Introducti<strong>on</strong><br />

Chapter 1<br />

Imagine the simplest possible experiment involving r<strong>an</strong>domness: <strong>to</strong>ssing a<br />

fair coin n times. When n is small, say 3 or 4, there is not much math in<br />

there. Once the number of <strong>to</strong>sses gets large it becomes harder <strong>to</strong> <strong>an</strong>alyze the<br />

situati<strong>on</strong>. However, <strong>with</strong> a large number of <strong>to</strong>sses patterns <strong>an</strong>d order start<br />

emerging from underneath the r<strong>an</strong>domness: heads appear about 50% of the<br />

time <strong>an</strong>d the his<strong>to</strong>gram shows a bell curve. These patterns become more<br />

<strong>an</strong>d more pr<strong>on</strong>ounced as the number of <strong>to</strong>sses increases. But from time <strong>to</strong><br />

time a r<strong>an</strong>dom fluctuati<strong>on</strong> might break the pattern: perhaps all of a sudden<br />

10,000 <strong>to</strong>sses of a fair coin give 6000 heads. In fact, we know that there is<br />

a ch<strong>an</strong>ce of (1/2) 10,000 that all the <strong>to</strong>sses yield heads. The point is that <strong>to</strong><br />

underst<strong>an</strong>d the system well <strong>on</strong>e c<strong>an</strong>not just be satisfied <strong>with</strong> underst<strong>an</strong>ding<br />

the most likely outcomes. One also needs <strong>to</strong> underst<strong>an</strong>d the odds of the more<br />

rare events. But why care about <strong>an</strong> event that has a ch<strong>an</strong>ce of (1/2) 10,000 ?<br />

Oversimplifying the matter for the sake of illustrati<strong>on</strong>, imagine that each<br />

day at 9AM 10,000 coins are <strong>to</strong>ssed simult<strong>an</strong>eously <strong>an</strong>d if there are 9000 or<br />

more heads a disaster sweeps the <strong>to</strong>wn. Suddenly, such a rare event is not<br />

so unimport<strong>an</strong>t. But say that each day the citizens of the <strong>to</strong>wn put $1 each<br />

in the catastrophes fund. Is this enough <strong>to</strong> be able <strong>to</strong> rebuild the <strong>to</strong>wn when<br />

disaster strikes? Maybe not. But if the citizens are asked <strong>to</strong> pay $1000 they<br />

may think that is <strong>to</strong>o much <strong>an</strong>d simply move out of the <strong>to</strong>wn. Obviously,<br />

the amount they need <strong>to</strong> pay has <strong>to</strong> do <strong>with</strong> how improbable it is <strong>to</strong> have all<br />

coins come up heads as well as <strong>with</strong> the cost of rebuilding their <strong>to</strong>wn. This<br />

is of course a caricature of how insur<strong>an</strong>ce premiums may be computed.<br />

This is <strong>an</strong> introduc<strong>to</strong>ry course <strong>on</strong> the methods of computing asymp<strong>to</strong>tics<br />

of probabilities of rare events: the theory of large deviati<strong>on</strong>s. Let us start<br />

<strong>with</strong> <strong>on</strong>e of the most basic computati<strong>on</strong>s <strong>on</strong>e c<strong>an</strong> actually carry out.<br />

3


4 1. Introducti<strong>on</strong><br />

Example 1.1. Let us c<strong>on</strong>sider coin <strong>to</strong>sses. Let {Xn} be <strong>an</strong> i.i.d. sequence of<br />

Bernoulli r<strong>an</strong>dom variables <strong>with</strong> success probability p (i.e. each Xn = 1 <strong>with</strong><br />

probability p <strong>an</strong>d 0 otherwise). Denote the partial sum by Sn = X1+· · ·+Xn.<br />

The law of large numbers ([15] or page 73 of [26]) says that Sn/n c<strong>on</strong>verges<br />

<strong>to</strong> p, almost surely. But at <strong>an</strong>y given n there is a ch<strong>an</strong>ce of p n that we get all<br />

heads (Sn = n) <strong>an</strong>d also a ch<strong>an</strong>ce of (1 − p) n that we get all tails (Sn = 0).<br />

In fact, for <strong>an</strong>y s ∈ (0, 1) there is always a ch<strong>an</strong>ce that <strong>on</strong>e gets a fracti<strong>on</strong><br />

of heads close <strong>to</strong> s. Let us compute this probability.<br />

Let us write [x] for the integral part of x ∈ R, i.e. the largest integer<br />

smaller or equal <strong>to</strong> x. Write<br />

P {Sn = [ns]} =<br />

n!<br />

[ns]!(n − [ns])! p[ns] (1 − p) n−[ns]<br />

∼ nn p [ns] (1 − p) n−[ns]<br />

[ns] [ns] (n − [ns]) n−[ns]<br />

<br />

n<br />

2π[ns](n − [ns]) ,<br />

where we have used Stirling’s formula n! ∼ e−nnn√2πn; see, for example,<br />

page 21 of Khoshnevis<strong>an</strong>’s textbook [26] or page 52 of Feller’s Vol. I [17].<br />

(We say that <strong>an</strong> ∼ bn, or <strong>an</strong> is equivalent <strong>to</strong> bn, when <strong>an</strong>/bn → 1.) Let us<br />

abbreviate<br />

<br />

n<br />

βn =<br />

2π[ns](n − [ns]) ,<br />

γn = (ns)ns (n − ns) n−ns<br />

[ns] [ns] (n − [ns]) n−[ns]<br />

p [ns] (1 − p) n−[ns]<br />

pns (1 − p) n−ns .<br />

Then, P {Sn = [ns]} is equivalent <strong>to</strong><br />

βnγn exp{n log n+ns log ns+n(1−s) log n(1−s)−ns log p−n(1−s) log(1−p)}.<br />

∗ Exercise 1.2. Show that there exists a c<strong>on</strong>st<strong>an</strong>t C such that 1<br />

C √ n ≤ βn ≤<br />

C <strong>an</strong>d 1<br />

Cn ≤ γn ≤ Cn for large enough n.<br />

(1.1)<br />

One then has<br />

lim<br />

n→∞<br />

1<br />

n log P {Sn = [ns]} = −Ip(s), <strong>with</strong><br />

Ip(s) = s log s<br />

1 − s<br />

+ (1 − s) log<br />

p 1 − p .<br />

This functi<strong>on</strong> Ip is c<strong>on</strong>tinuous <strong>on</strong> (0, 1) <strong>an</strong>d its limits at 0 <strong>an</strong>d 1 are exactly<br />

what we predicted earlier: Ip(1) = log 1<br />

p <strong>an</strong>d Ip(0) = log 1<br />

1−p . For s ∈ [0, 1]<br />

it is natural <strong>to</strong> set Ip(s) = ∞. Figure 1.1 shows what this functi<strong>on</strong> looks<br />

like.<br />

The functi<strong>on</strong> Ip in (1.1) is called a large deviati<strong>on</strong> rate functi<strong>on</strong>. Ip(s) is<br />

also called the entropy of the coin yielding heads <strong>with</strong> probability s relative


1.1. Informati<strong>on</strong>-theoretic entropy 5<br />

∞<br />

log 1<br />

p<br />

log 1<br />

1−p<br />

I(s)<br />

0<br />

0 p<br />

1<br />

Figure 1.1. The rate functi<strong>on</strong> for coin <strong>to</strong>sses.<br />

<strong>to</strong> the <strong>on</strong>e giving heads <strong>with</strong> probability p. The choice of terminology is<br />

not a coincidence. It is indeed related <strong>to</strong> both informati<strong>on</strong>-theoretic <strong>an</strong>d<br />

thermodynamic entropy.<br />

For this reas<strong>on</strong> we go <strong>on</strong> a brief de<strong>to</strong>ur <strong>to</strong> discuss these well-known<br />

noti<strong>on</strong>s of entropy <strong>an</strong>d <strong>to</strong> point out the link <strong>with</strong> the large deviati<strong>on</strong> rate<br />

functi<strong>on</strong> Ip. The so-called relative entropy that appears in large deviati<strong>on</strong><br />

theory will take center stage in Chapters 6–7, <strong>an</strong>d again in Chapter 9 when<br />

we discuss statistical mech<strong>an</strong>ics of lattice systems.<br />

1.1. Informati<strong>on</strong>-theoretic entropy<br />

Let us regard the outcome of n coin <strong>to</strong>sses as a l<strong>on</strong>g word written in binary<br />

l<strong>an</strong>guage, a sequence of zeros <strong>an</strong>d <strong>on</strong>es. The questi<strong>on</strong> we would like <strong>to</strong><br />

address is: how much informati<strong>on</strong> is there in a given sequence of n zeros<br />

<strong>an</strong>d <strong>on</strong>es? Or: what is the minimal amount of bits <strong>on</strong>e would need <strong>to</strong><br />

encode such a sequence?<br />

The <strong>an</strong>swer <strong>to</strong> this questi<strong>on</strong> has <strong>to</strong> do <strong>with</strong> qu<strong>an</strong>tifying the amount of<br />

uncertainty present in the coin itself. If the coin <strong>on</strong>ly gives heads, then <strong>on</strong>e<br />

knows exactly what is coming <strong>an</strong>d needs 0 bits <strong>to</strong> encode the outcome. On<br />

the other h<strong>an</strong>d, if the coin is fair, then <strong>on</strong>e c<strong>an</strong>not predict what will come<br />

next in <strong>an</strong>y favorable way <strong>an</strong>d <strong>on</strong>e thus needs exactly n bits <strong>to</strong> encode a<br />

sequence of n zeros <strong>an</strong>d <strong>on</strong>es; i.e. <strong>on</strong>e needs 1 bit per character. In general,<br />

<strong>on</strong>e needs h bits per character <strong>with</strong> h ∈ [0, 1]. In fact, we have already<br />

computed h in Example 1.1. To see how that computati<strong>on</strong> is relev<strong>an</strong>t we<br />

just need <strong>to</strong> view things from a slightly different <strong>an</strong>gle.<br />

A coin that gives heads <strong>with</strong> probability s is <strong>to</strong>ssed. After n <strong>to</strong>sses <strong>on</strong>e<br />

would typically get about [ns] heads <strong>an</strong>d [n(1 − s)] tails. The number of<br />

all possible combinati<strong>on</strong>s <strong>with</strong> [ns] heads is n<br />

[ns] . If we are <strong>to</strong> encode this<br />

s


6 1. Introducti<strong>on</strong><br />

sequence <strong>with</strong> [hn] zeros <strong>an</strong>d <strong>on</strong>es, then we have 2 [hn] possible words <strong>to</strong> use.<br />

One should then have<br />

<br />

n<br />

≤ 2<br />

[ns]<br />

[hn] .<br />

Using Stirling’s formula (or directly applying (1.1)) <strong>on</strong>e obtains that the<br />

minimal possible h is<br />

<strong>with</strong> I 1/2 given in (1.1).<br />

h(s) = −s log 2 s − (1 − s) log 2(1 − s) = 1 − I 1/2(s)<br />

log 2 ,<br />

Note that h(0) = h(1) = 0. This makes sense, since if we know s = 0<br />

(respectively, s = 1) then we know we will <strong>on</strong>ly get zeros (respectively, <strong>on</strong>es).<br />

Thus, there is nothing <strong>to</strong> encode. This is the case of complete order. On the<br />

other h<strong>an</strong>d, h(1/2) = 1. This <strong>to</strong>o makes sense since there is no informati<strong>on</strong><br />

<strong>on</strong>e c<strong>an</strong> extract from a sequence of fair coin <strong>to</strong>sses <strong>an</strong>d <strong>on</strong>e needs all n<br />

bits <strong>to</strong> encode the sequence. This is the case of complete disorder. For<br />

s ∈ (0, 1/2), <strong>on</strong>e knows that a 1 is less likely <strong>to</strong> occur th<strong>an</strong> a 0 <strong>an</strong>d hence<br />

<strong>on</strong>e should be able <strong>to</strong> c<strong>on</strong>serve <strong>an</strong>d use fewer th<strong>an</strong> n bits <strong>to</strong> encode the<br />

sequence. However, the above formula says that <strong>on</strong>e c<strong>an</strong>not do better th<strong>an</strong><br />

h(s)n. This, of course, does not tell us what the best encoding algorithm is.<br />

1.2. Thermodynamic entropy<br />

Once again, for the sake of illustrati<strong>on</strong>, we will describe <strong>an</strong> oversimplified<br />

system. This secti<strong>on</strong> is inspired by Schrödinger’s course [34].<br />

C<strong>on</strong>sider a physical system of n independent identical comp<strong>on</strong>ents. By<br />

independent we me<strong>an</strong> that the comp<strong>on</strong>ents do not communicate <strong>with</strong> each<br />

other. By identical we me<strong>an</strong> that each of them has the same “mech<strong>an</strong>ism”<br />

attached <strong>to</strong> it, screws, pist<strong>on</strong>s, <strong>an</strong>d what not. Each comp<strong>on</strong>ent c<strong>an</strong> be at <strong>an</strong><br />

energy level from the set {εℓ : ℓ ∈ N}. We submit the system <strong>to</strong> a heat bath<br />

at a fixed absolute temperature T which causes it <strong>to</strong> have <strong>to</strong>tal energy E. Let<br />

aℓ be the number of comp<strong>on</strong>ents in state εℓ. The system tries <strong>to</strong> maximize<br />

its disorder by choosing aℓ’s so that the number of possible c<strong>on</strong>figurati<strong>on</strong>s<br />

n!<br />

a1!···aℓ!··· is as large as possible, subject <strong>to</strong> the c<strong>on</strong>straints aℓ = n <strong>an</strong>d<br />

<br />

aℓεℓ = E.<br />

Equivalently, <strong>on</strong>e c<strong>an</strong> maximize the logarithm of the qu<strong>an</strong>tity in questi<strong>on</strong><br />

<strong>an</strong>d use Lagr<strong>an</strong>ge multipliers <strong>to</strong> achieve this optimizati<strong>on</strong> task; see page 266<br />

of Bartle’s textbook [3]. First, <strong>on</strong>e sets the gradient of<br />

log<br />

n!<br />

a1! · · · aℓ! · · · − α aℓ − β aℓεℓ


1.2. Thermodynamic entropy 7<br />

<strong>to</strong> 0. Here, α <strong>an</strong>d β are the Lagr<strong>an</strong>ge multipliers. For c<strong>on</strong>venience let us<br />

pretend the unknowns aℓ are c<strong>on</strong>tinuous variables. We will again use Stirling’s<br />

formula in the form log n! ∼ n(log n − 1) then compute the derivative<br />

of the above <strong>with</strong> respect <strong>to</strong> aℓ <strong>an</strong>d get<br />

log aℓ + α + βεℓ = 0, for all ℓ.<br />

In other words, aℓ = Ce −βεℓ. Since <strong>on</strong>e has a <strong>to</strong>tal number of n comp<strong>on</strong>ents,<br />

<strong>on</strong>e gets<br />

The sec<strong>on</strong>d c<strong>on</strong>straint gives<br />

aℓ = ne−βεℓ<br />

<br />

−βεj e .<br />

E = n εℓ e−βεℓ . −βεℓ e<br />

The whole state of the system is therefore determined by the energies εℓ<br />

<strong>an</strong>d the parameter β. In principle β is a complicated functi<strong>on</strong> of E. However,<br />

it turns out that β is physically the more import<strong>an</strong>t qu<strong>an</strong>tity. We will view<br />

everything as a functi<strong>on</strong> of β <strong>an</strong>d {εℓ} as shown in the above displays.<br />

In thermodynamics, the entropy S of the system is defined by the relati<strong>on</strong><br />

dQ = T dS, where dQ is the infinitesimal energy that must be given up <strong>to</strong> the<br />

system’s surroundings as unusable heat when we do work <strong>on</strong> the pist<strong>on</strong>s <strong>to</strong><br />

alter the system’s energy levels by dεℓ. This work is equal <strong>to</strong> aℓ dεℓ while<br />

the corresp<strong>on</strong>ding increase in energy is equal <strong>to</strong> dE. dQ is the difference<br />

between the two. But since all qu<strong>an</strong>tities will increase as n increases <strong>on</strong>e<br />

needs <strong>to</strong> normalize appropriately. We will hence let S be the average entropy<br />

per comp<strong>on</strong>ent, define U = E/n as the average energy per comp<strong>on</strong>ent, let<br />

F = log e−βεj , <strong>an</strong>d write<br />

(1.2)<br />

<br />

dE − <br />

aℓ dεℓ<br />

<br />

d(βE) − E dβ − β<br />

nT β<br />

<br />

aℓ dεℓ<br />

= 1<br />

<br />

d(βU) +<br />

T β<br />

∂F<br />

∂β dβ + ∂F<br />

<br />

dεℓ<br />

∂εℓ<br />

dS = 1<br />

nT<br />

= 1<br />

= 1<br />

d(βU + F ).<br />

T β<br />

Abbreviate G = βU + F which, by the above display, has <strong>to</strong> be a functi<strong>on</strong><br />

f(S) such that f ′ (S) = T β.<br />

Observe that G is <strong>an</strong> additive functi<strong>on</strong>al of the system in the following<br />

sense. Suppose we have <strong>an</strong>other system B <strong>with</strong> m independent <strong>an</strong>d identical<br />

comp<strong>on</strong>ents at energies {¯εk}. Assume the comp<strong>on</strong>ents of B are also<br />

independent of those of A. Now c<strong>on</strong>sider a third system c<strong>on</strong>sisting of the


8 1. Introducti<strong>on</strong><br />

two systems <strong>to</strong>gether <strong>an</strong>d call the new system AB. This is a system of nm<br />

comp<strong>on</strong>ents at energies {εℓ + ¯εk}. Observe that if β is kept the same for all<br />

three systems, then FA + FB = FAB, where we have put a subscript <strong>on</strong> the<br />

functi<strong>on</strong> F <strong>to</strong> indicate which system it refers <strong>to</strong>. Since U = − ∂F<br />

∂β , the same<br />

additive relati<strong>on</strong> holds for the functi<strong>on</strong> G. In other words,<br />

fA(SA) + fB(SB) = fAB(SAB).<br />

The key observati<strong>on</strong> now is that due <strong>to</strong> (1.2) entropy is also <strong>an</strong> additive<br />

functi<strong>on</strong>al of the system, when T is kept fixed for all three systems. That<br />

is, dSAB = dSA + dSB. Thus,<br />

fA(SA) + fB(SB) = fAB(SA + SB + c).<br />

Taking derivatives in SA <strong>an</strong>d in SB <strong>on</strong>e sees that f ′ A (SA) = f ′ B (SB). Since<br />

the system B was chosen arbitrarily we see that f ′ (S) must be a universal<br />

c<strong>on</strong>st<strong>an</strong>t, say 1/k. This implies the familiar β = 1<br />

kT . Moreover, G is<br />

proporti<strong>on</strong>al <strong>to</strong> the entropy S.<br />

Let us now compute G for a system that c<strong>an</strong> <strong>on</strong>ly take <strong>on</strong>e of two<br />

energies ε1 <strong>an</strong>d ε2. By symmetry, recentering, <strong>an</strong>d a ch<strong>an</strong>ge of units, we c<strong>an</strong><br />

assume that ε1 = 0 <strong>an</strong>d ε2 = 1. Then, the <strong>to</strong>tal energy E is precisely the<br />

number of comp<strong>on</strong>ents at energy 1. This is exactly like flipping a coin <strong>to</strong><br />

decide which comp<strong>on</strong>ent is at energy 1. The coin falls heads <strong>with</strong> probability<br />

u = E/n = U = e −β /(1 + e −β ). One has<br />

G = βu + F = u(β + F ) + (1 − u)F<br />

= −u log u − (1 − u) log(1 − u)<br />

= log 2 − I 1/2(u).<br />

We see that when p = 1/2 the rate functi<strong>on</strong> Ip in Example 1.1 is, up <strong>to</strong><br />

a c<strong>on</strong>st<strong>an</strong>t, the negative thermodynamic entropy of the two-energy system.<br />

(This is where the cus<strong>to</strong>mary mathematical <strong>an</strong>d physical usage of the term<br />

entropy differs: a minus sign!) This link between large deviati<strong>on</strong> theory<br />

<strong>an</strong>d statistical mech<strong>an</strong>ics will be explored further in the sec<strong>on</strong>d part of this<br />

course.<br />

In the previous secti<strong>on</strong> we saw that −Ip is a linear functi<strong>on</strong> (<strong>with</strong> positive<br />

slope) of h. Thus, <strong>on</strong>e c<strong>on</strong>cludes that the thermodynamic entropy of a<br />

physical system is essentially equal <strong>to</strong> the amount of informati<strong>on</strong> needed<br />

<strong>to</strong> fully describe the system or, equivalently, <strong>to</strong> the amount of uncertainty<br />

remaining in it.<br />

After this introduc<strong>to</strong>ry discussi<strong>on</strong> we begin in the next chapter a systematic<br />

study of probabilities of rare events. As in (1.1) in the coin <strong>to</strong>ssing<br />

example, these probabilities often decay exp<strong>on</strong>entially in the system size.


1.2. Thermodynamic entropy 9<br />

The identity (kβ) −1 S = U + β −1 F that represents <strong>an</strong> entropy-energy bal<strong>an</strong>ce<br />

will reappear several times in various guises. In c<strong>an</strong> be found in Exercise<br />

6.21, as equati<strong>on</strong> (8.5) for the Curie-Weiss model, <strong>an</strong>d in Secti<strong>on</strong> 9.5 as part<br />

(c) of the Dobrushin-L<strong>an</strong>ford-Ruelle variati<strong>on</strong>al principle for lattice systems.


Preliminary examples<br />

<strong>an</strong>d generalities<br />

2.1. Informal large deviati<strong>on</strong>s<br />

Chapter 2<br />

One comm<strong>on</strong> use of large deviati<strong>on</strong>s is <strong>to</strong> find <strong>an</strong> estimate good enough<br />

for the purpose at h<strong>an</strong>d; e.g. proving a limit theorem. The following is <strong>an</strong><br />

illustrati<strong>on</strong> of such a situati<strong>on</strong>.<br />

Example 2.1. Let {Xn} be <strong>an</strong> i.i.d. sequence <strong>with</strong> E[e θX ] < ∞ for each<br />

θ close <strong>to</strong> 0 (i.e. |θ| < δ for some δ > 0). Assume E[X] = 0. We would<br />

like <strong>to</strong> show that Sn/n p → 0 P -almost surely, for <strong>an</strong>y p > 1/2. When<br />

p ≥ 1 this follows from the str<strong>on</strong>g law of large numbers. Let us thus assume<br />

p ∈ (1/2, 1). Next, for t ≥ 0 Chebyshev’s inequality (see, for example, page<br />

15 of Durrett’s textbook [15]) implies<br />

P {Sn ≥ εn p } ≤ E[e tSn−εtnp<br />

] = exp{−εtn p + n log E[e tX ]}.<br />

The exp<strong>on</strong>ential moment assumpti<strong>on</strong> <strong>on</strong> X implies that E[|X| k ]t k /k! is<br />

summable, for t ∈ [0, δ). Recalling that E[X] = 0, we see that there exists<br />

a δ0 > 0 <strong>an</strong>d a c<strong>on</strong>st<strong>an</strong>t c such that<br />

E[e tX ] = E[e tX − tX] ≤ 1 +<br />

when t ∈ [0, δ0]. Then, taking t = εnp<br />

2nc<br />

∞<br />

k=2<br />

t k<br />

k! E[|X|k ] ≤ 1 + ct 2 ,<br />

<strong>an</strong>d n large enough,<br />

P {Sn ≥ εn p } ≤ exp{−εtn p + n log(1 + ct 2 )}<br />

≤ exp{−εtn p + nct 2 } = exp<br />

<br />

− ε2<br />

4c n2p−1<br />

.<br />

11


12 2. Preliminary examples <strong>an</strong>d generalities<br />

Applying this <strong>to</strong> the sequence {−Xn} also gives<br />

P {Sn ≤ −n p } ≤ exp{− ε2<br />

4c n2p−1 }.<br />

We have shown that P {|Sn| ≥ εn p } is summable <strong>an</strong>d the Borel-C<strong>an</strong>telli<br />

lemma (see, e.g. page 47 of [15] or page 73 of [26]) implies that<br />

for <strong>an</strong>y ε > 0. Thus, <strong>on</strong>e has<br />

P {∃n0 : n ≥ n0 ⇒ |Sn/n p | ≤ ε} = 1,<br />

P {∀k ∃n0 : n ≥ n0 ⇒ |Sn/n p | ≤ 1/k} = 1,<br />

which is saying that Sn/n p → 0, P -a.s.<br />

One c<strong>an</strong> achieve the same result using martingales. In fact, <strong>on</strong>e c<strong>an</strong><br />

then weaken the moment assumpti<strong>on</strong> <strong>to</strong> just E[|X| 2 ] < ∞. Here is how.<br />

Since Sn is a P -martingale (relative <strong>to</strong> the filtrati<strong>on</strong> σ(X1, . . . , Xn)), Doob’s<br />

inequality (see page 250 of [15] or (8.67) of [26]) gives<br />

P<br />

<br />

max<br />

k≤n |Sk| ≥ εn p<br />

≤ 1<br />

ε2n2p E[|Sn| 2 ] = E[|X|2 ]n<br />

ε2n2p Pick r > 0 such that r(2p − 1) > 1. Then,<br />

<br />

P max<br />

k≤mr |Sk| ≥ εm pr<br />

≤<br />

c1<br />

.<br />

mr(2p−1) c<br />

=<br />

ε2 n−(2p−1) .<br />

Hence, P {maxk≤m r |Sk| ≥ εm pr } is summable <strong>an</strong>d the Borel-C<strong>an</strong>telli lemma<br />

implies that m −rp maxk≤m r |Sk| c<strong>on</strong>verges <strong>to</strong> 0, P -a.s.<br />

To get the result for the full sequence observe that given n <strong>on</strong>e c<strong>an</strong> pick<br />

m(n) such that (m(n) − 1) r ≤ n < m(n) r . Then,<br />

n −p max<br />

k≤n |Sk| ≤<br />

m(n) r<br />

Since m(n) r /n c<strong>on</strong>verges <strong>to</strong> <strong>on</strong>e we are d<strong>on</strong>e.<br />

2.2. Formal large deviati<strong>on</strong>s<br />

n<br />

p m(n) −rp max |Sk|.<br />

k≤m(n) r<br />

In the formal setting <strong>on</strong>e seeks precise limits of probabilities of rare events,<br />

typically <strong>on</strong> <strong>an</strong> exp<strong>on</strong>ential scale. The st<strong>an</strong>dardized formulati<strong>on</strong> of these<br />

limits is called a “large deviati<strong>on</strong> principle” (LDP). In what follows, we<br />

describe the setting <strong>an</strong>d lead <strong>to</strong> the precise formulati<strong>on</strong>.<br />

In a very general setting <strong>on</strong>e has a Hausdorff <strong>to</strong>pological space X ; i.e. a<br />

<strong>to</strong>pological space where <strong>an</strong>y two points c<strong>an</strong> be separated by disjoint neighborhoods.<br />

We equip X <strong>with</strong> its Borel σ-algebra B <strong>an</strong>d let M1(X ) be the<br />

space of probability measures <strong>on</strong> the resulting measurable space. We have<br />

a sequence {µn} of such measures. In Example 1.1 <strong>on</strong> page 4 this was the<br />

sequence of distributi<strong>on</strong>s µn(A) = P {Sn/n ∈ A}.


2.2. Formal large deviati<strong>on</strong>s 13<br />

We are interested in the weight µn assigns <strong>to</strong> <strong>an</strong> outcome x ∈ X . In<br />

Example 1.1 these weights decayed like e −cn for atypical points. This is the<br />

kind of situati<strong>on</strong> we w<strong>an</strong>t <strong>to</strong> study, <strong>an</strong>d in particular we wish <strong>to</strong> compute the<br />

c<strong>on</strong>st<strong>an</strong>t c exactly. It could happen that c is infinite. This either happens<br />

because x c<strong>an</strong> never take place or because the probability actually decays<br />

faster th<strong>an</strong> exp<strong>on</strong>entially. On the other h<strong>an</strong>d, c could also be 0 which me<strong>an</strong>s<br />

that x turns out <strong>to</strong> be “less rare” th<strong>an</strong> we thought <strong>an</strong>d the probability decays<br />

slower th<strong>an</strong> exp<strong>on</strong>entially. Looking for the correct rate of decay is thus part<br />

of the problem. Let us say then that, at scale rn ↗ ∞, the weight µn assigns<br />

<strong>to</strong> x decays like e −crn . We still would like <strong>to</strong> compute the c<strong>on</strong>st<strong>an</strong>t c. In<br />

fact, this is a functi<strong>on</strong> of x that we will call the rate functi<strong>on</strong> <strong>an</strong>d denote by<br />

I(x).<br />

This is of course not precise. Often we c<strong>an</strong>not talk about the weights<br />

assigned <strong>to</strong> individual elements x of the space. Instead we must talk about<br />

weights assigned <strong>to</strong> events, or subsets, of the space. But, thinking informally<br />

for just a bit l<strong>on</strong>ger, <strong>on</strong> <strong>an</strong> exp<strong>on</strong>ential scale it makes sense <strong>to</strong> regard <strong>an</strong> event<br />

A as rare as its least rare outcome. That is, the weight assigned <strong>to</strong> a set A<br />

should be the maximal weight µn assigns <strong>to</strong> <strong>an</strong> element of A, i.e. e −rn infA I .<br />

On a technical level, it is <strong>to</strong>o much <strong>to</strong> expect actual c<strong>on</strong>vergence for all sets<br />

A <strong>on</strong> account of boundary effects. A more reas<strong>on</strong>able formulati<strong>on</strong> would be<br />

<strong>to</strong> say that for all measurable sets A:<br />

(2.1)<br />

− inf I(x) ≤ lim<br />

x∈A◦ n→∞<br />

1<br />

rn<br />

log µn(A) ≤ lim<br />

n→∞<br />

1<br />

rn<br />

log µn(A) ≤ − inf I(x),<br />

x∈A<br />

where A ◦ <strong>an</strong>d A are, respectively, the <strong>to</strong>pological interior <strong>an</strong>d closure of A.<br />

Remark 2.2. The limsup for closed sets <strong>an</strong>d liminf for open sets remind<br />

us of weak c<strong>on</strong>vergence of probability measures where the same boundary<br />

issue arises; see Appendix A.1 for the definiti<strong>on</strong> of weak c<strong>on</strong>vergence.<br />

Example 2.3. Let us revisit the Bernoulli i.i.d. sequence {Xn} that we<br />

c<strong>on</strong>sidered in Example 1.1. If we let µn(A) = P {Sn/n ∈ A} <strong>an</strong>d recall the<br />

functi<strong>on</strong> Ip in (1.1), then {µn} satisfy (2.1) <strong>with</strong> normalizati<strong>on</strong> n <strong>an</strong>d rate Ip.<br />

Indeed, take first <strong>an</strong> open set G. For <strong>an</strong>y point s ∈ G ∩ [0, 1] taking n large<br />

enough implies that [ns]/n ∈ G <strong>an</strong>d thus P {Sn/n ∈ G} ≥ P {Sn = [ns]}.<br />

This implies that<br />

1<br />

lim<br />

n→∞ n log P {Sn/n ∈ G} ≥ lim<br />

n→∞ P {Sn = [ns]} = −Ip(s).<br />

This inequality also holds for s ∈ G [0, 1], since Ip(s) is infinite then. Now<br />

we c<strong>an</strong> take sup over points s ∈ G <strong>an</strong>d the lower bound in (2.1) follows by<br />

taking G = A ◦ .


14 2. Preliminary examples <strong>an</strong>d generalities<br />

For a closed set F , <strong>on</strong>e c<strong>an</strong> split it in<strong>to</strong> the uni<strong>on</strong> of F1 = F ∩ (−∞, p]<br />

<strong>an</strong>d F2 ∩ [p, ∞). Let us first prove the upper bound in (2.1) for F1 <strong>an</strong>d F2<br />

separately.<br />

Let a = sup F1 ≤ p <strong>an</strong>d b = inf F2 ≥ p. (If F1 is empty set a = −∞ <strong>an</strong>d<br />

if F2 is empty set b = ∞.) Assume first that a ≥ 0. Then<br />

1<br />

n log P {Sn/n ∈ F1} ≤ 1<br />

n log P {Sn/n ∈ [0, a]} = 1<br />

n log<br />

[na] <br />

P {Sn = k}.<br />

k=0<br />

∗ Exercise 2.4. Prove that P {Sn = k} increases <strong>with</strong> k ≤ [na].<br />

By the above exercise, <strong>on</strong>e sees that<br />

1<br />

lim<br />

n→∞ n log P {Sn/n<br />

1<br />

∈ F1} ≤ lim<br />

n→∞ n log([na] + 1)P {Sn = [na]} = −Ip(a).<br />

This formula is still valid even when a < 0. One c<strong>an</strong> prove the upper bound<br />

in (2.1) for F2 similarly. Then, <strong>on</strong>e writes<br />

1<br />

n log P {Sn/n ∈ F } ≤ 1<br />

n log<br />

<br />

<br />

P {Sn/n ∈ F1} + P {Sn/n ∈ F2}<br />

≤ 1<br />

<br />

1<br />

log 2 + max<br />

n n log P {Sn/n ∈ F1}, 1<br />

n log P {Sn/n<br />

<br />

∈ F2} .<br />

Moreover, formula (1.1) or Figure (1.1) show that I is decreasing <strong>on</strong> [0, p] <strong>an</strong>d<br />

increasing <strong>on</strong> [p, 1]. Hence, infF1 Ip = Ip(a), infF2 Ip = Ip(b), <strong>an</strong>d infF Ip =<br />

min(Ip(a), Ip(b)).<br />

Finally,<br />

1<br />

lim<br />

n→∞ n log P {Sn/n ∈ F } ≤ − min(Ip(a), Ip(b)) = − inf Ip.<br />

F<br />

If we now take F = A, the upper bound in (2.1) follows.<br />

We have shown that (2.1) holds <strong>with</strong> Ip defined in (1.1). This is our first<br />

example of a full-fledged large deviati<strong>on</strong> principle.<br />

There are other inst<strong>an</strong>ces where <strong>on</strong>e c<strong>an</strong> compute the rate functi<strong>on</strong> by<br />

h<strong>an</strong>d.<br />

Exercise 2.5. Prove (2.1) holds for the law of the sample me<strong>an</strong> of <strong>an</strong> i.i.d.<br />

sequence of real-valued normal r<strong>an</strong>dom variables.<br />

Hint: A formal computati<strong>on</strong> should suggest I(x) = (x − µ) 2 /(2σ 2 ), where<br />

µ is the me<strong>an</strong> <strong>an</strong>d σ 2 is the vari<strong>an</strong>ce.<br />

Exercise 2.6. Prove (2.1) holds for the law of the sample me<strong>an</strong> of <strong>an</strong> i.i.d.<br />

sequence of exp<strong>on</strong>ential r<strong>an</strong>dom variables <strong>an</strong>d compute the rate functi<strong>on</strong><br />

explicitly.<br />

Hint: Use Stirling’s formula.


2.3. Lower semic<strong>on</strong>tinuity <strong>an</strong>d uniqueness 15<br />

Let us c<strong>on</strong>tinue <strong>with</strong> some general facts c<strong>on</strong>cerning (2.1) <strong>an</strong>d rate functi<strong>on</strong>s.<br />

2.3. Lower semic<strong>on</strong>tinuity <strong>an</strong>d uniqueness<br />

Recall the definiti<strong>on</strong> of a lower semic<strong>on</strong>tinuous (l.s.c.) functi<strong>on</strong>.<br />

Definiti<strong>on</strong> 2.7. A functi<strong>on</strong> f : X → [−∞, ∞] is lower semic<strong>on</strong>tinuous if<br />

{f ≤ c} is closed for all c ∈ R.<br />

Exercise 2.8. Prove that if f is lower semic<strong>on</strong>tinuous then {f = −∞} is<br />

closed.<br />

∗ Exercise 2.9. Prove that if X is metric then f is lower semic<strong>on</strong>tinuous if,<br />

<strong>an</strong>d <strong>on</strong>ly if, lim y→x f(y) ≥ f(x) for all x.<br />

Say we have <strong>an</strong> arbitrary functi<strong>on</strong> f : X → [−∞, ∞] <strong>an</strong>d would like<br />

<strong>to</strong> produce from it a lower semic<strong>on</strong>tinuous <strong>on</strong>e. This is achieved by the<br />

so-called lower semic<strong>on</strong>tinuous regularizati<strong>on</strong> of f, which we will denote by<br />

flsc : X → [−∞, ∞]. It is defined by<br />

<br />

<br />

(2.2) flsc(x) = sup inf f : G ∋ x <strong>an</strong>d G is open<br />

G<br />

.<br />

This defines a lower semic<strong>on</strong>tinuous functi<strong>on</strong> <strong>an</strong>d in fact the maximal lower<br />

semic<strong>on</strong>tinuous minor<strong>an</strong>t of f.<br />

Lemma 2.10. flsc is lower semic<strong>on</strong>tinuous <strong>an</strong>d flsc(x) ≤ f(x) for all x. If<br />

g is lower semic<strong>on</strong>tinuous <strong>an</strong>d satisfies g(x) ≤ f(x) for all x, then g(x) ≤<br />

flsc(x) for all x.<br />

Proof. flsc ≤ f is clear. To show flsc is lower semic<strong>on</strong>tinuous, let x ∈<br />

{flsc > c}. Then there is <strong>an</strong> open G c<strong>on</strong>taining x <strong>an</strong>d such that infG f > c.<br />

Hence by the supremum in the definiti<strong>on</strong> of flsc, flsc(y) ≥ infG f > c for all<br />

y ∈ G. Thus G is <strong>an</strong> open neighborhood of x c<strong>on</strong>tained in {flsc > c}. So<br />

{flsc > c} is open.<br />

To show the last claim <strong>on</strong>e just needs <strong>to</strong> show that glsc = g. For then<br />

<br />

<br />

g(x) = sup inf g : x ∈ G <strong>an</strong>d G is open<br />

G<br />

<br />

<br />

≤ sup inf f : x ∈ G <strong>an</strong>d G is open = flsc(x).<br />

G<br />

We already know that glsc ≤ g. To show the other directi<strong>on</strong> let c be such that<br />

g(x) > c. Then, G = {g > c} is <strong>an</strong> open set c<strong>on</strong>taining x <strong>an</strong>d infG g ≥ c.<br />

Thus glsc(x) ≥ c. Now increase c <strong>to</strong> g(x). <br />

The above c<strong>an</strong> be reinterpreted in terms of epigraphs. The epigraph of<br />

a functi<strong>on</strong> f is the set {(x, t) ∈ X × R : f(x) ≤ t}.


16 2. Preliminary examples <strong>an</strong>d generalities<br />

Lemma 2.11. The epigraph of flsc is the closure of that of f.<br />

Proof. Note that the epigraph of flsc is closed. That it c<strong>on</strong>tains the epigraph<br />

of f (<strong>an</strong>d thus also its closure) is immediate because flsc ≤ f. For the other<br />

inclusi<strong>on</strong> we need <strong>to</strong> show that <strong>an</strong>y open set outside the epigraph of f is also<br />

outside the epigraph of flsc. Let A be such a set <strong>an</strong>d let (x, t) ∈ A. There is<br />

<strong>an</strong> open neighborhood G of x <strong>an</strong>d <strong>an</strong> ε > 0 such that G × (t − ε, t + ε) ⊂ A.<br />

So for <strong>an</strong>y y ∈ G <strong>an</strong>d <strong>an</strong>y s ∈ (t − ε, t + ε), s < f(y). In particular,<br />

t + ε/2 ≤ infG f ≤ flsc(x). So (x, t) is outside the epigraph of flsc. <br />

This regularizati<strong>on</strong> is in fact what <strong>on</strong>e gets if <strong>on</strong>e <strong>on</strong>ly makes the necessary<br />

ch<strong>an</strong>ges <strong>to</strong> the values of the functi<strong>on</strong> <strong>to</strong> make it lower semic<strong>on</strong>tinuous.<br />

Exercise 2.12. Assume X is a metric space. Show that if xn → x, then<br />

flsc(x) ≤ lim f(xn). Prove that for each x ∈ X there is a sequence xn → x<br />

such that f(xn) → flsc(x). This gives the alternate definiti<strong>on</strong> flsc(x) =<br />

lim y→x f(y).<br />

Now we apply all this <strong>to</strong> rate functi<strong>on</strong>s of large deviati<strong>on</strong> principles.<br />

The next lemma shows that rate functi<strong>on</strong>s c<strong>an</strong> be assumed <strong>to</strong> be lower<br />

semic<strong>on</strong>tinuous.<br />

Lemma 2.13. Suppose I is a functi<strong>on</strong> such that (2.1) holds for all measurable<br />

sets A. Then, (2.1) c<strong>on</strong>tinues <strong>to</strong> hold if I is replaced by Ilsc.<br />

Proof. Ilsc ≤ I <strong>an</strong>d the upper bound is immediate. For the lower bound<br />

observe that infG Ilsc = infG I when G is open. <br />

Due <strong>to</strong> Lemma 2.13 we will call a [0, ∞]-valued functi<strong>on</strong> I a rate functi<strong>on</strong><br />

<strong>on</strong>ly when it is lower semic<strong>on</strong>tinuous. Here is the precise me<strong>an</strong>ing of a large<br />

deviati<strong>on</strong> principle (LDP) for the remainder of the text.<br />

Definiti<strong>on</strong> 2.14. Let I : X → [0, ∞] be a lower semic<strong>on</strong>tinuous functi<strong>on</strong><br />

<strong>an</strong>d rn ↗ ∞ a sequence of positive c<strong>on</strong>st<strong>an</strong>ts. A sequence of probability<br />

measures {µn} ⊂ M1(X ) is said <strong>to</strong> satisfy a large deviati<strong>on</strong> principle <strong>with</strong><br />

rate functi<strong>on</strong> I <strong>an</strong>d normalizati<strong>on</strong> rn if the following inequalities hold for all<br />

closed F ⊂ X <strong>an</strong>d all open G ⊂ X :<br />

(2.3)<br />

(2.4)<br />

lim<br />

n→∞<br />

lim<br />

n→∞<br />

1<br />

rn<br />

1<br />

rn<br />

log µn(F ) ≤ − inf<br />

F I;<br />

log µn(G) ≥ − inf<br />

G I.<br />

We will abbreviate LDP(µn, rn, I) if all of the above holds. When the sets<br />

{I ≤ c} are compact for all c ∈ R, we say I is a tight rate functi<strong>on</strong>.


2.3. Lower semic<strong>on</strong>tinuity <strong>an</strong>d uniqueness 17<br />

Remark 2.15. In a large part of the large deviati<strong>on</strong> literature a rate functi<strong>on</strong><br />

I is called good when all the sets {I ≤ c} are compact. We prefer the<br />

term tight as more descriptive <strong>an</strong>d because of the c<strong>on</strong>necti<strong>on</strong> <strong>with</strong> exp<strong>on</strong>ential<br />

tightness; see Lemma 3.3 below.<br />

Tightness of a rate functi<strong>on</strong> is a very useful property. Here are two<br />

examples.<br />

∗ Exercise 2.16. Suppose X is a Hausdorff <strong>to</strong>pological space <strong>an</strong>d let E ⊂ X<br />

be a closed set. Assume that the relative <strong>to</strong>pology <strong>on</strong> E is metrized by the<br />

metric d. Let I : E → [0, ∞] be a tight rate functi<strong>on</strong> <strong>an</strong>d fix <strong>an</strong> arbitrary<br />

closed set F ⊂ E. Prove that<br />

lim<br />

ε↘0 inf I = inf<br />

F ε F I,<br />

where F ε = {x ∈ E : ∃y ∈ F such that d(x, y) < ε}.<br />

∗ Exercise 2.17. X <strong>an</strong>d E as in the above exercise. Suppose ξn <strong>an</strong>d ηn are<br />

E-valued r<strong>an</strong>dom variables defined <strong>on</strong> (Ω, F , P ), <strong>an</strong>d for <strong>an</strong>y δ > 0 there<br />

exists <strong>an</strong> n0 < ∞ such that d(ξn(ω), ηn(ω)) < δ for all n ≥ n0 <strong>an</strong>d ω ∈ Ω.<br />

(a) Show that if the distributi<strong>on</strong>s of ξn satisfy the lower large deviati<strong>on</strong><br />

bound (2.4) <strong>with</strong> some rate functi<strong>on</strong> I : E → [0, ∞], then so do the<br />

distributi<strong>on</strong>s of ηn.<br />

(b) Show that if the distributi<strong>on</strong>s of ξn satisfy the upper large deviati<strong>on</strong><br />

bound (2.3) <strong>with</strong> some tight rate functi<strong>on</strong> I : E → [0, ∞], then so<br />

do the distributi<strong>on</strong>s of ηn.<br />

A <strong>to</strong>pological space is regular if points <strong>an</strong>d closed sets c<strong>an</strong> be separated<br />

by disjoint open neighborhoods. In this case, <strong>on</strong>e c<strong>an</strong> even be sure I is<br />

unique.<br />

Theorem 2.18. If X is a regular <strong>to</strong>pological space, then there is at most <strong>on</strong>e<br />

(lower semic<strong>on</strong>tinuous) rate functi<strong>on</strong> satisfying the large deviati<strong>on</strong> bounds<br />

(2.3) <strong>an</strong>d (2.4).<br />

Proof. We will show that I satisfies<br />

<br />

I(x) = sup − lim 1<br />

<br />

log µn(B) : x ∈ B <strong>an</strong>d B is open .<br />

One directi<strong>on</strong> is easy:<br />

− lim 1<br />

rn<br />

rn<br />

log µn(B) ≤ inf<br />

B I ≤ I(x).<br />

For the other directi<strong>on</strong>, fix x <strong>an</strong>d choose c < I(x). One c<strong>an</strong> separate x<br />

from {I ≤ c} by disjoint neighborhoods. Thus, there exists <strong>an</strong> open set G


18 2. Preliminary examples <strong>an</strong>d generalities<br />

c<strong>on</strong>taining x <strong>an</strong>d such that G ⊂ {I > c}. Then<br />

<br />

sup − lim 1<br />

<br />

log µn(B) : x ∈ B <strong>an</strong>d B is open<br />

rn<br />

≥ − lim 1<br />

log µn{G} ≥ − lim<br />

rn<br />

1<br />

log µn{G} ≥ inf I ≥ c.<br />

rn<br />

G<br />

Increasing c <strong>to</strong> I(x) c<strong>on</strong>cludes the proof.


More generalities <strong>an</strong>d<br />

Cramér’s theorem<br />

3.1. Weak large deviati<strong>on</strong> principles<br />

Chapter 3<br />

As we have seen in Example 1.1, when proving the lower large deviati<strong>on</strong><br />

bound (2.4) it is enough <strong>to</strong> c<strong>on</strong>sider G that are local neighborhoods. It is<br />

also enough <strong>to</strong> focus <strong>on</strong> local neighborhoods if <strong>on</strong>e <strong>on</strong>ly needs <strong>to</strong> prove the<br />

upper large deviati<strong>on</strong> bound (2.3) for compact sets F . This often simplifies<br />

the <strong>an</strong>alysis c<strong>on</strong>siderably.<br />

For the discussi<strong>on</strong> in this secti<strong>on</strong> let X be a Hausdorff space.<br />

Definiti<strong>on</strong> 3.1. A sequence of probability measures {µn} ⊂ M1(X ) is said<br />

<strong>to</strong> satisfy a weak large deviati<strong>on</strong> principle <strong>with</strong> lower semic<strong>on</strong>tinuous rate<br />

functi<strong>on</strong> I : X → [0, ∞] <strong>an</strong>d normalizati<strong>on</strong> {rn} if the lower large deviati<strong>on</strong><br />

bound (2.4) holds for all open sets G ⊂ X <strong>an</strong>d the upper large deviati<strong>on</strong><br />

bound (2.3) holds for all compact sets F ⊂ X .<br />

With enough c<strong>on</strong>trol <strong>on</strong> the tails of the measures µn this is in fact sufficient<br />

for the full LDP <strong>to</strong> hold.<br />

Definiti<strong>on</strong> 3.2. We say {µn} ⊂ M1(X ) is exp<strong>on</strong>entially tight <strong>with</strong> normalizati<strong>on</strong><br />

rn if for each b > 0 there exists a compact set Kb such that<br />

µn(K c b ) ≤ e−rnb for all n ∈ N.<br />

Note that if {µn} is exp<strong>on</strong>entially tight <strong>with</strong> normalizati<strong>on</strong> rn ↗ ∞,<br />

then {µn} is tight; see Appendix A.1 for the definiti<strong>on</strong> of tightness of a<br />

family of measures.<br />

19


20 3. More generalities <strong>an</strong>d Cramér’s theorem<br />

Theorem 3.3. Suppose weak LDP(µn, rn, I) holds <strong>an</strong>d {µn} is exp<strong>on</strong>entially<br />

tight <strong>with</strong> normalizati<strong>on</strong> rn. Then, LDP(µn, rn, I) holds <strong>an</strong>d I is a tight rate<br />

functi<strong>on</strong>.<br />

Proof. Let F be a closed set. Write<br />

1<br />

1<br />

lim log µn(F ) ≤ lim log(µn(F ∩ Kb) + µn(K<br />

n→∞ rn<br />

n→∞ rn<br />

c b ))<br />

<br />

1<br />

<br />

≤ max − b, lim log µn(F ∩ Kb)<br />

n→∞ rn<br />

<br />

<br />

≤ max − b, − inf I<br />

F ∩Kb<br />

<br />

≤ max − b, − inf I .<br />

F<br />

Letting b increase <strong>to</strong> infinity proves the upper large deviati<strong>on</strong> bound (2.3).<br />

On the other h<strong>an</strong>d, the weak LDP already c<strong>on</strong>tains the lower large deviati<strong>on</strong><br />

bound (2.4). This, in turn, implies that<br />

inf<br />

Kc I ≥ − lim<br />

b+1 n→∞<br />

1<br />

rn<br />

log µn(K c b+1 ) ≥ b + 1.<br />

Hence, {I ≤ b} is a closed subset of Kb+1 <strong>an</strong>d is hence compact. <br />

Exp<strong>on</strong>ential tightness helps compute <strong>on</strong>e more case of large deviati<strong>on</strong>s<br />

explicitly.<br />

Exercise 3.4. Prove the large deviati<strong>on</strong> principle for the law of the sample<br />

me<strong>an</strong> Sn/n of <strong>an</strong> i.i.d. sequence of R d -valued normal r<strong>an</strong>dom variables (<strong>with</strong><br />

me<strong>an</strong> µ <strong>an</strong>d n<strong>on</strong>singular covari<strong>an</strong>ce matrix A).<br />

Hint: A formal computati<strong>on</strong> yields I(x) = (x − µ) · A −1 (x − µ)/2. Note<br />

that this is different from the <strong>on</strong>e-dimensi<strong>on</strong>al case in Exercise 2.5 because<br />

<strong>on</strong>e c<strong>an</strong>not use m<strong>on</strong>ot<strong>on</strong>icity of I <strong>an</strong>d split closed sets F in<strong>to</strong> a part below<br />

µ <strong>an</strong>d a part above µ.<br />

We end the secti<strong>on</strong> <strong>with</strong> <strong>an</strong> import<strong>an</strong>t exercise.<br />

∗ Exercise 3.5. For x ∈ X , define upper <strong>an</strong>d lower local rate functi<strong>on</strong>s by<br />

(3.1)<br />

<strong>an</strong>d<br />

(3.2)<br />

κ(x) = − inf<br />

G ∋ x : G open<br />

κ(x) = − inf<br />

G ∋ x : G open<br />

lim<br />

n→∞<br />

lim<br />

n→∞<br />

1<br />

rn<br />

1<br />

rn<br />

log µn(G)<br />

log µn(G).<br />

Show that if κ = κ = κ then the weak LDP holds <strong>with</strong> rate functi<strong>on</strong> κ.


3.2. Cramér’s theorem 21<br />

3.2. Cramér’s theorem<br />

While proving the large deviati<strong>on</strong> principle in several simple situati<strong>on</strong>s (e.g.<br />

Example 1.1 <strong>an</strong>d Exercises 2.5, 2.6, <strong>an</strong>d 3.4) <strong>on</strong>e notices similarities in the<br />

arguments. Indeed, these results fall under the umbrella of a general large<br />

deviati<strong>on</strong> principle for the sample me<strong>an</strong> Sn/n of i.i.d. r<strong>an</strong>dom variables.<br />

(Recall that Sn = X1 + . . . + Xn.) This result, called Cramér’s theorem,<br />

is <strong>on</strong>e of the central results of large deviati<strong>on</strong> theory <strong>an</strong>d certainly the <strong>on</strong>e<br />

most frequently applied.<br />

Cramér’s theorem is valid not <strong>on</strong>ly for real-valued r<strong>an</strong>dom variables,<br />

but also for R d -valued r<strong>an</strong>dom vec<strong>to</strong>rs, <strong>an</strong>d even for some infinite dimensi<strong>on</strong>al<br />

r<strong>an</strong>dom vec<strong>to</strong>rs. The infinite dimensi<strong>on</strong>al case will not be discussed<br />

in the book. In this secti<strong>on</strong> we first develop the upper bound for the<br />

<strong>on</strong>e-dimensi<strong>on</strong>al case through a series of exercises. Then we state the <strong>on</strong>edimensi<strong>on</strong>al<br />

Cramér’s theorem in full generality, <strong>an</strong>d explore it further in<br />

exercises. The noti<strong>on</strong> of c<strong>on</strong>vexity makes a somewhat premature but small<br />

appear<strong>an</strong>ce in this secti<strong>on</strong>, although it is not taken up seriously until Chapter<br />

5.<br />

Next we state the multidimensi<strong>on</strong>al theorem in full, <strong>an</strong>d prove parts of<br />

it. The final discussi<strong>on</strong> of Cramér’s theorem takes place later in Secti<strong>on</strong> 5.3.<br />

There, armed <strong>with</strong> some ideas from c<strong>on</strong>vex <strong>an</strong>alysis, we prove the multidimensi<strong>on</strong>al<br />

result under the natural assumpti<strong>on</strong> that the moment generating<br />

functi<strong>on</strong> is finite in a neighborhood of the origin.<br />

Let {Xn} be a sequence of i.i.d. real-valued r<strong>an</strong>dom variables, <strong>an</strong>d write<br />

X for <strong>an</strong>other r<strong>an</strong>dom variable <strong>with</strong> the same distributi<strong>on</strong>. Recall the moment<br />

generating functi<strong>on</strong> M(θ) = E[e θX ] for θ ∈ R. Observe that M(θ) > 0<br />

always <strong>an</strong>d it c<strong>an</strong> happen that M(θ) = ∞. Using Chebyshev’s inequality<br />

(page 15 of [15]) write,<br />

(3.3)<br />

(3.4)<br />

P {Sn ≥ nb} ≤ e −nθb E[e θSn ] = e −nθb M(θ) n , for θ ≥ 0,<br />

P {Sn ≤ na} ≤ e −nθa E[e θSn ] = e −nθa M(θ) n , for θ ≤ 0.<br />

From above we get immediately the upper bounds<br />

1<br />

lim<br />

n→∞ n log P {Sn ≥ nb} ≤ − sup{θb<br />

− log M(θ)},<br />

θ≥0<br />

1<br />

lim<br />

n→∞ n log P {Sn ≤ na} ≤ − sup{θa<br />

− log M(θ)}.<br />

θ≤0


22 3. More generalities <strong>an</strong>d Cramér’s theorem<br />

∗ Exercise 3.6. Suppose X has a finite me<strong>an</strong> ¯x = E[X]. Prove that if<br />

a ≤ ¯x ≤ b, then<br />

sup{θb<br />

− log M(θ)} = sup{θb<br />

− log M(θ)} <strong>an</strong>d<br />

θ≥0<br />

θ∈R<br />

sup{θa<br />

− log M(θ)} = sup{θa<br />

− log M(θ)}.<br />

θ≤0<br />

θ∈R<br />

Hint: Use Jensen’s inequality (page 14 of [15] or page 40 of [26]) <strong>to</strong> show<br />

that θb − log M(θ) ≤ 0 for θ < 0 <strong>an</strong>d θa − log M(θ) ≤ 0 for θ > 0.<br />

(3.5)<br />

Define<br />

I(x) = sup{θx<br />

− log M(θ)}.<br />

θ∈R<br />

∗ Exercise 3.7. Prove that I is lower semic<strong>on</strong>tinuous, c<strong>on</strong>vex, <strong>an</strong>d that if<br />

¯x = E[X] is finite then I achieves its minimum at ¯x <strong>with</strong> I(¯x) = 0.<br />

Hint: Note that I is a supremum over lower semic<strong>on</strong>tinuous <strong>an</strong>d c<strong>on</strong>vex<br />

functi<strong>on</strong>s. Show that I(x) ≥ 0 for all x, but by Jensen’s inequality (page 14<br />

of [15] or page 40 of [26]) I(¯x) ≤ 0.<br />

∗ Exercise 3.8. Suppose M(θ) < ∞ in some open neighborhood around<br />

the origin. Show that then ¯x is the unique zero of I: that is, x = ¯x implies<br />

I(x) > 0.<br />

Hint: For <strong>an</strong>y x > ¯x show that (log M(θ)) ′ < x for θ in some interval (0, δ).<br />

Exercise 3.9. Check that the formulas for I given in Example 1.1 <strong>an</strong>d<br />

Exercises 2.5 <strong>an</strong>d 2.6 match the new definiti<strong>on</strong> of I in (3.5).<br />

Exercise 3.7 <strong>to</strong>gether <strong>with</strong> the earlier observati<strong>on</strong>s shows that I(x) is<br />

n<strong>on</strong>increasing for x < ¯x <strong>an</strong>d n<strong>on</strong>decreasing for x > ¯x. In particular, if<br />

a ≤ ¯x ≤ b, then I(a) = infx≤a I(x) <strong>an</strong>d I(b) = infx≥b I(x). This proves the<br />

upper bound for the sets F = (−∞, a] <strong>an</strong>d F = [b, ∞) in the case where the<br />

me<strong>an</strong> is finite.<br />

Exercise 3.10. Assume d = 1. Prove that the sample me<strong>an</strong> Sn/n satisfies<br />

the upper large deviati<strong>on</strong> bound (2.3) <strong>with</strong> normalizati<strong>on</strong> n <strong>an</strong>d rate I<br />

defined in (3.5), <strong>with</strong> no further assumpti<strong>on</strong>s <strong>on</strong> the distributi<strong>on</strong>.<br />

Hint: The case of finite me<strong>an</strong> is almost d<strong>on</strong>e above. Then c<strong>on</strong>sider separately<br />

the cases where the me<strong>an</strong> is infinite <strong>an</strong>d where the me<strong>an</strong> does not<br />

exist.<br />

It turns out that in <strong>on</strong>e dimensi<strong>on</strong> the full LDP <strong>with</strong> rate functi<strong>on</strong> I<br />

is valid for i.i.d. r<strong>an</strong>dom variables <strong>with</strong>out <strong>an</strong>y assumpti<strong>on</strong>s whatsoever <strong>on</strong><br />

the distributi<strong>on</strong>. Let us state this theorem in its complete generality, even


3.2. Cramér’s theorem 23<br />

though we will not prove it. A proof c<strong>an</strong> be found in Dembo <strong>an</strong>d Zei<strong>to</strong>uni<br />

[8].<br />

Cramér’s theorem <strong>on</strong> R. Let {Xn} be a sequence of i.i.d. real-valued<br />

r<strong>an</strong>dom variables. Let µn be the law of the sample me<strong>an</strong> Sn/n. Then, the<br />

large deviati<strong>on</strong> principle LDP (µn, n, I) is satisfied <strong>with</strong> I defined in (3.5).<br />

While Cramér’s theorem is valid in general, it does not give much informati<strong>on</strong><br />

unless the variables have exp<strong>on</strong>entially decaying tails. This point is<br />

explored in the next exercise.<br />

Exercise 3.11. Let {Xi} be <strong>an</strong> i.i.d. real-valued sequence. Assume E[X 2 1 ] <<br />

∞ but, for all ε > 0, P {X1 > b} > e−εb for all large enough b. Show that<br />

(a) limn→∞ 1<br />

n log P {Sn/n > E[X1] + δ} = 0 for <strong>an</strong>y δ > 0.<br />

(b) The rate functi<strong>on</strong> is identically 0 <strong>on</strong> [E(X1), ∞).<br />

Hint: For (a), deduce<br />

P {Sn/n ≥ E[X1] + δ} ≥ P {Sn−1 ≥ (n − 1)E[X1]}P {X1 ≥ nδ + E[X1]}<br />

<strong>an</strong>d apply the central limit theorem (see page 112 of [15] or page 100 of<br />

[26]). For (b), first find M(θ) for θ > 0. Then observe that for θ ≤ 0 <strong>an</strong>d<br />

x ≥ E[X1],<br />

θx − log M(θ) ≤ θ(x − E[X1]) ≤ 0.<br />

Exercise 3.12. Let {Xi} be <strong>an</strong> i.i.d. real-valued sequence. Prove that the<br />

closure of the set {I < ∞} is the same as the closure of the c<strong>on</strong>vex hull of<br />

the support of the law of X.<br />

Hint: Let K be the latter set <strong>an</strong>d y /∈ K. To show that I(y) = ∞, find<br />

θ ∈ R such that θy − ε > sup x∈K xθ. For the other directi<strong>on</strong>, take y in the<br />

interior of {I = ∞}. To get y ∈ K, show first that φy(θ) = θy − log M(θ)<br />

c<strong>on</strong>verges <strong>to</strong> infinity as θ → ∞ or −∞. Assume the former. Show that<br />

for some ε, |x − y| ≤ ε implies φx(θ) → ∞ as θ → ∞. Then, for θ > 0,<br />

θ(y − ε) − log M(θ) ≤ − log µ{x : |x − y| ≤ ε}. Let θ → ∞.<br />

The informati<strong>on</strong> c<strong>on</strong>tained in Cramér’s theorem is quite crude because<br />

<strong>on</strong>ly the exp<strong>on</strong>entially decaying terms of a full exp<strong>an</strong>si<strong>on</strong> affect the result.<br />

In some cases <strong>on</strong>e c<strong>an</strong> easily derive much more precise asymp<strong>to</strong>tics.<br />

Exercise 3.13. Prove that if {Xk} are i.i.d. st<strong>an</strong>dard normal, then for <strong>an</strong>y<br />

k ∈ N <strong>an</strong>d a > 0<br />

log P {Sn ≥ <strong>an</strong>} ∼ − a2n 1<br />

−<br />

2 2 log(2πna2 )<br />

<br />

+ log 1 − 1<br />

a2 1 × 3<br />

+<br />

n a4 1 × 3 × · · · × (2k − 1)<br />

− · · · + (−1)k<br />

n2 a2knk <br />

.


24 3. More generalities <strong>an</strong>d Cramér’s theorem<br />

Hint: Observe that<br />

d<br />

<br />

e<br />

dx<br />

−x2 n<br />

/2<br />

(−1) j (1×3×· · ·×(2k−1))x<br />

j=0<br />

<br />

−2k−1 < −e−x2 /2 if n is even,<br />

> −e x2 /2 if n is odd.<br />

Exercise 3.14. Derive Cramér rates for some basic distributi<strong>on</strong>s.<br />

(a) For real α > 0, the rate α exp<strong>on</strong>ential distributi<strong>on</strong> has density f(x) =<br />

αe −αx <strong>on</strong> R+. Derive the Cramér rate functi<strong>on</strong><br />

I(x) = αx − 1 − log αx for x > 0.<br />

(b) For real λ > 0, the me<strong>an</strong> λ Poiss<strong>on</strong> distributi<strong>on</strong> has probability mass<br />

functi<strong>on</strong> p(k) = e −λ λ k /k! for k ∈ Z+. Derive the Cramér rate functi<strong>on</strong><br />

I(x) = x log(x/λ) − x + λ for x ≥ 0.<br />

We turn <strong>to</strong> discuss Cramér’s theorem in multiple dimensi<strong>on</strong>s. When<br />

{Xn} are R d -valued, the moment generating functi<strong>on</strong> is given by M(θ) =<br />

E[e θ·X ] for θ ∈ R d . Again, M(θ) ∈ (0, ∞]. Define<br />

(3.6)<br />

I(x) = sup<br />

θ∈Rd {θ · x − log M(θ)}.<br />

Exercise 3.15. Check that Exercises 3.7 <strong>an</strong>d 3.8 apply <strong>to</strong> the multidimensi<strong>on</strong>al<br />

case as well.<br />

The full LDP of the <strong>on</strong>e-dimensi<strong>on</strong>al Cramér theorem does not generalize<br />

<strong>to</strong> multiple dimensi<strong>on</strong>s <strong>with</strong>out additi<strong>on</strong>al assumpti<strong>on</strong>s <strong>on</strong> the finiteness of<br />

the moment generating functi<strong>on</strong>; see [10] for counterexamples. The following<br />

is proved in [8]; see Corollary 6.1.6 therein.<br />

Cramér’s theorem <strong>on</strong> R d . Let {Xn} be a sequence of i.i.d. R d -valued r<strong>an</strong>dom<br />

variables <strong>an</strong>d let µn be the law of the sample me<strong>an</strong> Sn/n. Then <strong>with</strong>out<br />

further assumpti<strong>on</strong>s weak LDP(µn, n, I) holds <strong>with</strong> I defined in (3.6). If,<br />

moreover, M(θ) < ∞ in a neighborhood of 0, then LDP(µn, n, I) holds <strong>an</strong>d<br />

I is a tight rate functi<strong>on</strong>.<br />

At this point, we prove the upper bound for compact sets <strong>with</strong>out assumpti<strong>on</strong>s<br />

<strong>on</strong> M <strong>an</strong>d then the tightness of I under the assumpti<strong>on</strong> that M<br />

is finite near the origin. Then we give a simple proof of the lower bound<br />

under the restrictive assumpti<strong>on</strong><br />

(3.7)<br />

M(θ) < ∞ for all θ ∈ R d <strong>an</strong>d |θ| −1 log M(θ) → ∞ as |θ| → ∞.<br />

Both these proofs allow us <strong>to</strong> introduce import<strong>an</strong>t techniques. Assumpti<strong>on</strong><br />

(3.7) ensures that the supremum in (3.6) is achieved. This is precisely the<br />

issue that <strong>on</strong>e c<strong>an</strong> overcome <strong>with</strong>out <strong>an</strong>y assumpti<strong>on</strong>s when d = 1. In<br />

Secti<strong>on</strong> 5.3 we revisit the theorem <strong>an</strong>d prove its final versi<strong>on</strong> where M is<br />

assumed <strong>to</strong> be finite in a neighborhood of the origin.


3.2. Cramér’s theorem 25<br />

Proof of the upper bound for compacts <strong>an</strong>d of the tightness of I.<br />

The upper large deviati<strong>on</strong> bound (2.3) is proved in several steps.<br />

First, take <strong>an</strong>y bounded Borel set C <strong>an</strong>d write, for <strong>an</strong>y θ ∈ R d ,<br />

P {Sn/n ∈ C} = E[1I{Sn/n ∈ C}] ≤ e − infy∈C nθ·y E[e θ·Sn ]<br />

= e −n infy∈C θ·y M(θ) n .<br />

This shows that<br />

1<br />

n log P {Sn/n ∈ C} ≤ − sup inf {θ · y − log M(θ)}.<br />

θ y∈C<br />

We now need <strong>to</strong> interch<strong>an</strong>ge the sup <strong>an</strong>d the inf. This c<strong>an</strong> be d<strong>on</strong>e if C is a<br />

compact c<strong>on</strong>vex set.<br />

Minimax theorem <strong>on</strong> R d . Let C ⊂ R d be compact <strong>an</strong>d c<strong>on</strong>vex. Let D ⊂<br />

R d be c<strong>on</strong>vex. Let f : C × D → R be such that for each y ∈ C, f(y, ·) is<br />

c<strong>on</strong>cave <strong>an</strong>d c<strong>on</strong>tinuous, <strong>an</strong>d for each θ ∈ D, f(·, θ) is c<strong>on</strong>vex. Then<br />

sup inf f(y, θ) = inf<br />

y∈C y∈C sup f(y, θ).<br />

θ∈D<br />

This theorem is a special case of the more general minimax theorem<br />

proved in Appendix B.2. To have a feeling for the above theorem think<br />

of a horse saddle in R 3 . We have a smooth functi<strong>on</strong> that is c<strong>on</strong>vex in <strong>on</strong>e<br />

directi<strong>on</strong> <strong>an</strong>d c<strong>on</strong>cave in the other. Taking sup in the c<strong>on</strong>cave directi<strong>on</strong> <strong>an</strong>d<br />

inf in the c<strong>on</strong>vex <strong>on</strong>e will result in the saddle point regardless of the order.<br />

Observe that D = {θ : M(θ) < ∞} is c<strong>on</strong>vex <strong>an</strong>d f(y, θ) = θ·y−log M(θ)<br />

satisfies the assumpti<strong>on</strong>s of the above theorem. Thus, we have that the upper<br />

bound (even <strong>with</strong>out taking <strong>an</strong>y limits) is satisfied for compact c<strong>on</strong>vex sets.<br />

Next, let K be <strong>an</strong>y compact set <strong>an</strong>d let α ≤ infK I. Fix ε > 0. Since I<br />

is lower semic<strong>on</strong>tinuous {I ≤ α − ε} is closed <strong>an</strong>d <strong>on</strong>e c<strong>an</strong> find, for each<br />

x ∈ K ⊂ {I > α − ε}, a closed ball Cx ⊂ {I > α − ε} <strong>an</strong>d hence such<br />

that infCx I ≥ α − ε. One c<strong>an</strong> cover K <strong>with</strong> a finite collecti<strong>on</strong> of such balls<br />

Cx1 , . . . , CxN . Then, applying the upper bound <strong>to</strong> these compact c<strong>on</strong>vex<br />

sets <strong>on</strong>e has,<br />

N<br />

P {Sn/n ∈ K} ≤ P {Sn/n ∈ Cxi }<br />

i=1<br />

θ∈D<br />

≤ N max<br />

1≤i≤N P {Sn/n ∈ Cxi } ≤ Ne−n(α−ε) .<br />

Taking n → ∞ then ε → 0 then α → infK I, the upper large deviati<strong>on</strong><br />

bound (2.3) follows for the compact set K. We have thus verified that the<br />

upper bound in the weak LDP(µn, n, I) holds.<br />

Next, we verify exp<strong>on</strong>ential tightness under the assumpti<strong>on</strong> that M is<br />

finite near the origin, after which applying Theorem 3.3 would c<strong>on</strong>clude the


26 3. More generalities <strong>an</strong>d Cramér’s theorem<br />

proof. To this end, from (3.3) <strong>an</strong>d (3.4) it follows that for <strong>an</strong>y b > 0 there<br />

exists <strong>an</strong> a = a(b) > 0 such that<br />

P {|S (i)<br />

n | ≥ na} ≤ e−bn<br />

, for i = 1, 2, . . . , d, <strong>an</strong>d all n ∈ N.<br />

d<br />

Here for y ∈ R d , y (i) denotes its i-th coordinate. Therefore, Definiti<strong>on</strong> 3.2<br />

is satisfied <strong>with</strong> rn = n <strong>an</strong>d Kb = {y : |y (i) | ≤ a(b) for all i = 1, . . . , d}. <br />

Proof of Cramér’s lower bound under (3.7). This proof introduces the<br />

classical method of ch<strong>an</strong>ge of measure for proving LDP lower bounds.<br />

Let G be <strong>an</strong> open set <strong>an</strong>d fix x ∈ G <strong>an</strong>d ε > 0 such that {y : |y − x| <<br />

ε} ⊂ G.<br />

Hölder’s inequality (see page 15 of [15] or page 39 of [26]) implies that<br />

log M(θ) is a c<strong>on</strong>vex functi<strong>on</strong>: taking p = 1/t <strong>an</strong>d q = 1/(1 − t)<br />

M(tθ1 + (1 − t)θ2) = E[e tθ1·X e (1−t)θ2·X ]<br />

≤ E[e θ1·X ] t E[e θ2·X ] 1−t = M(θ1) t M(θ2) 1−t .<br />

Dominated c<strong>on</strong>vergence (see page 16 of [15] or page 46 of [26]) implies that<br />

M(θ) is everywhere differentiable. C<strong>on</strong>sidering also (3.7) shows that θ · x −<br />

log M(θ) is a c<strong>on</strong>cave differentiable functi<strong>on</strong> that achieves its maximum I(x)<br />

at some θx. Furthermore, it must be the case that ∇M(θx) = xM(θx).<br />

Define the probability measure νx <strong>on</strong> R d by<br />

νx(A) =<br />

1<br />

M(θx) E[eθx·X 1I{X ∈ A}].<br />

Once again, dominated c<strong>on</strong>vergence implies that E[e θx·X |X|] < ∞ <strong>an</strong>d<br />

E[e θx·X X] = ∇M(θx) = xM(θx);<br />

i.e. νx has me<strong>an</strong> x. Let Qx be the law of the i.i.d. sequence {Xn} <strong>with</strong><br />

marginals νx. Write,<br />

P {Sn/n ∈ G} ≥ P {|Sn − nx| < εn}<br />

≥ e −nθx·x−nε|θx| E[e θx·Sn 1I{|Sn − nx| < εn}]<br />

= e −nθx·x−nε|θx| M(θx) n Qx{|Sn − nx| < εn}.<br />

The law of large numbers implies that Qx{|Sn − nx| < εn} → 1. Thus,<br />

1<br />

lim<br />

n→∞ n log P {Sn/n ∈ G} ≥ −I(x) − ε|θx|.<br />

Taking ε → 0 then sup over x ∈ G proves the lower large deviati<strong>on</strong> bound<br />

(2.4).


3.3. Limits, deviati<strong>on</strong>s, <strong>an</strong>d fluctuati<strong>on</strong>s 27<br />

3.3. Limits, deviati<strong>on</strong>s, <strong>an</strong>d fluctuati<strong>on</strong>s<br />

Let Yn be a sequence of r<strong>an</strong>dom variables <strong>with</strong> values in a metric space<br />

(X , d) <strong>an</strong>d let µn be the law of Yn: µn(B) = P {Yn ∈ B}. Naturally <strong>an</strong> LDP<br />

for the sequence {µn} is related <strong>to</strong> the asymp<strong>to</strong>tic behavior of Yn. Suppose<br />

LDP(µn, rn, I) holds <strong>an</strong>d Yn → ¯y in probability. Then the limit ¯y does not<br />

represent a deviati<strong>on</strong>. The rate functi<strong>on</strong> I recognizes this <strong>with</strong> the value<br />

I(¯y) = 0 that follows from the upper bound. For <strong>an</strong>y open neighborhood G<br />

of ¯y we have µn(G) → 1. C<strong>on</strong>sequently for the closure<br />

0 ≤ inf<br />

¯G<br />

I ≤ − lim r −1<br />

n log µn( ¯ G) = 0.<br />

Let G shrink down <strong>to</strong> ¯y. Lower semic<strong>on</strong>tinuity forces I(¯y) = 0.<br />

The reader should recognize though that the zero set of I does not<br />

necessarily represent limit values. It may simply be that the probability of<br />

a deviati<strong>on</strong> decays slower th<strong>an</strong> exp<strong>on</strong>entially which again leads <strong>to</strong> I = 0.<br />

Every rate functi<strong>on</strong> satisfies inf I = 0 as c<strong>an</strong> be seen by taking F = X in<br />

the upper large deviati<strong>on</strong> bound (2.3).<br />

On the other h<strong>an</strong>d, we c<strong>an</strong> deduce c<strong>on</strong>vergence of the r<strong>an</strong>dom variables<br />

from the LDP if the rate functi<strong>on</strong> has good properties. Assume that I is a<br />

tight rate functi<strong>on</strong> <strong>an</strong>d has a unique zero I(¯y) = 0. Let A = {y : d(y, ¯y) ≥ ε}.<br />

Compactness <strong>an</strong>d lower semic<strong>on</strong>tinuity ensure that the infimum u = infA I<br />

is achieved. Since ¯y ∈ A, it must be that u > 0. Then, for n large enough,<br />

the upper large deviati<strong>on</strong> bound (2.3) implies<br />

P {d(Yn, ¯y) ≥ ε} ≤ e −rn(infA I−u/2) = e −rnu/2 .<br />

Thus, Yn c<strong>on</strong>verges <strong>to</strong> ¯y in probability. If, moreover, rn grows fast enough,<br />

for example like n, then the Borel-C<strong>an</strong>telli lemma implies that Yn c<strong>on</strong>verges<br />

<strong>to</strong> ¯y a.s.<br />

For i.i.d. variables Cramér’s theorem should also be unders<strong>to</strong>od in relati<strong>on</strong><br />

<strong>with</strong> the central limit theorem (CLT). C<strong>on</strong>sider the case where M(θ) is<br />

finite in a neighborhood of the origin so that there is a finite me<strong>an</strong> ¯x = E[X]<br />

<strong>an</strong>d I(x) > 0 for x = ¯x (Exercise 3.8). Then Cramér’s theorem says that<br />

order 1 deviati<strong>on</strong>s of Sn/n from ¯x have exp<strong>on</strong>entially v<strong>an</strong>ishing probability:<br />

for each δ > 0 there exists a c<strong>on</strong>st<strong>an</strong>t c > 0 such that for large n<br />

P {|Sn/n − ¯x| ≥ δ} ≤ e −cn .<br />

By c<strong>on</strong>trast, the CLT tells us that small deviati<strong>on</strong>s of order n −1/2 c<strong>on</strong>verge<br />

<strong>to</strong> a limit distributi<strong>on</strong>: for r ∈ R,<br />

P {Sn/n − ¯x ≥ rn −1/2 } −→<br />

n→∞<br />

∞<br />

r<br />

e −s2 /2σ 2<br />

√ 2πσ 2 ds


28 3. More generalities <strong>an</strong>d Cramér’s theorem<br />

where σ 2 is the vari<strong>an</strong>ce of X. In the jarg<strong>on</strong> of the discipline this distincti<strong>on</strong><br />

is sometimes expressed by saying that the CLT describes fluctuati<strong>on</strong>s as<br />

opposed <strong>to</strong> deviati<strong>on</strong>s.<br />

There are also results about moderate deviati<strong>on</strong>s that fall between large<br />

deviati<strong>on</strong>s <strong>an</strong>d CLT fluctuati<strong>on</strong>s. For example, if d = 1 <strong>an</strong>d M is finite in a<br />

neighborhood of 0, then for <strong>an</strong>y α ∈ (0, 1/2) <strong>on</strong>e has<br />

n −2α log P {|Sn/n − ¯x| ≥ δn −1/2+α } −→ −<br />

n→∞ δ2<br />

. 2<br />

2σ<br />

In Part III of the book in Secti<strong>on</strong>s 12.1 <strong>an</strong>d 12.2 we discuss refinements <strong>to</strong><br />

Cramér’s theorem <strong>an</strong>d moderate deviati<strong>on</strong>s.


Yet some more<br />

generalities<br />

4.1. C<strong>on</strong>tracti<strong>on</strong> principle<br />

Chapter 4<br />

Sometimes the problem at h<strong>an</strong>d c<strong>an</strong> be formulated as a mapping of <strong>an</strong>other<br />

problem <strong>on</strong> a different space. It is reas<strong>on</strong>able that <strong>an</strong> LDP for the latter<br />

tr<strong>an</strong>sfers <strong>to</strong> <strong>on</strong>e for the former <strong>with</strong> the rate functi<strong>on</strong> at point y being the<br />

smallest value of the original rate functi<strong>on</strong> at preimages of y. This is what<br />

the c<strong>on</strong>tracti<strong>on</strong> principle (or push-forward principle) is about.<br />

C<strong>on</strong>tracti<strong>on</strong> principle. Suppose f is a c<strong>on</strong>tinuous functi<strong>on</strong> from X <strong>to</strong> Y,<br />

two Hausdorff <strong>to</strong>pological spaces, <strong>an</strong>d LDP(µn, rn, I) holds <strong>on</strong> X . Define<br />

νn = µn ◦ f −1 ∈ M1(Y); i.e. νn(A) = µn(f −1 (A)). Define also<br />

J(y) = inf<br />

f(x)=y I(x),<br />

<strong>with</strong> the c<strong>on</strong>venti<strong>on</strong> that the inf over <strong>an</strong> empty set is infinite. Let J be the<br />

lower semic<strong>on</strong>tinuous regularizati<strong>on</strong> of J. That is,<br />

<br />

<br />

J(y) = sup inf J : y ∈ G, G is open .<br />

G<br />

(a) LDP(νn, rn, J) holds <strong>on</strong> Y.<br />

(b) If I is tight, then J ≡ J is tight as well.<br />

Proof. By Lemma 2.13 it suffices <strong>to</strong> prove that J satisfies the large deviati<strong>on</strong><br />

bounds (2.3) <strong>an</strong>d (2.4). Take a closed set F ⊂ Y. Then<br />

lim<br />

n→∞<br />

1<br />

rn<br />

log µn(f −1 (F )) ≤ − inf<br />

x∈f −1 I(x) = − inf<br />

(F ) y∈F inf I(x) = − inf<br />

f(x)=y y∈F<br />

J(y).<br />

29


30 4. Yet some more generalities<br />

The lower bound is proved similarly <strong>an</strong>d (a) follows.<br />

Assume now that I is tight. Observe that if J(y) < ∞, then f −1 (y) is<br />

n<strong>on</strong>empty <strong>an</strong>d closed, <strong>an</strong>d the nested n<strong>on</strong>empty compact sets {I ≤ J(y) +<br />

1/n} ∩ f −1 (y) have a n<strong>on</strong>empty intersecti<strong>on</strong>. Hence J(y) = I(x) for some<br />

x ∈ f −1 (y). C<strong>on</strong>sequently, { J ≤ c} = f({I ≤ c}) is a compact subset of Y.<br />

In particular, { J ≤ c} is closed <strong>an</strong>d J is lower semic<strong>on</strong>tinous <strong>an</strong>d is hence<br />

identical <strong>to</strong> J. <br />

Note that if the rate functi<strong>on</strong> I is not tight, then J may fail <strong>to</strong> be lower<br />

semic<strong>on</strong>tinuous (<strong>an</strong>d hence J ≡ J).<br />

Exercise 4.1. Let X = R <strong>an</strong>d µn(dx) = φn(x)dx where<br />

φn(x) = nx −2 e 1−n/x 1I (0,n)(x).<br />

Show that {µn} are not tight <strong>on</strong> R but LDP(µn, n, I) holds <strong>with</strong> I(x) = x −1<br />

for x > 0 <strong>an</strong>d infinite otherwise; see Appendix A.1 for the definiti<strong>on</strong> of<br />

tightness of measures. Note that I is not a tight rate functi<strong>on</strong>.<br />

∗ Exercise 4.2. Let f : R → S 1 = {y ∈ C : |y| = 1} be f(x) = e 2πi x<br />

x+1<br />

<strong>an</strong>d νn = µn ◦ f −1 , <strong>with</strong> µn defined in the previous exercise. Prove that<br />

J(z) = inf f(x)=z I(x) is not lower semic<strong>on</strong>tinuous <strong>an</strong>d that {νn} are tight <strong>an</strong>d<br />

c<strong>on</strong>verge weakly <strong>to</strong> δ1. Prove also that LDP(νn, n, J) holds <strong>with</strong> J(e 2πit ) =<br />

1−t<br />

t for t ∈ (0, 1].<br />

The simplest situati<strong>on</strong> when the c<strong>on</strong>tracti<strong>on</strong> principle is applied is when<br />

X is a subspace of Y.<br />

∗ Exercise 4.3. Suppose LDP(µn, rn, I) holds <strong>on</strong> X <strong>an</strong>d that X is a Hausdorff<br />

<strong>to</strong>pological space c<strong>on</strong>tained in the larger Hausdorff <strong>to</strong>pological space<br />

Y. Find J so that LDP(µn, rn, J) holds <strong>on</strong> Y. What happens when I is<br />

tight <strong>on</strong> X ?<br />

Hint: A natural way <strong>to</strong> extend I is <strong>to</strong> simply set it <strong>to</strong> infinity outside X .<br />

However, as dem<strong>on</strong>strated in the previous exercise, this may not be lower<br />

semic<strong>on</strong>tinuous.<br />

Example 4.4. Let X = {0, 1}. Let {Xn} be <strong>an</strong> i.i.d. sequence of X -valued<br />

r<strong>an</strong>dom variables <strong>with</strong> comm<strong>on</strong> distributi<strong>on</strong> p δ1 + (1 − p)δ0, for p ∈ (0, 1).<br />

(This distributi<strong>on</strong> is called the Bernoulli distributi<strong>on</strong> <strong>an</strong>d is usually denoted<br />

by BER(p)). We have seen in Example 2.3 that if µn is the law of Sn/n =<br />

(X1+· · ·+Xn)/n, then LDP(µn, n, Ip) holds <strong>with</strong> Ip given by (1.1). C<strong>on</strong>sider<br />

now the so-called empirical measures<br />

Ln = 1<br />

n<br />

n<br />

k=1<br />

δXk .


4.2. Varadh<strong>an</strong>’s theorem <strong>an</strong>d Bryc’s theorem 31<br />

Ln is a r<strong>an</strong>dom variable in Y = M1(X ). Let νn be its law. These empirical<br />

measures usually c<strong>on</strong>tain more informati<strong>on</strong> th<strong>an</strong> just the sample me<strong>an</strong> Sn/n.<br />

In our case however<br />

Ln = Sn<br />

n δ1<br />

<br />

+ 1 − Sn<br />

<br />

δ0.<br />

n<br />

Hence, if <strong>on</strong>e defines f : X → Y by f(s) = sδ1+(1−s)δ0, then νn = µn◦f −1 .<br />

The c<strong>on</strong>tracti<strong>on</strong> principle allows us <strong>to</strong> c<strong>on</strong>clude a large deviati<strong>on</strong> principle<br />

for νn. The corresp<strong>on</strong>ding rate functi<strong>on</strong> is given for α ∈ M1(X ) by<br />

H(α) = Ip(s) for α = sδ1 + (1 − s)δ0 <strong>with</strong> s ∈ [0, 1].<br />

We will see in Chapter 6 that H(α) is a relative entropy <strong>an</strong>d that the LDP<br />

for the empirical measure holds in general for i.i.d. processes.<br />

4.2. Varadh<strong>an</strong>’s theorem <strong>an</strong>d Bryc’s theorem<br />

A familiar fact about moment generating functi<strong>on</strong>s is that for a bounded<br />

measurable functi<strong>on</strong> f : X → R, a probability measure µ, <strong>an</strong>d a sequence<br />

rn → ∞,<br />

Hence<br />

c = µ-ess sup f ≥ 1<br />

rn log<br />

<br />

≥ 1 log µ{f > c − ε} + c − ε.<br />

rn<br />

1 lim<br />

n→∞ rn log<br />

<br />

e rnf dµ ≥ 1<br />

rn log<br />

<br />

e<br />

f>c−ε<br />

rnf dµ<br />

e rnf dµ = µ-ess sup f.<br />

(µ-ess sup f = inf{b : µ(f > b) = 0}.) But what happens when µ is replaced<br />

by a sequence µn? If {µn} satisfies a large deviati<strong>on</strong> principle <strong>with</strong><br />

normalizati<strong>on</strong> rn, then the rate functi<strong>on</strong> I comes in<strong>to</strong> the picture. The result<br />

is known as Varadh<strong>an</strong>’s theorem. It is a probabilistic <strong>an</strong>alogue of the<br />

well-known Laplace method for asymp<strong>to</strong>tics of integrals illustrated by the<br />

next simple exercise.<br />

Exercise 4.5. (Stirling’s formula) Use inducti<strong>on</strong> <strong>to</strong> show that<br />

n! =<br />

∞<br />

e<br />

0<br />

−x x n dx.<br />

Observe that e −x x n has a unique maximum at x = n. Prove that<br />

lim<br />

n→∞<br />

n!<br />

√ 2πn e −n n n<br />

= 1.<br />

Hint: Show that the main c<strong>on</strong>tributi<strong>on</strong> <strong>to</strong> the integral comes from x ∈<br />

[n − ε, n + ε] <strong>an</strong>d use Taylor’s exp<strong>an</strong>si<strong>on</strong>.


32 4. Yet some more generalities<br />

Varadh<strong>an</strong>’s theorem. Suppose LDP(µn, rn, I) holds, f : X → [−∞, ∞) is<br />

a c<strong>on</strong>tinuous functi<strong>on</strong>, <strong>an</strong>d<br />

lim<br />

b→∞ lim 1<br />

n→∞ rn log<br />

<br />

e<br />

f≥b<br />

rnf (4.1)<br />

dµn = −∞.<br />

Then<br />

1 lim<br />

n→∞ rn log<br />

<br />

e rnf dµn = sup(f − I).<br />

Exercise 4.6. Show that c<strong>on</strong>diti<strong>on</strong> (4.1) is satisfied when there exists <strong>an</strong><br />

α > 1 such that<br />

<br />

sup<br />

n<br />

<br />

e αrnf 1/rn dµn < ∞.<br />

This in turn is satisfied, for example, when f is bounded above.<br />

The general idea behind the proof of the theorem is similar <strong>to</strong> the <strong>on</strong>e<br />

used when µn = µ is independent of n. Namely, if <strong>on</strong>e partiti<strong>on</strong>s the space<br />

in<strong>to</strong> sets Ui <strong>on</strong> which µn(Ui) ∼ e−rnI(xi) for xi ∈ Ui, then<br />

<br />

1<br />

rn log<br />

e rnf dµn ∼ 1<br />

rn log e nf(xi) µn(Ui)<br />

∼ 1<br />

rn log e n[f(xi)−I(xi)]<br />

∼ max[f(xi)<br />

− I(xi)].<br />

The actual proof follows from the next two lemmas.<br />

Lemma 4.7. Let f : X → [−∞, ∞] be a lower semic<strong>on</strong>tinuous functi<strong>on</strong>.<br />

Assume {µn} satisfies the lower large deviati<strong>on</strong> bound (2.4) <strong>with</strong> rate functi<strong>on</strong><br />

I <strong>an</strong>d normalizati<strong>on</strong> rn. Then,<br />

<br />

1 lim rn<br />

n→∞<br />

log<br />

e rnf dµn ≥ sup (f − I).<br />

f∧I −∞. Let<br />

−∞ < c < f(x). Then G = {f > c} c<strong>on</strong>tains x, is open, <strong>an</strong>d is such that<br />

infG f ≥ c. Write<br />

1 lim rn<br />

n→∞<br />

log<br />

<br />

e rnf dµn ≥ lim<br />

n→∞<br />

1<br />

rn log<br />

<br />

≥ c + lim<br />

n→∞<br />

1<br />

rn<br />

G<br />

i<br />

e rnf dµn<br />

log µn(G)<br />

≥ c − inf<br />

G I ≥ c − I(x).<br />

Now let c grow <strong>to</strong> f(x) then take sup over x. <br />

Lemma 4.8. Let f : X → [−∞, ∞] be <strong>an</strong> upper semic<strong>on</strong>tinuous functi<strong>on</strong><br />

(i.e. −f is lower semic<strong>on</strong>tinuous). Assume {µn} satisfies the upper large


4.2. Varadh<strong>an</strong>’s theorem <strong>an</strong>d Bryc’s theorem 33<br />

deviati<strong>on</strong> bound (2.3) <strong>with</strong> rate functi<strong>on</strong> I <strong>an</strong>d normalizati<strong>on</strong> rn. Assume<br />

also that<br />

lim<br />

b→∞ lim 1<br />

n→∞ rn log<br />

<br />

e<br />

f≥b<br />

rnf dµn = −∞.<br />

Then,<br />

1 lim<br />

n→∞ rn log<br />

<br />

e rnf dµn ≤ sup (f − I).<br />

f∧I


34 4. Yet some more generalities<br />

Varadh<strong>an</strong>’s theorem tells us also where the largest c<strong>on</strong>tributi<strong>on</strong>s <strong>to</strong> the<br />

integrals e rnf dµn asymp<strong>to</strong>tically come from.<br />

∗ Exercise 4.9. Suppose LDP(µn, rn, I) holds <strong>an</strong>d f is a bounded c<strong>on</strong>tinuous<br />

functi<strong>on</strong> <strong>on</strong> X . Define the probability measures νn by<br />

νn(A) = Eµn [e rnf 1IA]<br />

E µn[e rnf ]<br />

Prove that LDP(νn, rn, J) holds, where<br />

J(x) = I(x) − f(x) − inf(I − f).<br />

Hint: For F closed, replacing f by −∞ outside F gives <strong>an</strong> upper semic<strong>on</strong>tinuous<br />

functi<strong>on</strong>. Similarly, for G open, replacing f by −∞ outside G gives<br />

a lower semic<strong>on</strong>tinuous functi<strong>on</strong>.<br />

Thus, the minimizers of the new rate functi<strong>on</strong> J are the largest c<strong>on</strong>tributers<br />

<strong>to</strong> the integrals e rnf dµn . These are precisely the maximizers of<br />

f − I.<br />

As we will see in the next example, Varadh<strong>an</strong>’s theorem c<strong>an</strong> provide <strong>an</strong><br />

expl<strong>an</strong>ati<strong>on</strong> of the following interesting situati<strong>on</strong>.<br />

∗ Exercise 4.10. Let Zk be i.i.d. <strong>with</strong> P {Zk = 1} = P {Zk = 2} = 1/2. Set<br />

Wn = Z1Z2 · · · Zn. Then, E[Wn] = (3/2) n , but show that Wn grows like<br />

(3/2) n <strong>with</strong> exp<strong>on</strong>entially small probability, or more precisely, that for small<br />

enough ε > 0, P {Wn > (3/2 − ε) n } c<strong>on</strong>verges <strong>to</strong> 0 exp<strong>on</strong>entially fast. Show<br />

that the typical growth rate is √ 2, in the sense that P {( √ 2 − ε) n < Wn <<br />

( √ 2 + ε) n } c<strong>on</strong>verges <strong>to</strong> 1 exp<strong>on</strong>entially fast.<br />

Remark 4.11. Another interesting fact about Wn is that (2/3) n Wn → 0<br />

almost surely (by the str<strong>on</strong>g law of large numbers) while (2/3) np E[W p n] → ∞<br />

for <strong>an</strong>y p > 1. This is related <strong>to</strong> <strong>an</strong> import<strong>an</strong>t phenomen<strong>on</strong> called intermittency.<br />

For more about this see [? ].<br />

Example 4.12. Let Xk = log Zk, <strong>with</strong> Zk as above. Then, Wn = e Sn . What<br />

the above exercise shows is that even though paths {Xk} such that ( √ 2 −<br />

ε) n < Wn < ( √ 2 + ε) n have overwhelming probability, they nevertheless do<br />

not c<strong>on</strong>tribute the most <strong>to</strong> the expectati<strong>on</strong> E[Wn]. Taking products seems<br />

<strong>to</strong> have suppressed the role of these paths while amplifying the import<strong>an</strong>ce<br />

of others. This is precisely what Varadh<strong>an</strong>’s theorem explains:<br />

where<br />

log 3<br />

1<br />

2 = lim<br />

n→∞ n log E[Wn] 1 = lim<br />

n→∞ n log E[eSn ] = sup (x − I(x)),<br />

0≤x≤log 2<br />

I(x) = x x log 2 − x log 2 − x<br />

log + log + log 2<br />

log 2 log 2 log 2 log 2<br />

.


4.2. Varadh<strong>an</strong>’s theorem <strong>an</strong>d Bryc’s theorem 35<br />

is the Cramér rate for the law of Sn/n; see (1.1) <strong>an</strong>d use the c<strong>on</strong>tracti<strong>on</strong><br />

principle. The above supremum is attained at x = 2<br />

3 log 2 instead of E[X1] =<br />

log √ 2. Thus, the paths that c<strong>on</strong>tribute the most are the <strong>on</strong>es for which<br />

Sn/n ∼ 2<br />

3 log 2, i.e. Wn ∼ (22/3 ) n . They do occur <strong>with</strong> <strong>an</strong> exp<strong>on</strong>entially<br />

small probability, but when they do occur, they have <strong>an</strong> exp<strong>on</strong>entially large<br />

c<strong>on</strong>tributi<strong>on</strong>. The two bal<strong>an</strong>ce out <strong>to</strong> produce the me<strong>an</strong> value (3/2) n . LDP<br />

<strong>an</strong>d the rate functi<strong>on</strong> has a unique zero at x = 2<br />

3 log 2.<br />

The above exercises <strong>an</strong>d example point <strong>to</strong> a relati<strong>on</strong>ship between large<br />

deviati<strong>on</strong>s <strong>an</strong>d statistical mech<strong>an</strong>ics where <strong>on</strong>e encounters integrals of the<br />

form e nf dµn. The distributi<strong>on</strong>s νn in Exercise 4.9 are examples of <strong>Gibbs</strong><br />

measures. We will see more of this in the sec<strong>on</strong>d part of the course. The<br />

next Secti<strong>on</strong> 4.3 provides <strong>an</strong> appetizer in the c<strong>on</strong>text of a me<strong>an</strong>-field model.<br />

Varadh<strong>an</strong>’s theorem gives asymp<strong>to</strong>tics of integrals as a c<strong>on</strong>sequence of<br />

<strong>an</strong> LDP. Perhaps not surprisingly, knowing the asymp<strong>to</strong>tics of sufficiently<br />

m<strong>an</strong>y integrals is equivalent <strong>to</strong> <strong>an</strong> LDP. Let Cb(X ) denote the set of bounded<br />

<strong>an</strong>d c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> X .<br />

Bryc’s theorem. Let {µn} be a sequence of probability measures <strong>on</strong> a metric<br />

space X . Assume {µn} is exp<strong>on</strong>entially tight <strong>with</strong> normalizati<strong>on</strong> rn.<br />

Suppose the limit<br />

1<br />

Γ(f) = lim<br />

n→∞ rn log<br />

<br />

e rnf dµn<br />

exists for all bounded c<strong>on</strong>tinuous functi<strong>on</strong>s f. Then, LDP(µn, rn, I) holds<br />

<strong>with</strong> the tight rate functi<strong>on</strong><br />

(4.2)<br />

I(x) = sup<br />

f∈Cb(X )<br />

{f(x) − Γ(f)}.<br />

The above theorem reminds us again of weak c<strong>on</strong>vergence of probability<br />

measures.<br />

Of course, given the LDP, Varadh<strong>an</strong>’s theorem implies that Γ(f) =<br />

sup(f − I). Note, however, that the relati<strong>on</strong> between Γ <strong>an</strong>d I is not the<br />

c<strong>on</strong>vex duality we will see in Chapter 5, where the functi<strong>on</strong>s f are c<strong>on</strong>tinuous<br />

linear functi<strong>on</strong>s. Even though until this point we have <strong>on</strong>ly seen c<strong>on</strong>vex rate<br />

functi<strong>on</strong>s, the rate functi<strong>on</strong> in (4.2) need not be c<strong>on</strong>vex; recall Definiti<strong>on</strong><br />

2.14. In fact, the next secti<strong>on</strong> has <strong>an</strong> LDP <strong>with</strong> a n<strong>on</strong>c<strong>on</strong>vex rate functi<strong>on</strong>.<br />

Also, Exercise 5.27 shows how simple it is <strong>to</strong> come up <strong>with</strong> such a situati<strong>on</strong>.<br />

Proving that the limit Γ(f) exists for all bounded c<strong>on</strong>tinuous functi<strong>on</strong>s<br />

may be <strong>to</strong>o hard <strong>to</strong> achieve. However, for the LDP <strong>to</strong> hold <strong>on</strong>e really<br />

needs the limit <strong>to</strong> exist for a rich enough class of functi<strong>on</strong>s; see for example<br />

Theorem 4.4.10 of [8].<br />

If X a metric vec<strong>to</strong>r space, then a rich enough class of functi<strong>on</strong>s that<br />

would ensure the LDP via Bryc’s theorem is the class of c<strong>on</strong>cave Lipschitz


36 4. Yet some more generalities<br />

functi<strong>on</strong>s. For interesting applicati<strong>on</strong>s (traveling salesm<strong>an</strong>, minimal sp<strong>an</strong>ning<br />

trees, free energy of the short-r<strong>an</strong>ge spin glass model, etc) where large<br />

deviati<strong>on</strong> principles are proved via this route, see [39].<br />

Proof of Bryc’s theorem. Functi<strong>on</strong> I is lower semic<strong>on</strong>tinuous, since it is<br />

a sup over c<strong>on</strong>tinuous functi<strong>on</strong>s. Taking f = 0 shows I(x) ≥ 0. Since {µn}<br />

are exp<strong>on</strong>entially tight, we <strong>on</strong>ly need <strong>to</strong> prove the weak LDP. We start <strong>with</strong><br />

the lower bound.<br />

Let G be <strong>an</strong> open set <strong>an</strong>d fix x ∈ G. There is a functi<strong>on</strong> f : X → [0, 1]<br />

such that f(x) = 1 <strong>an</strong>d f v<strong>an</strong>ishes outside G. Take a > 0 <strong>an</strong>d define<br />

fa = a(f − 1). Then,<br />

<br />

e rnfa dµn ≤ e −arn + µn(G).<br />

Thus<br />

max{ lim<br />

n→∞<br />

1<br />

rn log µn(G), −a} ≥ lim<br />

n→∞<br />

Take a <strong>to</strong> infinity then sup over x.<br />

1<br />

rn log<br />

<br />

e rnfa dµn = Γ(fa)<br />

= −{fa(x) − Γ(fa)} ≥ −I(x).<br />

For the upper bound, let C be <strong>an</strong>y measurable set <strong>an</strong>d let f be a bounded<br />

c<strong>on</strong>tinuous functi<strong>on</strong>. Then,<br />

1 lim<br />

n→∞ rn log µn(C) 1 ≤ lim<br />

n→∞ rn log<br />

<br />

e −rn infC<br />

<br />

f<br />

e rnf <br />

dµn ≤ − inf f + Γ(f).<br />

C<br />

Since f is not necessarily c<strong>on</strong>vex a theorem like the <strong>on</strong>e <strong>on</strong> page 25 c<strong>an</strong>not<br />

be used. Let us proceed <strong>with</strong> the proof of the upper bound for <strong>an</strong> arbitrary<br />

compact set K. Fix<br />

c < inf I = inf<br />

K x∈K sup {f(x) − Γ(f)}.<br />

f∈Cb(X )<br />

Then, for each x ∈ K there exists a bounded c<strong>on</strong>tinuous functi<strong>on</strong> fx such<br />

that fx(x) − Γ(fx) > c. Since fx is c<strong>on</strong>tinuous, there exists <strong>an</strong> open neighborhood<br />

of x, say Bx, such that fx(y) − Γ(fx) > c for all y ∈ Bx. One c<strong>an</strong><br />

then cover K <strong>with</strong> a finite number of these neighborhoods, say Bx1 , . . . , BxN .<br />

But then,<br />

1 lim<br />

n→∞ rn log µn(K) ≤ max<br />

1≤k≤N lim<br />

≤ − min<br />

1≤k≤N<br />

n→∞<br />

<br />

1 log µn(Bxk )<br />

rn<br />

inf fxk<br />

Bxk <br />

− Γ(fxk ) ≤ −c.<br />

Taking c up <strong>to</strong> infK I proves the upper bound for compact sets. The theorem<br />

is proved.


4.3. Curie-Weiss model for ferromagnetism 37<br />

Exercise 4.13. Let ηk > 0 be <strong>an</strong>y sequence c<strong>on</strong>verging <strong>to</strong> 0, <strong>an</strong>d let ρk be<br />

a probability measure supported <strong>on</strong> the interval [−ηk, ηk]. Let {Xk} be a<br />

sequence of independent r<strong>an</strong>dom variables such that Xk has distributi<strong>on</strong> ρk.<br />

Let Sn be the partial sum as before <strong>an</strong>d µn the distributi<strong>on</strong> of Sn/n. Show<br />

LDP(µn, n, I) holds <strong>with</strong> rate functi<strong>on</strong> I(0) = 0, I(x) = ∞ for x = 0.<br />

We generalize this in Chapter 15. For example, the above result will c<strong>on</strong>tinue<br />

<strong>to</strong> hold under weak c<strong>on</strong>vergence ρk → δ0, as l<strong>on</strong>g as these distributi<strong>on</strong>s<br />

have a comm<strong>on</strong> compact support.<br />

But show that if Xk has distributi<strong>on</strong> P {Xk = 0} = 1 − ηk <strong>an</strong>d P {Xk =<br />

k} = ηk, then the above rate functi<strong>on</strong> does not work if ηk c<strong>on</strong>verges <strong>to</strong> zero<br />

slowly enough.<br />

4.3. Curie-Weiss model for ferromagnetism<br />

In this secti<strong>on</strong> we look at a simple model of sp<strong>on</strong>t<strong>an</strong>eous magnetizati<strong>on</strong> <strong>to</strong><br />

illustrate the usefulness of Cramér’s theorem <strong>an</strong>d Varadh<strong>an</strong>’s theorem in<br />

studying models from statistical mech<strong>an</strong>ics.<br />

For each n ∈ N, we have a model of n a<strong>to</strong>ms, j = 1, . . . , n, each of which<br />

has a spin ωi ∈ {−1, 1}. The space of spin c<strong>on</strong>figurati<strong>on</strong>s is denoted by<br />

Ωn = {−1, 1} n . The energy of the system is given by the Hamilt<strong>on</strong>i<strong>an</strong><br />

Hn(ω) = − J<br />

2n<br />

<br />

1≤i,j≤n<br />

ωiωj − h<br />

n<br />

ωj,<br />

where J > 0 <strong>an</strong>d h ∈ R are c<strong>on</strong>st<strong>an</strong>ts. (J > 0 corresp<strong>on</strong>ds <strong>to</strong> ferromagnetism<br />

<strong>an</strong>d h is the strength of the external magnetic field.)<br />

Nature prefers lower energy states, so in a ferromagnetic material the<br />

spins tend <strong>to</strong> be aligned <strong>an</strong>d follow the magnetic field, if there is <strong>an</strong>y (h = 0).<br />

The <strong>Gibbs</strong> measure for n spins is<br />

γn(ω) = 1<br />

Zn<br />

j=1<br />

e −βHn(ω) Pn(ω),<br />

where Pn(ω) = 2 −n is the a priori measure (ω1, . . . , ωn are i.i.d. fair coin<br />

flips), β > 0 is the inverse temperature, <strong>an</strong>d Zn = e −βHn dPn is the normalizati<strong>on</strong><br />

c<strong>on</strong>st<strong>an</strong>t called the partiti<strong>on</strong> functi<strong>on</strong>.<br />

Remark 4.14. The more realistic Ising model in a finite box Λ ⊂ Zd has<br />

Hamilt<strong>on</strong>i<strong>an</strong><br />

H Ising<br />

Λ (ω) = −J <br />

ωxωy − h <br />

ωx,<br />

x,y:x−y 1 =1<br />

where the summati<strong>on</strong> is over x ∈ Λ <strong>an</strong>d its nearest neighbors y. Curie-<br />

Weiss is called the me<strong>an</strong>-field approximati<strong>on</strong> of the Ising model because its<br />

x


38 4. Yet some more generalities<br />

Hamilt<strong>on</strong>i<strong>an</strong> satisfies<br />

Hn(ω) = − J<br />

2<br />

n<br />

i=1<br />

ωi<br />

<br />

1<br />

n<br />

n<br />

j=1<br />

ωj<br />

<br />

− h<br />

where the average in parentheses is the “me<strong>an</strong> field” of the other spins. In<br />

other words, in the Curie-Weiss model each pair of spins interacts <strong>with</strong> the<br />

same strength regardless of the dist<strong>an</strong>ce between them. This eliminati<strong>on</strong> of<br />

spatial structure makes the <strong>an</strong>alysis much easier. It is noteworthy, however,<br />

that physical experiments show that physical structure does matter, which<br />

makes the Ising model (Chapter 10), for example, far more accurate.<br />

Observe that if h = 0, limβ→∞ γn(ω) = 1<br />

2 (δω≡1 + δω≡−1). This says that<br />

at low temperature, in the absence of external fields, energy dominates <strong>an</strong>d<br />

<strong>on</strong>e gets complete order at the limit. On the other h<strong>an</strong>d, limβ→0 γn(ω) = Pn,<br />

which says that at high temperature, <strong>with</strong> no external field present, thermal<br />

moti<strong>on</strong> dominates <strong>an</strong>d <strong>on</strong>e gets complete disorder at the limit. This me<strong>an</strong>s<br />

that, at the two extremes β = 0 <strong>an</strong>d β = ∞, the finite volume system<br />

(n < ∞) behaves differently.<br />

Our goal is <strong>to</strong> see if this also happens in the limiting system of infinitely<br />

m<strong>an</strong>y spins (n → ∞). More precisely, we w<strong>an</strong>t <strong>to</strong> show that there exists<br />

a critical temperature 1/βc (called the Curie point) such that the infinite<br />

volume model behaves differently for β < βc <strong>an</strong>d β > βc.<br />

To achieve our goal we look at the behavior of the average spin Sn/n =<br />

1 n n i=1 ωi. In fact, this is the same as looking at the Hamilt<strong>on</strong>i<strong>an</strong>:<br />

−βHn(ω) = n Jβ<br />

<br />

Sn<br />

2<br />

+ nh<br />

2 n<br />

Sn<br />

n .<br />

Let µn be the law of Sn/n under Pn (i.e. ωi’s are i.i.d.) <strong>an</strong>d let νn be the<br />

law of Sn/n under γn. The latter is the measure we are interested in.<br />

Theorem 4.15. The following holds.<br />

(a) If h = 0 <strong>an</strong>d β ≤ 1/J, then νn c<strong>on</strong>verges weakly <strong>to</strong> δ0. In fact, for<br />

all ε > 0, γn{|Sn/n| ≥ ε} → 0 exp<strong>on</strong>entially fast.<br />

(b) If h = 0 <strong>an</strong>d β > 1/J, then νn c<strong>on</strong>verges weakly <strong>to</strong> 1<br />

2 (δ m(β,0) +<br />

δ −m(β,0)). In fact, if A is a closed set such that<br />

A ∩ {m(β, 0), −m(β, 0)} = ∅,<br />

then γn{Sn/n ∈ A} → 0 exp<strong>on</strong>entially fast. Here, z = ±m(β, 0)<br />

= Jβz.<br />

are the two n<strong>on</strong>zero soluti<strong>on</strong>s of 1<br />

2<br />

log 1+z<br />

1−z<br />

(c) If h = 0, then νn c<strong>on</strong>verges weakly <strong>to</strong> δm(β,h). In fact, for all ε > 0,<br />

γn{|Sn/n − m(β, h)| ≥ ε} → 0 exp<strong>on</strong>entially fast. Here, m(β, h) is<br />

the unique soluti<strong>on</strong> of 1 1+z<br />

2 log 1−z = Jβz + βh that has the same sign<br />

as h.<br />

n<br />

i=1<br />

ωi


4.3. Curie-Weiss model for ferromagnetism 39<br />

h<br />

0<br />

δm, m > 0<br />

1<br />

2 ( δ− m+ δm ) , m > 0<br />

T c = J<br />

δm, m < 0<br />

Figure 4.1. The Curie-Weiss phase diagram. T = 1/β <strong>an</strong>d m =<br />

m(β, h) is as in Theorem 4.15.<br />

When there is no external field, the theorem says that if the temperature<br />

is above J then neither −1 nor +1 gains the upper h<strong>an</strong>d <strong>an</strong>d “r<strong>an</strong>domness<br />

beats couplings”. In other words, when we heat up a magnet, it loses its<br />

magnetizati<strong>on</strong>. When, <strong>on</strong> the other h<strong>an</strong>d, the temperature is below J, “coupling<br />

tendency wins over noise” <strong>an</strong>d there are two possible limits for the<br />

sample me<strong>an</strong>, <strong>with</strong> equal probability. In other words, when cooled down a<br />

ferromagnetic material gets magnetized even in the absence of <strong>an</strong> external<br />

magnetic field. Of course, the ferromagnetic material also gets magnetized if<br />

we switch the external magnetic field <strong>on</strong>. The theorem then says that if the<br />

temperature is low <strong>an</strong>d we switch the magnetic field off, the magnet does<br />

not lose its magnetizati<strong>on</strong>. This is known as sp<strong>on</strong>t<strong>an</strong>eous magnetizati<strong>on</strong>.<br />

See Figure 4.1 for the phase diagram summarizing these results. For more<br />

<strong>on</strong> this, see the discussi<strong>on</strong> <strong>on</strong> page 93.<br />

Proof of Theorem 4.15. By Cramér’s theorem <strong>on</strong> R (page 23) we know<br />

that LDP(µn, n, I) holds <strong>with</strong> rate<br />

I(z) =<br />

1−z<br />

2<br />

log(1 − z) + 1+z<br />

2<br />

δ0<br />

T<br />

log(1 + z) if − 1 ≤ z ≤ 1,<br />

∞ otherwise.<br />

Exercise 4.16. Check the above formula for the rate functi<strong>on</strong> by direct<br />

computati<strong>on</strong> of (3.6) <strong>an</strong>d also by <strong>an</strong> applicati<strong>on</strong> of the c<strong>on</strong>tracti<strong>on</strong> principle<br />

<strong>to</strong> the Bernoulli case in Example 2.3.


40 4. Yet some more generalities<br />

Next write<br />

νn(B) = γn{Sn/n ∈ B} = 1<br />

= 1<br />

<br />

Zn<br />

= 1<br />

<br />

1IB(z)e<br />

Zn<br />

n<br />

<br />

Jβ<br />

<br />

1IB(Sn/n)e −βHn dPn<br />

<br />

Zn<br />

1IB(Sn/n)e n<br />

<br />

Jβ<br />

2 (Sn/n)2 +βh(Sn/n)<br />

2 z2 +βhz<br />

By Exercise 4.9, LDP(νn, n, I) holds <strong>with</strong> rate<br />

<br />

µn(dz).<br />

z<br />

dPn<br />

I(z) = I(z) − Jβ<br />

2 z2 Jβ<br />

− βhz − c, where c = inf{I(z)<br />

− 2 z2 − βhz}.<br />

Exercise 4.17. Prove the following.<br />

(a) If h = 0, then I is uniquely minimized at z = m(β, h), the unique<br />

= Jβz + βh that is of the same sign as h.<br />

soluti<strong>on</strong> of 1<br />

2<br />

log 1+z<br />

1−z<br />

(b) If h = 0 <strong>an</strong>d Jβ ≤ 1, then I is c<strong>on</strong>vex <strong>an</strong>d is uniquely minimized<br />

at z = 0. In other words, m(β, 0) = 0.<br />

(c) If h = 0 <strong>an</strong>d Jβ > 1, then I is not c<strong>on</strong>vex <strong>an</strong>d is minimized at<br />

= Jβz.<br />

{−m(β), +m(β)}, the two n<strong>on</strong>zero soluti<strong>on</strong>s of 1<br />

2<br />

log 1+z<br />

1−z<br />

Now, if h = 0 <strong>an</strong>d βJ ≤ 1, then for <strong>an</strong>y ε > 0, γn{|Sn/n| ≥ ε} → 0<br />

exp<strong>on</strong>entially fast because by the upper large deviati<strong>on</strong> bound (2.3),<br />

1<br />

lim<br />

n→∞ n log γn{|Sn/n| ≥ ε} ≤ − inf I(z) < 0.<br />

|z|≥ε<br />

This implies that νn c<strong>on</strong>verges weakly <strong>to</strong> δ0 <strong>an</strong>d c<strong>on</strong>cludes the proof of (a).<br />

(c) is proved similarly.<br />

If h = 0, βJ > 1, <strong>an</strong>d A is closed such that A ∩ {m(β), −m(β)} = ∅,<br />

then<br />

1<br />

lim<br />

n→∞ n log γn{Sn/n ∈ A} ≤ − inf I < 0<br />

A<br />

<strong>an</strong>d γn{Sn/n ∈ A} → 0 exp<strong>on</strong>entially fast. In particular,<br />

lim<br />

n→∞ γn{|Sn/n − m(β)| < ε or |Sn/n + m(β)| < ε} = 1.<br />

But (ωi) n i=1 <strong>an</strong>d (−ωi) n i=1 have the same distributi<strong>on</strong>. Hence, νn is symmetric<br />

<strong>an</strong>d<br />

γn{|Sn/n − m(β)| < ε} = γn{|Sn/n + m(β)| < ε}.<br />

Thus, both limits exist <strong>an</strong>d are equal <strong>to</strong> 1/2. This shows that νn c<strong>on</strong>verges<br />

weakly <strong>to</strong> 1<br />

2 (δ m(β) + δ −m(β)) <strong>an</strong>d c<strong>on</strong>cludes the proof of (b).


C<strong>on</strong>vex <strong>an</strong>alysis in<br />

large deviati<strong>on</strong> theory<br />

Chapter 5<br />

We have seen in the previous chapters how c<strong>on</strong>vex sets <strong>an</strong>d c<strong>on</strong>vex functi<strong>on</strong>s<br />

occur naturally in the study of large deviati<strong>on</strong>s. This chapter will, therefore,<br />

be devoted <strong>to</strong> the <strong>an</strong>alysis of such objects. Although the results in this<br />

chapter may seem remote, they are <strong>an</strong> essential <strong>to</strong>ol in large deviati<strong>on</strong> theory.<br />

5.1. Some elementary c<strong>on</strong>vex <strong>an</strong>alysis<br />

Recall how Lemmas 2.13 <strong>an</strong>d 2.10 showed how <strong>to</strong> replace a functi<strong>on</strong> by the<br />

“best possible” lower semic<strong>on</strong>tinuous versi<strong>on</strong>. Say now we would like <strong>to</strong> also<br />

require c<strong>on</strong>vexity. The natural way <strong>to</strong> do this would be by taking sup over<br />

affine minor<strong>an</strong>ts. The epigraph of the resulting functi<strong>on</strong> will then be the<br />

closure of the c<strong>on</strong>vex hull of the epigraph the original functi<strong>on</strong>. Moreover,<br />

if f were c<strong>on</strong>vex <strong>to</strong> begin <strong>with</strong>, this c<strong>on</strong>structi<strong>on</strong> should again give the<br />

lower semic<strong>on</strong>tinuous regularizati<strong>on</strong>, defined in (2.2) as the maximal lower<br />

semic<strong>on</strong>tinuous minor<strong>an</strong>t. Let us make all this precise.<br />

The setting is that of two real vec<strong>to</strong>r spaces in duality. That is, we<br />

have vec<strong>to</strong>r spaces X <strong>an</strong>d Y <strong>an</strong>d a bilinear map 〈·, ·〉 : X × Y → R. The<br />

weak <strong>to</strong>pology σ(X , Y) <strong>on</strong> X is the minimal <strong>to</strong>pology making the maps<br />

{x ↦→ 〈x, y〉 : y ∈ Y} c<strong>on</strong>tinuous. Similarly for σ(Y, X ) <strong>on</strong> Y.<br />

Assumpti<strong>on</strong> 5.1. We will assume that for each n<strong>on</strong>zero x ∈ X there exists<br />

a y ∈ Y such that 〈x, y〉 = 0. We also assume that for each n<strong>on</strong>zero y ∈ Y<br />

there exists <strong>an</strong> x ∈ X such that 〈x, y〉 = 0.<br />

41


42 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

With the above assumpti<strong>on</strong> in force the <strong>to</strong>pologies σ(X , Y) <strong>an</strong>d σ(Y, X )<br />

make X <strong>an</strong>d Y Hausdorff <strong>to</strong>pological spaces. We will have <strong>to</strong> check this<br />

assumpti<strong>on</strong> every time we put two spaces in duality.<br />

Example 5.2. If X = Y = R d , then σ(X , Y) = σ(Y, X ) is the Euclide<strong>an</strong><br />

<strong>to</strong>pology <strong>on</strong> R d .<br />

Example 5.3. If X is <strong>an</strong>y B<strong>an</strong>ach space <strong>an</strong>d Y = X ∗ , then σ(X , Y) <strong>an</strong>d<br />

σ(Y, X ) are, respectively, the weak <strong>an</strong>d weak ∗ <strong>to</strong>pologies from functi<strong>on</strong>al<br />

<strong>an</strong>alysis.<br />

Example 5.4. Our most import<strong>an</strong>t applicati<strong>on</strong>s will be when X = M(S)<br />

is the space of real-valued Borel measures <strong>on</strong> some <strong>to</strong>pological space S <strong>an</strong>d<br />

Y is some vec<strong>to</strong>r space of bounded Borel functi<strong>on</strong>s <strong>on</strong> S, e.g. Cb(S) when S<br />

is metric.<br />

In the following propositi<strong>on</strong> we give a few c<strong>on</strong>sequences. We leave the<br />

proof of this propositi<strong>on</strong> <strong>to</strong> the reader, either as <strong>an</strong> exercise or <strong>to</strong> be looked<br />

up from a functi<strong>on</strong>al <strong>an</strong>alysis text. For example, Chapter 3 in Rudin’s<br />

textbook [33] has a secti<strong>on</strong> <strong>on</strong> weak <strong>to</strong>pologies.<br />

Propositi<strong>on</strong> 5.5. Setting as above.<br />

(a) A base for σ(X , Y) is given by the collecti<strong>on</strong> of sets of the type<br />

{x ∈ X : |〈x, yi〉 − 〈x0, yi〉| < ε, i = 1, . . . , m}<br />

for m ∈ N, x0 ∈ X , y1, . . . , ym ∈ Y, <strong>an</strong>d ε > 0. In particular,<br />

σ(X , Y) is locally c<strong>on</strong>vex <strong>an</strong>d thus X is locally c<strong>on</strong>vex, which me<strong>an</strong>s<br />

that every neighborhood of x ∈ X c<strong>on</strong>tains a c<strong>on</strong>vex neighborhood<br />

of x.<br />

(b) xn → x in σ(X , Y) if <strong>an</strong>d <strong>on</strong>ly if 〈xn, y〉 → 〈x, y〉 for all y ∈ Y.<br />

(c) The maps R × X → X : (a, x) ↦→ ax <strong>an</strong>d X × X → X : (x, x ′ ) ↦→<br />

x + x ′ are c<strong>on</strong>tinuous in the appropriate <strong>to</strong>pologies. This says that<br />

X is a <strong>to</strong>pological vec<strong>to</strong>r space.<br />

(d) Suppose f : X → R is a c<strong>on</strong>tinuous linear functi<strong>on</strong>al. Then, there<br />

exists a unique y ∈ Y such that f(x) = 〈x, y〉 for all x ∈ X .<br />

This gives <strong>an</strong> isomorphism between the dual X ∗ (the vec<strong>to</strong>r space<br />

of c<strong>on</strong>tinuous linear functi<strong>on</strong>als <strong>on</strong> X ) <strong>an</strong>d Y.<br />

(e) All of the above also holds if the roles of X <strong>an</strong>d Y are reversed.<br />

Here is a fundamental fact that we will need in what follows.<br />

Hahn-B<strong>an</strong>ach separati<strong>on</strong> theorem. Suppose A <strong>an</strong>d B are n<strong>on</strong>empty,<br />

disjoint, c<strong>on</strong>vex subsets of a locally c<strong>on</strong>vex <strong>to</strong>pological real vec<strong>to</strong>r space Z.


5.1. Some elementary c<strong>on</strong>vex <strong>an</strong>alysis 43<br />

(a) If A is open there exists z ∗ ∈ Z ∗ <strong>an</strong>d γ ∈ R such that<br />

〈z ′ , z ∗ 〉 < γ ≤ 〈z ′′ , z∗〉, ∀z ′ ∈ A, z ′′ ∈ B.<br />

(b) If A is compact <strong>an</strong>d B is closed then there exists z ∗ ∈ Z ∗ <strong>an</strong>d<br />

γ1, γ2 ∈ R such that<br />

〈z ′ , z ∗ 〉 < γ1 < γ2 < 〈z ′′ , z∗〉, ∀z ′ ∈ A, z ′′ ∈ B.<br />

For the proof see Theorem 3.4 of [33].<br />

Definiti<strong>on</strong> 5.6. For <strong>an</strong>y functi<strong>on</strong> f : X → [−∞, ∞], define the c<strong>on</strong>vex<br />

c<strong>on</strong>jugate f ∗ : Y → [−∞, ∞] by<br />

f ∗ (y) = sup{〈x,<br />

y〉 − f(x)}.<br />

x∈X<br />

Similarly, if g : Y → [−∞, ∞], define g ∗ : X → [−∞, ∞] by<br />

g ∗ (y) = sup{〈x,<br />

y〉 − g(y)}.<br />

y∈Y<br />

Inductively, f ∗∗ = (f ∗ ) ∗ is the c<strong>on</strong>vex bic<strong>on</strong>jugate <strong>an</strong>d f ∗n = (f ∗(n−1) ) ∗ .<br />

Similarly for g.<br />

Propositi<strong>on</strong> 5.12 below shows why we use the word “c<strong>on</strong>vex” in the above<br />

terminology.<br />

f ∗∗ is in fact the object we are after, as explaind in the first paragraph<br />

of this secti<strong>on</strong>.<br />

∗ Exercise 5.7. We say f is affine if f(x) = a + 〈x, y〉 for some a ∈ R <strong>an</strong>d<br />

y ∈ Y. Prove that f ∗∗ is the sup over the affine minor<strong>an</strong>ts of f.<br />

Next, recall the definiti<strong>on</strong> of a c<strong>on</strong>vex functi<strong>on</strong>.<br />

Definiti<strong>on</strong> 5.8. We say a functi<strong>on</strong> f : X → [−∞, ∞] is c<strong>on</strong>vex if<br />

f(αx1 + (1 − α)x2) ≤ αf(x1) + (1 − α)f(x2)<br />

for all x1, x2 ∈ X <strong>an</strong>d α ∈ [0, 1] such that the right-h<strong>an</strong>d-side is well defined.<br />

A c<strong>on</strong>vex functi<strong>on</strong> is proper if it maps in<strong>to</strong> (−∞, ∞] <strong>an</strong>d is not identically<br />

∞.<br />

∗ Exercise 5.9. Suppose E is a c<strong>on</strong>vex subset of X <strong>an</strong>d let f : E → (−∞, ∞]<br />

be lower semic<strong>on</strong>tinuous. Prove that f is c<strong>on</strong>vex <strong>on</strong> E if <strong>an</strong>d <strong>on</strong>ly if for all<br />

x, y ∈ E,<br />

<br />

x + y<br />

<br />

f ≤<br />

2<br />

f(x) + f(y)<br />

.<br />

2<br />

Hint: First take care of dyadic rati<strong>on</strong>als α = k2 −n by inducti<strong>on</strong> <strong>on</strong> n.


44 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

Exercise 5.10. Show that a finite c<strong>on</strong>vex functi<strong>on</strong> <strong>on</strong> <strong>an</strong> open interval is<br />

c<strong>on</strong>tinuous. This is also true more generally in finite-dimensi<strong>on</strong>al spaces.<br />

See Theorem 10.1 in [32].<br />

∗ Exercise 5.11. (Fenchel-Young inequality) Prove that for <strong>an</strong>y x ∈ X <strong>an</strong>d<br />

y ∈ Y<br />

〈x, y〉 ≤ f(x) + f ∗ (y),<br />

whenever the right-h<strong>an</strong>d-side makes sense.<br />

Propositi<strong>on</strong> 5.12. For f arbitrary the following is true.<br />

(a) f ∗ is c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous, where the c<strong>on</strong>st<strong>an</strong>ts f ∗ ≡ ∞<br />

<strong>an</strong>d f ∗ ≡ −∞ also qualify as c<strong>on</strong>vex lower semic<strong>on</strong>tinuous.<br />

(b) f ∗∗ (x) ≤ f(x) for all x ∈ X .<br />

(c) f ∗n = f ∗ if n ≥ 1 is odd <strong>an</strong>d f ∗n = f ∗∗ if n ≥ 2 is even.<br />

Proof. If f takes the value −∞, then f ∗ ≡ ∞. If f ≡ ∞, then f ∗ ≡ −∞.<br />

Thus, part (a) is true if f is not proper. If, <strong>on</strong> the other h<strong>an</strong>d, f is proper,<br />

then f ∗ is the pointwise sup over a n<strong>on</strong>empty family of affine c<strong>on</strong>tinuous<br />

functi<strong>on</strong>s y ↦→ 〈x, y〉 − f(x), x ∈ X <strong>with</strong> f(x) < ∞. Hence the claim of part<br />

(a) is true in this case <strong>to</strong>o. Part (b) follows from Exercise 5.7.<br />

Observe next that if f1 ≤ f2, then f ∗ 1 ≥ f ∗ 2 . Thus, f ∗∗ ≤ f implies that<br />

f ∗3 ≥ f ∗ . But part (b) implies f ∗3 ≤ f ∗ . Hence, f ∗3 = f ∗ <strong>an</strong>d part (c)<br />

follows by inducti<strong>on</strong>. <br />

If f ∗∗ is supposed <strong>to</strong> be a c<strong>on</strong>vex lower semic<strong>on</strong>tinuous regularizati<strong>on</strong> of<br />

f, it should be the case that if f is already c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous,<br />

then f ∗∗ simply recovers f.<br />

Fenchel-Moreau theorem. C<strong>on</strong>sider a functi<strong>on</strong> f : X → (−∞, ∞] that<br />

is not identically ∞. Then, f = f ∗∗ if <strong>an</strong>d <strong>on</strong>ly if f is c<strong>on</strong>vex <strong>an</strong>d lower<br />

semic<strong>on</strong>tinuous.<br />

Proof. If the equality holds, the previous propositi<strong>on</strong> implies that f is c<strong>on</strong>vex<br />

<strong>an</strong>d lower semic<strong>on</strong>tinuous. Let us now prove the other directi<strong>on</strong>. Again,<br />

the previous propositi<strong>on</strong> implies that f ∗∗ ≤ f. So we need <strong>to</strong> show that<br />

f ∗∗ ≥ f for a c<strong>on</strong>vex lower semic<strong>on</strong>tinuous functi<strong>on</strong> f. Let us start by<br />

showing that f ∗ is proper.<br />

To this end, pick t0 > −∞ <strong>an</strong>d x0 ∈ X such that t0 < f(x0) < ∞. We<br />

then have that<br />

f ∗ (y) ≥ 〈x0, y〉 − f(x0) > −∞<br />

for all y ∈ Y. Thus, we <strong>on</strong>ly need <strong>to</strong> show the existence of y0 ∈ Y such<br />

that f ∗ (y0) < ∞. Since f is c<strong>on</strong>vex lower semic<strong>on</strong>tinuous, its epigraph<br />

{(x, t) ∈ X × R : f(x) ≤ t < ∞} is a closed <strong>an</strong>d c<strong>on</strong>vex set. Since X × R is a


5.1. Some elementary c<strong>on</strong>vex <strong>an</strong>alysis 45<br />

locally c<strong>on</strong>vex <strong>to</strong>pological vec<strong>to</strong>r space, <strong>an</strong>d (x0, t0) is outside the epigraph,<br />

the Hahn-B<strong>an</strong>ach separati<strong>on</strong> theorem implies that there exists a φ ∈ (X ×R) ∗<br />

<strong>an</strong>d γ ∈ R such that<br />

φ(x0, t0) > γ > φ(x, t)<br />

for all (x, t) in the epigraph.<br />

Exercise 5.13. Prove that (X × R) ∗ is isomorphic <strong>to</strong> X ∗ × R (<strong>an</strong>d hence<br />

<strong>to</strong> Y × R).<br />

By the above exercise, there exists a pair (y, s) ∈ Y × R such that<br />

φ(x, t) = 〈x, y〉 + st <strong>an</strong>d<br />

〈x0, y〉 + st0 > γ > 〈x, y〉 + st<br />

for all (x, t) in the epigraph. In particular, since (x0, f(x0)) is in the epigraph,<br />

<strong>on</strong>e has<br />

〈x0, y〉 + st0 > 〈x0, y〉 + sf(x0)<br />

<strong>an</strong>d thus st0 > sf(x0) <strong>an</strong>d s < 0. But then <strong>on</strong>e has 〈x, y/|s|〉 − f(x) < γ/|s|<br />

whenever f(x) < ∞. It follows that<br />

f ∗ (y/|s|) = sup {〈x, y/|s|〉 − f(x)} ≤ γ/|s| < ∞.<br />

x:f(x) −∞<br />

<strong>an</strong>d f ∗∗ ≤ f is not identically ∞. We are now ready <strong>to</strong> prove that f ∗∗ ≥ f.<br />

Suppose there exists a point x0 ∈ X such that f ∗∗ (x0) < f(x0). Then<br />

|f ∗∗ (x0)| < ∞ <strong>an</strong>d we c<strong>an</strong> separate (x0, f ∗∗ (x0)) from the epigraph of f. In<br />

other words, there exists a γ ∈ R <strong>an</strong>d a pair (y, s) ∈ Y × R such that<br />

〈x0, y〉 + sf ∗∗ (x0) > γ > 〈x, y〉 + st<br />

for all (x, t) in the epigraph of f. Now pick x such that f(x) < ∞ <strong>an</strong>d let t<br />

grow <strong>to</strong> infinity. This proves that s ≤ 0. Suppose s < 0. Then,<br />

f ∗∗ (x0) + f ∗ (−y/s) = f ∗∗ (x0) + sup {〈x, y/|s|〉 − f(x)}<br />

x:f(x) 〈x, y〉


46 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

for all x such that f(x) < ∞. Observe that this implies that f(x0) c<strong>an</strong>not<br />

be finite. Now let α > 0. Since f ∗ is proper we c<strong>an</strong> choose y0 such that<br />

f ∗ (y0) < ∞. One has<br />

But then<br />

f ∗ (y0 + αy) = sup{〈x,<br />

y0 + αy〉 − f(x)}<br />

x<br />

≤ sup{〈x,<br />

y0〉 − f(x)} + α sup<br />

x<br />

≤ f ∗ (y0) + αγ.<br />

x:f(x) γ/s − 〈x, y/s〉


5.1. Some elementary c<strong>on</strong>vex <strong>an</strong>alysis 47<br />

for all x such that f(x) < ∞ <strong>an</strong>d thus, in fact, for all x. Since the righth<strong>an</strong>d-side<br />

is c<strong>on</strong>vex <strong>an</strong>d c<strong>on</strong>tinuous in x, this implies that<br />

t0 ≥ f ∗∗ (x0) ≥ γ/s − 〈x0, y/s〉<br />

which c<strong>on</strong>tradicts the separati<strong>on</strong>. So s = 0 <strong>an</strong>d<br />

〈x, y〉 < γ < 〈x0, y〉<br />

for all x such that f(x) < ∞. Thus f(x) is infinite for all x <strong>with</strong> 〈x, y〉 > γ.<br />

But resetting f ∗∗ <strong>to</strong> infinity <strong>on</strong> this regi<strong>on</strong> creates a c<strong>on</strong>vex lower semic<strong>on</strong>tinuous<br />

minor<strong>an</strong>t of f. Thus f ∗∗ is also infinite <strong>on</strong> this regi<strong>on</strong> <strong>an</strong>d in particular<br />

at x0. This c<strong>on</strong>tradicts the choice of x0. <br />

If f is already c<strong>on</strong>vex, then f ∗∗ is nothing but the lower semic<strong>on</strong>tinuous<br />

regularizati<strong>on</strong> of f, which according <strong>to</strong> Lemma 2.10 is the maximal lower<br />

semic<strong>on</strong>tinuous minor<strong>an</strong>t of f.<br />

Theorem 5.18. C<strong>on</strong>sider f : X → (−∞, ∞] that is not identically infinite.<br />

Recall the lower semic<strong>on</strong>tinuous regularizati<strong>on</strong><br />

<br />

<br />

flsc(x) = sup inf f : x ∈ G <strong>an</strong>d G is open<br />

G<br />

.<br />

If f is c<strong>on</strong>vex, then flsc = f ∗∗ .<br />

Proof. Note that if f ∗∗ takes the value −∞, then f ∗∗∗ = ∞ <strong>an</strong>d thus<br />

f ∗ = ∞ <strong>an</strong>d f ∗∗ = −∞. By Lemma 5.17, this implies that the closure of<br />

the epigraph of f is all of X ×R <strong>an</strong>d thus, by Lemma 2.11, that the epigraph<br />

of flsc is all of X × R <strong>an</strong>d flsc = −∞ = f ∗∗ . We c<strong>an</strong>, therefore, assume that<br />

f ∗∗ > −∞.<br />

Since f ∗∗ ≤ f <strong>an</strong>d f ∗∗ is lower semic<strong>on</strong>tinuous Lemma 2.10 implies then<br />

that f ∗∗ ≤ flsc. In particular, flsc > −∞.<br />

If we now show that flsc is c<strong>on</strong>vex, then Corollary 5.15 implies that<br />

flsc ≤ f ∗∗ . Let G be <strong>an</strong> open neighborhood of αx1 + (1 − α)x2 such that<br />

infG f > −∞. By the c<strong>on</strong>tinuity of the map (x1, x2) ↦→ αx1 + (1 − α)x2<br />

there exists a neighborhood G1 of x1 <strong>an</strong>d a neighborhood G2 of x2 such that<br />

αG1 + (1 − α)G2 ⊂ G. Then,<br />

inf f(αz1 + (1 − α)z2) ≤ α inf f + (1 − α) inf f<br />

z1∈G1 z2∈G2<br />

G1<br />

G2<br />

≤ αflsc(x1) + (1 − α)flsc(x2).<br />

inf<br />

G f ≤ inf<br />

Taking supremum over G proves the c<strong>on</strong>vexity of flsc <strong>an</strong>d c<strong>on</strong>cludes the<br />

proof of the theorem. <br />

Let us now go back <strong>to</strong> the picture of f ∗∗ as the supremum of affine<br />

minor<strong>an</strong>ts of f. To be able <strong>to</strong> visualize things, let us look at the case X = R d .<br />

Let f : R d → R be a c<strong>on</strong>vex functi<strong>on</strong> <strong>an</strong>d suppose f ∗ (y) = y · x − f(x) for


48 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

some x <strong>an</strong>d y in R d . Then, for all u, y·u−f(u) ≤ y·x−f(x) or, equivalently,<br />

f(u) ≥ f(x) + y · (u − x). This says that the graph {(u, f(u)) : u ∈ R d } ⊂<br />

R d+1 has a t<strong>an</strong>gent hyperpl<strong>an</strong>e at (x, f(x)) <strong>with</strong> normal (y, −1). If f is<br />

differentiable, then y is unique <strong>an</strong>d equal <strong>to</strong> ∇f(x). Otherwise there may<br />

be several t<strong>an</strong>gent hyperpl<strong>an</strong>es at (x, f(x)). This discussi<strong>on</strong> remains valid<br />

in the more general setting of this secti<strong>on</strong>.<br />

Definiti<strong>on</strong> 5.19. Let f : X → [−∞, ∞]. The subdifferential of f at x is<br />

the multivalued mapping ∂f : X → Y defined by<br />

∂f(x) = {y ∈ Y : ∀u ∈ X , f(u) ≥ f(x) + 〈u − x, y〉}.<br />

Exercise 5.20. Prove that if f is not identically ∞ <strong>an</strong>d f(x) = ∞, then<br />

∂f(x) = ∅. Then prove that if f is c<strong>on</strong>vex <strong>an</strong>d proper <strong>an</strong>d if x is in the<br />

interior of {u : f(u) < ∞}, then ∂f(x) = ∅. Find a counterexample where<br />

f(x) < ∞ but ∂f(x) = ∅.<br />

Hint: Use the Hahn-B<strong>an</strong>ach separati<strong>on</strong> theorem.<br />

Exercise 5.21. Let X = R d <strong>an</strong>d assume f is c<strong>on</strong>vex, proper, <strong>an</strong>d differentiable<br />

at x. Prove that ∂f(x) = {∇f(x)}.<br />

Theorem 5.22. Let f : X → (−∞, ∞] be not identically infinite. Then the<br />

following are equivalent.<br />

(a) y ∈ ∂f(x);<br />

(b) f(x) + f ∗ (y) ≤ 〈x, y〉;<br />

(c) f(x) + f ∗ (y) = 〈x, y〉.<br />

If f is also c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous, then (a)-(c) are equivalent <strong>to</strong><br />

(d) x ∈ ∂f ∗ (y);<br />

Proof. (b) implies (c) by the Fenchel-Young inequality (Exercise 5.11). (c)<br />

implies (a) implies (b) by the definiti<strong>on</strong> of f ∗ . Applying the equivalence of<br />

(a) <strong>an</strong>d (c) <strong>to</strong> f ∗ <strong>on</strong>e sees that (d) is equivalent <strong>to</strong> f ∗ (y) + f ∗∗ (x) = 〈x, y〉.<br />

But f = f ∗∗ if f is c<strong>on</strong>vex, proper, <strong>an</strong>d lower semic<strong>on</strong>tinuous. Hence (d) is<br />

equivalent <strong>to</strong> (c). <br />

Exercise 5.23. Let f be a finite c<strong>on</strong>vex functi<strong>on</strong> <strong>on</strong> (a, b). The following<br />

properties follow quickly from the definiti<strong>on</strong>.<br />

(a) ∂f(x) is a n<strong>on</strong>empty closed interval. f ′ (x) exists if <strong>an</strong>d <strong>on</strong>ly if<br />

∂f(x) is a singlet<strong>on</strong>, <strong>an</strong>d then ∂f(x) = {f ′ (x)}.<br />

(b) Let x = y <strong>an</strong>d α ∈ ∂f(x). Then f(y) = f(x) + α(y − x) implies<br />

that α ∈ ∂f(y).


5.2. Rate functi<strong>on</strong> as a c<strong>on</strong>vex c<strong>on</strong>jugate 49<br />

(c) If x < y, α ∈ ∂f(x), <strong>an</strong>d β ∈ ∂f(y) then<br />

α ≤<br />

f(y) − f(x)<br />

y − x<br />

≤ β.<br />

∂f(x) <strong>an</strong>d ∂f(y) have at most <strong>on</strong>e point in comm<strong>on</strong>. If that happens<br />

then this point is <strong>an</strong> endpoint for both sets <strong>an</strong>d over [x, y] the<br />

graph of f is a line segment.<br />

(d) Suppose xj > x, αj ∈ ∂f(xj) <strong>an</strong>d xj → x. Then αj → sup ∂f(x).<br />

5.2. Rate functi<strong>on</strong> as a c<strong>on</strong>vex c<strong>on</strong>jugate<br />

We will now prove a general theorem about the upper large deviati<strong>on</strong> bound<br />

(2.3) for compact sets.<br />

Let X <strong>an</strong>d Y be two vec<strong>to</strong>r spaces in duality <strong>an</strong>d let X be <strong>to</strong>pologized<br />

by σ(X , Y). Let E be a closed c<strong>on</strong>vex subset of X . We are given a sequence<br />

of probability measures {µn} ⊂ M1(E). Define<br />

1<br />

¯p(y) = lim<br />

n→∞ rn log<br />

<br />

e rn〈x,y〉 µn(dx) ∈ [−∞, ∞] for y ∈ Y<br />

<strong>an</strong>d<br />

E<br />

¯p ∗ (x) = sup{〈x,<br />

y〉 − ¯p(y)} ∈ [0, ∞] for x ∈ X .<br />

y∈Y<br />

Theorem 5.24. ¯p is c<strong>on</strong>vex <strong>an</strong>d ¯p ∗ : X → [0, ∞] is c<strong>on</strong>vex <strong>an</strong>d lower<br />

semic<strong>on</strong>tinuous. Moreover, for <strong>an</strong>y closed compact subset F ⊂ E,<br />

lim<br />

n→∞<br />

1<br />

rn log µn(F ) ≤ − inf<br />

F ¯p ∗ .<br />

Proof. ¯p is c<strong>on</strong>vex by Hölder’s inequality (see page 15 of [15] or page 39<br />

of [26]). N<strong>on</strong>negativity of ¯p ∗ is a c<strong>on</strong>sequence of ¯p(0) = 0 <strong>an</strong>d its lower<br />

semic<strong>on</strong>tinuity <strong>an</strong>d c<strong>on</strong>vexity are satisfied because it is a c<strong>on</strong>vex c<strong>on</strong>jugate.<br />

Let us thus prove the upper bound.<br />

The proof is similar <strong>to</strong> <strong>on</strong>es we have already seen a few times. Let δ > 0<br />

<strong>an</strong>d c < infF ¯p ∗ . For each x ∈ F , there is y ∈ Y such that 〈x, y〉 − ¯p(y) > c.<br />

Let Bx = {u ∈ X : |〈u, y〉 − 〈x, y〉| < δ}, <strong>an</strong> open neighborhood of x. Since<br />

F is compact, cover it <strong>with</strong> Bx1 , . . . , Bxm, <strong>with</strong> corresp<strong>on</strong>ding y1, . . . , ym.<br />

Write<br />

µn(Bxi ) =<br />

<strong>an</strong>d thus<br />

<br />

Bx i<br />

e −rn〈u,yi〉+rn〈u,yi〉<br />

µn(du) ≤ e rn(−〈xi,yi〉+δ)<br />

<br />

Bx i<br />

1 lim<br />

n→∞ rn log µn(Bxi ) ≤ −〈xi, yi〉 + δ + ¯p(yi) ≤ −c + δ.<br />

e rn〈u,yi〉 µn(du)


50 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

C<strong>on</strong>sequently,<br />

lim<br />

n→∞<br />

1<br />

rn log µn(F<br />

<br />

) ≤ max<br />

1≤i≤m<br />

lim<br />

n→∞<br />

<br />

1 µn(Bxi ) ≤ −c + δ.<br />

rn<br />

Now take δ <strong>to</strong> 0 <strong>an</strong>d c <strong>to</strong> infF ¯p ∗ . <br />

If <strong>an</strong> LDP holds, Varadh<strong>an</strong>’s theorem (page 32) gives a sufficient c<strong>on</strong>diti<strong>on</strong><br />

for the existence of the limit<br />

1<br />

p(y) = lim<br />

n→∞ rn log<br />

<br />

e rn〈x,y〉 (5.1)<br />

µn(dx).<br />

This functi<strong>on</strong> is sometimes called the pressure. By the previous theorem,<br />

p ∗ is a c<strong>an</strong>didate for the rate functi<strong>on</strong>. So when is it the case that p ∗ = I?<br />

The next theorem says that this holds when I is c<strong>on</strong>vex.<br />

Theorem 5.25. Assume LDP(µn, rn, I) holds <strong>on</strong> E <strong>an</strong>d I is c<strong>on</strong>vex. If<br />

we extend I <strong>to</strong> X by setting it <strong>to</strong> ∞ outside E, then I is c<strong>on</strong>vex, lower<br />

semic<strong>on</strong>tinuous <strong>on</strong> X , <strong>an</strong>d LDP(µ, rn, I) holds <strong>on</strong> X . If, moreover,<br />

(5.2)<br />

sup<br />

n<br />

<br />

E<br />

e rn〈x,y〉 µn(dx)<br />

1/rn<br />

< ∞ for all y ∈ Y,<br />

then the limit in (5.1) exists, for all y ∈ Y, <strong>an</strong>d is equal <strong>to</strong><br />

p(y) = sup{〈x,<br />

y〉 − I(x)}.<br />

x∈E<br />

For the extended I, we have p = I ∗ <strong>an</strong>d I = p ∗ .<br />

∗ Exercise 5.26. Prove the theorem.<br />

Hint: Use the c<strong>on</strong>tracti<strong>on</strong> principle <strong>an</strong>d Varadh<strong>an</strong>’s theorem.<br />

The above theorem gives us a c<strong>on</strong>vex <strong>an</strong>alytic characterizati<strong>on</strong> of the<br />

zeroes of the rate functi<strong>on</strong>:<br />

I(x) = 0 ⇐⇒ p ∗ (x) = 0 ⇐⇒ p ∗ (x) + p(0) = 0 = 〈x, 0〉 ⇐⇒ x ∈ ∂p(0).<br />

In other words, the set of zeroes is the subdifferential of the pressure at 0.<br />

We have already seen in Secti<strong>on</strong> 4.3 (Exercise 4.17(c)) that rate functi<strong>on</strong>s<br />

do not have <strong>to</strong> be c<strong>on</strong>vex. Here is <strong>an</strong> exercise showing how easy it is <strong>to</strong><br />

c<strong>on</strong>struct such rate functi<strong>on</strong>s.<br />

Exercise 5.27. Let µn <strong>an</strong>d νn be probability measures <strong>on</strong> X such that<br />

LDP(µn, rn, I) <strong>an</strong>d LDP(νn, rn, J) hold. Prove that for <strong>an</strong>y α ∈ (0, 1),<br />

LDP(αµn + (1 − α)νn, rn, min(I, J)) holds. Observe that the new rate functi<strong>on</strong><br />

may fail <strong>to</strong> be c<strong>on</strong>vex even if I <strong>an</strong>d J are c<strong>on</strong>vex.<br />

The next exercise observes that superexp<strong>on</strong>ential c<strong>on</strong>vergence in the<br />

LDP corresp<strong>on</strong>ds <strong>to</strong> a linear pressure functi<strong>on</strong>al.


5.3. Multidimensi<strong>on</strong>al Cramér theorem revisited 51<br />

Exercise 5.28. Let {ηn} be R d -valued r<strong>an</strong>dom variables <strong>an</strong>d m ∈ R d . Assume<br />

{ηn} are uniformly bounded. Prove that the following are equivalent.<br />

(a) ∀ε > 0, limn→∞ r −1<br />

n log P {|ηn − m| ≥ ε} = −∞.<br />

(b) ∀t ∈ R d , p(t) = limn→∞ r −1<br />

n log E[e rnt·ηn ] = t · m.<br />

(c) LDP(P {ηn ∈ ·}, rn, I) holds <strong>with</strong> I(m) = 0 <strong>an</strong>d I(x) = ∞ for<br />

x = m.<br />

Note that boundedness is <strong>on</strong>ly needed for (c)⇒(b).<br />

Hint: To prove that (b) implies (a) find finitely m<strong>an</strong>y t1, . . . , tk ∈ R d <strong>an</strong>d<br />

δ > 0 such that {x : |x − m| ≥ ε} ⊂ ∪ s i=1 {x : x · ti ≥ ci} <strong>an</strong>d m · ti + δ ≤ ci<br />

for all i.<br />

If (5.2) is violated, then even when all limits exist, Varadh<strong>an</strong>’s theorem<br />

<strong>an</strong>d the c<strong>on</strong>vex c<strong>on</strong>jugate representati<strong>on</strong> of the rate may fail. C<strong>on</strong>sequently,<br />

the general upper bound may not be optimal.<br />

Exercise 5.29. (page 123 of [8]) Let 0 < pn < 1/2, bn <strong>an</strong>d arbitrary sequence<br />

of real numbers, <strong>an</strong>d µn ∈ M1(R) such that µn{0} = 1 − 2pn <strong>an</strong>d<br />

µn{bn} = µn{−bn} = pn. Prove the following.<br />

(a) If n −1 log pn → −∞, then LDP(µn, n, I) holds <strong>with</strong> the proper<br />

c<strong>on</strong>vex lower semic<strong>on</strong>tinuous rate I such that I(0) = 0 <strong>an</strong>d I(x) =<br />

∞ for x = 0.<br />

(b) If furthermore bn = n−1 log pn, then (5.2) does not hold <strong>an</strong>d<br />

p(t) = lim<br />

n→∞ n−1 <br />

log e ntx µn(dx)<br />

exists for t ∈ R, but p ∗ (x) < I(x) for all x = 0, <strong>an</strong>d c<strong>on</strong>sequently<br />

p ∗ = I <strong>an</strong>d p = I ∗ .<br />

5.3. Multidimensi<strong>on</strong>al Cramér theorem revisited<br />

In this secti<strong>on</strong> we establish the final form of Cramér’s theorem in R d . C<strong>on</strong>vexity<br />

figures prominently in the proof. Let us recall the setting. X <strong>an</strong>d<br />

{Xn} are i.i.d. R d -valued r<strong>an</strong>dom variables, Sn = X1 + · · · + Xn, <strong>an</strong>d<br />

µn(B) = P {Sn/n ∈ B} is the law of the sample me<strong>an</strong>. The moment generating<br />

functi<strong>on</strong> is M(θ) = E[e θ·X ], <strong>an</strong>d<br />

p(θ) = log M(θ)<br />

is a (−∞, ∞]-valued c<strong>on</strong>vex, lower semic<strong>on</strong>tinuous functi<strong>on</strong> <strong>on</strong> R d . Lower<br />

semic<strong>on</strong>tinuity comes from Fa<strong>to</strong>u’s lemma (see, e.g. page 16 of [15] or page<br />

45 of [26]).


52 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

Cramér’s theorem <strong>on</strong> R d . Assume M(θ) < ∞ in <strong>an</strong> open neighborhood of<br />

the origin. Then LDP(µn, n, I) holds <strong>with</strong> c<strong>on</strong>vex, tight rate functi<strong>on</strong> I = p ∗ .<br />

Remark 5.30. With more work <strong>on</strong>e c<strong>an</strong> show that weak LDP(µn, n, p ∗ )<br />

holds even <strong>with</strong>out the finiteness assumpti<strong>on</strong> <strong>on</strong> M (Corollary 6.1.6 of [8]).<br />

Proof. This proof gives us the opportunity <strong>to</strong> display <strong>an</strong>other method for<br />

obtaining LDPs, namely subadditivity.<br />

Step 1. We show that there exists a c<strong>on</strong>vex, tight rate functi<strong>on</strong> I such<br />

that LDP(µn, n, I) holds.<br />

This is where subadditivity comes in. We claim that for each open ball<br />

B ⊂ R d , the limit<br />

j(B) = − lim<br />

n→∞ n−1 (5.3)<br />

log P {Sn/n ∈ B} ∈ [0, ∞]<br />

exists. Any c<strong>on</strong>vex open set would in fact work but we need <strong>on</strong>ly balls. Let<br />

<strong>an</strong> = − log P {Sn/n ∈ B}. We establish two properties for this sequence,<br />

namely<br />

(5.4)<br />

<strong>an</strong>d subadditivity<br />

(5.5)<br />

either <strong>an</strong> = ∞ ∀n, or ∃N < ∞ such that <strong>an</strong> < ∞ ∀n ≥ N<br />

am+n ≤ am + <strong>an</strong>.<br />

To check (5.4), suppose there exists k such that P {Sk/k ∈ B} > 0. This<br />

k is kept fixed while we verify (5.4). Since B is the uni<strong>on</strong> of countably m<strong>an</strong>y<br />

closed balls, we c<strong>an</strong> find a closed ball K ⊂ B such that P {Sk/k ∈ K} > 0.<br />

Fix ε > 0 smaller th<strong>an</strong> the dist<strong>an</strong>ce from K <strong>to</strong> the complement of B so that<br />

x ∈ K <strong>an</strong>d |y| < ε imply x + y ∈ B.<br />

For this proof it is c<strong>on</strong>venient <strong>to</strong> use the abbreviati<strong>on</strong> Sm,n = Xm+1 +<br />

Xm+2 + · · · + Xn. Note that Sm,n has the same distributi<strong>on</strong> as Sn−m.<br />

Decompose <strong>an</strong>y n as n = mk + ℓ <strong>with</strong> 0 ≤ ℓ < k, <strong>an</strong>d then write<br />

Smk+ℓ = S0,k + Sk,2k + · · · + S (m−1)k,mk + Smk,mk+ℓ.<br />

The terms above are independent <strong>an</strong>d the first m also identically distributed.<br />

Note also the inequality<br />

<br />

Sn<br />

Smk <br />

<br />

− <br />

n mk =<br />

<br />

<br />

ℓSmk<br />

Smk,mk+ℓ <br />

<br />

− <br />

|Smk| |Smk,mk+ℓ|<br />

n n ≤ + .<br />

mn n<br />

∗ Exercise 5.31. Apply <strong>an</strong> exp<strong>on</strong>ential Chebyshev inequality <strong>to</strong>gether <strong>with</strong><br />

the assumpti<strong>on</strong> that M(θ) < ∞ in a neighborhood of the origin <strong>to</strong> show<br />

that there exists N0 such that<br />

max<br />

0≤ℓ


5.3. Multidimensi<strong>on</strong>al Cramér theorem revisited 53<br />

Imagine a large enough radius r such that K ⊂ {x : |x| < r}. Then<br />

(mk) −1 Smk ∈ K implies (mk) −1 Smk < nε/(2k) if n > 2kr/ε. Fix <strong>an</strong>y<br />

N > N0 ∨ (2kr/ε). Now we estimate the probability from below. First by<br />

the choice of ε, <strong>an</strong>d then by independence,<br />

<br />

Smk Sn<br />

Smk <br />

P {Sn/n ∈ B} ≥ P ∈ K, <br />

mk − <br />

n mk < ε<br />

<br />

Smk |Smk| ε |Smk,mk+ℓ|<br />

≥ P ∈ K, < , <<br />

mk mn 2 n<br />

ε<br />

<br />

2<br />

<br />

Smk |Smk| nε |Sℓ| ε<br />

= P ∈ K, < P <<br />

mk mk 2k n 2<br />

<br />

Smk<br />

≥ P ∈ K ·<br />

mk 1<br />

for n ≥ N.<br />

2<br />

To prove (5.4) we use c<strong>on</strong>vexity of K <strong>an</strong>d the i.i.d. assumpti<strong>on</strong>. Let now<br />

n ≥ N.<br />

P {Sn/n ∈ B} ≥ 1<br />

2 · P {(mk)−1 Smk ∈ K}<br />

≥ 1<br />

2 · P k −1 S0,k ∈ K, k −1 Sk,2k ∈ K, . . . , k −1 S (m−1)k,mk ∈ K}<br />

= 1<br />

2 · P {k −1 Sk ∈ K} m > 0.<br />

Property (5.4) has been verified.<br />

Similarly, (5.5) follows immediately from c<strong>on</strong>vexity of B <strong>an</strong>d the i.i.d.<br />

assumpti<strong>on</strong>:<br />

am+n = − log P {(m + n) −1 Sm+n ∈ B}<br />

≤ − log P m −1 Sm ∈ B , n −1 Sm,m+n ∈ B <br />

= − log P {Sm/m ∈ B} − log P {Sn/n ∈ B}<br />

= am + <strong>an</strong>.<br />

Note that the inequalities above work even if some probabilities v<strong>an</strong>ish.<br />

With these properties checked we c<strong>an</strong> apply the next classic fact. In<br />

Secti<strong>on</strong> 7.2 we generalize it suitably <strong>to</strong> a multidimensi<strong>on</strong>al index set.<br />

Fekete’s lemma. Let (<strong>an</strong>)n≥1 be a sequence in (−∞, ∞] <strong>with</strong> properties<br />

(5.4)–(5.5). Then<br />

<strong>an</strong><br />

lim<br />

n→∞ n<br />

= inf<br />

n<br />

<strong>an</strong><br />

n<br />

<strong>with</strong> values in [−∞, ∞].<br />

Proof. For the identically infinite case the claimed property is trivially true.<br />

So <strong>on</strong>ly the case where the sequence is eventually finite needs proof. The<br />

inequality<br />

<strong>an</strong><br />

lim<br />

n→∞ n<br />

≥ inf<br />

n<br />

<strong>an</strong><br />

n


54 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

needs no proof.<br />

Fix <strong>an</strong>y k ∈ N such that ak < ∞. Fix m0 such that m0k ≥ N for N<br />

from assumpti<strong>on</strong> (5.4). C<strong>on</strong>sider n > m0k <strong>an</strong>d write n = mk + m0k + ℓ for<br />

some 0 ≤ ℓ < 0. Subadditivity gives<br />

<strong>an</strong> ≤ mak + am0k+ℓ ≤ mak + max<br />

j:m0k≤j


5.3. Multidimensi<strong>on</strong>al Cramér theorem revisited 55<br />

(5.7)<br />

(5.8)<br />

From Lemma 4.7 we get <strong>on</strong>e inequality for free:<br />

p(θ) ≥ I ∗ (θ).<br />

For the trickier half observe that for <strong>an</strong> open ball B <strong>an</strong>d <strong>an</strong>y finite n,<br />

n −1 log P {Sn/n ∈ B} ≤ −j(B) ≤ − inf<br />

x∈ ¯ I(x).<br />

B<br />

The first inequality is the subadditivity. In the sec<strong>on</strong>d ¯ B is the compact<br />

closure of B, <strong>an</strong>d the inequality comes from compactness. Take a number<br />

c < inf x∈ ¯ B I(x). For each x ∈ ¯ B pick <strong>an</strong> open ball Gx ∋ x such that<br />

j(Gx) > c. Cover ¯ B <strong>with</strong> finitely m<strong>an</strong>y of these balls, say G1, . . . , Gm. The<br />

simple uni<strong>on</strong> bound<br />

P {Sn/n ∈ B} ≤<br />

m<br />

P {Sn/n ∈ Gi}<br />

i=1<br />

becomes in the limit j(B) ≥ mini j(Gi) > c.<br />

Let us first restrict the calculati<strong>on</strong> of p(θ) <strong>to</strong> a δ-ball around a point<br />

z ∈ R d <strong>an</strong>d develop <strong>an</strong> estimate for it.<br />

n −1 log E e θ·Sn , n −1 Sn ∈ B(z, δ) <br />

≤ θ · z + |θ| δ + n −1 log P {Sn/n ∈ B(z, δ)}<br />

≤ θ · z + |θ| δ − inf<br />

x∈B(z,δ)<br />

I(x)<br />

≤ sup<br />

x∈B(z,δ)<br />

{θ · x − I(x)} + 2 |θ| δ.<br />

Next we go from small balls <strong>to</strong> a large ball of radius R centered at the origin.<br />

We cover the R-ball <strong>with</strong> m δ-balls B(zi, δ). There is a c<strong>on</strong>st<strong>an</strong>t C such that<br />

m ≤ (CR/δ) d for all R > 0. Then<br />

<br />

≤ R <br />

Define<br />

n −1 log E e θ·Sn , n −1 Sn<br />

≤ n −1 d log(CR/δ) + max<br />

1≤i≤m n−1 log E e θ·Sn , n −1 Sn ∈ B(zi, δ) <br />

≤ n −1 d log(CR/δ) + I ∗ (θ) + 2 |θ| δ.<br />

p(θ, R) = lim<br />

n→∞ n−1 log E e θ·Sn , n −1 <br />

Sn<br />

≤ R .<br />

Then from above we have the bound<br />

p(θ, R) ≤ I ∗ (θ) + 2 |θ| δ.


56 5. C<strong>on</strong>vex <strong>an</strong>alysis in large deviati<strong>on</strong> theory<br />

It remains <strong>to</strong> argue that p(θ, R) is a good approximati<strong>on</strong> <strong>to</strong> the true p(θ).<br />

Independence makes this easy:<br />

n −1 log E e θ·Sn , n −1 <br />

Sn<br />

≤ R <br />

≥ n −1 log E e θ·Sn , |Xi| ≤ R ∀i ∈ {1, . . . , n} <br />

= n −1 <br />

n<br />

θ·Xi log E e 1I{|Xi| ≤ R} <br />

= log E e θ·X , |X| ≤ R .<br />

i=1<br />

Combining the last two displays gives<br />

log E e θ·X , |X| ≤ R ≤ I ∗ (θ) + 2 |θ| δ.<br />

Take δ ↘ 0 <strong>an</strong>d R ↗ ∞ <strong>to</strong> get p(θ) ≤ I ∗ (θ). Together <strong>with</strong> (5.7) we<br />

have p = I ∗ . Since I is c<strong>on</strong>vex, lower semic<strong>on</strong>tinuous, <strong>an</strong>d not identically<br />

infinite (the upper bound guar<strong>an</strong>tees that a rate functi<strong>on</strong> has inf I = 0),<br />

I = I ∗∗ = p ∗ . This completes the proof of Cramér’s Theorem.<br />

Let us observe that Step 2 is not needed if we assume M(θ) < ∞ for all<br />

θ ∈ R d . For in this case I = p ∗ follows from Theorem 5.25. Furnishing the<br />

details for this is a good exercise for the reader who sees this material for<br />

the first time. <br />

Exercise 5.32. A process {Xn} is exch<strong>an</strong>geable if its distributi<strong>on</strong> is invari<strong>an</strong>t<br />

under finite permutati<strong>on</strong>s of the r<strong>an</strong>dom variables. According <strong>to</strong> de<br />

Finetti’s theorem (page 174), <strong>an</strong> infinite exch<strong>an</strong>geable sequence is a mixture<br />

of i.i.d. sequences. Use this, Cramér’s theorem <strong>an</strong>d either Bryc’s theorem or<br />

Exercise 5.27 <strong>to</strong> derive <strong>an</strong> LDP for the sample me<strong>an</strong> Sn/n of <strong>an</strong> exch<strong>an</strong>geable<br />

process under some suitable hypotheses.


Relative entropy <strong>an</strong>d<br />

large deviati<strong>on</strong>s for<br />

empirical measures<br />

Chapter 6<br />

Exercise 4.4 showed that if {Xk} are i.i.d. Bernoulli r<strong>an</strong>dom variables, then<br />

the laws of the empirical measures<br />

Ln = 1<br />

n<br />

satisfy a large deviati<strong>on</strong> principle <strong>with</strong> the rate functi<strong>on</strong> H : M1([0, 1]) →<br />

[0, ∞] given by<br />

(6.1)<br />

n<br />

k=1<br />

δXk<br />

H(α) = Ip(s) for α = sδ1 + (1 − s)δ0 <strong>with</strong> s ∈ [0, 1],<br />

where Ip(s) = s log s<br />

1−s<br />

p + (1 − s) log 1−p . Recall also the direct relati<strong>on</strong> between<br />

Ip <strong>an</strong>d both thermodynamic <strong>an</strong>d informati<strong>on</strong>-theoretic entropies; see<br />

Secti<strong>on</strong>s 1.2 <strong>an</strong>d 1.1.<br />

In this chapter, we will introduce the entropy of <strong>on</strong>e measure relative <strong>to</strong><br />

<strong>an</strong>other, study its properties, <strong>an</strong>d see how it is related <strong>to</strong> large deviati<strong>on</strong>s<br />

for empirical measures. The above H will then be seen <strong>to</strong> be the entropy of<br />

α relative <strong>to</strong> BER(p).<br />

Once <strong>on</strong>e has a large deviati<strong>on</strong> principle for empirical measures, <strong>on</strong>e c<strong>an</strong><br />

use the c<strong>on</strong>tracti<strong>on</strong> principle <strong>to</strong> recover a large deviati<strong>on</strong> principle for the<br />

sample me<strong>an</strong>. Hence, the former has more informati<strong>on</strong> about the process<br />

th<strong>an</strong> the latter.<br />

57


58 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

6.1. Relative entropy<br />

Let M(X ) be the space of finite signed measures <strong>on</strong> a measurable space<br />

(X , B) <strong>an</strong>d let M1(X ) be the subspace of probability measures. Let bB<br />

be the space of bounded measurable functi<strong>on</strong>s. There is a natural duality<br />

between M(X ) <strong>an</strong>d bB:<br />

<br />

〈ν, g〉 = g dν, for g ∈ bB <strong>an</strong>d ν ∈ M(X ).<br />

The resulting weak <strong>to</strong>pologies separate points.<br />

∗ Exercise 6.1. Show that for each g ∈ bB such that g ≡ 0 there exists a<br />

ν ∈ M1(X ) such that 〈ν, g〉 = 0. Then show that for each ν ∈ M(X ) such<br />

that ν ≡ 0 there exists a g ∈ bB such that 〈ν, g〉 = 0. Assuming that X is a<br />

metric space <strong>an</strong>d B its Borel σ-algebra, show that for each ν ∈ M(X ) such<br />

that ν ≡ 0 there exists a g ∈ Cb(X ) such that 〈ν, g〉 = 0.<br />

Definiti<strong>on</strong> 6.2. For ν, λ ∈ M1(X ), the entropy of ν relative <strong>to</strong> λ, H(ν | λ),<br />

is defined by<br />

<br />

dν<br />

φ log φ dλ if ν ≪ λ <strong>an</strong>d φ =<br />

H(ν | λ) =<br />

dλ ,<br />

∞ otherwise.<br />

Note that x log x ≥ −1/e, so the above integral is well-defined.<br />

Exercise 6.3. Show that the entropy of a measure α ∈ M1([0, 1]) relative<br />

<strong>to</strong> BER(p) is given by (6.1).<br />

Lemma 6.4. H(ν | λ) ≥ 0 <strong>an</strong>d H(ν | λ) = 0 if <strong>an</strong>d <strong>on</strong>ly if ν = λ.<br />

Proof. Jensen’s inequality (page 14 of [15] or page 40 of [26]) implies that<br />

H is n<strong>on</strong>negative. The strict c<strong>on</strong>vexity of x log x implies that equality holds<br />

if <strong>an</strong>d <strong>on</strong>ly if φ is c<strong>on</strong>st<strong>an</strong>t λ-a.s. <br />

Let us fix the reference probability measure λ ∈ M1(X ). Let p : bB → R<br />

be defined by<br />

<br />

p(g) = log e g dλ.<br />

Extend H(ν | λ) <strong>to</strong> all ν ∈ M(X ) by setting it <strong>to</strong> infinity for ν ∈ M(X ) <br />

M1(X ).<br />

Theorem 6.5. p <strong>an</strong>d H are c<strong>on</strong>vex c<strong>on</strong>jugates of <strong>on</strong>e <strong>an</strong>other. In particular,<br />

H is c<strong>on</strong>vex <strong>an</strong>d has the variati<strong>on</strong> representati<strong>on</strong><br />

(6.2)<br />

H(ν | λ) = sup {E<br />

g∈bB<br />

ν [g] − log E λ [e g ]}.


6.1. Relative entropy 59<br />

Proof. We start by proving that p = H ∗ . To this end, put dν = eg<br />

E λ [e g ] dλ,<br />

<strong>an</strong>d compute<br />

〈ν, g〉 − H(ν | λ) = Eλ [ge g ]<br />

E λ [e g ]<br />

− Eλ<br />

= log E λ [e g ] = p(g).<br />

e g<br />

E λ [e g ] (g − log Eλ [e g ])<br />

So H ∗ (g) ≥ p(g). On the other h<strong>an</strong>d, let ν ∈ M1(X ) <strong>with</strong> ν ≪ λ <strong>an</strong>d<br />

φ = dν<br />

dλ such that Eν [| log φ|] < ∞. Write<br />

〈ν, g〉 − H(ν | λ) = E ν [g − log φ] = E ν [log eg<br />

φ ] ≤ log Eν [ eg<br />

φ ]<br />

= log E λ [e g 1I{φ > 0}] ≤ log E λ [e g ] = p(g).<br />

So H ∗ (g) ≤ p(g). Next, we show that H ≤ p ∗ .<br />

Let ν ∈ M(X ). Suppose there exists a n<strong>on</strong>negative g ∈ bB such that<br />

E ν [g] < 0. Then,<br />

p ∗ (g) ≥ E ν [−Mg] − log E λ [e −Mg ] ≥ −ME ν [g].<br />

Taking M <strong>to</strong> infinity shows that p ∗ (g) = ∞. In the rest, we assume ν is a<br />

n<strong>on</strong>negative measure.<br />

Suppose that ν(X ) > 1. Then<br />

p ∗ (ν) ≥ E ν [M] − log E λ [e M ] = M(ν(X ) − 1)<br />

<strong>an</strong>d <strong>on</strong>ce again taking M <strong>to</strong> infinity shows that p ∗ (ν) = ∞. Similarly, if<br />

ν(X ) < 1, then p ∗ (ν) = ∞. Thus, we assume ν ∈ M1(X ).<br />

Suppose there exists a measurable set A such that λ(A) = 0 < ν(A).<br />

Then<br />

p ∗ (ν) ≥ E ν [M1IA] − log E λ [e M1IA ] = Mν(A)<br />

<strong>an</strong>d p ∗ (ν) = ∞. We thus assume ν ≪ λ.<br />

Suppose φ = dν<br />

dλ <strong>an</strong>d let φb a = a ∨ (φ ∧ b), for 0 < a < 1 < b < ∞. Then<br />

p ∗ (ν) ≥ E ν [log φ b a] − log E λ [φ b a] ≥ E ν [log φ b a] − log E λ [φ ∨ a].<br />

Since log φ b a ≥ log a, m<strong>on</strong>ot<strong>on</strong>e c<strong>on</strong>vergence (see page 16 of [15] or page 46<br />

of [26]) implies that E ν [log φ b a] c<strong>on</strong>verges, as b → ∞, <strong>to</strong> E ν [log(φ ∨ a)] ≥<br />

E ν [log φ] = H(ν | λ). Thus,<br />

p ∗ (ν) ≥ H(ν | λ) − log E λ [φ ∨ a].<br />

Since 0 ≤ φ ∨ a ≤ φ + 1, dominated c<strong>on</strong>vergence (see page 16 of [15] or page<br />

46 of [26]) implies that log E λ [φ ∨ a] c<strong>on</strong>verges <strong>to</strong> log E λ [φ] = 0, as a → 0,<br />

<strong>an</strong>d we have proved that p ∗ ≥ H.<br />

Finally, H ≤ p ∗ = H ∗∗ ≤ H, where the last inequality follows from part<br />

(b) of Propositi<strong>on</strong> 5.12. <br />

In fact, relative entropy is strictly c<strong>on</strong>vex.


60 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

∗ Exercise 6.6. Show that if µ, ν ∈ M1(X ) <strong>an</strong>d θ ∈ (0, 1), then<br />

H(θν + (1 − θ)ν | λ) = θH(µ | λ) + (1 − θ)H(ν | λ) < ∞<br />

is equivalent <strong>to</strong> µ = ν.<br />

When X is a metric space Cb(X ) is also in duality <strong>with</strong> M(X ) <strong>an</strong>d, by<br />

Exercise 6.1, the <strong>to</strong>pologies separate points. Restricted <strong>to</strong> M1(X ) the <strong>to</strong>pology<br />

σ(M(X ), Cb(X )) is the st<strong>an</strong>dard weak <strong>to</strong>pology of probability measures<br />

<strong>an</strong>d is very c<strong>on</strong>venient <strong>to</strong> work <strong>with</strong>; see Appendix A.1. In this case, <strong>on</strong>e<br />

c<strong>an</strong> compute H using <strong>on</strong>ly bounded c<strong>on</strong>tinuous functi<strong>on</strong>s.<br />

Theorem 6.7. Let X be a metric space. Then<br />

(6.3)<br />

H(ν | λ) = sup<br />

f∈Cb(X )<br />

{E ν [f] − log E λ [e f ]}.<br />

In particular, p <strong>an</strong>d H are c<strong>on</strong>vex c<strong>on</strong>jugates in the duality of the spaces<br />

M(X ) <strong>an</strong>d Cb(X ). On the space M1(X ) of probability measures H is lower<br />

semic<strong>on</strong>tinuous in the weak <strong>to</strong>pology generated by Cb(X ).<br />

The proof requires the following technical lemma which we prove in<br />

Appendix B.1.<br />

Lemma B.1. Let X be a metric space <strong>an</strong>d let H be a class of bounded<br />

functi<strong>on</strong>s that c<strong>on</strong>tains the space Ub(X ) of bounded uniformly c<strong>on</strong>tinuous<br />

functi<strong>on</strong>s <strong>an</strong>d is closed under uniformly bounded pointwise c<strong>on</strong>vergence (i.e.<br />

fn ∈ H for all n, maxn sup x |fn(x)| < ∞, <strong>an</strong>d fn(x) → f(x) for all x ∈ X<br />

<strong>to</strong>gether imply f ∈ H). Then bB ⊂ H.<br />

Proof of Theorem 6.7. Let<br />

C = sup<br />

f∈Cb(X )<br />

{E ν [f] − log E λ [e f ]}<br />

<strong>an</strong>d suppose C < H(ν | λ). Set H = {f ∈ bB : E ν [f]−p(f) ≤ C}. This class<br />

of bounded functi<strong>on</strong>s is closed under uniformly bounded pointwise c<strong>on</strong>vergence<br />

<strong>an</strong>d c<strong>on</strong>tains Cb(X ). It thus c<strong>on</strong>tains all of bB <strong>an</strong>d Theorem 6.5 implies<br />

H(ν | λ) ≤ C. We thus c<strong>on</strong>clude that H = p ∗ in the σ(M(X ), Cb(X ))<strong>to</strong>pology.<br />

The fact that p(f) = H ∗ (f) for f ∈ Cb(X ) is a special case of<br />

Theorem 6.5. <br />

We will see later how relative entropy H is related <strong>to</strong> rate functi<strong>on</strong>s. A<br />

natural questi<strong>on</strong> <strong>to</strong> ask, then, is whether or not sublevel sets are compact.<br />

The <strong>an</strong>swer is yes, if X is Polish.<br />

Definiti<strong>on</strong> 6.8. X is called Polish if it has a countable dense set <strong>an</strong>d the<br />

<strong>to</strong>pology of X has a complete metric.


6.1. Relative entropy 61<br />

Propositi<strong>on</strong> 6.9. Assume X is Polish. For c ∈ R, the sublevel set<br />

A = {ν ∈ M1(X ) : H(ν | λ) ≤ c}<br />

is compact in the weak <strong>to</strong>pology generated by Cb(X ).<br />

To prove the lemma we need <strong>to</strong> recall the noti<strong>on</strong> of tightness of a family<br />

of measures; see Appendix A.1. We also need <strong>to</strong> recall the definiti<strong>on</strong> of<br />

uniform integrability.<br />

Definiti<strong>on</strong> 6.10. A collecti<strong>on</strong> of integrable functi<strong>on</strong>s C is uniformly integrable<br />

if<br />

lim<br />

M→∞ sup E[|ϕ|1I{|ϕ| ≥ M}] = 0.<br />

ϕ∈C<br />

∗ Exercise 6.11. Prove that if C is uniformly integrable, then for all ε > 0<br />

there exists a δ > 0 such that P (B) < δ implies sup ϕ∈C E[|ϕ|1IB] < ε.<br />

∗ Exercise 6.12. Assume there exists a n<strong>on</strong>negative functi<strong>on</strong> G : [0, ∞) →<br />

[0, ∞) such that G(x)/x → ∞ as x → ∞, <strong>an</strong>d sup ϕ∈C E[G(|ϕ|)] < ∞. Show<br />

that C is uniformly integrable.<br />

Proof of Propositi<strong>on</strong> 6.9. Take G(x) = x log x. The criteri<strong>on</strong> in the<br />

above exercise implies that C = { dν<br />

dµ : ν ∈ A} is uniformly integrable. Next,<br />

we show that A is tight.<br />

Fix ε > 0. By Exercise 6.11, there exists δ > 0 such that λ(E) < δ<br />

implies sup ν∈A ν(E) < ε. Now use the regularity of probability measures<br />

<strong>on</strong> Polish spaces (Ulam’s theorem; see Exercise A.11) <strong>to</strong> find a compact set<br />

K ⊂ X such that λ(K c ) < δ. This proves the tightness of the family of<br />

measures A.<br />

By Prohorov’s theorem (page 169), A is relatively compact in the weak<br />

<strong>to</strong>pology. Since H is lower semic<strong>on</strong>tinuous, A is closed <strong>an</strong>d is thus compact.<br />

<br />

We c<strong>on</strong>clude this secti<strong>on</strong> <strong>with</strong> some exercises.<br />

Exercise 6.13. For 0 ≤ f ∈ L1 (R) such that f(x) dx = 1, let<br />

<br />

H(f) = f log f dx.<br />

(a) Show that γ(x) = (2πσ 2 ) −1/2 e −x2 /2σ 2<br />

is the unique minimizer of<br />

H(f) am<strong>on</strong>g f that satisfy x2f(x) dx = σ2 .<br />

(b) Similarly, show that e(x) = λe −λx is the unique minimizer am<strong>on</strong>g<br />

f supported <strong>on</strong> [0, ∞) <strong>an</strong>d satisfying xf(x) dx = 1/λ.<br />

Hint: Compute H(f(x)dx | γ(x)dx) <strong>an</strong>d H(f(x)dx | e(x)dx) then H(γ) <strong>an</strong>d<br />

H(e). Note that the former two qu<strong>an</strong>tities are n<strong>on</strong>negative.


62 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

∗Exercise 6.14. (C<strong>on</strong>diti<strong>on</strong>al entropy formula) Suppose A is a sub-σalgebra<br />

of B. Suppose there exist versi<strong>on</strong>s of the c<strong>on</strong>diti<strong>on</strong>al probabilities<br />

µ A <strong>an</strong>d λA of µ <strong>an</strong>d λ, given A; see Example 8.5. Let µA <strong>an</strong>d λA be the<br />

restricti<strong>on</strong>s of µ <strong>an</strong>d λ <strong>to</strong> A. Prove that<br />

<br />

H(µ | λ) = H(µA | λA) + H(µ A | λ A )dµ.<br />

Hint: Show that dµA<br />

dλA = Eλ [ dµ<br />

dλ<br />

dµA<br />

|A] <strong>an</strong>d dλA = dµ<br />

dλ<br />

/ dµA<br />

dλA .<br />

∗ Exercise 6.15. Suppose µ(A) = 1 <strong>an</strong>d λ(A) > 0. Let π = λ( · | A). Show<br />

that<br />

H(µ | λ) = H(µ | π) − log λ(A).<br />

Exercise 6.16. Let S be a finite space. Show that there is a maximizing<br />

functi<strong>on</strong> f in (6.2) if <strong>an</strong>d <strong>on</strong>ly if ν <strong>an</strong>d λ have the same support.<br />

∗ Exercise 6.17. This exercise uses relative entropy <strong>to</strong> prove the Markov<br />

chain c<strong>on</strong>vergence theorem for <strong>an</strong> irreducible, aperiodic, finite state space<br />

Markov chain. Let S be a finite state space, P the tr<strong>an</strong>siti<strong>on</strong> matrix, π the<br />

unique invari<strong>an</strong>t distributi<strong>on</strong>, <strong>an</strong>d µ <strong>an</strong> arbitrary initial distributi<strong>on</strong>. We<br />

show that µP n → π.<br />

(a) Show that H(· | π) is c<strong>on</strong>tinuous.<br />

(b) Show that in general H(νP | π) ≤ H(ν | π). Show that there exists<br />

m ∈ N such that P m > 0 elementwise, <strong>an</strong>d then H(νP m | π) <<br />

H(ν | π) if ν = π.<br />

(c) Let ν be a limit point of µP n . Show that ν must equal π.<br />

6.2. S<strong>an</strong>ov’s theorem<br />

We are now ready <strong>to</strong> prove the large deviati<strong>on</strong> principle for empirical measures<br />

of a sequence of i.i.d. r<strong>an</strong>dom variables <strong>on</strong> a Polish space S. The<br />

<strong>to</strong>pology that determines open <strong>an</strong>d closed sets in M1(S) is the weak <strong>to</strong>pology<br />

generated by Cb(S).<br />

S<strong>an</strong>ov’s theorem. Let S be a Polish space. Let {Xn} be a sequence of i.i.d.<br />

S-valued r<strong>an</strong>dom variables <strong>with</strong> comm<strong>on</strong> law λ ∈ X = M1(S). Let<br />

Ln = 1<br />

n<br />

δXk ∈ M1(S)<br />

n<br />

k=1<br />

be the empirical measures. Let Qn ∈ M1(X ) be the law of Ln. Then, the<br />

family {Qn} is exp<strong>on</strong>entially tight <strong>an</strong>d LDP(Qn, n, H) holds <strong>with</strong> the tight<br />

c<strong>on</strong>vex rate functi<strong>on</strong> H(ν) = H(ν | λ).


6.2. S<strong>an</strong>ov’s theorem 63<br />

To underst<strong>an</strong>d what the above theorem says let S = R <strong>an</strong>d assume λ<br />

is absolutely c<strong>on</strong>tinuous relative <strong>to</strong> the Lebesgue measure <strong>on</strong> R; i.e. λ has a<br />

probability density functi<strong>on</strong> f(x) such that λ(B) = <br />

B f(x)dx. If <strong>on</strong>e plots<br />

the normalized his<strong>to</strong>gram of data points (Xk) n k=1 , it is supposed <strong>to</strong> resemble<br />

f. S<strong>an</strong>ov’s theorem tells us that given the null hypothesis that our data is<br />

i.i.d. <strong>with</strong> law λ the probability that the normalized his<strong>to</strong>gram will look like<br />

the probability density functi<strong>on</strong> of a different measure ν decays like e−cn ,<br />

where c is the entropy of ν relative <strong>to</strong> λ. This is what we saw in the case of<br />

λ=BER(p) in Exercise 4.4.<br />

Proof. Let P = λ ⊗N . Throughout the proof we have <strong>to</strong> c<strong>on</strong>sider expectati<strong>on</strong>s<br />

<strong>with</strong> respect <strong>to</strong> more th<strong>an</strong> <strong>on</strong>e measure, hence now we write<br />

E P [f] = f dP for expectati<strong>on</strong> under P .<br />

Observe that for a bounded c<strong>on</strong>tinuous functi<strong>on</strong> f : S → R,<br />

1<br />

¯p(f) = lim<br />

n→∞ n log<br />

<br />

e n〈ν,f〉 <br />

1<br />

Qn(dν) = lim<br />

n→∞ n log EP e P <br />

n<br />

k=1 f(Xk)<br />

= lim<br />

n→∞ log<br />

<br />

e f dλ = p(f).<br />

Since p∗ = H, the general upper bound in Theorem 5.24 implies the upper<br />

bound for compact sets. By Theorem 3.3, <strong>to</strong> get <strong>an</strong> upper bound for general<br />

closed sets it is enough <strong>to</strong> establish exp<strong>on</strong>ential tightness of {Qn}. Use again<br />

Ulam’s theorem (Exercise A.11) <strong>to</strong> pick a compact set Γℓ such that λ(Γc ℓ ) <<br />

e−2ℓ2. Then, Aℓ = {ν : ν(Γℓ) ≥ 1 − 1/ℓ} is closed in the weak <strong>to</strong>pology<br />

σ(M1(S), Cb(S)) because (c) of the portm<strong>an</strong>teau theorem (Exercise A.3)<br />

implies<br />

ν(Γℓ) ≥ lim νj(Γℓ), if νj → ν.<br />

j→∞<br />

The set KL = ∩ℓ≥LAℓ is thus also closed. Since KL is also tight it is compact<br />

(by Prohorov’s theorem, page 169). But now <strong>on</strong>e has<br />

n<br />

Thus,<br />

Qn(A c ℓ ) = P {Ln(Γ c ℓ ) > 1/ℓ} = P<br />

1IΓ c ℓ (Xi)<br />

<br />

> n/ℓ<br />

i=1<br />

≤ e −2nℓ <br />

P<br />

E e 2ℓ2 Pn i=1 1I <br />

Γc (Xi)<br />

= e −2nℓ E λ<br />

e 2ℓ21I Γc (X1) n ≤ e −2nℓ (e 2ℓ2<br />

e −2ℓ2<br />

+ 1) n ≤ e −nℓ .<br />

Qn(K c L) ≤ <br />

ℓ≥L<br />

e −nℓ ≤ 2e −nL<br />

<strong>an</strong>d thereby we have verified exp<strong>on</strong>ential tightness of {Qn}.<br />

Next, we focus our attenti<strong>on</strong> <strong>on</strong> the lower bound for open sets. This is<br />

d<strong>on</strong>e using a quite st<strong>an</strong>dard ch<strong>an</strong>ge of measure argument. Basically, getting


64 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

a lower bound amounts <strong>to</strong> computing the “cost” of forcing our process <strong>to</strong><br />

behave differently. This me<strong>an</strong>s the marginal λ is replaced by some other<br />

marginal µ. Mathematically, replacing <strong>on</strong>e measure by <strong>an</strong>other is d<strong>on</strong>e via<br />

Rad<strong>on</strong>-Nikodym derivatives, <strong>an</strong>d this way relative entropy enters the calculati<strong>on</strong>.<br />

Let us now be precise.<br />

Let G be <strong>an</strong> open subset of M1(S) <strong>an</strong>d µ ∈ G. Without loss of generality,<br />

we c<strong>an</strong> assume H(µ | λ) < ∞ (otherwise, µ does not matter when computing<br />

infG H). This implies that µ ≪ λ. Let φ = dµ<br />

dλ <strong>an</strong>d let Q = µ⊗N be the law<br />

of the i.i.d. sequence <strong>with</strong> marginal µ. Let Fn = σ(X1, . . . , Xn). Then,<br />

dQ<br />

dP<br />

<br />

<br />

(x1, . . . , xn) =<br />

Fn<br />

n<br />

φ(xi) = Φn(x).<br />

i=1<br />

Here, we used the notati<strong>on</strong> x = (xi)i≥1. Now write<br />

1<br />

n log P {Ln ∈ G} ≥ 1<br />

n log EP [1IG(Ln)1I{Φn > 0}]<br />

= 1<br />

n log EQ [1IG(Ln)Φ −1<br />

n ]<br />

= 1<br />

n log<br />

<br />

1<br />

Q{Ln ∈ G} EQ [1IG(Ln)Φ −1<br />

<br />

n ] + 1<br />

n log Q{Ln ∈ G}<br />

−1<br />

≥<br />

nQ{Ln ∈ G} EQ [1IG(Ln) log Φn] + 1<br />

n log Q{Ln ∈ G} .<br />

In the third line above, we used Jensen’s inequality (page 14 of [15] or<br />

page 40 of [26]) <strong>with</strong> the c<strong>on</strong>vex functi<strong>on</strong> − log x. Now use the fact that<br />

x log x ≥ −1/e <strong>to</strong> write<br />

Thus,<br />

E Q [1IG(Ln) log Φn] = E Q [log Φn] − E Q [1IG c(Ln) log Φn]<br />

1<br />

n log P {Ln ∈ G} ≥<br />

1<br />

Q{Ln ∈ G}<br />

= nE µ [log φ] − E P [1IG c(Ln)Φn log Φn]<br />

≤ nH(µ | λ) + 1/e.<br />

{−H(µ | λ) − 1/(ne)} + 1<br />

n log Q{Ln ∈ G}.<br />

By part (a) of Propositi<strong>on</strong> 5.5 <strong>on</strong>e c<strong>an</strong> find a neighborhood of µ inside<br />

G that is determined <strong>on</strong>ly by finitely m<strong>an</strong>y functi<strong>on</strong>s in Cb(S). By the law<br />

of large numbers Q{|E Ln [f] − E µ [f]| < ε} → 1 for all such functi<strong>on</strong>s f <strong>an</strong>d<br />

<strong>an</strong>y ε > 0. Hence, Q{Ln ∈ G} c<strong>on</strong>verges <strong>to</strong> 1. We thus have<br />

1<br />

lim<br />

n→∞ n log Qn(G) ≥ −H(µ | λ).<br />

Taking sup over µ ∈ G finishes the proof.


6.2. S<strong>an</strong>ov’s theorem 65<br />

As we menti<strong>on</strong>ed earlier, empirical measures c<strong>on</strong>tain more informati<strong>on</strong><br />

<strong>on</strong> the process th<strong>an</strong> sample me<strong>an</strong>s do. Using the c<strong>on</strong>tracti<strong>on</strong> principle <strong>on</strong>e<br />

c<strong>an</strong> deduce a versi<strong>on</strong> of Cramér’s theorem for additive functi<strong>on</strong>als of i.i.d.<br />

r<strong>an</strong>dom variables <strong>on</strong> a Polish space.<br />

∗Exercise 6.18. Let S be a Polish space. Let {Xn} be a sequence of i.i.d.<br />

S-valued r<strong>an</strong>dom variables <strong>with</strong> comm<strong>on</strong> law λ <strong>an</strong>d let φ : S → Rd be a<br />

c<strong>on</strong>tinuous functi<strong>on</strong>. Assume that E[ea|φ(X1)| ] < ∞ for all a > 0. Let µn<br />

be the law of the sample me<strong>an</strong> Sn/n = n−1 n k=1 φ(Xk). Prove, <strong>with</strong>out<br />

applying Cramér’s theorem, that LDP(µn, n, I) holds <strong>with</strong> the tight c<strong>on</strong>vex<br />

rate functi<strong>on</strong> I : R d → [0, ∞) given by<br />

I(z) = inf{H(ν | λ) : E ν [φ] = z} = sup<br />

<br />

z · θ − log<br />

e φ(x)·θ λ(dx) : θ ∈ R d<br />

.<br />

Hint: Start <strong>with</strong> φ bounded. The first formula for I is a result of the<br />

c<strong>on</strong>tracti<strong>on</strong> principle. To prove the sec<strong>on</strong>d equality compute I ∗ . Next,<br />

write φ(X) = φ(X)1I{|φ(X)| ≤ b} + φ(X)1I{|φ(X)| > b}. The unbounded<br />

part is c<strong>on</strong>trolled by the fact that E[e a|φ(X)| , |φ(X)| > b] → 0 when b → ∞.<br />

The lower bound follows immediately. If I (b) is the rate corresp<strong>on</strong>ding <strong>to</strong><br />

the bounded part, then for closed F ⊂ R d <strong>on</strong>e has the upper bound<br />

lim<br />

n→∞ n−1 log P {Sn/n ∈ F } ≤ − lim<br />

lim<br />

inf<br />

ε→0 b→∞ z∈Fε<br />

I (b) (z),<br />

where F ε = {z + y : z ∈ F, |y| ≤ ε}. Using the fact that E[e a|φ(X)| ] < ∞<br />

c<strong>on</strong>clude the upper bound in a similar fashi<strong>on</strong> <strong>to</strong> Exercise 2.16.<br />

Remark 6.19. It is interesting that S<strong>an</strong>ov’s theorem itself follows from a<br />

more general versi<strong>on</strong> of Cramér’s theorem in Polish spaces; see Theorem<br />

6.1.3, Corollary 6.2.3, <strong>an</strong>d Lemma 6.2.6 of [8].<br />

Looking at Cramér’s theorem as a c<strong>on</strong>tracti<strong>on</strong> from S<strong>an</strong>ov’s theorem<br />

gives valuable insight.<br />

∗ Exercise 6.20. Let λ ∈ M1(S) <strong>an</strong>d let {Xk} be i.i.d. <strong>with</strong> marginal λ.<br />

Fix a bounded c<strong>on</strong>tinuous functi<strong>on</strong> H : S → R. (The letter H is <strong>to</strong> suggest<br />

Hamilt<strong>on</strong>i<strong>an</strong> or energy; not <strong>to</strong> be c<strong>on</strong>fused <strong>with</strong> relative entropy.) By<br />

Exercise 6.18, we know that the large deviati<strong>on</strong> principle holds for the laws<br />

of<br />

<strong>with</strong> rate functi<strong>on</strong><br />

Hn = 1<br />

n<br />

n<br />

H(Xk)<br />

k=1<br />

p ∗ (z) = inf{H(ν | λ) : ν ∈ M1(S), E ν [H] = z},<br />

where p ∗ is the c<strong>on</strong>vex c<strong>on</strong>jugate of p(t) = log E λ [e tH ] < ∞, t ∈ R.


66 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

(a) Prove that whenever p ∗ (z) < ∞, there is a unique νz ∈ M1(S)<br />

such that E νz [H] = z <strong>an</strong>d p ∗ (z) = H(νz | λ).<br />

Hint: Use lower semic<strong>on</strong>tinuity <strong>an</strong>d compact sublevel sets (Propositi<strong>on</strong><br />

6.9) <strong>to</strong> c<strong>on</strong>clude the existence of the minimizer, <strong>an</strong>d strict<br />

c<strong>on</strong>vexity (Exercise 6.6) <strong>to</strong> c<strong>on</strong>clude its uniqueness.<br />

(b) For β ∈ R, define µβ ∈ M1(S) by<br />

µβ(dx) = e−βH(x)<br />

λ(dx).<br />

ep(−β) (In statistical mech<strong>an</strong>ics, µβ is the <strong>Gibbs</strong> measure at inverse temperature<br />

β.) Prove that p ′ (t) = E µ−t [H],<br />

p ′′ (t) = E µ−t [H 2 ] − E µ−t [H] 2<br />

(the vari<strong>an</strong>ce of H under µ−t), <strong>an</strong>d limt→0 p ′ (t) = H dλ (high<br />

temperature limit).<br />

Define<br />

A = λ-ess inf H <strong>an</strong>d B = λ-ess sup H;<br />

i.e. A = sup{a : λ(H < a) = 0} <strong>an</strong>d B = inf{b : λ(H > b) = 0}.<br />

Show that<br />

lim<br />

t→−∞ p′ (t) = A <strong>an</strong>d lim p<br />

t→∞ ′ (t) = B.<br />

(c) Assume H is not λ-a.s. c<strong>on</strong>st<strong>an</strong>t so that p ′′ > 0. Prove that for<br />

z ∈ (A, B), there is a unique β = βz such that p ′ (−β) = z <strong>an</strong>d<br />

p ∗ (z) = −zβ − p(−β). Furthermore, νz = µβ. In other words, for<br />

each energy value z ∈ (A, B), there is a unique inverse temperature<br />

β such that E µβ[H] = z, the <strong>Gibbs</strong> measure µβ minimizes the<br />

entropy H(ν | λ) subject <strong>to</strong> the energy c<strong>on</strong>straint E ν [H] = z, <strong>an</strong>d<br />

the minimum entropy is precisely the value p ∗ (z) of the Cramér<br />

rate for Hn.<br />

Here is <strong>an</strong> alternative point of view <strong>to</strong> the results of the above exercise,<br />

<strong>with</strong> a thermodynamical flavor.<br />

∗ Exercise 6.21. Let the notati<strong>on</strong> be as in the previous exercise.<br />

(a) Set Zβ = E λ [e −βH ]. (In statistical mech<strong>an</strong>ics Zβ is the partiti<strong>on</strong><br />

functi<strong>on</strong> at inverse temperature β.) Prove that<br />

− log Zβ = inf βE ν [H] + H(ν | λ) : ν ∈ M1(S) <br />

is uniquely attained at µβ. When β > 0, the above c<strong>an</strong> be rewritten<br />

as<br />

−β −1 log Zβ = inf E ν [H] + β −1 H(ν | λ) : ν ∈ M1(S) .


6.3. Maximum entropy principle 67<br />

Hint: Apply Jensen’s inequality (page 14 of [15] or page 40 of [26])<br />

<strong>to</strong> the integral − log(e −βH ( dν<br />

dλ )−1 )dν.<br />

(b) For z ∈ (A, B), prove that<br />

sup{−H(ν | λ) : ν ∈ M1(S), E ν [H] = z}<br />

is uniquely attained at µβ, where β is uniquely specified by the<br />

equati<strong>on</strong> E µβ[H] = z.<br />

Hint: Existence <strong>an</strong>d uniqueness are shown as in the previous exercise.<br />

The rest is immediate from (a).<br />

<strong>Gibbs</strong> free energy is the amount of thermodynamic energy in a system<br />

that c<strong>an</strong> be c<strong>on</strong>verted in<strong>to</strong> work at a c<strong>on</strong>st<strong>an</strong>t temperature <strong>an</strong>d pressure,<br />

while thermodynamic entropy multiplied by temperature is the amount of<br />

unusable heat the system gives up when work is applied. Thus, free energy<br />

equals the system’s energy (or, more precisely, its enthalpy) less its<br />

thermodynamic entropy multiplied by its temperature. (Think of a group<br />

of <strong>to</strong>ddlers you w<strong>an</strong>t <strong>to</strong> c<strong>on</strong>vince of doing some project <strong>to</strong>gether. You will<br />

have <strong>to</strong> spend <strong>an</strong> unproductive amount of energy just <strong>to</strong> make them sit still,<br />

<strong>an</strong>d then there is the amount of productive energy that goes in<strong>to</strong> the actual<br />

project. The latter is the free energy <strong>an</strong>d the former is the entropy times the<br />

temperature, while the <strong>to</strong>tal amount of energy you <strong>an</strong>d the children spent<br />

is the enthalpy.)<br />

Part (a) of the above exercise characterizes the <strong>Gibbs</strong> measure by a<br />

variati<strong>on</strong>al principle. It is a mathematical statement of the thermodynamical<br />

principle that “left <strong>on</strong> its own, nature tends <strong>to</strong> minimize free energy (or<br />

work)”. (On their own children will minimize the amount of work they do.)<br />

Part (b) is the <strong>Gibbs</strong> c<strong>on</strong>diti<strong>on</strong>ing principle, which says that “under <strong>an</strong><br />

energy c<strong>on</strong>straint, nature maximizes entropy”, <strong>with</strong> the underst<strong>an</strong>ding that<br />

thermodynamic entropy is −H(ν | λ); i.e. it corresp<strong>on</strong>ds <strong>to</strong> the negative<br />

of relative entropy. (Children will maximize disorder until they exhaust<br />

themselves.)<br />

6.3. Maximum entropy principle<br />

In the setting of S<strong>an</strong>ov’s theorem, Ln → λ a.s. as n → ∞. An interesting<br />

questi<strong>on</strong> is: what happens if we c<strong>on</strong>diti<strong>on</strong> the process <strong>to</strong> behave atypically?<br />

For example, if C is a set of probability measures whose closure does not<br />

c<strong>on</strong>tain λ, then P {Ln ∈ C} → 0, but what if we c<strong>on</strong>diti<strong>on</strong> Ln <strong>to</strong> remain<br />

in C? With some technical assumpti<strong>on</strong>s we c<strong>an</strong> give a precise <strong>an</strong>swer: Ln<br />

c<strong>on</strong>verges <strong>to</strong>wards the element(s) of C that minimize H( · | λ). Here is a<br />

precise statement.


68 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

Maximum entropy principle. Suppose C ⊂ X = M1(S) is closed, c<strong>on</strong>vex,<br />

<strong>an</strong>d satisfies<br />

inf<br />

ν∈C<br />

H(ν | λ) = inf H(ν | λ) < ∞.<br />

ν∈C ◦<br />

Then, there is a unique ˜ν ∈ C that minimizes H( · | λ) over C. The c<strong>on</strong>diti<strong>on</strong>ed<br />

laws of Ln c<strong>on</strong>verge weakly <strong>to</strong> a point mass at ˜ν, that is,<br />

lim<br />

n→∞ P {Ln ∈ · | Ln ∈ C} = δ˜ν<br />

in the weak <strong>to</strong>pology of M1(X ) generated by Cb(X ). In fact, this c<strong>on</strong>vergence<br />

is exp<strong>on</strong>ential in the sense that for <strong>an</strong>y weak neighborhood U of ˜ν, there is<br />

a b > 0 such that<br />

P {Ln ∈ U c | Ln ∈ C} ≤ e −nb<br />

for large enough n. Furthermore, for <strong>an</strong>y fixed k, the c<strong>on</strong>diti<strong>on</strong>ed law of Xk<br />

c<strong>on</strong>verges weakly <strong>to</strong> ˜ν, that is,<br />

for all f ∈ Cb(S).<br />

lim<br />

n→∞ E[f(Xk) | Ln ∈ C] = E ˜ν [f]<br />

Remark 6.22. The “entropy” in the name of the principle is thermodynamic<br />

entropy. Hence, minimizing H( · | λ) corresp<strong>on</strong>ds <strong>to</strong> maximizing entropy.<br />

As a heuristic principle maximum entropy is used in statistics <strong>to</strong> solve<br />

the following problem: Suppose our belief is that a r<strong>an</strong>dom variable X of<br />

interest obeys a distributi<strong>on</strong> λ. Then we receive new informati<strong>on</strong> about X,<br />

perhaps by performing <strong>an</strong> experiment, <strong>an</strong>d λ is no l<strong>on</strong>ger compatible <strong>with</strong><br />

the new informati<strong>on</strong>. What should be our new best guess for the unknown<br />

law of X, am<strong>on</strong>g the compatible laws (the set C)? The <strong>an</strong>swer is the law that<br />

is closest <strong>to</strong> λ in “entropy dist<strong>an</strong>ce”. (Quotes are in order because relative<br />

entropy is not a metric <strong>on</strong> probability measures.) The above theorem c<strong>an</strong><br />

be regarded as a theoretical justificati<strong>on</strong> for this principle.<br />

Proof of the maximum entropy principle. As in the above exercises,<br />

existence <strong>an</strong>d uniqueness of ˜ν follow from lower semic<strong>on</strong>tinuity, compact<br />

sublevel sets, <strong>an</strong>d strict c<strong>on</strong>vexity of H. Next, let U be <strong>an</strong>y neighborhood<br />

of ˜ν <strong>an</strong>d write<br />

1<br />

lim<br />

n→∞ n log P {Ln ∈ U c | Ln ∈ C}<br />

<br />

1<br />

= lim<br />

n→∞ n log P {Ln ∈ U c ∩ C} − 1<br />

n log P {Ln<br />

<br />

∈ C} .


6.3. Maximum entropy principle 69<br />

The assumpti<strong>on</strong>s of the theorem imply that 1<br />

n log P {Ln ∈ C} c<strong>on</strong>verges <strong>to</strong><br />

−H(˜ν | λ). Since ˜ν ∈ U c <strong>an</strong>d U c is closed we have<br />

1<br />

lim<br />

n→∞ n log P {Ln ∈ U c | Ln ∈ C} ≤ − inf<br />

ν∈U c H(ν | λ) + H(˜ν | λ) < 0.<br />

∩C<br />

For k ≤ n,<br />

E[f(Xk) | Ln ∈ C] = E[f(Xk)1I{Ln ∈ C}]<br />

P {Ln ∈ C}<br />

n 1<br />

<br />

<br />

= E f(Xk) Ln ∈ C = E{E<br />

n<br />

Ln [f] | Ln ∈ C}.<br />

k=1<br />

Since f is bounded <strong>an</strong>d c<strong>on</strong>tinuous <strong>on</strong> S, F (ν) = fdν is a bounded c<strong>on</strong>tinuous<br />

functi<strong>on</strong> <strong>on</strong> X = M1(S). C<strong>on</strong>sequently by weak c<strong>on</strong>vergence of the<br />

c<strong>on</strong>diti<strong>on</strong>al distributi<strong>on</strong> of Ln, E[F (Ln) | Ln ∈ C] c<strong>on</strong>verges <strong>to</strong> F dδ˜ν =<br />

F (˜ν) = E ˜ν [f]. <br />

As a striking applicati<strong>on</strong> of the maximum entropy principle, <strong>on</strong>e c<strong>an</strong> see<br />

how the <strong>Gibbs</strong> measure arises as a limit of c<strong>on</strong>diti<strong>on</strong>al probabilities <strong>with</strong> a<br />

clear physical me<strong>an</strong>ing.<br />

∗ Exercise 6.23. (Equivalence of ensembles) With notati<strong>on</strong> as in Exercise<br />

6.20, suppose z ∈ (A, B) so there is a unique β such that E µβ[H] = z. Prove<br />

that for <strong>an</strong>y fixed k,<br />

lim<br />

δ→0 lim<br />

n→∞ P {Xk ∈ · | |Hn − z| ≤ δ} = µβ.<br />

Hint: Let Mδ = {ν : |Eν [H] − z| ≤ δ} <strong>an</strong>d m(δ) = infν∈Mδ H(ν | λ). Show<br />

that m is a c<strong>on</strong>tinuous, decreasing functi<strong>on</strong> from [0, ∞) <strong>on</strong><strong>to</strong> [0, H(µβ | λ)].<br />

Deduce that C = Mδ satisfies the assumpti<strong>on</strong>s of the maximum entropy<br />

principle. Let νδ be the entropy minimizing measure in Mδ <strong>an</strong>d show that<br />

νδ → µβ as δ → 0.<br />

The above says, roughly, that if a large number of particles governed by<br />

the free measure P = λ ⊗N is c<strong>on</strong>strained <strong>to</strong> have average energy Hn = z<br />

(in a c<strong>on</strong>trolled experiment, say), then <strong>an</strong> individual particle obeys the<br />

<strong>Gibbs</strong> measure µβ <strong>with</strong> the temperature 1/β adjusted <strong>to</strong> produce the correct<br />

expected energy E µβ[H] = z. In statistical mech<strong>an</strong>ics, the measure<br />

P { · | Hn = z} is called the microc<strong>an</strong><strong>on</strong>ical ensemble, <strong>an</strong>d the <strong>Gibbs</strong> measure<br />

µβ is the c<strong>an</strong><strong>on</strong>ical ensemble. These measures were introduced by Josiah<br />

Willard <strong>Gibbs</strong> (1839 -1903) who is credited <strong>with</strong> systematizing equilibrium<br />

statistical mech<strong>an</strong>ics after the pi<strong>on</strong>eering work of Maxwell <strong>an</strong>d Boltzm<strong>an</strong>n.<br />

The problem of equivalence of ensembles is whether these two ensembles<br />

give equivalent results in the infinite particle limit. This exercise gives <strong>an</strong>


70 6. Relative entropy <strong>an</strong>d large deviati<strong>on</strong>s for empirical measures<br />

affermative <strong>an</strong>swer in a simple special case: the particles do not interact,<br />

<strong>an</strong>d we look <strong>on</strong>ly at the marginal distributi<strong>on</strong> of a single particle.<br />

As the last item we present <strong>an</strong> eleg<strong>an</strong>t classic example of equivalence of<br />

ensembles.<br />

Maxwell’s principle. For <strong>an</strong> integer n ≥ 1 let Xn = (X (n)<br />

1 , . . . , X (n)<br />

n ) ∈ Rn be uniformly distributed <strong>on</strong> the (n − 1)-dimensi<strong>on</strong>al sphere of radius σ √ n.<br />

For k ≤ n let Pk,n be the distributi<strong>on</strong> of (X (n)<br />

1 , . . . , X (n)<br />

k ). Then, for each<br />

fixed k, as n → ∞, Pk,n c<strong>on</strong>verges weakly <strong>to</strong> the distributi<strong>on</strong> of k i.i.d.<br />

normals <strong>with</strong> me<strong>an</strong> 0 <strong>an</strong>d vari<strong>an</strong>ce σ2 .<br />

Proof. Let {Zk} be <strong>an</strong> i.i.d. sequence of st<strong>an</strong>dard normal r<strong>an</strong>dom variables.<br />

For k ≤ n let<br />

Y (n)<br />

k =<br />

σ √ nZk<br />

<br />

Z2 1 + · · · + Z2 .<br />

n<br />

∗Exercise 6.24. Prove that Yn = (Y (n)<br />

1 , . . . , Y (n)<br />

n ) has the same distributi<strong>on</strong><br />

as Xn.<br />

By the str<strong>on</strong>g law of large numbers (Z2 1 + · · · + Z2 n)/n c<strong>on</strong>verges almost<br />

surely <strong>to</strong> E[Z 2 1 ] = 1. The claim follows.


<strong>Large</strong> deviati<strong>on</strong>s for<br />

i.i.d. fields at the<br />

process level<br />

Chapter 7<br />

In the previous chapter we saw how empirical distributi<strong>on</strong>s have more informati<strong>on</strong><br />

th<strong>an</strong> just the sample me<strong>an</strong>. However, they still do not capture the<br />

full amount of informati<strong>on</strong> that the process (Xk)k≥0 has in it. The object<br />

that does so is the sequence<br />

Rn = 1<br />

n<br />

n<br />

δ (Xj)j≥k ∈ M1(X Z+ ).<br />

k=0<br />

In this chapter, we study large deviati<strong>on</strong>s for the law of this sequence.<br />

We generalize the setting <strong>to</strong> a r<strong>an</strong>dom field (Xi) i∈Z d of i.i.d. r<strong>an</strong>dom variables<br />

indexed by the d-dimensi<strong>on</strong>al cubic lattice <strong>an</strong>d study the large deviati<strong>on</strong>s<br />

of the corresp<strong>on</strong>ding r<strong>an</strong>dom measure Rn. This is called process level large<br />

deviati<strong>on</strong>s, in c<strong>on</strong>trast <strong>with</strong> the positi<strong>on</strong> level of S<strong>an</strong>ov’s theorem. Another<br />

terminology separates large deviati<strong>on</strong>s in<strong>to</strong> level 1 (Cramér’s theorem), level<br />

2 (S<strong>an</strong>ov’s theorem) <strong>an</strong>d level 3 (process level) large deviati<strong>on</strong>s. This more<br />

general point of view turns out <strong>to</strong> be useful for statistical mech<strong>an</strong>ics models,<br />

as we will see in the sec<strong>on</strong>d part of the course.<br />

7.1. Setting<br />

We are given a Polish space X . Let d ∈ N be fixed <strong>an</strong>d define the space<br />

of c<strong>on</strong>figurati<strong>on</strong>s <strong>to</strong> be Ω = X Zd.<br />

A generic element of Ω is a c<strong>on</strong>figurati<strong>on</strong><br />

ω = (ωi) i∈Zd <strong>with</strong> coordinates ωi ∈ X . Endowed <strong>with</strong> the product <strong>to</strong>pology<br />

71


72 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

Ω is Polish as well. Let F be its Borel σ-algebra. Theorem A.12 implies<br />

that M1(Ω), <strong>with</strong> the weak <strong>to</strong>pology generated by Cb(Ω), is also Polish.<br />

Call a functi<strong>on</strong> f : Ω → R local if it is a functi<strong>on</strong> of <strong>on</strong>ly finitely m<strong>an</strong>y<br />

coordinates ωi. Let Cb,loc(Ω) be the space of bounded c<strong>on</strong>tinuous local functi<strong>on</strong>s.<br />

∗ Exercise 7.1. Show that µk → µ in M1(Ω) if <strong>an</strong>d <strong>on</strong>ly if all finite-<br />

dimensi<strong>on</strong>al marginals c<strong>on</strong>verge.<br />

Hint: Use the equivalence of (a) <strong>an</strong>d (b) in the portm<strong>an</strong>teau theorem (Exercise<br />

A.3) al<strong>on</strong>g <strong>with</strong> a density argument.<br />

∗ Exercise 7.2. Prove that the <strong>to</strong>pologies <strong>on</strong> M1(Ω) generated by Cb(Ω)<br />

<strong>an</strong>d Cb,loc(Ω) coincide.<br />

Hint: On each ΩΛ, Λ finite, use the argument <strong>on</strong> page 168 <strong>to</strong> find countably<br />

m<strong>an</strong>y functi<strong>on</strong>s that determine weak c<strong>on</strong>vergence. Put all these functi<strong>on</strong>s<br />

<strong>to</strong>gether <strong>to</strong> form a metric of the type (A.1) that has <strong>on</strong>ly local functi<strong>on</strong>s.<br />

Now use Exercise A.5 <strong>an</strong>d c<strong>on</strong>clude.<br />

We are also given a probability measure λ ∈ M1(X ). Let P = λ⊗Zd be<br />

the product measure <strong>on</strong> Ω. If we define the coordinate process (Xi) i∈Zd as<br />

Xi(ω) = ωi, then P makes this process in<strong>to</strong> <strong>an</strong> i.i.d. r<strong>an</strong>dom field. With<br />

our statistical mech<strong>an</strong>ics applicati<strong>on</strong>s in mind, we sometimes refer <strong>to</strong> Xi’s<br />

as spins.<br />

The shift group {θi : i ∈ Z d } <strong>on</strong> Ω is the group of homeomorphisms<br />

defined by (θiω)j = ωi+j for ω ∈ Ω <strong>an</strong>d i, j ∈ Z d . Let Mθ(Ω) be the space<br />

of shift-invari<strong>an</strong>t probability measures <strong>on</strong> Ω, that is<br />

Mθ(Ω) = {µ ∈ M1(Ω) : µ ◦ θi = µ ∀i ∈ Z d }.<br />

Let I be the σ-algebra of shift-invari<strong>an</strong>t Borel sets of Ω, that is<br />

(7.1)<br />

I = {A ∈ F : θiA = A ∀i ∈ Z d }.<br />

Exercise 7.3. Prove that I is a σ-algebra.<br />

Definiti<strong>on</strong> 7.4. We say that µ ∈ Mθ(Ω) is ergodic if µ(A) ∈ {0, 1} for all<br />

A ∈ I. We denote the set of ergodic probability measures by Me(Ω).<br />

The ergodic theorem implies that a shift-invari<strong>an</strong>t probability measure is<br />

a mixture of ergodic <strong>on</strong>es. Then the extreme points of the c<strong>on</strong>vex set Mθ(Ω)<br />

must be ergodic. (By definiti<strong>on</strong>, extreme points of Mθ(Ω) are measures<br />

P ∈ Mθ(Ω) such that P = tP1+(1−t)P2 <strong>with</strong> t ∈ (0, 1) <strong>an</strong>d P1, P2 ∈ Mθ(Ω)<br />

imply P1 = P2 = P .) The c<strong>on</strong>verse is also easily seen <strong>to</strong> be true. Thus,<br />

Me(Ω) also st<strong>an</strong>ds for the set of extreme points of Mθ(Ω). See Appendix<br />

A.2 for more.


7.2. Specific relative entropy 73<br />

Let us write i = (i1, i2, . . . , id) for points of the indexing lattice Zd , <strong>an</strong>d<br />

then a sequence of cubes Vn = {i ∈ Zd : −n < i1, . . . , id < n} whose uni<strong>on</strong><br />

exhausts Zd . The empirical fields Rn : Ω → M1(Ω) are defined by<br />

Rn(ω) = 1<br />

|Vn|<br />

<br />

δθiω.<br />

By the multidimensi<strong>on</strong>al ergodic theorem (page 170), E Rn(ω) [g] → E P [g]<br />

for all g ∈ L 1 (P ) <strong>an</strong>d P -a.e. ω. Our goal is <strong>to</strong> study large deviati<strong>on</strong>s of the<br />

laws of Rn.<br />

i∈Vn<br />

For ω ∈ Ω, the periodized c<strong>on</strong>figurati<strong>on</strong> ω (n) is defined by ω (n)<br />

i<br />

i ∈ Vn, <strong>an</strong>d ω (n)<br />

i<br />

= ωi for<br />

= ω(n)<br />

j whenever ik = jk mod (2n − 1) for all k = 1, . . . , d.<br />

The periodic empirical fields Rn : Ω → M1(Ω) are then defined by<br />

Rn(ω) = 1 <br />

δθiω |Vn|<br />

(n).<br />

(7.2)<br />

Due <strong>to</strong> periodizati<strong>on</strong>, Rn is measurable <strong>with</strong> respect <strong>to</strong> FVn, the σ-algebra<br />

generated by {ωi : i ∈ Vn}. It is, therefore, easier <strong>to</strong> work <strong>with</strong> Rn th<strong>an</strong><br />

<strong>with</strong> Rn, which depends <strong>on</strong> the whole c<strong>on</strong>figurati<strong>on</strong> ω. However, Rn <strong>an</strong>d Rn<br />

come asymp<strong>to</strong>tically close <strong>to</strong>gether.<br />

i∈Vn<br />

∗Exercise 7.5. Prove that for all bounded local measurable functi<strong>on</strong>s g<br />

<br />

<br />

sup E<br />

ω<br />

e <br />

Rn(ω) Rn(ω) <br />

[g] − E [g] → 0.<br />

As a c<strong>on</strong>sequence, Rn also c<strong>on</strong>verges weakly <strong>to</strong> P . By Exercise 2.17 the<br />

large deviati<strong>on</strong> principle for Rn tr<strong>an</strong>sfers easily <strong>to</strong> <strong>on</strong>e for Rn, <strong>on</strong>ce <strong>on</strong>e has<br />

a tight rate functi<strong>on</strong>.<br />

7.2. Specific relative entropy<br />

Just like it was the case in S<strong>an</strong>ov’s theorem, the rate functi<strong>on</strong> in the large<br />

deviati<strong>on</strong> principle for Rn <strong>an</strong>d Rn, evaluated at the measure Q ∈ Mθ(Ω),<br />

will be the entropy of Q relative <strong>to</strong> P . However, there is <strong>on</strong>e complicati<strong>on</strong>.<br />

Exercise 7.6. Prove that if Q ∈ Mθ(Ω) is absolutely c<strong>on</strong>tinuous relative<br />

<strong>to</strong> P ∈ Me(Ω), then Q = P .<br />

Hint: Prove that Q ∈ Me(Ω) <strong>an</strong>d use Exercise A.15.<br />

The above is due <strong>to</strong> the fact that we now have a product space <strong>an</strong>d<br />

a product measure. To get a n<strong>on</strong>trivial entropy we take a limit of finitedimensi<strong>on</strong>al<br />

entropies normalized by volume.


74 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

For Q ∈ M1(Ω) <strong>an</strong>d Λ ⊂ Z d , let QΛ be the restricti<strong>on</strong> of Q <strong>to</strong> FΛ, the<br />

σ-algebra generated by ωΛ = (ωi)i∈Λ. Let HΛ(Q | P ) be the entropy of QΛ<br />

relative <strong>to</strong> PΛ. For Λ = Vn we abbreviate <strong>an</strong>d write Qn <strong>an</strong>d Hn.<br />

Theorem 7.7. For each Q ∈ Mθ(Ω), the specific relative entropy<br />

exists, <strong>an</strong>d is also given by<br />

(7.3)<br />

1<br />

h(Q | P ) = lim<br />

n→∞ |Vn| Hn(Q | P )<br />

1<br />

h(Q | P ) = sup<br />

Λ∈R |Λ| HΛ(Q | P ),<br />

where R is the collecti<strong>on</strong> of finite rect<strong>an</strong>gles in Z d .<br />

The following simple argument is sufficiently eleg<strong>an</strong>t <strong>an</strong>d comm<strong>on</strong> that<br />

it deserves <strong>to</strong> be presented as a separate lemma.<br />

Fekete’s lemma. Suppose that for Λ ∈ R we have aΛ ∈ [0, ∞] that satisfy<br />

Then<br />

(a) aΛ + a∆ ≤ aΛ∪∆ whenever Λ ∩ ∆ = ∅ <strong>an</strong>d Λ ∪ ∆ ∈ R.<br />

(b) aΛ = ai+Λ for all i ∈ Z d <strong>an</strong>d Λ ∈ R.<br />

aVn aΛ<br />

lim = sup<br />

n→∞ |Vn| Λ |Λ| .<br />

Proof. Fix Λ. Let {k+Λ : k ∈ K} be a tiling of Zd by disjoint shifted copies<br />

of Λ. Let b(n) be the number of these copies that are c<strong>on</strong>tained in Vn. Let<br />

{k + Λ : k ∈ K ′ } be the copies that intersect Vn but do not lie entirely inside<br />

Vn. Set Λn k = Vn ∩ (k + Λ). Now (a) c<strong>an</strong> be applied repeatedly <strong>to</strong> get the<br />

inequality<br />

aVn ≥ b(n)aΛ + <br />

≥ b(n)aΛ.<br />

k∈K ′<br />

Thus,<br />

aVn<br />

lim<br />

n→∞ |Vn| ≥ aΛ<br />

b(n) aΛ<br />

lim =<br />

n→∞ |Vn| |Λ| .<br />

The other directi<strong>on</strong> is trivial. <br />

Proof of Theorem 7.7. We verify (a) <strong>an</strong>d (b) of the above lemma for aΛ =<br />

HΛ(Q | P ). By the variati<strong>on</strong>al characterizati<strong>on</strong> (6.3) of relative entropy,<br />

aΛ n k<br />

HΛ(Q | P ) = sup<br />

f∈Cb(X Λ )<br />

{E QΛ PΛ f<br />

[f] − log E [e ]}.<br />

Note that we make no notati<strong>on</strong>al distincti<strong>on</strong> between QΛ as a measure <strong>on</strong><br />

the σ-algebra FΛ <strong>on</strong> the space Ω, <strong>an</strong>d as a measure <strong>on</strong> the space X Λ . Let


7.2. Specific relative entropy 75<br />

f ∈ Cb(X Λ ) <strong>an</strong>d g ∈ Cb(X ∆ ). Then, f <strong>an</strong>d g are independent under P ,<br />

HΛ∪∆(Q | P ) ≥ E Q [f + g] − log E P [e f+g ]<br />

Taking sup over such f <strong>an</strong>d g gives<br />

= (E Q [f] − log E P [e f ]) + (E Q [g] − log E P [e g ]).<br />

HΛ∪∆(Q | P ) ≥ HΛ(Q | P ) + H∆(Q | P ). <br />

The shift invari<strong>an</strong>ce part (b) follows also from the variati<strong>on</strong>al characterizati<strong>on</strong><br />

because for Q ∈ Mθ(Ω) QΛ <strong>an</strong>d Qi+Λ coincide as measures <strong>on</strong><br />

X Λ .<br />

The specific entropy has nice properties.<br />

Propositi<strong>on</strong> 7.8. h( · | P ) is affine <strong>on</strong> Mθ(Ω), lower semic<strong>on</strong>tinuous, <strong>an</strong>d<br />

the sublevel sets {h ≤ c} are compact in the weak <strong>to</strong>pology generated by<br />

Cb(Ω). Furthermore, h(Q | P ) = 0 if, <strong>an</strong>d <strong>on</strong>ly if, Q = P .<br />

Proof. Since each HΛ is c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous their supremum<br />

h inherits these properties. Hence, we need <strong>to</strong> show that h is also c<strong>on</strong>cave.<br />

To this end, suppose Q = tQ 1 +(1−t)Q 2 . We w<strong>an</strong>t <strong>to</strong> show that h(Q | P ) ≥<br />

th(Q 1 | P ) + (1 − t)h(Q 2 | P ).<br />

Assume Qn ≪ Pn for all n. (Otherwise, h(Q | P ) = ∞ <strong>an</strong>d there is<br />

nothing <strong>to</strong> prove.) Let f 1 n = dQ1n dPn <strong>an</strong>d f 2 n = dQ2n . Write<br />

dPn<br />

Hn(Q | P ) = E Q<br />

log dQn<br />

<br />

dPn<br />

= tE Q1 1<br />

log(tfn + (1 − t)f 2 n) + (1 − t)E Q2 1<br />

log(tfn + (1 − t)f 2 n) <br />

≥ tE Q1 log(tf 1 n) + (1 − t)E Q2 log((1 − t)f 2 n) <br />

= tHn(Q 1 | P ) + (1 − t)Hn(Q 2 | P ) + t log t + (1 − t) log(1 − t).<br />

Divide by |Vn| <strong>an</strong>d let n grow <strong>to</strong> infinity.<br />

Since the weak <strong>to</strong>pology is metric, <strong>to</strong> prove compactness of sublevel sets<br />

it is enough <strong>to</strong> show sequential compactness. Thus, fix c ∈ R <strong>an</strong>d let {Q ℓ }<br />

be a sequence such that h(Q ℓ | P ) ≤ c for all ℓ ∈ N.<br />

Since H1 ≤ h, Propositi<strong>on</strong> 6.9 implies the existence of a subsequence<br />

{Q ℓ(j,1) }j∈N <strong>with</strong> h(Q ℓ(j,1) | P ) ≤ c <strong>an</strong>d such that the marginals <strong>on</strong> X V1 c<strong>on</strong>verge<br />

weakly <strong>to</strong> a probability measure ρ1 ∈ M1(X V1 ). Next, H2 ≤ |V2|h implies<br />

the existence of a further subsequence {Q ℓ(j,2) }j∈N <strong>with</strong> h(Q ℓ(j,2) | P ) ≤<br />

c <strong>an</strong>d marginals <strong>on</strong> X V2 c<strong>on</strong>verging weakly <strong>to</strong> a probability measure ρ2 ∈<br />

M1(X V2 ). Inductively, given {Q ℓ(j,n−1) }j∈N, by appeal <strong>to</strong> compact sublevel<br />

sets of relative entropy we extract a further subsequence {Q ℓ(j,n) }j∈N of


76 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

{Q ℓ(j,n−1) }j∈N for which h ≤ c c<strong>on</strong>tinues <strong>to</strong> hold <strong>an</strong>d the marginals <strong>on</strong> X Vn<br />

c<strong>on</strong>verge <strong>to</strong> ρn ∈ M1(X Vn ).<br />

Now, {Qℓ(j,j) }j∈N is a subsequence of {Qℓ(j,n) }j∈N, for <strong>an</strong>y fixed n, <strong>an</strong>d<br />

thus {Q ℓ(j,j)<br />

n }j∈N c<strong>on</strong>verge weakly <strong>to</strong> ρn. This kind of argument is called the<br />

diag<strong>on</strong>al trick.<br />

Observe that if m > n, then the restricti<strong>on</strong> of ρm <strong>to</strong> X Vn is the weak limit<br />

of the restricti<strong>on</strong>s of Q ℓ(j,j)<br />

m <strong>to</strong> Vn which are precisely Q ℓ(j,j)<br />

n <strong>an</strong>d thus have<br />

limit ρn. In other words, {ρn} form a c<strong>on</strong>sistent family of finite-dimensi<strong>on</strong>al<br />

distributi<strong>on</strong>s. Kolmogorov’s extensi<strong>on</strong> theorem (see page 474 of [15] or page<br />

60 of [26]) implies then that there exists a probability measure Q ∈ M1(Ω)<br />

such that Qn = ρn for all n. But then the finite-dimensi<strong>on</strong>al marginals of<br />

{Qℓ(j,j) } c<strong>on</strong>verge <strong>to</strong> those of Q <strong>an</strong>d thus the sequence itself c<strong>on</strong>verges weakly<br />

<strong>to</strong> Q. This proves the sequential compactness (<strong>an</strong>d hence the compactness)<br />

of {h ≤ c}.<br />

The statement about the zeroes of h follows directly from Lemma 6.4<br />

<strong>an</strong>d (7.3). <br />

Recall that Me(Ω) is the set of extreme points of the c<strong>on</strong>vex set Mθ(Ω).<br />

Because the space of finite measures is infinite-dimensi<strong>on</strong>al, <strong>an</strong> interesting<br />

phenomen<strong>on</strong> happens. Me(Ω) is in fact dense in Mθ(Ω). This denseness<br />

allows us <strong>to</strong> apply the ergodic theorem <strong>to</strong> prove the lower bound in the LDP<br />

we are after, quite similarly <strong>to</strong> the way the ergodic theorem was used <strong>to</strong><br />

prove the lower bound in S<strong>an</strong>ov’s theorem.<br />

Lemma 7.9. For Q ∈ Mθ(Ω), there are Q k ∈ Me(Ω) that c<strong>on</strong>verge weakly<br />

<strong>to</strong> Q <strong>an</strong>d such that h(Q k | P ) → h(Q | P ).<br />

Proof. Let {j + Vn : j ∈ In}, In = (2n − 1)Z d , be a covering of Z d by<br />

disjoint shifted copies of Vn. Let Q n,⊗ be the measure <strong>on</strong> Ω which makes<br />

the σ-algebras Fj+Vn, j ∈ In, independent, but coincides <strong>with</strong> Q <strong>on</strong> each<br />

such σ-algebra. More precisely, Q n,⊗ (dω) = ⊗j∈InQj+Vn(dωj+Vn). Then,<br />

Q n = 1 <br />

Q<br />

|Vn|<br />

i∈Vn<br />

n,⊗ ◦ θ−i<br />

is shift invari<strong>an</strong>t. We prove that this sequence satisfies the claim of the<br />

lemma.<br />

First, we prove the weak c<strong>on</strong>vergence <strong>to</strong> Q. To this end, let g be <strong>an</strong>y<br />

bounded Fm-measurable functi<strong>on</strong> <strong>an</strong>d take n > m. Then<br />

E Qn<br />

[g] = 1 <br />

E<br />

|Vn|<br />

Qn,⊗<br />

[g ◦ θi] + O(cm,n),<br />

i∈Vn<br />

i+Vm⊂Vn


7.2. Specific relative entropy 77<br />

where cm,n = |{i ∈ Vn : i + Vm ⊂ Vn}/|Vn| → 0 as n → ∞. But if<br />

i + Vm ⊂ Vn, then E Qn,⊗<br />

[g ◦ θi] = E Q [g]. Thus,<br />

E Qn<br />

[g] = (1 − cm,n)E Q [g] + O(cm,n) −→ E<br />

n→∞ Q [g].<br />

Next, we show that the measures Qn are ergodic. Let A be in the shiftinvari<strong>an</strong>t<br />

σ-algebra. Then,<br />

Q n (A) = 1 <br />

Q<br />

|Vn|<br />

n,⊗ (θ−iA) = Q n,⊗ (A).<br />

i∈Vn<br />

Exercise A.18 gives a tail measurable event B such that 0 = Q n (A∆B) ≥<br />

1<br />

|Vn| Qn,⊗ (A∆B). Thus, Q n,⊗ (A) = Q n,⊗ (B). By Kolmogorov’s 0-1 law<br />

(Exercise A.17), Q n,⊗ (B) ∈ {0, 1}. The ergodicity of Q n has been checked.<br />

Finally, we need <strong>to</strong> prove the c<strong>on</strong>vergence of specific entropies. By the<br />

lower semic<strong>on</strong>tinuity of h we know that h(Q | P ) ≤ lim n→∞ h(Q n | P ). To<br />

prove the other directi<strong>on</strong> we first use c<strong>on</strong>vexity <strong>to</strong> write<br />

h(Q k 1<br />

| P ) = lim<br />

n→∞ |Vn| Hn(Q k 1 1 <br />

| P ) ≤ lim<br />

Hn(Q<br />

n→∞ |Vn| |Vk|<br />

k,⊗ ◦ θ−i | P ).<br />

Note that the shift invari<strong>an</strong>ce of P implies that<br />

i∈Vk<br />

Hn(Q k,⊗ ◦ θ−i | P ) = Hi+Vn(Q k,⊗ | P ) ≤ H W i n,k (Q k,⊗ | P ),<br />

where W i n,k is the smallest cube c<strong>on</strong>taining i + Vn <strong>an</strong>d made up of Vk <strong>an</strong>d<br />

disjoint shifted copies of it.<br />

Exercise 7.10. Prove the following<br />

(a) Let X <strong>an</strong>d Y be two Polish spaces. Let α, µ ∈ M1(X ) <strong>an</strong>d β, ν ∈<br />

M1(Y). Then H(α ⊗ β | µ ⊗ ν) = H(α | µ) + H(β | ν).<br />

(b) Let X be a Polish space <strong>an</strong>d λ, µ ∈ M1(X ). Then h(µ ⊗Zd<br />

H(µ | λ).<br />

| λ⊗Zd) =<br />

From (a) in the above exercise H W i n,k (Q k,⊗ | P ) = |W i n,k |<br />

|Vk| Hk(Q | P ). Since<br />

|W i n,k | ≤ (2n + 4k)d <strong>an</strong>d |Vn| = (2n − 1) d , <strong>on</strong>e has<br />

(7.4)<br />

h(Q k | P ) ≤ lim<br />

n→∞<br />

1 1 <br />

Hn(Q<br />

|Vn| |Vk|<br />

k,⊗ ◦ θ−i | P )<br />

i∈Vk<br />

≤ 1<br />

|Vk| Hk(Q | P ) ≤ h(Q | P ).<br />

This completes the proof of the lemma. <br />

We end this secti<strong>on</strong> <strong>with</strong> <strong>an</strong> exercise that shows how specific relative<br />

entropy c<strong>an</strong> in fact be seen as a relative entropy.


78 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

∗ Exercise 7.11. Suppose A1 ⊂ A2 ⊂ · · · are σ-algebras generating a σ-<br />

algebra A . Let µ <strong>an</strong>d ν be two probability measures <strong>on</strong> A . Prove that<br />

HAn (µ | ν) increases <strong>to</strong> H(µ | ν) as n → ∞. Here HAn is relative entropy of<br />

the restricti<strong>on</strong>s <strong>to</strong> An.<br />

Hint: Use the variati<strong>on</strong>al formulati<strong>on</strong> of relative entropy <strong>to</strong> see that the<br />

limit exists <strong>an</strong>d equals supn HAn (µ | ν). To identify the limit as H(µ | ν),<br />

prove a suitable <strong>an</strong>alogue of Lemma B.1. Note that it is easier <strong>to</strong> solve the<br />

exercise in the special case An = FVn th<strong>an</strong> for general σ-algebras because<br />

then <strong>on</strong>e c<strong>an</strong> apply Lemma B.1.<br />

Exercise 7.12. Let Q ∈ Mθ(Ω). Order Zd lexicographically (i.e. by first<br />

coordinate, then sec<strong>on</strong>d, etc), <strong>an</strong>d let Ui = {k ∈ Zd : k < i} be the lexicographic<br />

past of the site i. (Of course, if d = 1 this l<strong>an</strong>guage has a clear<br />

me<strong>an</strong>ing, but the mathematics works just as well for <strong>an</strong>y dimensi<strong>on</strong> d.) Let<br />

Q(· | FU0 ) be a c<strong>on</strong>diti<strong>on</strong>al probability measure for Q, given the past of 0.<br />

Restrict this measure <strong>to</strong> F0 <strong>to</strong> get Q0(· | FU0 ), the c<strong>on</strong>diti<strong>on</strong>al distributi<strong>on</strong><br />

of X0 under Q, given the past of 0. Let U ∗ 0 = U0 ∪ {0}. Think of QU0 ⊗ λ as<br />

a probability measure <strong>on</strong> FU ∗ 0 = FU0 ⊗ F0 in the obvious way. Prove that<br />

h(Q | P ) = H(QU ∗ 0<br />

| QU0 ⊗ λ).<br />

Hint: Fix Vn = {i (1) > i (2) > · · · > i (s) }, <strong>with</strong> s = |Vn|. Let V k n = {i (ℓ) :<br />

k ≤ ℓ ≤ s}. Note that<br />

s−1<br />

Hn(Q | P ) = H0(Q | P ) +<br />

k=1<br />

{H V k n −i (k)(Q | P ) − H V k+1<br />

n<br />

−i (k)(Q | P )}.<br />

By the c<strong>on</strong>diti<strong>on</strong>al entropy formula (Exercise 6.14), the summati<strong>on</strong> term<br />

is equal <strong>to</strong> H V k n −i (k)(QU ∗ 0 | QU0 ⊗ λ). Now <strong>an</strong>y fixed finite subset of U ∗ 0 is<br />

c<strong>on</strong>tained in V k n − i (k) for all but <strong>an</strong> asymp<strong>to</strong>tically v<strong>an</strong>ishing fracti<strong>on</strong> of<br />

i (k) ∈ Vn.<br />

7.3. Pressure <strong>an</strong>d the large deviati<strong>on</strong> principle<br />

We are now ready <strong>to</strong> state the main theorem of this chapter. Throughout<br />

this secti<strong>on</strong>, the setting is the <strong>on</strong>e defined in Secti<strong>on</strong> 7.1. For Q ∈ M1(Ω)<br />

define I(Q) = h(Q | P ) if Q ∈ Mθ(Ω), <strong>an</strong>d I(Q) = ∞ otherwise.<br />

Theorem 7.13. Let µn be the laws of the empirical fields Rn under the<br />

i.i.d. product measure P. Then LDP(µn, |Vn|, I) holds <strong>an</strong>d I is tight. The<br />

same holds for the periodized empirical fields Rn.<br />

The rest of this secti<strong>on</strong> is dedicated <strong>to</strong> the proof of this theorem. The<br />

proof of the lower bound is very similar <strong>to</strong> those of Cramér’s <strong>an</strong>d S<strong>an</strong>ov’s


7.3. Pressure <strong>an</strong>d the large deviati<strong>on</strong> principle 79<br />

2m-1+2r<br />

2m-1<br />

Figure 7.1. Cubes V (ℓ)<br />

m .<br />

theorems <strong>an</strong>d will use the ergodic theorem instead of the law of large numbers.<br />

The upper bound will follow from the general Theorem 5.24 combined<br />

<strong>with</strong> exp<strong>on</strong>ential tightness. The first step is <strong>to</strong> prove that the limit that<br />

defines the pressure in (5.1) exists.<br />

Propositi<strong>on</strong> 7.14. Setting as in the statement of the theorem. Let f be<br />

a bounded measurable local functi<strong>on</strong>. Then the limit defining the pressure<br />

exists, <strong>an</strong>d is the same for Rn <strong>an</strong>d the periodized empirical process Rn:<br />

p(f) = lim<br />

n→∞<br />

= lim<br />

n→∞<br />

1<br />

<br />

log E e<br />

|Vn| P<br />

i∈Vn f◦θi<br />

<br />

1<br />

<br />

log E e<br />

|Vn| |Vn|ERn(ω) [f] <br />

= lim<br />

n→∞<br />

In particular, this holds for all f ∈ Cb,loc(Ω).<br />

2n-1<br />

1<br />

<br />

log E e<br />

|Vn| |Vn|E e <br />

Rn(ω) [f]<br />

.<br />

Proof. Pick r so that f is FVr-measurable. Take two integers m < n <strong>an</strong>d<br />

let V (ℓ)<br />

m ⊂ Vn, ℓ = 1, . . . , k d , be k d shifted copies of Vm arr<strong>an</strong>ged so that<br />

there is dist<strong>an</strong>ce 2r between each adjacent pair in each coordinate directi<strong>on</strong>.<br />

2n−1<br />

Let k be as large as possible. In fact, k = [ 2m+2r−1 ]; see Figure 7.1.<br />

<strong>with</strong><br />

The volume of Vn not covered by the copies of Vm is<br />

|Vn| − k d |Vm| ≤ |Vn| −<br />

<br />

n − m − r<br />

d |Vm| = |Vn|κn,m,<br />

m + r<br />

lim<br />

m→∞ lim<br />

n→∞ κn,m<br />

<br />

|Vm|<br />

= lim 1 −<br />

m→∞ (2m + 2r) d<br />

<br />

= 0.


80 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

Since f is FVr-measurable, the collecti<strong>on</strong>s {f ◦ θi : i ∈ V (ℓ)<br />

m } are inde-<br />

pendent for distinct ℓ. Write<br />

(7.5)<br />

pn(f) = 1<br />

log E<br />

|Vn|<br />

≤ 1<br />

log E<br />

|Vn|<br />

<br />

e P<br />

i∈Vn f◦θi<br />

<br />

Pk d P<br />

ℓ=1<br />

e<br />

i∈V (ℓ)<br />

m<br />

= κn,m f∞ + kd<br />

log E<br />

|Vn|<br />

≤ κn,m f ∞ +<br />

f◦θi<br />

e κn,m|Vn|f <br />

∞<br />

<br />

e P<br />

i∈Vm f◦θi<br />

<br />

|Vm|<br />

pm(f).<br />

(2m + 2r − 1) d<br />

Taking n → ∞ then m → ∞ <strong>on</strong>e has limn→∞ pn(f) ≤ lim m→∞ pm(f).<br />

This proves the existence of the limit p(f). To prove the sec<strong>on</strong>d claim, use<br />

Exercise 7.5 <strong>to</strong> write<br />

e −|Vn|εn E<br />

<br />

exp |Vn|E Rn(ω) [f] <br />

<br />

≤ E<br />

≤ E<br />

exp |Vn|E e Rn(ω) [f] <br />

<br />

exp |Vn|E Rn(ω) [f] <br />

e |Vn|εn ,<br />

where εn = sup ω |E e Rn(ω) [f] − E Rn(ω) [f]| → 0 by Exercise 7.5. <br />

In fact, <strong>on</strong>e also has <strong>an</strong> upper bound <strong>on</strong> the value of the pressure p.<br />

Lemma 7.15. If f is a bounded FVm-measurable functi<strong>on</strong>, then<br />

p(f) ≤ 1<br />

|Vm| log E[e|Vm|f ].<br />

Proof. Pick n so that Vn is a uni<strong>on</strong> of r = |Vn|/|Vm| disjoint shifted copies<br />

of Vm. Let x1, . . . , xr be the centers of these copies, so Vn = ∪ r k=1 (xk +<br />

Vm). Then, use the generalized versi<strong>on</strong> of Hölder’s inequality (proved by<br />

inducti<strong>on</strong>) <strong>an</strong>d then independence <strong>to</strong> write<br />

<br />

E e P<br />

i∈Vn f◦θi<br />

<br />

<br />

= E e Pr k=1 f◦θx<br />

<br />

k +j ≤ <br />

= <br />

j∈Vm<br />

j∈Vm k=1<br />

j∈Vm<br />

<br />

E e |Vm| Pr k=1 f◦θx<br />

1/|Vm| k +j<br />

r<br />

E e |Vm|f 1/|Vm| |Vm|f<br />

= E e |Vn|/|Vm|<br />

.<br />

The claim follows immediately. <br />

Put the space M(Ω) of finite Borel measures in duality <strong>with</strong> Cb,loc(Ω).<br />

We identify the c<strong>on</strong>vex c<strong>on</strong>jugate of p in this duality. By Lemma B.1 we get<br />

(7.6)<br />

H(ν | λ) = sup<br />

f∈Cb,loc(X )<br />

{E ν [f] − log E λ [e f ]}.


7.3. Pressure <strong>an</strong>d the large deviati<strong>on</strong> principle 81<br />

Lemma 7.16. For Q ∈ Mθ(Ω), p ∗ (Q) = h(Q | P ).<br />

Proof. Let f be a bounded c<strong>on</strong>tinuous FVm-measurable functi<strong>on</strong>. Then<br />

p ∗ (Q) ≥ E Q [f/|Vm|] − p(f/|Vm|) ≥ (E Q [f] − log E[e f ])/|Vm|.<br />

Taking sup over such f <strong>an</strong>d using (7.6) implies p ∗ (Q) ≥ Hm(Q | P )/|Vm|, for<br />

all m, <strong>an</strong>d thus p ∗ (Q) ≥ h(Q | P ).<br />

C<strong>on</strong>versely, ¯ f = <br />

i∈Vn f ◦ θi is FVm+n-measurable, hence<br />

Hm+n(Q | P ) ≥ E Q [ ¯ <br />

f] − log E e P<br />

i∈Vn f◦θi<br />

<br />

= |Vn|(E Q [f] − pn(f)),<br />

where pn(f) was defined in (7.5). Dividing by |Vn| <strong>an</strong>d taking n → ∞ <strong>on</strong>e<br />

finds that h(Q | P ) ≥ E Q [f] − p(f) <strong>an</strong>d hence h(Q | P ) ≥ p ∗ (Q). <br />

∗ Exercise 7.17. Prove that if Q ∈ M(Ω) Mθ(Ω), then p ∗ (Q) = ∞.<br />

Next, we deal <strong>with</strong> exp<strong>on</strong>ential tightness.<br />

Lemma 7.18. For all b > 0, there exists a compact set Kb ⊂ M1(Ω) such<br />

that P { Rn ∈ Kb} ≤ e −|Vn|b .<br />

<br />

Proof. By S<strong>an</strong>ov’s theorem (page 62), the laws of Ln = |Vn| −1<br />

i∈Vn δXi<br />

form <strong>an</strong> exp<strong>on</strong>entially tight family of measures <strong>on</strong> M1(X ). Thus, for each<br />

m ∈ N <strong>an</strong>d j ∈ Zd , there is a compact Am,j ⊂ M1(X ) such that<br />

P {Ln ∈ Am,j} ≤ e −|Vn|(m+|j|) .<br />

Since Am,j is compact, Prohorov’s theorem (page 169) implies it is a tight<br />

family of probability measures <strong>an</strong>d there is a compact Um,j ⊂ X such that<br />

µ(U c m,j ) < e−(m+|j|) for all µ ∈ Am,j. Define<br />

Hm = {Q ∈ M1(Ω) : Q(ω : ωj ∈ U c m,j) ≤ e −(m+|j|) ∀j ∈ Z d }<br />

<strong>an</strong>d observe that if Qℓ ∈ Hm c<strong>on</strong>verge weakly <strong>to</strong> Q, then the portm<strong>an</strong>teau<br />

theorem (Exercise A.3) implies<br />

Q(ωj ∈ U c m,j) ≤ lim Qℓ(ωj ∈ U<br />

ℓ→∞<br />

c m,j) ≤ e −(m+|j|) .<br />

Thus, Hm is closed. Take ℓ large enough such that <br />

m≥ℓ−b<br />

<br />

j e−(m+|j|) ≤ 1<br />

<strong>an</strong>d define Kb = ∩m≥ℓHm. Then, Kb is a closed tight family of measures<br />

<strong>an</strong>d hence, by Prohorov’s theorem, is compact. Indeed, for ε > 0 take m ≥ ℓ<br />

such that <br />

j e−(m+|j|) < ε. By Tych<strong>on</strong>ov’s theorem (see Theorem A3 in<br />

[33]), Um = <br />

j∈Z d Um,j is compact <strong>an</strong>d, for Q ∈ Kb,<br />

Q(U c m) ≤ <br />

Q(ωj ∈ U c m,j) ≤ <br />

e −(m+|j|) < ε.<br />

j<br />

j


82 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

Finally,<br />

But<br />

Figure 7.2. The bijecti<strong>on</strong> from j + Vn <strong>on</strong><strong>to</strong> Vn.<br />

P { Rn ∈ Kb} ≤ <br />

P { Rn ∈ Hm}<br />

m≥ℓ<br />

≤ <br />

P Rn(ωj ∈ U c m,j) > e −(m+|j|)<br />

.<br />

m≥ℓ<br />

j<br />

Rn(ωj ∈ U c m,j) = 1 <br />

δθiω |Vn|<br />

i∈Vn<br />

(n)(ωj ∈ U c m,j) = 1 <br />

1IU<br />

|Vn|<br />

i∈Vn<br />

c m,j (ω(n)<br />

i+j )<br />

= 1<br />

|Vn|<br />

<br />

˜ι∈Vn<br />

1IU c m,j (ω˜ι) = Ln(U c m,j),<br />

where we observe that there is a bijecti<strong>on</strong> j + i ↦→ ˜ι from j + Vn <strong>on</strong><strong>to</strong> Vn<br />

such that ω (n)<br />

i+j = ω˜ι; see Figure 7.2.<br />

Since Ln(U c m,j ) > e−(m+|j|) implies Ln ∈ Ac m,j , we have<br />

P { Rn ∈ Kb} ≤ <br />

e −|Vn|(m+|j|) <br />

−|Vn|b<br />

≤ e<br />

m≥ℓ<br />

j<br />

m≥ℓ−b<br />

j<br />

e −(m+|j|) ≤ e −|Vn|b .<br />

The lemma is proved. <br />

We are ready <strong>to</strong> complete the proof of the process level LDP.<br />

Proof of Theorem 7.13. Recall that the <strong>to</strong>pology <strong>on</strong> M1(Ω) generated<br />

by Cb(Ω) is the same as that generated by Cb,loc(Ω); see Exercise 7.2. The<br />

upper large deviati<strong>on</strong> bound (2.3) for compact sets, for both the laws of Rn<br />

<strong>an</strong>d of Rn, follows hence from Theorem 5.24 (<strong>with</strong> E = M1(Ω)), Propositi<strong>on</strong><br />

7.14, Lemma 7.16, <strong>an</strong>d Exercise 7.17. Then, by Theorem 3.3, exp<strong>on</strong>ential


7.3. Pressure <strong>an</strong>d the large deviati<strong>on</strong> principle 83<br />

tightness implies the upper bound holds for all closed sets. It remains <strong>to</strong><br />

prove the lower bound.<br />

By Lemma 7.9, it suffices <strong>to</strong> show that for open G ⊂ Mθ(Ω),<br />

1<br />

lim<br />

n→∞ |Vn| log P {Rn ∈ G} ≥ − inf h(Q | P ).<br />

Q∈G∩Me(Ω)<br />

Thus let Q ∈ G be ergodic. We may assume h(Q | P ) < ∞, otherwise<br />

there is nothing <strong>to</strong> prove. In this case we have, for all n, a Rad<strong>on</strong>-Nikodym<br />

derivative fn = dQn<br />

dPn <strong>on</strong> Fn.<br />

line,<br />

Then, using Jensen’s inequality in the third<br />

(7.7)<br />

1<br />

|Vn| log P {Rn ∈ G} ≥ 1<br />

|Vn| log<br />

= 1<br />

|Vn| log Q{Rn ∈ G} + 1<br />

|Vn| log<br />

≥ 1<br />

|Vn| log Q{Rn ∈ G} −<br />

Now, x log x ≥ −1/e, for x > 0, <strong>to</strong> write<br />

<br />

<br />

<br />

log fn dQn = log fn dQn −<br />

Rn∈G<br />

<br />

1IG(Rn)f −1<br />

n dQn<br />

1<br />

<br />

1<br />

Q{Rn ∈ G}<br />

|Vn|Q{Rn ∈ G}<br />

Rn∈G c<br />

<br />

<br />

Rn∈G<br />

Rn∈G<br />

f −1<br />

<br />

n dQn<br />

log fn dQn.<br />

fn log fn dP ≤ Hn(Q | P ) + 1/e.<br />

Thus, (7.7) is bounded below by<br />

1<br />

|Vn| log Q{Rn<br />

1 1<br />

∈ G} −<br />

Q{Rn ∈ G} |Vn| Hn(Q<br />

1<br />

| P ) −<br />

e|Vn|Q{Rn ∈ G} .<br />

By part (a) of Propositi<strong>on</strong> 5.5 <strong>on</strong>e c<strong>an</strong> find a neighborhood of Q inside G<br />

that is determined <strong>on</strong>ly by finitely m<strong>an</strong>y functi<strong>on</strong>s in Cb(Ω). By the ergodic<br />

theorem Q{|E Rn [f] → E Q [f]| < ε} → 1 for all such functi<strong>on</strong>s f <strong>an</strong>d <strong>an</strong>y<br />

ε > 0. Hence, Q{Rn ∈ G} → 1 as n → ∞ <strong>an</strong>d the lower bound follows. The<br />

lower bound also holds for Rn as a c<strong>on</strong>sequence of Exercises 7.5 <strong>an</strong>d 2.17.<br />

The theorem is proved. <br />

In the next part we study <strong>Gibbs</strong> measures <strong>an</strong>d their c<strong>on</strong>necti<strong>on</strong> <strong>to</strong> large<br />

deviati<strong>on</strong> theory. We end this part of the book <strong>with</strong> <strong>an</strong> exercise <strong>on</strong> a phenomen<strong>on</strong><br />

related <strong>to</strong> the <strong>on</strong>e observed in Secti<strong>on</strong> ??, this time in the c<strong>on</strong>text<br />

of coin <strong>to</strong>ssing.<br />

Exercise 7.19. This exercise shows that if a coin <strong>to</strong>ssing process is c<strong>on</strong>diti<strong>on</strong>ed<br />

<strong>to</strong> yield <strong>an</strong> abnormally large fracti<strong>on</strong> of 1’s, the entire process behaves<br />

(in the limit) like a sequence of <strong>to</strong>sses from a biased coin.<br />

Let Ω = {0, 1} N be the sample space for coin <strong>to</strong>sses, <strong>with</strong> sample points<br />

ω = (xk)k∈N <strong>an</strong>d coordinate variables Xk(ω) = xk. For s ∈ [0, 1] let νs =<br />

(1 − s)δ0 + sδ1 be the probability measure <strong>on</strong> {0, 1} that gives a 1 <strong>with</strong>


84 7. <strong>Large</strong> deviati<strong>on</strong>s for i.i.d. fields at the process level<br />

probability s, <strong>an</strong>d Qs = ν ⊗N<br />

s<br />

the probability measure <strong>on</strong> Ω under which the<br />

{Xk} are i.i.d. νs-distributed. Let P = Q 1/2 be the fair coin <strong>to</strong>ssing measure,<br />

<strong>an</strong>d Sn = X1 + . . . + Xn the number of 1’s in the first n observati<strong>on</strong>s.<br />

Let 1/2 < s ≤ 1. Show that P { · | Sn ≥ ns} c<strong>on</strong>verges weakly <strong>to</strong> Qs.<br />

Hint: To prove this you c<strong>an</strong> use large deviati<strong>on</strong> theory in a m<strong>an</strong>ner <strong>an</strong>alogous<br />

<strong>to</strong> the proof of the maximum entropy principle (page 68). Observe<br />

that if f is a bounded local functi<strong>on</strong> <strong>on</strong> Ω, then<br />

E P [f | Sn ≥ ns] = E P E Rn [f] | Sn ≥ ns + a small error.<br />

Use Exercise 7.12 <strong>to</strong> show that Qs is the unique minimizer of h(Q | P ) subject<br />

<strong>to</strong> E Q [X0] ≥ s.


Part II<br />

Statistical Mech<strong>an</strong>ics


Formalism for classical<br />

lattice systems<br />

Chapter 8<br />

Recall the setting from the previous chapter. Let X be a Polish space. This<br />

will be the single spin space. Ω = X Zd is the space of spin c<strong>on</strong>figurati<strong>on</strong>s.<br />

Generic elements of Ω are primarily denoted by σ, <strong>an</strong>d <strong>on</strong> occasi<strong>on</strong> by ω or<br />

τ. On Ω there is <strong>an</strong> a priori or reference measure λ that reflects complete<br />

lack of interacti<strong>on</strong>. Hence, we assume that λ is a product measure under<br />

which the spins (σi) i∈Zd are i.i.d. X -valued r<strong>an</strong>dom variables. For example,<br />

if X = {−1, +1} then a natural choice is the fair coin <strong>to</strong>ssing measure<br />

λ = ( 1<br />

⊗Zd<br />

δ+1) .<br />

2δ−1 + 1<br />

2<br />

Knowing σ ∈ Ω me<strong>an</strong>s we have perfect knowledge of the microscopic<br />

state. This involves <strong>to</strong>o m<strong>an</strong>y variables <strong>an</strong>d physical measurements are subject<br />

<strong>to</strong> statistical fluctuati<strong>on</strong>s. Knowing the distributi<strong>on</strong> of the microscopic<br />

states is thus more reas<strong>on</strong>able in that it leads <strong>to</strong> fewer parameters <strong>an</strong>d explains<br />

the apparent deterministic behavior of the system. We are hence<br />

given a probability measure µ ∈ M1(Ω) that we call the macroscopic state.<br />

8.1. Finite volume model<br />

Fix a finite subset Λ ⊂ Z d <strong>an</strong>d let ΩΛ = X Λ . Let FΛ be the corresp<strong>on</strong>ding<br />

Borel σ-algebra. A Hamilt<strong>on</strong>i<strong>an</strong> HΛ is a bounded FΛ-measurable functi<strong>on</strong><br />

that gives the energy HΛ(σΛ) of a c<strong>on</strong>figurati<strong>on</strong> σΛ ∈ ΩΛ. The <strong>Gibbs</strong> measure<br />

µΛ describes the equilibrium of the interacting spins in the volume Λ.<br />

87


88 8. Formalism for classical lattice systems<br />

It is justified via equivalence of ensembles (Exercise 6.23) <strong>an</strong>d is defined as<br />

(8.1)<br />

dµΛ = e−βHΛ<br />

dλΛ,<br />

ZΛ<br />

where ZΛ = E λΛ[e −βHΛ] is the normalizing c<strong>on</strong>st<strong>an</strong>t called the partiti<strong>on</strong><br />

functi<strong>on</strong>, λ ∈ M1(Ω) is the a priori (or reference) measure, <strong>an</strong>d β > 0 is the<br />

inverse temperature β = 1/T .<br />

The equilibrium of the interacting spins is also defined by a variati<strong>on</strong>al<br />

principle: the free energy −β −1 log ZΛ (from statistical mech<strong>an</strong>ics) minimizes<br />

the Helmholtz free energy (from thermodynamics),<br />

(8.2)<br />

−β −1 log ZΛ = inf<br />

ν∈M1(ΩΛ) {Eν [HΛ] + β −1 H(ν | λΛ)},<br />

<strong>an</strong>d the measure at which the infimum is attained is called the equilibrium<br />

measure. These two descripti<strong>on</strong>s of the equilibrium coincide; recall Exercise<br />

6.21.<br />

The above variati<strong>on</strong>al principle reflects a fundamental idea in this subject:<br />

a bal<strong>an</strong>ce between two competing microscopic effects. On the <strong>on</strong>e<br />

h<strong>an</strong>d, the interacti<strong>on</strong> of the spins, measured by the Hamilt<strong>on</strong>i<strong>an</strong>, induces<br />

<strong>an</strong> ordering effect. On the other h<strong>an</strong>d, thermal moti<strong>on</strong>, measured by the<br />

entropy, has a r<strong>an</strong>domizing effect. At higher temperatures, or small β, we<br />

expect more thermal moti<strong>on</strong>, while at low temperatures, or high β, we expect<br />

more order. We have already observed such a bal<strong>an</strong>ce in the Curie-Weiss<br />

model in Secti<strong>on</strong> 4.3 where, in the absence of <strong>an</strong> external magnetic field,<br />

magnetizati<strong>on</strong> occurs at low temperature <strong>an</strong>d is lost at high temperature.<br />

We wish <strong>to</strong> develop this formalism for infinite volume Λ = Z d in order<br />

<strong>to</strong> observe phase tr<strong>an</strong>siti<strong>on</strong>. In infinite volume (8.1) <strong>an</strong>d (8.2) do not work<br />

directly. In the previous chapter we already r<strong>an</strong> in<strong>to</strong> the problem that relative<br />

entropy becomes somewhat trivial in infinite volume, <strong>an</strong>d we corrected<br />

this by taking limits normalized by volume. In a similar vein we introduce<br />

next <strong>an</strong> interacti<strong>on</strong> potential that will describe the ordering effect across microscopic<br />

dist<strong>an</strong>ces <strong>on</strong> the lattice. This enables us <strong>to</strong> define infinite volume<br />

<strong>Gibbs</strong> measures.<br />

8.2. Potentials <strong>an</strong>d Hamilt<strong>on</strong>i<strong>an</strong>s<br />

Definiti<strong>on</strong> 8.1. Φ = {ΦA : A ⊂ Z d finite, ΦA : Ω → R} is <strong>an</strong> absolutely<br />

summable, shift-invari<strong>an</strong>t interacti<strong>on</strong> potential if for all i ∈ Z d <strong>an</strong>d A ⊂ Z d<br />

the following holds:<br />

(a) ΦA is FA-measurable,<br />

(b) ΦA+i = ΦA ◦ θi,<br />

(c) Φ = <br />

A:0∈A ΦA ∞ < ∞, <strong>with</strong> ΦA ∞ = sup |ΦA|.


8.2. Potentials <strong>an</strong>d Hamilt<strong>on</strong>i<strong>an</strong>s 89<br />

Let B be the B<strong>an</strong>ach space of absolutely summable, shift-invari<strong>an</strong>t interacti<strong>on</strong><br />

potentials. The summability in part (c) of the above definiti<strong>on</strong> will<br />

make all the sums we write from now <strong>on</strong> well defined.<br />

ΦA(σ) represents the energy of the interacti<strong>on</strong> of the spins in A. We will<br />

say that it has finite r<strong>an</strong>ge R if diam(A) > R implies ΦA ≡ 0.<br />

Exercise 8.2. Prove that finite r<strong>an</strong>ge shift-invari<strong>an</strong>t potentials are dense in<br />

B.<br />

We say that Φ is a two-body or pair potential if ΦA ≡ 0 when A has<br />

more th<strong>an</strong> two points (i.e. |A| > 2). A pair potential <strong>with</strong> r<strong>an</strong>ge 1 is called<br />

a nearest-neighbor potential.<br />

In what follows, we will <strong>on</strong>ly c<strong>on</strong>sider interacti<strong>on</strong> potentials such that<br />

Φ ⊂ Cb(Ω). This is necessary because the large deviati<strong>on</strong> principle we proved<br />

in the previous chapter was in the weak <strong>to</strong>pology.<br />

Note that the <strong>on</strong>e-body or self-potential Φ {i}, for i ∈ Z d , does not c<strong>on</strong>tribute<br />

<strong>to</strong> interacti<strong>on</strong> <strong>an</strong>d hence could be subsumed in the a priori measure λ.<br />

Physically, however, the a priori measure is often c<strong>an</strong><strong>on</strong>ically associated <strong>to</strong><br />

the single spin space X (i.e. λ is a product measure), while the self-potential<br />

describes the effect of external forces <strong>on</strong> the spin, such as external magnetic<br />

fields.<br />

Let Λ ⊂ Zd be finite. The Hamilt<strong>on</strong>i<strong>an</strong> Hfree Λ = HΦ,free<br />

Λ for the volume<br />

Λ <strong>with</strong> free boundary c<strong>on</strong>diti<strong>on</strong> (i.e. no influence from Λc ) is the bounded<br />

FΛ-measurable functi<strong>on</strong><br />

H free<br />

Λ = <br />

ΦA.<br />

A:A⊂Λ<br />

The interacti<strong>on</strong> between spins in Λ <strong>an</strong>d Λ c is described by<br />

WΛ,Λc = W Φ Λ,Λc =<br />

<br />

A∩Λ=∅<br />

A∩Λc=∅ ΦA.<br />

Now fix a c<strong>on</strong>figurati<strong>on</strong> τΛc ∈ ΩΛc <strong>on</strong> the sites outside Λ. The Hamil-<br />

t<strong>on</strong>i<strong>an</strong> H τΛc Λ = HΦ,τ Λc Λ for Λ <strong>with</strong> boundary c<strong>on</strong>diti<strong>on</strong> τΛc is<br />

H τΛc Λ (σΛ) = H free<br />

Λ (σΛ) + WΛ,Λc(σΛ, τΛc) =<br />

<br />

A∩Λ=∅<br />

ΦA(σΛ, τΛc). When a boundary c<strong>on</strong>diti<strong>on</strong> τΛc has been specified, the <strong>Gibbs</strong> measure<br />

πΛ(τ, ·) of the spins in Λ is defined in terms of the integral of a bounded test<br />

functi<strong>on</strong> f <strong>on</strong> ΩΛ by<br />

(8.3)<br />

<br />

ΩΛ<br />

f(σΛ) πΛ(τ, dσΛ) = 1<br />

Z τ Λ c<br />

Λ<br />

<br />

ΩΛ<br />

f(σΛ)e −βHτ Λ c<br />

Λ (σΛ) λΛ(dσΛ).


90 8. Formalism for classical lattice systems<br />

Alternatively, for a test functi<strong>on</strong> f : Ω → R,<br />

<br />

<br />

(8.4)<br />

Ω<br />

f(σ) πΛ(τ, dσ) = 1<br />

Z τ Λ c<br />

Λ<br />

As before, Z τΛc <br />

Λ =<br />

called the partiti<strong>on</strong> functi<strong>on</strong>.<br />

ΩΛ<br />

f(σΛ, τΛ c)e−βHτ Λ c<br />

Λ (σΛ) λΛ(dσΛ).<br />

ΩΛ e−βHτ Λc Λ (σΛ) λΛ(dσΛ) is the normalizati<strong>on</strong> c<strong>on</strong>st<strong>an</strong>t<br />

∗Exercise 8.3. Show that the <strong>Gibbs</strong> kernels defined by (8.4) for a potential<br />

satisfying Definiti<strong>on</strong> 8.1 has the following shift invari<strong>an</strong>ce: for f ∈ bF <strong>on</strong> Ω,<br />

<br />

<br />

f(θ−iσ) πΛ(τ, dσ) = f(σ) πΛ+i(θ−iτ, dσ)<br />

Ω<br />

Before going forward <strong>with</strong> <strong>Gibbs</strong> measures, we turn <strong>to</strong> a study of abstract<br />

probability-measure-valued functi<strong>on</strong>s of the above type.<br />

8.3. Specificati<strong>on</strong>s<br />

Given two measurable spaces (Y, C ) <strong>an</strong>d (Z, D), a s<strong>to</strong>chastic kernel π from<br />

(Y, C ) <strong>to</strong> (Z, D) is a map π : Y × D → [0, 1] such that<br />

(a) π(y, ·) ∈ M1(Z, D) for all y ∈ Y,<br />

(b) y ↦→ π(y, D) is C -measurable for all D ∈ D.<br />

Physically, the s<strong>to</strong>chastic kernel reflects the equilibrium in Z given a<br />

y ∈ Y. From a dynamical systems point of view, the noti<strong>on</strong> of a s<strong>to</strong>chastic<br />

kernel is a generalizati<strong>on</strong> of a dynamical system given by a map. A s<strong>to</strong>chastic<br />

kernel acts as a linear tr<strong>an</strong>sformati<strong>on</strong> that pulls functi<strong>on</strong>s back <strong>an</strong>d pushes<br />

measures forward. The bD → bC map f ↦→ πf is defined by<br />

<br />

πf(y) = f(z)π(y, dz), y ∈ Y<br />

<strong>an</strong>d the M(Y, C ) → M(Z, D) map µ ↦→ µπ is defined by<br />

<br />

µπ(D) = π(y, D)µ(dy), D ∈ D.<br />

Y<br />

Z<br />

Sometimes we will write π y (D), π y f, <strong>an</strong>d π(f) instead of π(y, D), πf(y),<br />

<strong>an</strong>d πf, respectively.<br />

This acti<strong>on</strong> has a clear interpretati<strong>on</strong> in the Markov chain setting.<br />

Example 8.4. The tr<strong>an</strong>siti<strong>on</strong> probability P (x, dy) of a time-homogeneous<br />

Markov chain is a s<strong>to</strong>chastic kernel <strong>on</strong> the state space of the chain.<br />

Another very basic example is c<strong>on</strong>diti<strong>on</strong>al probability.<br />


8.3. Specificati<strong>on</strong>s 91<br />

Example 8.5. Let X be Polish <strong>an</strong>d B be its Borel σ-algebra. Let A ⊂<br />

B be a smaller σ-algebra <strong>an</strong>d µ ∈ M1(X ). Then, there always exists a<br />

c<strong>on</strong>diti<strong>on</strong>al probability measure for µ, given A , which is a s<strong>to</strong>chastic kernel<br />

π from (X , A ) <strong>to</strong> (X , B) such that E µ [fg] = E µ [f πg] for all bounded A -<br />

measurable functi<strong>on</strong>s f <strong>an</strong>d all bounded B-measurable functi<strong>on</strong>s g. See<br />

pages 33 <strong>an</strong>d 230 of [15].<br />

S<strong>to</strong>chastic kernels c<strong>an</strong> be composed. Indeed, if π is a kernel from (Y, C )<br />

<strong>to</strong> (Z, D) <strong>an</strong>d ρ is a kernel from (Z, D) <strong>to</strong> (W, E ), then<br />

<br />

πρ(y, E) = ρ(z, E)π(y, dz)<br />

is a kernel from (Y, C ) <strong>to</strong> (W, E ).<br />

Exercise 8.6. Check that the above compositi<strong>on</strong> does give a s<strong>to</strong>chastic<br />

kernel <strong>an</strong>d that it agrees <strong>with</strong> πρf = π(ρf) <strong>an</strong>d µ(πρ) = (µπ)ρ.<br />

Going back <strong>to</strong> our original setting, we now c<strong>an</strong> define what we me<strong>an</strong> by<br />

a specificati<strong>on</strong>.<br />

Definiti<strong>on</strong> 8.7. A specificati<strong>on</strong> is a family of s<strong>to</strong>chastic kernels Π = {πΛ :<br />

Λ ⊂ Z d finite} such that<br />

(a) πΛ is a s<strong>to</strong>chastic kernel from (Ω, FΛc) <strong>to</strong> (Ω, F ).<br />

(b) πΛ is FΛ c-proper; i.e. if B is FΛ c-measurable, then πΛ(σ, B) =<br />

1IB(σ), for all σ ∈ Ω, or πΛ1IB = 1IB.<br />

(c) If ∆ ⊂ Λ ⊂ Z d are finite, then πΛπ∆ = πΛ.<br />

Note that the restricti<strong>on</strong> of πΛ(σ, ·) <strong>to</strong> FΛ c<strong>an</strong> be physically thought of as<br />

the equilibrium probability for the volume Λ, given the boundary c<strong>on</strong>diti<strong>on</strong><br />

σΛc outside Λ.<br />

Exercise 8.8. Let ∆ ⊂ Λ ⊂ Z d be finite. Prove that (b) implies that<br />

π∆πΛ = πΛ.<br />

∗ Exercise 8.9. Prove that (b) is equivalent <strong>to</strong> πΛ(fg) = gπΛf for g ∈ bFΛ c<br />

<strong>an</strong>d f ∈ bF .<br />

Exercise 8.10. Assume that πΛf = E[f | FΛc] for some unknown probability<br />

P . Show that c<strong>on</strong>diti<strong>on</strong> (c) is then the usual c<strong>on</strong>sistency c<strong>on</strong>diti<strong>on</strong> for<br />

c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong>s; i.e.<br />

<br />

.<br />

E[f | FΛ c] = E E[f | F∆ c] FΛ c<br />

The physical interpretati<strong>on</strong> of (c) is that equilibrium in a volume Λ is<br />

compatible <strong>with</strong> the equilibria of all its subvolumes ∆.


92 8. Formalism for classical lattice systems<br />

Definiti<strong>on</strong> 8.11. For a given specificati<strong>on</strong> Π, define the set<br />

G Π = {µ ∈ M1(Ω) : E µ [f|FΛ c] = πΛf, ∀f ∈ bF , ∀Λ ⊂ Z d finite}.<br />

This is the set of <strong>Gibbs</strong> measures of Π. It c<strong>on</strong>sists of probability measures<br />

admitted by (or c<strong>on</strong>sistent <strong>with</strong>) Π.<br />

A natural questi<strong>on</strong> <strong>to</strong> ask is whether G Π is empty or not. To <strong>an</strong>swer<br />

this questi<strong>on</strong> we will first re-express <strong>an</strong>d weaken the criteri<strong>on</strong> of bel<strong>on</strong>ging<br />

<strong>to</strong> G Π .<br />

Lemma 8.12. The following are equivalent.<br />

(a) µ ∈ G Π ,<br />

(b) µπΛ = µ, for all Λ ⊂ Z d finite,<br />

(c) µπVn = µ, for all n, where Vn = {i ∈ Z d : −n < i1, . . . , id < n}.<br />

Proof. The implicati<strong>on</strong> (a) ⇒ (b) follows from E µ (E µ [f|FΛ c]) = Eµ [f].<br />

(b) trivially implies (c). To prove that (c) implies (b) suppose Λ ⊂ Vn.<br />

Then<br />

µπΛ(A) = µπVnπΛ(A) = µ(πVnπΛ)(A) = µπVn(A) = µ(A).<br />

Finally, (b) implies (a) because if g ∈ bFΛc <strong>an</strong>d f ∈ bF , then<br />

Exercise 8.13. Prove that<br />

E µ [fg] = E µ [πΛ(fg)] = E µ [g πΛf]. <br />

G Π = {µ ∈ M1(Ω) : E µ [f|FΛ c] = πΛf, ∀f ∈ Cb(Ω), ∀Λ ⊂ Z d finite}<br />

= {µ ∈ M1(Ω) : E µ [f|FΛ c] = πΛf, ∀f ∈ Cb,loc(Ω), ∀Λ ⊂ Z d finite}.<br />

Hint: Note that the proof of the above lemma works word for word <strong>with</strong><br />

these alternate definiti<strong>on</strong>s of G Π .<br />

Definiti<strong>on</strong> 8.14. We say that the specificati<strong>on</strong> Π is Feller-c<strong>on</strong>tinuous if for<br />

all finite Λ ⊂ Z d , πΛ is a Feller-c<strong>on</strong>tinuous kernel; i.e. πΛf ∈ Cb(Ω) for all<br />

f ∈ Cb(Ω).<br />

Theorem 8.15. Suppose Π is a Feller-c<strong>on</strong>tinuous specificati<strong>on</strong>. Let Γn be<br />

<strong>an</strong>y sequence of finite subsets of Z d that exhausts Z d , in the sense that <strong>an</strong>y<br />

finite Λ ⊂ Z d is included in Γn for large enough n. Let νn ∈ M1(Ω) be <strong>an</strong>y<br />

sequence. If νnπΓn c<strong>on</strong>verge weakly <strong>to</strong> some µ ∈ M1(Ω), then µ ∈ G Π .<br />

Proof. Fix a finite Λ ⊂ Z d <strong>an</strong>d <strong>an</strong> f ∈ Cb(Ω). Then πΛf ∈ Cb(Ω) <strong>an</strong>d<br />

E µ [πΛf] = lim<br />

n→∞ Eνn [πΓnπΛf] = lim<br />

n→∞ Eνn [πΓnf] = E µ [f]. <br />

As a c<strong>on</strong>sequence of the above theorem, <strong>on</strong>e has the existence of <strong>Gibbs</strong><br />

measures when X is compact <strong>an</strong>d the specificati<strong>on</strong> is Feller-c<strong>on</strong>tinuous.


8.4. <strong>Gibbs</strong> specificati<strong>on</strong>s <strong>an</strong>d phase tr<strong>an</strong>siti<strong>on</strong> 93<br />

8.4. <strong>Gibbs</strong> specificati<strong>on</strong>s <strong>an</strong>d phase tr<strong>an</strong>siti<strong>on</strong><br />

After discussing specificati<strong>on</strong>s in general, let us return <strong>to</strong> specificati<strong>on</strong>s Π =<br />

{πΛ : Λ ⊂ Z d finite} defined as in (8.3) by <strong>an</strong> interacti<strong>on</strong> potential Φ <strong>an</strong>d<br />

a given inverse temperature parameter β > 0. This type of specificati<strong>on</strong> is<br />

called a <strong>Gibbs</strong> specificati<strong>on</strong>. We assume our interacti<strong>on</strong> potentials always<br />

c<strong>on</strong>tinuous: Φ ⊂ Cb(Ω). When there is no c<strong>on</strong>fusi<strong>on</strong>, dependence <strong>on</strong> Φ is<br />

sometimes omitted from the notati<strong>on</strong>.<br />

Exercise 8.16. Prove that if the interacti<strong>on</strong> potential is c<strong>on</strong>tinuous, then<br />

Π = {πΛ : Λ ⊂ Z d finite} defined by (8.3) is a Feller-c<strong>on</strong>tinuous specificati<strong>on</strong>.<br />

G Π is the set of <strong>Gibbs</strong> measures for the potential Φ at inverse temperature<br />

β. The equati<strong>on</strong>s µπΛ = µ defining G Π are called the DLR equati<strong>on</strong>s.<br />

This idea was introduced in 1968 by Dobrushin [12] (1968) <strong>an</strong>d in 1969 by<br />

L<strong>an</strong>ford <strong>an</strong>d Ruelle [28] (1969).<br />

What makes these models interesting is that they are realistic enough <strong>to</strong><br />

account for the physical phenomen<strong>on</strong> known as phase tr<strong>an</strong>siti<strong>on</strong>. Physically,<br />

a phase tr<strong>an</strong>siti<strong>on</strong> is <strong>an</strong> abrupt ch<strong>an</strong>ge in the physical properties of the<br />

system, like water boiling in<strong>to</strong> steam.<br />

More specifically, we are given a family of absolutely summable shiftinvari<strong>an</strong>t<br />

interacti<strong>on</strong> potentials {Φα} ⊂ B. As menti<strong>on</strong>ed above, Φα is c<strong>on</strong>tinuous<br />

for all α. The index α refers <strong>to</strong> the thermodynamic variables, other<br />

th<strong>an</strong> the temperature; e.g. the external magnetic field (denoted by h) or<br />

the pressure (denoted by P ), etc. Since all thermodynamic qu<strong>an</strong>tities c<strong>an</strong><br />

be expressed in terms of the free energy <strong>an</strong>d its derivatives, <strong>on</strong>e way <strong>to</strong><br />

characterize the occurrence of a phase tr<strong>an</strong>siti<strong>on</strong> at (β, α) is by the formati<strong>on</strong><br />

of singularities in the free energy (i.e. <strong>on</strong>e of its derivatives becomes<br />

disc<strong>on</strong>tinuous).<br />

∗ Exercise 8.17. Observe that the finite volume Curie-Weiss model (Secti<strong>on</strong><br />

4.3) does not have a phase tr<strong>an</strong>siti<strong>on</strong> at <strong>an</strong>y β > 0 <strong>an</strong>d h ∈ R. Prove that<br />

the infinite volume limit of the Curie-Weiss model (Theorem 4.15) exhibits<br />

a phase tr<strong>an</strong>siti<strong>on</strong> at points (β, h) <strong>with</strong> h = 0 <strong>an</strong>d β ≥ 1/J.<br />

Hint: Use the <strong>an</strong>alytic implicit functi<strong>on</strong> theorem (page 34 of [19]) <strong>to</strong> show<br />

that the magnetizati<strong>on</strong> m(β, h) is <strong>an</strong>alytic when h = 0 or β < 1/J. Also,<br />

show that m has a jump when β > 1/J <strong>an</strong>d h crosses 0 <strong>an</strong>d that m(β, 0) ∼<br />

3(Jβ − 1) as β ↘ 1/J. Next, apply Varadh<strong>an</strong>’s theorem (page 32) <strong>to</strong> find<br />

that the limit of the free energy per spin is given by<br />

1<br />

−p(β, h) = − lim<br />

n→∞ nβ log<br />

<br />

(8.5)<br />

e −βHn dPn = β −1 I(m) − J<br />

2 m2 − hm.<br />

Use the equati<strong>on</strong> m solves <strong>to</strong> show that m = − ∂p<br />

∂h , when h = 0 or β < 1/J.


94 8. Formalism for classical lattice systems<br />

As observed in the above exercise, there are no singularities in the free<br />

energy of a system of a finite size. The infinite volume limit, however, c<strong>an</strong><br />

lead <strong>to</strong> singularities. The c<strong>on</strong>vergence <strong>to</strong> the thermodynamic limit is fast,<br />

so that the phase behavior is apparent already <strong>on</strong> a relatively small volume,<br />

even though the singularities are smoothed out by the system’s finite size.<br />

Another definiti<strong>on</strong> of phase tr<strong>an</strong>siti<strong>on</strong> is used in the c<strong>on</strong>text of <strong>Gibbs</strong><br />

measures <strong>an</strong>d was introduced by Dobrushin [12].<br />

Definiti<strong>on</strong> 8.18. We say that a phase tr<strong>an</strong>siti<strong>on</strong> occurs at (β, α) when<br />

G Πβ,α has more th<strong>an</strong> <strong>on</strong>e element.<br />

In other words, phase tr<strong>an</strong>siti<strong>on</strong> is characterized by n<strong>on</strong>uniqueness of the<br />

<strong>Gibbs</strong> measure. This is reas<strong>on</strong>able, since it me<strong>an</strong>s the system has several<br />

possible equilibria <strong>to</strong> choose from. That this definiti<strong>on</strong> is physically sound<br />

will be clearly illustrated in the next chapter, when we study the Ising model.<br />

In fact, Theorem 10.2 c<strong>on</strong>firms that this definiti<strong>on</strong> does corresp<strong>on</strong>d <strong>to</strong> the<br />

physical phenomen<strong>on</strong> of phase tr<strong>an</strong>siti<strong>on</strong>. For now, however, let us just<br />

describe things using the Curie-Weiss model even though, strictly speaking,<br />

this model does not corresp<strong>on</strong>d <strong>to</strong> <strong>an</strong>y <strong>Gibbs</strong> specificati<strong>on</strong> <strong>an</strong>d thus does not<br />

fit in<strong>to</strong> the setting in this chapter.<br />

To make our illustrati<strong>on</strong> even more clear, we will describe the more<br />

familiar liquid-gas phase tr<strong>an</strong>siti<strong>on</strong> instead of the magnetizati<strong>on</strong> phase tr<strong>an</strong>siti<strong>on</strong>.<br />

This is d<strong>on</strong>e by simply reinterpreting +1 as “the site is occupied by<br />

a water molecule” <strong>an</strong>d −1 as “the site is empty”. The empirical me<strong>an</strong> Sn/n<br />

is then linearly related <strong>to</strong> the density of particles. A positive me<strong>an</strong> indicates<br />

a liquid phase while a negative me<strong>an</strong> characterizes the gas phase.<br />

The parameter β is still the inverse temperature, while the parameter h<br />

is now called the chemical potential. The more familiar physical qu<strong>an</strong>tity is<br />

pressure. The relati<strong>on</strong> between pressure <strong>an</strong>d chemical potential is complicated,<br />

but increasing h corresp<strong>on</strong>ds <strong>to</strong> increasing the pressure. The phase<br />

diagram in Figure 4.1 is then tr<strong>an</strong>sformed in<strong>to</strong> the <strong>on</strong>e in Figure 8.1.<br />

If looked at qualitatively (i.e. not looking at the numbers, or the precise<br />

equati<strong>on</strong> of the curve), Figure 8.1 corresp<strong>on</strong>ds <strong>to</strong> the phase diagram for both<br />

the Curie-Weiss <strong>an</strong>d the two-dimensi<strong>on</strong>al Ising models, after the chemical<br />

potential h was replaced by the pressure. (This explains why instead of a<br />

straight line h = 0, we get a curve.)<br />

On the other h<strong>an</strong>d, when looked at qu<strong>an</strong>titatively (i.e. now take the<br />

numbers in<strong>to</strong> account), this figure corresp<strong>on</strong>ds <strong>to</strong> the liquid-gas part of the<br />

experimental phase diagram for water. (We leave out the liquid-solid phase<br />

tr<strong>an</strong>siti<strong>on</strong>, etc.) This figure shows, for example, that if pressure is held<br />

c<strong>on</strong>st<strong>an</strong>t <strong>an</strong>d temperature is increased, water will turn in<strong>to</strong> steam at a critical


8.4. <strong>Gibbs</strong> specificati<strong>on</strong>s <strong>an</strong>d phase tr<strong>an</strong>siti<strong>on</strong> 95<br />

Pc = 220.6<br />

1<br />

0.006<br />

P (bar)<br />

0.01<br />

δ m, m > 0<br />

Liquid<br />

1<br />

2 (δ −m + δ m ), m > 0<br />

100<br />

δ 0<br />

δ m, m < 0<br />

Gas<br />

T c = 374<br />

T (C)<br />

Figure 8.1. The phase diagram for the liquid-gas phase tr<strong>an</strong>siti<strong>on</strong> in<br />

water. T = 1/β <strong>an</strong>d m = m(β, h) is as in Theorem 4.15. The solid<br />

line represents the values of temperature <strong>an</strong>d pressure at which a phase<br />

tr<strong>an</strong>siti<strong>on</strong> occurs; i.e. <strong>to</strong> β < βc <strong>an</strong>d h = 0.<br />

temperature; e.g. if the pressure is equal <strong>to</strong> 1 bar, then water boils at the<br />

familiar 100 ◦ C.<br />

Now we c<strong>an</strong> see how Definiti<strong>on</strong> 8.18 agrees <strong>with</strong> the physical phase tr<strong>an</strong>siti<strong>on</strong>.<br />

Al<strong>on</strong>g the line of phase tr<strong>an</strong>siti<strong>on</strong>, the average density takes <strong>on</strong>e of<br />

two values, <strong>with</strong> equal probability. In the Ising model, this corresp<strong>on</strong>ds <strong>to</strong><br />

having two <strong>Gibbs</strong> measures. Off this line, the average density is unique.<br />

This corresp<strong>on</strong>ds <strong>to</strong> uniqueness of the <strong>Gibbs</strong> measure, in the Ising model;<br />

see Theorem 10.2.<br />

It is noteworthy that Exercise 8.17 shows that at critical pressure Pc<br />

<strong>an</strong>d critical temperature Tc free energy develops a singularity. However,<br />

Theorem 4.15 states that the average density is unique at these values. In<br />

fact, we will see in the next chapter that the <strong>Gibbs</strong> measure is unique for<br />

the Ising model at these critical values when d = 2 <strong>an</strong>d is expected <strong>to</strong> be<br />

unique also when d = 3. The two definiti<strong>on</strong>s of phase tr<strong>an</strong>siti<strong>on</strong> seem then<br />

<strong>to</strong> coincide everywhere but at this critical point. This is easily fixed by<br />

slightly ch<strong>an</strong>ging Definiti<strong>on</strong> 8.18 <strong>an</strong>d saying instead that there is no phase<br />

tr<strong>an</strong>siti<strong>on</strong> at (β, α) if <strong>an</strong>d <strong>on</strong>ly if the <strong>Gibbs</strong> measure is unique for values in<br />

a whole neighborhood of (β, α).<br />

Remark 8.19. With the above modificati<strong>on</strong>, uniqueness of the <strong>Gibbs</strong> distributi<strong>on</strong><br />

does coincide <strong>with</strong> the other definiti<strong>on</strong> of phase tr<strong>an</strong>siti<strong>on</strong> in m<strong>an</strong>y<br />

import<strong>an</strong>t models (like the Ising model). However, it is not always a good<br />

definiti<strong>on</strong> for phase tr<strong>an</strong>siti<strong>on</strong> in a comm<strong>on</strong> physical way of thinking, <strong>an</strong>d<br />

will not coincide <strong>with</strong> the other definiti<strong>on</strong> in a general setting. For example,


96 8. Formalism for classical lattice systems<br />

the pl<strong>an</strong>e rota<strong>to</strong>r model, where Ising spins are replaced by vec<strong>to</strong>rs <strong>on</strong> the<br />

unit circle, exhibits a Kosterlitz-Thouless type of phase tr<strong>an</strong>siti<strong>on</strong> in which<br />

above a certain βc, correlati<strong>on</strong>s no l<strong>on</strong>ger decay exp<strong>on</strong>entially but <strong>on</strong>ly like<br />

inverse powers; see [21]. On the other h<strong>an</strong>d, it is also known (at least partly<br />

rigorously, but based <strong>on</strong> older physics arguments) that there is a unique<br />

<strong>Gibbs</strong> state; see [7], [29], <strong>an</strong>d [20].<br />

8.5. Observables<br />

Microscopically, <strong>on</strong>e c<strong>an</strong> <strong>on</strong>ly observe finitely m<strong>an</strong>y spins. Hence, a microscopic<br />

observable is defined as a real-valued FΛ-measurable functi<strong>on</strong> <strong>on</strong> Ω,<br />

for some finite Λ ⊂ Z d .<br />

Macroscopic observati<strong>on</strong>s, <strong>on</strong> the other h<strong>an</strong>d, should not depend <strong>on</strong><br />

<strong>an</strong>y finite collecti<strong>on</strong> of spins. C<strong>on</strong>sequently, a macroscopic observable must<br />

be measurable <strong>with</strong> respect <strong>to</strong> the tail σ-algebra T = ∩FΛc, where the<br />

intersecti<strong>on</strong> is over all finite Λ ⊂ Zd . This is nicely expressed in the next<br />

lemma.<br />

Lemma 8.20. Let Π = {πΛ : Λ ⊂ Z d finite} be <strong>an</strong> arbitrary specificati<strong>on</strong><br />

<strong>an</strong>d let f ∈ bF . Then f ∈ bT if <strong>an</strong>d <strong>on</strong>ly if πΛf = f for all finite Λ ⊂ Z d .<br />

Proof. Fix Λ ⊂ Zd finite. If f ∈ bT , then f ∈ bFΛc <strong>an</strong>d Exercise 8.9<br />

implies that πΛf = f. C<strong>on</strong>versely, if f ∈ bF is such that πΛf = f, then<br />

f ∈ bFΛc. If this is true for all Λ finite, then f ∈ bT . <br />

There is clearly a separati<strong>on</strong> of microscopic <strong>an</strong>d macroscopic scales.<br />

More precisely, if f is both a microscopic <strong>an</strong>d a macroscopic observable,<br />

then it is of course c<strong>on</strong>st<strong>an</strong>t.<br />

Some people require that a macroscopic observable be shift invari<strong>an</strong>t;<br />

i.e. that it be I-measurable (see (7.1) for the definiti<strong>on</strong> of I).<br />

Example 8.21. If f is a microscopic observable then the average<br />

1 <br />

lim f ◦ θi<br />

n→∞ |Vn|<br />

i∈Vn<br />

(whenever it exists) is both T - <strong>an</strong>d I-measurable.<br />

For a measure µ <strong>to</strong> qualify as a macrostate, macroscopic observables<br />

should be deterministic under µ, because <strong>on</strong>ly microscopic qu<strong>an</strong>tities are<br />

allowed <strong>to</strong> fluctuate. In other words, µ should satisfy a 0-1 law <strong>on</strong> the<br />

relev<strong>an</strong>t σ-algebra.<br />

If we require macroscopic observables <strong>to</strong> be tail-measurable, then T<br />

should be trivial under µ, but if we also require the observables <strong>to</strong> be shift<br />

invari<strong>an</strong>t, then µ should just be ergodic. (Exercise A.18 is relev<strong>an</strong>t.)


8.5. Observables 97<br />

In either case, the set of observable macrostates coincides <strong>with</strong> the set<br />

of extreme <strong>Gibbs</strong> measures. (Recall the definiti<strong>on</strong> of <strong>an</strong> extreme point from<br />

page 72.) Thus, extreme <strong>Gibbs</strong> measures are the physically reas<strong>on</strong>able objects,<br />

<strong>an</strong>d their study is central <strong>to</strong> statistical mech<strong>an</strong>ics.<br />

Theorem 8.22. Let Π be <strong>an</strong> arbitrary specificati<strong>on</strong> corresp<strong>on</strong>ding <strong>to</strong> <strong>an</strong><br />

absolutely summable shift-invari<strong>an</strong>t interacti<strong>on</strong> potential Φ.<br />

(a) µ ∈ G Π is T -trivial if, <strong>an</strong>d <strong>on</strong>ly if, µ is extreme in G Π .<br />

(b) µ ∈ G Π ∩ Mθ(Ω) is I-trivial (<strong>an</strong>d hence ergodic) if, <strong>an</strong>d <strong>on</strong>ly if, µ<br />

is extreme in G Π ∩ Mθ(Ω).<br />

Proof. To prove (a) first c<strong>on</strong>sider µ ∈ G Π such that for some T -measurable<br />

A we have 0 < µ(A) < 1. Define ν1(B) = µ(B|A) <strong>an</strong>d ν2(B) = µ(B|Ac ).<br />

Then ν1 = ν2 <strong>an</strong>d µ = tν1 + (1 − t)ν2, where t = µ(A) ∈ (0, 1). Moreover,<br />

ν1 <strong>an</strong>d ν2 bel<strong>on</strong>g <strong>to</strong> G Π . Indeed, if Λ ⊂ Zd is finite then<br />

<br />

ν1πΛ(B) = πΛ(σ, B)ν1(dσ) = µ(A) −1<br />

<br />

1IA(σ)πΛ1IB(σ)µ(dσ)<br />

= µ(A) −1<br />

<br />

πΛ(σ, A ∩ B)µ(dσ) = µ(B|A) = ν1(B).<br />

Above, we used Exercise 8.9 <strong>to</strong> write that 1IAπΛ1IB = πΛ1IA∩B. A similar<br />

computati<strong>on</strong> holds for ν2. Thus, µ is not extreme.<br />

For the other directi<strong>on</strong> suppose T is trivial under µ but µ = tν1 + (1 −<br />

t)ν2, for ν1 <strong>an</strong>d ν2 in G Π <strong>an</strong>d t ∈ (0, 1]. Since ν1 ≪ µ, T is also trivial under<br />

ν1. Then, by the backwards-martingale c<strong>on</strong>vergence theorem (see Theorem<br />

6.1 <strong>on</strong> page 265 of [15]) we have that for <strong>an</strong>y local f, πVnf c<strong>on</strong>verges, as<br />

Vn grows <strong>to</strong> Z d , <strong>to</strong> E µ [f], µ-a.s., <strong>an</strong>d <strong>to</strong> E ν1 [f], ν1-a.s. But again ν1 ≪ µ<br />

implies that the limits agree, so µ = ν1 <strong>an</strong>d µ is extreme.<br />

Now we prove part (b). Let µ ∈ G Π ∩ Mθ(Ω) be such that for some<br />

I-measurable set A we have 0 < µ(A) < 1. By Exercise A.18 there exists<br />

a T -measurable set A such that µ(A∆ A) = 0. Thus, defining ν1 <strong>an</strong>d ν2 as<br />

above we have µ = tν1 + (1 − t)ν2 <strong>with</strong> ν1, ν2 ∈ G Π ∩ Mθ(Ω) <strong>an</strong>d t ∈ (0, 1).<br />

In other words, µ is not extreme. The other directi<strong>on</strong> is a c<strong>on</strong>sequence of<br />

the ergodic theorem <strong>an</strong>d is left as <strong>an</strong> exercise; see Exercise A.16. <br />

<strong>Gibbs</strong> measures c<strong>an</strong> be written as a c<strong>on</strong>vex mixture of observable <strong>on</strong>es.<br />

One way <strong>to</strong> see this is via Choquet’s theorem (see [31]). We give <strong>an</strong> alternate<br />

approach.<br />

Exercise 8.23. (Decompositi<strong>on</strong> for <strong>Gibbs</strong> measures) Use limits of πΛn as<br />

Λn ↗ Z d <strong>to</strong> c<strong>on</strong>struct a kernel <strong>with</strong> values in extreme <strong>Gibbs</strong> measures<br />

<strong>an</strong>d that gives versi<strong>on</strong>s of µ( · | T ) simult<strong>an</strong>eously for all µ ∈ G Π . Apply


98 8. Formalism for classical lattice systems<br />

Exercise A.25 <strong>to</strong> prove that every µ ∈ G Π has a unique probability measure<br />

Qµ supported <strong>on</strong> the extremes of G Π such that µ = ν Qµ(dν).<br />

The set of stati<strong>on</strong>ary <strong>Gibbs</strong> measures c<strong>an</strong> be empty even when the set of<br />

<strong>Gibbs</strong> measures is not; see (11.46) in [22]. Sufficient c<strong>on</strong>diti<strong>on</strong>s for existence<br />

of stati<strong>on</strong>ary <strong>Gibbs</strong> measures are given in Secti<strong>on</strong> 5.2 of [22].<br />

Exercise 8.24. (Decompositi<strong>on</strong> for stati<strong>on</strong>ary <strong>Gibbs</strong> measures) Let µ ∈<br />

G Π ∩ Mθ(Ω). Let κ be the kernel from Exercise A.26. Show that for <strong>an</strong>y<br />

Λ ⊂ Z d finite, κ(ω)πΛ = κ(ω) µ-a.s. Use Exercise A.25 <strong>to</strong> c<strong>on</strong>clude the<br />

existence of a unique probability measure Qµ supported <strong>on</strong> the extremes of<br />

G Π ∩ Mθ(Ω) such that µ = ν Qµ(dν).<br />

The above expresses the heuristic idea that even though macrostates are<br />

the <strong>on</strong>ly states a system c<strong>an</strong> be at, <strong>an</strong> experimenter will not know which<br />

<strong>on</strong>e it is <strong>an</strong>d thus will c<strong>on</strong>sider that the system is in a weighted average of<br />

macrostates.


<strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d<br />

equilibrium statistical<br />

mech<strong>an</strong>ics<br />

9.1. Replacing Hamilt<strong>on</strong>i<strong>an</strong>s <strong>with</strong> averages<br />

Chapter 9<br />

<strong>Large</strong> deviati<strong>on</strong> theory is c<strong>on</strong>cerned <strong>with</strong> fluctuati<strong>on</strong>s of empirical averages<br />

of microscopic qu<strong>an</strong>tities. Hence, it is intuitively reas<strong>on</strong>able that there is a<br />

c<strong>on</strong>necti<strong>on</strong> <strong>with</strong> equilibrium statistical mech<strong>an</strong>ics. This c<strong>on</strong>necti<strong>on</strong> c<strong>an</strong> be<br />

made precise as follows.<br />

Let Φ be <strong>an</strong> absolutely summable shift-invari<strong>an</strong>t c<strong>on</strong>tinuous interacti<strong>on</strong><br />

potential. The inverse temperature β will be fixed throughout this chapter<br />

<strong>an</strong>d thus we will set β = 1. To see β explicitly, <strong>on</strong>e c<strong>an</strong> simply replace each<br />

Φ by βΦ. Define the microscopic observable<br />

fΦ = <br />

A:0∈A<br />

ΦA<br />

∈ Cb(Ω).<br />

|A|<br />

This is the energy c<strong>on</strong>tributi<strong>on</strong> of the origin. fΦ ◦ θi is then seen as the<br />

energy c<strong>on</strong>tributi<strong>on</strong> of site i.<br />

Now, the energy c<strong>on</strong>tributi<strong>on</strong> of a volume is the sum of the energy c<strong>on</strong>tributi<strong>on</strong>s<br />

of each of its sites. Of course, <strong>on</strong>e c<strong>an</strong> have different boundary<br />

c<strong>on</strong>diti<strong>on</strong>s, but boundary c<strong>on</strong>tributi<strong>on</strong> should be of the same size as the<br />

boundary itself. When the lattice is Z d , the boundary of Λ has size o(|Λ|).<br />

This is summarized in Lemma 9.1.<br />

99


100 9. <strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d equilibrium statistical mech<strong>an</strong>ics<br />

For ∆ ⊂ Λ ⊂ Zd finite, define Λ(∆) = {i ∈ Λ : i + ∆ ⊂ Λ} <strong>an</strong>d<br />

b∆,Λ = |Λ| <br />

ΦA + Φ |Λ Λ(∆)|.<br />

A:0∈A⊂∆<br />

Note that for <strong>an</strong>y finite ∆ ⊂ Z d ,<br />

lim<br />

Λ↗Zd b∆,Λ<br />

|Λ|<br />

= <br />

A:0∈A⊂∆<br />

ΦA ,<br />

which c<strong>an</strong> be made arbitrarily small by taking ∆ large.<br />

Lemma 9.1 below reveals how large deviati<strong>on</strong> theory sneaks in. It shows<br />

how <strong>on</strong>e c<strong>an</strong> replace the Hamilt<strong>on</strong>i<strong>an</strong>, in the definiti<strong>on</strong> of <strong>Gibbs</strong> measures,<br />

by a volume average of shifts of fΦ.<br />

Lemma 9.1. For ∆ ⊂ Λ ⊂ Zd finite,<br />

<br />

<br />

sup H free<br />

Λ (σΛ) − <br />

<br />

(9.1)<br />

fΦ(θiσ) ≤ b∆,Λ<br />

<strong>an</strong>d<br />

(9.2)<br />

Proof. Write<br />

But<br />

σ∈Ω<br />

<br />

<br />

sup<br />

H free<br />

Λ = <br />

i∈Λ<br />

H<br />

σ∈Ω<br />

free<br />

Λ (σΛ) − H σΛc A⊂Λ<br />

= <br />

ΦA = <br />

<br />

ΦA<br />

|A|<br />

i∈Λ A:i∈A<br />

<br />

A:i∈A<br />

Thus,<br />

<br />

<br />

H free<br />

Λ (σΛ) − <br />

<br />

fΦ ◦ θiσ<br />

i∈Λ<br />

ΦA<br />

|A|<br />

A⊂Λ i∈A<br />

− <br />

Λ (σΛ)<br />

<br />

<br />

≤ b∆,Λ.<br />

= <br />

<br />

i∈Λ A:i∈A⊂Λ<br />

<br />

ΦA<br />

|A|<br />

i∈Λ A:i∈A⊂Λ<br />

ΦA<br />

|A| .<br />

ΦA<br />

Φi+B<br />

=<br />

|A| |B|<br />

B:0∈B<br />

= fΦ ◦ θi.<br />

≤ <br />

<br />

i∈Λ(∆) A:i∈A⊂Λ<br />

≤ <br />

<br />

i∈Λ(∆) B:0∈B<br />

i+B⊂Λ<br />

≤ <br />

i∈Λ(∆) B:0∈B⊂∆<br />

≤ b∆,Λ,<br />

<br />

ΦA<br />

|A|<br />

ΦB<br />

|B|<br />

ΦB<br />

|B|<br />

+ <br />

<br />

i∈ΛΛ(∆) A:i∈A⊂Λ<br />

+ <br />

<br />

i∈ΛΛ(∆) A:i∈A<br />

+ <br />

<br />

i∈ΛΛ(∆) B:0∈B<br />

ΦA<br />

|A|<br />

ΦA<br />

|A|<br />

ΦB<br />

|B|


9.2. Thermodynamic limit of the pressure 101<br />

<strong>an</strong>d (9.1) is proved. Next write<br />

<br />

<br />

H free<br />

Λ (σΛ) − H σΛc Λ (σΛ)<br />

<br />

<br />

≤ <br />

A∩Λ=∅<br />

A∩Λ c =∅<br />

≤ <br />

ΦA<br />

<br />

i∈Λ(∆) A:i∈A⊂Λ<br />

≤ |Λ(∆)| <br />

≤ b∆,Λ,<br />

B:0∈B⊂∆<br />

ΦA + <br />

<br />

i∈ΛΛ(∆) A:i∈A<br />

ΦA<br />

ΦB + |Λ Λ(∆)| <br />

B:0∈B<br />

ΦB<br />

proving (9.2). <br />

9.2. Thermodynamic limit of the pressure<br />

Recall that λ is a fixed i.i.d. product measure <strong>on</strong> Ω. We defined the volumes<br />

Vn = {i ∈ Zd : −n < i1, . . . , id < n} <strong>an</strong>d we let τ ∈ Ω ∪ {free} be a given<br />

boundary c<strong>on</strong>diti<strong>on</strong>. The partiti<strong>on</strong> functi<strong>on</strong> was defined by<br />

Z τ <br />

Λ = e −Hτ Λ (σΛ)<br />

λ(dσΛ).<br />

A qu<strong>an</strong>tity that encodes all the thermodynamic properties of the system is<br />

the pressure, defined as the limit (if it exists)<br />

1<br />

P (Φ) = lim<br />

n→∞ |Vn| log Zτ Vn<br />

of the finite volume pressure P τ 1<br />

Λ (Φ) = |Λ| log Zτ Λ .<br />

The term “pressure” makes sense in the gr<strong>an</strong>d-c<strong>an</strong><strong>on</strong>ical ensemble of a<br />

lattice gas (X = {0, 1} <strong>an</strong>d φA(σ) = −J(A) <br />

i∈A σi for some shift-invari<strong>an</strong>t<br />

positive functi<strong>on</strong> J). But, for example, for ferromagnetic models it is simply<br />

the same as “minus the free energy per site.”<br />

By (9.1), the fact that the limit P (Φ) exists follows from the large deviati<strong>on</strong><br />

principle for empirical fields (Theorem 7.13) <strong>an</strong>d Varadh<strong>an</strong>’s theorem<br />

(page 32). The fact that this limit is independent of the boundary c<strong>on</strong>diti<strong>on</strong><br />

τ is simply a result of (9.2). As a c<strong>on</strong>sequence, <strong>on</strong>e also has a variati<strong>on</strong>al<br />

principle defining the pressure P (Φ) in terms of specific relative entropy.<br />

Recall the definiti<strong>on</strong> of p(f) in (5.1) (<strong>an</strong>d Propositi<strong>on</strong> 7.14).<br />

Theorem 9.2. There exists <strong>an</strong> infinite volume pressure P (Φ) such that<br />

(a) lim<br />

sup<br />

n→∞<br />

τ∈Ω∪{free}<br />

|P τ Vn − P (Φ)| = 0;<br />

1<br />

(b) P (Φ) = p(−fΦ) = lim<br />

n→∞ |Vn| log Eλ P<br />

−<br />

[e i∈Vn fΦ◦θi ];


102 9. <strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d equilibrium statistical mech<strong>an</strong>ics<br />

(c) −P (Φ) = inf<br />

ν∈Mθ(Ω) {Eν [fΦ] + h(ν | λ)}.<br />

Proof. Theorem 7.13 gives us a large deviati<strong>on</strong> principle for the laws of<br />

the empirical fields under the i.i.d. measure λ. Then, taking E = M1(Ω),<br />

X = M(Ω), <strong>an</strong>d Y = Cb(Ω), Theorem 5.25 implies the existence of the limit<br />

defining p(−fΦ) in part (b), as well as the variati<strong>on</strong>al formula in part (c),<br />

provided we use (b) as the definiti<strong>on</strong> of P (Φ). Part (a) simply follows from<br />

(9.1) <strong>an</strong>d (9.2). Indeed, take ∆ = Vm <strong>an</strong>d Λ = Vn, for n > m. Then<br />

<br />

<br />

sup sup − H τ Vn (σVn) + <br />

<br />

fΦ ◦ θiσ<br />

≤ 2bVm,Vn.<br />

In general<br />

τ∈Ω∪{free} σ∈Ω<br />

i∈Vn<br />

log E λ [e f ] = log E λ [e g e f−g ] ≤ log E λ [e g ]e f−g = log E λ [e g ] + f − g .<br />

Thus,<br />

sup<br />

τ∈Ω∪{free}<br />

<br />

<br />

log Z τ Vn − log Eλ [e<br />

− P<br />

i∈Vn fΦ◦θi<br />

<br />

<br />

] ≤ 2bVm,Vn.<br />

Now divide by |Vn| <strong>an</strong>d take n → ∞ then m → ∞. <br />

9.3. Specific relative entropy<br />

Define the finite-volume <strong>Gibbs</strong> measures <strong>with</strong> free boundary c<strong>on</strong>diti<strong>on</strong>s by<br />

π free<br />

Λ (dσΛ) = e−Hfree Λ (σΛ)<br />

Zfree Λ<br />

Theorem 9.3. For ν ∈ Mθ(Ω), the limit<br />

exists <strong>an</strong>d is given by<br />

(9.3)<br />

λΛ(dσΛ).<br />

1<br />

h(ν | Φ) = lim<br />

n→∞ |Vn| H(νVn | π free<br />

Vn )<br />

h(ν | Φ) = E ν [fΦ] + h(ν | λ) + P (Φ).<br />

Furthermore, for <strong>an</strong>y <strong>Gibbs</strong> measure γ ∈ G Φ ,<br />

exists <strong>an</strong>d h(ν | γ) = h(ν | Φ).<br />

h(ν | γ) = lim<br />

n→∞ H(νVn | γVn)<br />

Remark 9.4. Theorem 9.2 says that the free energy −P (Φ) is the smallest<br />

value of the average energy c<strong>on</strong>tributi<strong>on</strong> of 0 less the physical entropy, where<br />

the minimizati<strong>on</strong> takes place over all shift-invari<strong>an</strong>t probability measures.<br />

Theorem 9.3 then says that this minimum is attained at <strong>an</strong>y shift-invari<strong>an</strong>t<br />

<strong>Gibbs</strong> measure. This is <strong>an</strong>alogous <strong>to</strong> (8.5) in the Curie-Weiss model <strong>an</strong>d <strong>to</strong><br />

part (a) of Exercise 6.21.


9.3. Specific relative entropy 103<br />

∗ Exercise 9.5. Use (9.3) <strong>to</strong> prove that h(· | Φ) is affine, lower semic<strong>on</strong>tinu-<br />

ous, <strong>an</strong>d has compact sublevel sets.<br />

Proof of Theorem 9.3. Recall that when the index is the volume Vn, we<br />

abbreviate things by simply using index n. Let g ∈ bFn.<br />

E νn [g] − log π free<br />

n (e g ) = E νn [g] − log E λn [e g−Hfree n ] + log Z free<br />

n<br />

= E νn [g − H free<br />

n ] − log E λn [e g−Hfree<br />

n ] + E νn [H free<br />

n ] + log Z free<br />

n .<br />

Since H free<br />

n ∈ bFn, we c<strong>on</strong>clude that<br />

1<br />

|Vn| H(νn | π free<br />

n ) = 1<br />

|Vn| H(νn | λn) + 1<br />

|Vn| Eνn [H free<br />

n ] + P free<br />

Vn .<br />

The first term <strong>on</strong> the right-h<strong>an</strong>d-side c<strong>on</strong>verges <strong>to</strong> h(ν | λ) by the definiti<strong>on</strong><br />

of specific entropy. The third term c<strong>on</strong>verges <strong>to</strong> P (Φ), by Theorem 9.2. The<br />

sec<strong>on</strong>d term is equal <strong>to</strong> Eν [Hfree <br />

n /|Vn|] which, by (9.1), has the same limit<br />

as |Vn| −1<br />

i∈Vn Eν [fΦ ◦ θi] = Eν [fΦ]. This gives the existence of h(ν | Φ) as<br />

well as (9.3).<br />

Next, observe that if γ ∈ G Φ , g ∈ bFn, <strong>an</strong>d πn is the <strong>Gibbs</strong> specificati<strong>on</strong><br />

corresp<strong>on</strong>ding <strong>to</strong> Φ <strong>an</strong>d volume Vn, then<br />

log E γn [e g ] = log E γ [πn(e g )]<br />

λn E [eg−H = log<br />

τ n]<br />

Eλn[e−Hτ n] γ(dτ)<br />

= log<br />

E λn [e g−H free<br />

n e Hfree<br />

n −Hτ n]<br />

Eλn[e−Hfree n eHfree n −Hτ n] γ(dτ)<br />

≤ log Eλn [e g−Hfree<br />

n ]<br />

Eλn[e−Hfree n ]<br />

≤ log π free<br />

n (e g ) + 2bVm,Vn,<br />

+ 2 sup H<br />

τ<br />

free<br />

n − H τ n∞<br />

where we used (9.2) in the last line, <strong>with</strong> m < n. C<strong>on</strong>sequently,<br />

<strong>an</strong>d<br />

E νn [g] − log π free<br />

n (e g ) ≤ E νn [g] − log E γn [e g ] + 2bVm,Vn<br />

H(νn | π free<br />

n ) ≤ H(νn | γn) + 2bVm,Vn.<br />

The lower bound c<strong>an</strong> be shown similarly <strong>an</strong>d <strong>on</strong>e has<br />

|H(νn | γn) − H(νn | π free<br />

n )| ≤ 2bVm,Vn.<br />

Thus, h(ν | γ) exists <strong>an</strong>d equals h(ν | Φ).


104 9. <strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d equilibrium statistical mech<strong>an</strong>ics<br />

9.4. <strong>Large</strong> deviati<strong>on</strong>s under <strong>Gibbs</strong> kernels<br />

Using Varadh<strong>an</strong>’s theorem <strong>an</strong>d (9.2), <strong>on</strong>e c<strong>an</strong> deduce a uniform large deviati<strong>on</strong><br />

principle for the empirical fields under <strong>Gibbs</strong> kernels, <strong>an</strong>d thus also<br />

under <strong>Gibbs</strong> measures. Recall the periodized empirical fields Rn, defined in<br />

(7.2).<br />

Theorem 9.6. For <strong>an</strong>y Borel-measurable set A ⊂ Mθ(Ω),<br />

(9.4)<br />

− inf h(ν | Φ) ≤ lim<br />

ν∈A◦ inf<br />

n→∞ τ∈Ω∪{free}<br />

≤ lim<br />

sup<br />

n→∞<br />

τ∈Ω∪{free}<br />

≤ − sup<br />

ν∈A◦ h(ν | Φ).<br />

1<br />

|Vn| log πτ Vn { Rn ∈ A}<br />

1<br />

|Vn| log πτ Vn { Rn ∈ A}<br />

Also, if γ ∈ G Φ , then the distributi<strong>on</strong>s γ{Rn ∈ ·} satisfy a large deviati<strong>on</strong><br />

principle <strong>with</strong> rate functi<strong>on</strong> h(· | γ).<br />

Proof. Note that<br />

γ{ <br />

Rn ∈ A} =<br />

πVn(τ, { Rn ∈ A})γ(dτ).<br />

So if we replace Rn by Rn, the last statement follows from (9.4) <strong>an</strong>d h(· | γ) =<br />

h(· | Φ). It then follows for Rn from Exercises 7.5 <strong>an</strong>d 2.17 <strong>an</strong>d the tightness<br />

of the rate functi<strong>on</strong> h(· | γ). Let us thus prove (9.4).<br />

Define the probability measures ρn <strong>on</strong> Mθ(Ω) by<br />

ρn(A) = 1<br />

cn<br />

<br />

1I{ Rn ∈ A}e −|Vn|E e Rn[fΦ] dλ,<br />

where cn = Eλ <br />

e−|Vn|E e <br />

Rn[fΦ] . Since Theorem 7.13 gives a large deviati<strong>on</strong><br />

principle for the laws of Rn under the i.i.d. measure λ <strong>with</strong> rate functi<strong>on</strong><br />

h(· | λ), Exercise 4.9 implies that the large deviati<strong>on</strong> principle holds for ρn<br />

<strong>with</strong> normalizati<strong>on</strong> |Vn| <strong>an</strong>d rate functi<strong>on</strong><br />

I(ν) = E ν [fΦ] + h(ν | λ) − inf<br />

µ∈Mθ(Ω) {Eµ [fΦ] + h(µ | λ)}<br />

= E ν [fΦ] + h(ν | λ) + P (Φ) = h(ν | Φ).


9.5. Dobrushin-L<strong>an</strong>ford-Ruelle (DLR) variati<strong>on</strong>al principle 105<br />

Now, the same approximati<strong>on</strong> step as before allows us <strong>to</strong> switch from ρn <strong>to</strong><br />

the <strong>Gibbs</strong> kernels. To see this, write for τ ∈ Ω ∪ {free},<br />

<br />

log ρn(A) = log 1IA( Rn)e −|Vn|E e <br />

Rn[fΦ]<br />

dλ − log e −|Vn|E e Rn[fΦ]<br />

dλ<br />

<br />

≤ log 1IA( Rn)e −Hτ <br />

ndλ − log e −Hτ ndλ + 2 H τ n − |Vn|E e Rn [fΦ] <br />

≤ log π τ n{ Rn ∈ A} + 2 H τ n − |Vn|E Rn [fΦ] + 2|Vn| E e Rn [fΦ] − E Rn [fΦ] .<br />

Abbreviate f (r) <br />

Φ = 0∈A,|A|≤r ΦA/|A|. Then, using Exercise 7.5 <strong>on</strong>e sees<br />

that<br />

<br />

lim E e Rn [fΦ] − E Rn [fΦ] ≤ 2 <br />

<br />

ΦA + lim E e Rn (r)<br />

[f Φ ] − ERn [f (r)<br />

Φ ] n→∞<br />

Then, for m fixed,<br />

lim<br />

inf<br />

n→∞ τ∈Ω∪{free}<br />

0∈A,|A|>r<br />

= 2 <br />

0∈A,|A|>r<br />

n→∞<br />

ΦA −→ 0.<br />

r→∞<br />

1<br />

|Vn| log πτ n{ 1<br />

Rn ∈ A} ≥ lim<br />

n→∞ |Vn| log ρn(A)<br />

bVm,Vn<br />

− 4 lim<br />

n→∞ |Vn|<br />

≥ − inf h(ν | Φ) − 4 lim<br />

ν∈A◦ n→∞<br />

bVm,Vn<br />

.<br />

|Vn|<br />

Taking m → ∞ kills the right-most term <strong>an</strong>d proves the lower large deviati<strong>on</strong><br />

bound. The upper bound is similar. <br />

9.5. Dobrushin-L<strong>an</strong>ford-Ruelle (DLR) variati<strong>on</strong>al principle<br />

We have so far seen that shift-invari<strong>an</strong>t <strong>Gibbs</strong> measures are all minimizers<br />

in the variati<strong>on</strong>al formula for the pressure P (Φ); see Remark 9.4. The<br />

natural questi<strong>on</strong> is whether or not the minimizers are all shift-invari<strong>an</strong>t<br />

<strong>Gibbs</strong> measures. This would follow from Theorems 9.2 <strong>an</strong>d 9.3 if <strong>on</strong>e shows<br />

that shift-invari<strong>an</strong>t soluti<strong>on</strong>s <strong>to</strong> h(γ | Φ) = 0 are <strong>Gibbs</strong> measures.<br />

Dobrushin-L<strong>an</strong>ford-Ruelle variati<strong>on</strong>al principle. Fix a shift-invari<strong>an</strong>t<br />

absolutely summable c<strong>on</strong>tinuous interacti<strong>on</strong> potential Φ. Let γ ∈ Mθ(Ω).<br />

The following are equivalent.<br />

(a) γ ∈ G Φ ;<br />

(b) h(γ | Φ) = 0;<br />

(c) E γ [fΦ] + h(γ | λ) = −P (Φ) = inf{E ν [fΦ] + h(ν | λ) : ν ∈ Mθ(Ω)}.<br />

Remark 9.7. The above variati<strong>on</strong>al principle shows that if phase tr<strong>an</strong>siti<strong>on</strong><br />

happens, then the large deviati<strong>on</strong> rate functi<strong>on</strong> under <strong>Gibbs</strong> measures has


106 9. <strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d equilibrium statistical mech<strong>an</strong>ics<br />

multiple zeroes. We will see in the next chapter that this happens in the<br />

multidimensi<strong>on</strong>al Ising model at low temperature.<br />

Proof of the DLR variati<strong>on</strong>al principle. (a) implies (b) due <strong>to</strong> Theorem<br />

9.3. Also, (b) is equivalent <strong>to</strong> (c) due <strong>to</strong> (9.3). The delicate part is <strong>to</strong><br />

show that (b) implies (a). Suppose (a) does not hold. Then, there exists<br />

some integer ℓ such that γ = γπVℓ . Thus, H(γ | γπVℓ ) ≥ 2η, for some η > 0.<br />

By Exercise 7.11, there is some integer k such that H(γVk | (γπVℓ )Vk ) ≥ η.<br />

For a probability measure µ ∈ M1(Ω) <strong>an</strong>d two sets A, B ⊂ Zd , let µ A B =<br />

µB(· | FA) be the c<strong>on</strong>diti<strong>on</strong>al distributi<strong>on</strong> of {σi : i ∈ B} given {σi : i ∈ A}.<br />

Lemma 9.8. Let ∆ ⊂ Λ ⊂ Γ ⊂ Zd be finite. Then<br />

<br />

<br />

<br />

| (γπ∆) Λ∆<br />

<br />

Λ ) dγ − H(γ Λ∆<br />

Λ | (π free<br />

where<br />

H(γ Λ∆<br />

Λ<br />

d∆,Λ = <br />

A:A∩∆=∅<br />

A⊂Λ<br />

ΦA .<br />

Γ ) Λ∆<br />

Λ<br />

<br />

<br />

) dγ<br />

≤ 4d∆,Λ ,<br />

Here d∆,Λ bounds the energy c<strong>on</strong>tributi<strong>on</strong> from outside Λ <strong>to</strong> the volume<br />

∆. For fixed ∆ it decays <strong>to</strong> 0 as Λ grows <strong>to</strong> Z d .<br />

Proof of Lemma 9.8. Define temporarily the ∆-volume Hamilt<strong>on</strong>i<strong>an</strong> of<br />

interacti<strong>on</strong>s c<strong>on</strong>fined <strong>to</strong> Λ,<br />

H∆,Λ = <br />

A:A∩∆=∅<br />

A⊂Λ<br />

Throughout this proof write, for a c<strong>on</strong>figurati<strong>on</strong> σ ∈ Ω <strong>an</strong>d a set A ⊂ Z d ,<br />

σA = (σi)i∈A. Define π∆,Λ(σΛ∆) ∈ M1(ΩΛ) by<br />

E π∆,Λ(σΛ∆) [ϕ] =<br />

for ϕ a bounded FΛ-measurable functi<strong>on</strong>.<br />

ΦA.<br />

ϕ(ω∆, σΛ∆)e −H∆,Λ(ω∆,σΛ∆) λ∆(dω∆)<br />

e −H∆,Λ(ω∆,σΛ∆) λ∆(dω∆)<br />

In what follows, equalities hold almost surely. Note, however, that all the<br />

finite-volume measures involved in the computati<strong>on</strong>s are mutually absolutely<br />

c<strong>on</strong>tinuous relative <strong>to</strong> λ (restricted <strong>to</strong> the volume in questi<strong>on</strong>). Hence, the<br />

statements hold almost surely relative <strong>to</strong> <strong>an</strong>y of these measures. Throughout<br />

the rest of the proof ϕ is <strong>an</strong> arbitrary bounded FΛ-measurable functi<strong>on</strong>.<br />

.


9.5. Dobrushin-L<strong>an</strong>ford-Ruelle (DLR) variati<strong>on</strong>al principle 107<br />

Compute<br />

E πfree<br />

Γ [ϕ | σΛ∆]<br />

=<br />

=<br />

<br />

<br />

where<br />

ΩΓΛ<br />

ΩΓΛ<br />

<br />

Ω∆ ϕ(ω∆, σΛ∆)e−Hfree Γ (ω∆,σΛ∆,τΓΛ) λ∆(dω∆) λΓΛ(τΓΛ)<br />

<br />

<br />

ΩΓΛ<br />

<br />

Ω∆ e−Hfree Γ (ω∆,σΛ∆,τΓΛ) λ∆(dω∆) λΓΛ(τΓΛ)<br />

Ω∆ ϕ(ω∆, σΛ∆)e−H∆,Λ(ω∆,σΛ∆)−Σ ′ −Σ ′′<br />

λ∆(dω∆) λΓΛ(τΓΛ)<br />

<br />

ΩΓΛ<br />

<br />

Ω∆ e−H∆,Λ(ω∆,σΛ∆)−Σ ′ −Σ ′′ λ∆(dω∆) λΓΛ(τΓΛ)<br />

Σ ′ = <br />

A:A∩∆=∅<br />

A⊂Λ,A⊂Γ<br />

Since sup |Σ ′ | ≤ d∆,Γ, <strong>on</strong>e has<br />

E πfree<br />

Γ [ϕ | σΛ∆]<br />

= e C1<br />

= e C1<br />

= e C1<br />

<br />

<br />

<br />

ΩΓΛ<br />

<br />

ΦA <strong>an</strong>d Σ ′′ = <br />

A:A∩∆=∅<br />

A⊂Γ<br />

ΦA.<br />

Ω∆ ϕ(ω∆, σΛ∆)e−H∆,Λ(ω∆,σΛ∆)−Σ ′′<br />

λ∆(dω∆) λΓΛ(dτΓΛ)<br />

<br />

<br />

ΩΓΛ Ω∆ e−H∆,Λ(ω∆,σΛ∆)−Σ ′′ λ∆(dω∆) λΓΛ(dτΓΛ)<br />

Ω∆ ϕ(ω∆, σΛ∆)e−H∆,Λ(ω∆,σΛ∆) λ∆(dω∆) <br />

ΩΓΛ e−Σ′′ λΓΛ(dτΓΛ)<br />

<br />

Ω∆ e−H∆,Λ(ω∆,σΛ∆) λ∆(dω∆) <br />

Ω∆ ϕ(ω∆, σΛ∆)e −H∆,Λ(ω∆,σΛ∆) λ∆(dω∆)<br />

<br />

Ω∆ e−H∆,Λ(ω∆,σΛ∆) λ∆(dω∆)<br />

= e C1 E π∆,Λ(σΛ∆) [ϕ],<br />

<strong>with</strong> C1 = C1(σΛ∆) <strong>an</strong>d sup |C1| ≤ 2d∆,Λ.<br />

ΩΓΛ e−Σ′′ λΓΛ(dτΓΛ)<br />

Exercise 9.9. Show, similarly <strong>to</strong> the above computati<strong>on</strong>, that<br />

<strong>with</strong> sup |C2| ≤ 2d∆,Λ.<br />

π∆ϕ(σΛ∆, τΛ c) = eC2(σΛ∆,τ Λ c ) E π∆,Λ(σΛ∆) [ϕ],<br />

Next, we show that<br />

E γπ∆<br />

<br />

[ϕ | σΛ∆] =<br />

π∆ϕ(σΛ∆, τΛc) γ(dτΛc | σΛ∆).<br />

Indeed, if ψ is bounded <strong>an</strong>d FΛ∆-measurable, then<br />

E γπ∆<br />

<br />

<br />

<br />

[ϕψ] = π∆(ϕψ) dγ = ψ π∆ϕ dγ =<br />

<br />

<br />

=<br />

ψ E γ [π∆ϕ | FΛ∆] dγ =<br />

E γ [ψ π∆ϕ | FΛ∆] dγ<br />

ψ E γ [π∆ϕ | FΛ∆] d(γπ∆).<br />

In the last equality we have used the fact that <strong>on</strong> FΛ∆, γ <strong>an</strong>d γπ∆ coincide.<br />

,


108 9. <strong>Large</strong> deviati<strong>on</strong>s <strong>an</strong>d equilibrium statistical mech<strong>an</strong>ics<br />

Combining all of the above, <strong>on</strong>e has<br />

E γπ∆<br />

<br />

[ϕ | σΛ∆] = π∆ϕ(σΛ∆, τΛc) γ(dτΛc | σΛ∆)<br />

<br />

=<br />

e C2(σΛ∆,τΛc ) π∆,Λ(σΛ∆)<br />

E [ϕ] γ(dτΛc | σΛ∆)<br />

= e C′ 2 (σΛ∆) E π∆,Λ(σΛ∆) [ϕ]<br />

= e −C1(σΛ∆)+C ′ 2 (σΛ∆) E π free<br />

Γ [ϕ | σΛ∆],<br />

<strong>with</strong> sup |C ′ 2 | ≤ 2d∆,Λ. This implies that for g ∈ bFΛ,<br />

<br />

<br />

sup log E<br />

Λ∆<br />

(γπ∆) Λ [e g ] − log E (πfree<br />

Γ )Λ∆<br />

Λ [e g ]<br />

<br />

<br />

≤ 4d∆,Λ,<br />

which implies the claim of the lemma by the variati<strong>on</strong>al formula for relative<br />

entropy (Theorem 6.5). <br />

Returning <strong>to</strong> the proof of the theorem, take k large enough such that<br />

dVℓ,Vk < η/8. Fix m ∈ N <strong>an</strong>d pick n = n(m) such that Vn is a disjoint uni<strong>on</strong><br />

of md tr<strong>an</strong>slates of Vk: Vn = ∪md s=1 (is + Vk). Abbreviate ∆j = ij + Vℓ <strong>an</strong>d<br />

Λj = ∪ j<br />

s=1 (is +Vk), for j = 1, . . . , md . Abbreviate π = πfree Vn<br />

. Recall Exercise<br />

7.11 <strong>an</strong>d the c<strong>on</strong>diti<strong>on</strong>al entropy formula from Exercise 6.14. Now compute<br />

for j ≥ 2,<br />

H(γΛj<br />

| πΛj ) − H(γΛj−1 | πΛj−1 )<br />

≥ H(γΛj | πΛj ) − H(γΛj∆j | πΛj∆j )<br />

<br />

= H(γ Λj∆j<br />

| π Λj<br />

Λj∆j<br />

) dγΛj<br />

Λj<br />

<br />

≥<br />

H(γ Λj∆j<br />

Λj<br />

= H(γΛj | (γπ∆j )Λj ) − 4dVℓ,Vk<br />

| (γπ∆j )Λj∆j<br />

) dγΛj − 4dVℓ,Vk<br />

Λj<br />

≥ H(γij+Vk | (γπij+Vℓ )ij+Vk ) − 4dVℓ,Vk<br />

≥ η − 4η<br />

= η/2.<br />

8<br />

On the fourth line, we used the fact that (γπ∆j )Λj∆j = γΛj∆j . The above<br />

computati<strong>on</strong> also holds for j = 1, <strong>with</strong> the first line being H(γΛ1 | πΛ1 ).<br />

Adding these inequalities over j gives<br />

This implies that<br />

H(γVn | π free<br />

Vn ) ≥ md η/2.<br />

1<br />

|Vn| H(γVn | π free<br />

Vn<br />

) ≥ η<br />

2|Vk| ,<br />

for n = n(m). Letting m grow <strong>to</strong> infinity we see that h(γ | Φ) ≥ η<br />

2|Vk| > 0,<br />

which c<strong>on</strong>tradicts (b).


9.5. Dobrushin-L<strong>an</strong>ford-Ruelle (DLR) variati<strong>on</strong>al principle 109<br />

As a corollary of the above variati<strong>on</strong>al principle, <strong>on</strong>e c<strong>an</strong> now deduce<br />

that G Φ is not empty, even when the spin space X is not compact.<br />

Corollary 9.10. If Φ is <strong>an</strong> absolutely summable shift-invari<strong>an</strong>t c<strong>on</strong>tinuous<br />

interacti<strong>on</strong> potential, then G Φ ∩ Mθ(Ω) is n<strong>on</strong>empty, c<strong>on</strong>vex, <strong>an</strong>d compact.<br />

Proof. We already know that G Φ ∩ Mθ(Ω) is c<strong>on</strong>vex. We have shown that<br />

G Φ ∩ Mθ(Ω) = {γ : h(γ | Φ) = 0} = ∩n≥1{γ : h(γ | Φ) ≤ 1/n}.<br />

By Theorem 9.6, h(· | Φ) is a rate functi<strong>on</strong>. Its infimum must then be 0<br />

<strong>an</strong>d the sets in the above intersecti<strong>on</strong> are n<strong>on</strong>empty. By Exercise 9.5, they<br />

are also compact. The intersecti<strong>on</strong> is thus n<strong>on</strong>empty <strong>an</strong>d compact, by the<br />

finite intersecti<strong>on</strong> property of nested compact sets; see Propositi<strong>on</strong> 4.21 of<br />

[18]. <br />

Using the noti<strong>on</strong> of subdifferentials, <strong>on</strong>e c<strong>an</strong> give a nice geometric interpretati<strong>on</strong><br />

of a <strong>Gibbs</strong> state. To this end, set B <strong>an</strong>d Mθ(Ω) in duality by<br />

〈Φ, ν〉 = −E ν [fΦ]. Then, by part (c) of Theorem 9.2, P (Φ) is a supremum<br />

over affine c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>an</strong>d is hence c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous<br />

<strong>on</strong> B.<br />

Corollary 9.11. Suppose γ ∈ Mθ(Ω). Then, γ ∈ G Φ if, <strong>an</strong>d <strong>on</strong>ly if, γ is a<br />

t<strong>an</strong>gent <strong>to</strong> P at Φ; i.e. γ ∈ ∂P (Φ).<br />

Exercise 9.12. Show that the duality above between B <strong>an</strong>d Mθ(Ω) satisfies<br />

Assumpti<strong>on</strong> 5.1 so that the induced weak <strong>to</strong>pologies are Hausdorff.


Phase tr<strong>an</strong>siti<strong>on</strong> in the<br />

Ising model<br />

Chapter 10<br />

We saw in Secti<strong>on</strong> 4.3 that the Curie-Weiss model of ferromagnetism captures<br />

the phase tr<strong>an</strong>siti<strong>on</strong> phenomen<strong>on</strong> known as sp<strong>on</strong>t<strong>an</strong>eous magnetizati<strong>on</strong>.<br />

A serious shortcoming of the Curie-Weiss model is that every pair of<br />

spins interacts <strong>with</strong> the same strength, so there are no spatial effects. A<br />

more realistic model of ferromagnetism is the Ising model where the spins<br />

are placed <strong>on</strong> the sites of <strong>an</strong> integer lattice <strong>an</strong>d <strong>on</strong>ly neighboring spins interact.<br />

This is still a crude approximati<strong>on</strong> because in reality spins are carried<br />

by electr<strong>on</strong>s. But the Ising model is still a worthwhile test case for ideas of<br />

statistical mech<strong>an</strong>ics <strong>an</strong>d it has been a rich source of deep mathematics.<br />

Fix a dimensi<strong>on</strong> d. Let the state space be given by X = {−1, +1}. This<br />

represents our assumpti<strong>on</strong> that <strong>on</strong>ly up or down spins are possible. The Ising<br />

model <strong>on</strong> Z d has a nearest neighbor potential described by three parameters:<br />

the strength of the nearest neighbor coupling (J > 0), the strength of the<br />

external magnetic field (h ∈ R), <strong>an</strong>d the inverse temperature (β > 0). The<br />

potential takes the form<br />

⎧<br />

⎪⎨ −Jσiσj, A = {i, j}, <strong>with</strong> i − j1 = 1,<br />

ΦA(σ) = −hσi,<br />

⎪⎩<br />

0,<br />

A = {i},<br />

otherwise.<br />

111


112 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

Here i1 is the ℓ1 norm <strong>an</strong>d thus i − j1 = 1 me<strong>an</strong>s that i <strong>an</strong>d j are<br />

nearest neighbors. C<strong>on</strong>sequently, the Hamilt<strong>on</strong>i<strong>an</strong> is<br />

H ω Λ(σΛ) = −J <br />

σiσj − J <br />

σiωj − h <br />

i,j∈Λ<br />

i−j 1 =1<br />

i∈Λ, j∈Λ c<br />

i−j 1 =1<br />

Taking the reference measure <strong>to</strong> be λ = ( 1<br />

2δ−1 + 1<br />

2<br />

specificati<strong>on</strong> Πβ,h,J = {πΛ : Λ ⊂ Zd finite} by<br />

where<br />

πΛ(ω, A) =<br />

1<br />

Z ω Λ,β,h,J<br />

<br />

ΩΛ<br />

Z ω Λ,β,h,J =<br />

<br />

i∈Λ<br />

σi .<br />

⊗Zd δ1) , <strong>on</strong>e c<strong>an</strong> define the<br />

1IA(σΛ, ωΛ c) e−βHω Λ (σΛ) λΛ(dσΛ),<br />

ΩΛ<br />

e −βHω Λ (σΛ) λΛ(dσΛ)<br />

is the partiti<strong>on</strong> functi<strong>on</strong>. We will use the notati<strong>on</strong> µ ω Λ,β,h,J = πΛ(ω, ·) <strong>an</strong>d<br />

will call ω the boundary c<strong>on</strong>diti<strong>on</strong>.<br />

The free Hamilt<strong>on</strong>i<strong>an</strong> <strong>an</strong>d the finite-volume <strong>Gibbs</strong> measures <strong>with</strong> free<br />

boundary c<strong>on</strong>diti<strong>on</strong>s are defined as above, but <strong>with</strong>out the boundary terms<br />

ωj. This is equivalent <strong>to</strong> setting ωj = 0 <strong>an</strong>d thus we will use the notati<strong>on</strong><br />

H0 Λ (σΛ) <strong>an</strong>d µ 0 Λ,β,h,J . We will also use the notati<strong>on</strong> µ+<br />

Λ,β,h,J for the case of<br />

+ boundary c<strong>on</strong>diti<strong>on</strong>, i.e. ωj = 1 for all j, <strong>an</strong>d similarly µ −<br />

Λ,β,h,J for the<br />

<strong>Gibbs</strong> measure <strong>with</strong> boundary c<strong>on</strong>diti<strong>on</strong> ω ≡ −1.<br />

The mapping T σ = −σ is called the global spin flip.<br />

∗Exercise 10.1. Check the following symmetry in the absence of <strong>an</strong> external<br />

magnetic field: µ +<br />

Λ,β,0,J = µ−<br />

Λ,β,0,J ◦ T .<br />

Already a first quick gl<strong>an</strong>ce shows this model is reas<strong>on</strong>able. Indeed, J ><br />

0 corresp<strong>on</strong>ds <strong>to</strong> ferromagnetism. When h = 0, H0 Λ has two ground states:<br />

all spins equal <strong>to</strong> +1 <strong>an</strong>d all spins equal <strong>to</strong> −1. At positive temperature <strong>an</strong>d<br />

<strong>with</strong>out external field, µ +<br />

Λ,β,0,J prefers + spins while µ−<br />

Λ,β,0,J prefers − spins.<br />

Based <strong>on</strong> what happens in the Curie-Weiss model, we expect that at low<br />

temperatures the infinite-volume limits of the two measures give distinct<br />

<strong>Gibbs</strong> states <strong>an</strong>d thereby phase tr<strong>an</strong>siti<strong>on</strong> occurs. This would happen if<br />

boundary effects c<strong>an</strong> reach the origin regardless of how large the volume Λ<br />

is. On the other h<strong>an</strong>d, at high temp boundary effects should be swamped<br />

by overwhelming noise <strong>an</strong>d a unique <strong>Gibbs</strong> state is expected.<br />

The free <strong>Gibbs</strong> measure µ 0 Λ,β,h,J prefers + or − spins depending <strong>on</strong> the<br />

sign of h. When both the external magnetic field <strong>an</strong>d boundary c<strong>on</strong>diti<strong>on</strong>s<br />

are present, the two compete. However, since the magnetic field acts <strong>on</strong> the<br />

whole volume <strong>on</strong>e would expect it <strong>to</strong> dominate boundary effects. So there<br />

should not be <strong>an</strong>y phase tr<strong>an</strong>siti<strong>on</strong> when h = 0.<br />

Here is the main theorem, validating the above ideas.


10.1. One-dimensi<strong>on</strong>al Ising model 113<br />

Theorem 10.2. The weak limits<br />

µ +<br />

β,h,J<br />

= lim µ+<br />

Λ↗Zd Λ,β,h,J<br />

<strong>an</strong>d µ−<br />

β,h,J<br />

exist for all d ∈ N, h ∈ R, J > 0, <strong>an</strong>d β > 0.<br />

= lim µ−<br />

Λ↗Zd Λ,β,h,J<br />

(a) When d = 1 there is a unique <strong>Gibbs</strong> measure for all h ∈ R, J > 0,<br />

<strong>an</strong>d β > 0 <strong>an</strong>d in particular µ +<br />

β,h,J<br />

= µ−<br />

β,h,J .<br />

(b) When d ≥ 2 there exists a finite, positive critical temperature βc =<br />

βc(J, d) such that this holds: if h = 0 or β < βc, then µ +<br />

β,h,J = µ−<br />

β,h,J<br />

<strong>an</strong>d the <strong>Gibbs</strong> measure is unique, while if h = 0 <strong>an</strong>d β > βc, then<br />

<strong>an</strong>d we have a phase tr<strong>an</strong>siti<strong>on</strong>.<br />

µ +<br />

β,h,J<br />

= µ−<br />

β,h,J<br />

The rest of this chapter is devoted <strong>to</strong> the proof of the above theorem.<br />

This is a hard task <strong>an</strong>d will take up a few secti<strong>on</strong>s.<br />

Remark 10.3. When d = 2 it is known that βc = 1<br />

2J log(1 + √ 2). See page<br />

100 of [22]. Also, when d = 2 µ +<br />

β,h,J <strong>an</strong>d µ−<br />

β,h,J are the <strong>on</strong>ly extreme <strong>Gibbs</strong><br />

measures. See Aizenm<strong>an</strong> [1; 2] <strong>an</strong>d Higuchi [24]. It then follows that all<br />

<strong>Gibbs</strong> measures are shift invari<strong>an</strong>t in dimensi<strong>on</strong> two. However, when d = 3<br />

<strong>an</strong>d h = 0 there exist values of β for which there are infinitely m<strong>an</strong>y extreme<br />

<strong>Gibbs</strong> measures; see Dobrushin [13] <strong>an</strong>d v<strong>an</strong> Beijeren [40]. It also follows<br />

from Dobrushin’s work that for d = 3, h = 0, <strong>an</strong>d β large, there exist <strong>Gibbs</strong><br />

measures that are not shift invari<strong>an</strong>t.<br />

Remark 10.4. It is known that when d ≥ 2 but d = 3, the <strong>Gibbs</strong> measure<br />

is unique for β = βc <strong>an</strong>d h = 0. It is also believed, but is <strong>an</strong> open questi<strong>on</strong>,<br />

that the same holds for d = 3; see [? ].<br />

Remark 10.5. It is worthwhile at this point <strong>to</strong> take a sec<strong>on</strong>d look at page<br />

93 where the familiar liquid-gas phase tr<strong>an</strong>siti<strong>on</strong> was discussed.<br />

10.1. One-dimensi<strong>on</strong>al Ising model<br />

In this secti<strong>on</strong> we prove part (a) of Theorem 10.2. First we observe that the<br />

compactness of Ω implies that the set of <strong>Gibbs</strong> measures G is n<strong>on</strong>empty for<br />

all values of the parameters β > 0, J > 0, <strong>an</strong>d h ∈ R; see Theorem 8.15.<br />

Next, fix γ ∈ G , m ∈ N, <strong>an</strong>d b−m, . . . , bm ∈ {−1, +1}. C<strong>on</strong>sider the cylinder<br />

set A = {σ ∈ Ω : σi = bi for |i| ≤ m}. Then, dominated c<strong>on</strong>vergence (see<br />

page 16 of [15] or page 46 of [26]) implies that<br />

γ(A) = lim<br />

n→∞ EγE<br />

γ [1IA|F [−n,n] c] = E γ<br />

lim<br />

n→∞ Eγ (10.1)<br />

[1IA|F [−n,n] c] .<br />

Next we derive the almost sure limit of ˜γn = Eγ [1IA|F [−n,n] c]. The existence<br />

of this limit follows also from the backwards-martingale c<strong>on</strong>vergence theorem<br />

(Theorem 6.1 <strong>on</strong> page 265 of [15] or page 155 of [26]).


114 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

Without loss of generality, we c<strong>an</strong> absorb β in<strong>to</strong> J <strong>an</strong>d h <strong>an</strong>d hence set<br />

β = 1. Fix τ ∈ Ω <strong>an</strong>d n > m. Write<br />

˜γn(τ) = E γ [1IA|F [−n,n] c](τ) = µ τ [−n,n],1,h,J {σi = bi, |i| ≤ m}<br />

<br />

exp<br />

=<br />

σ−n,...,σ−m−1<br />

σm+1,...,σn<br />

<br />

− H τ [−n,n] (σ−n, . . . , σ−m−1, b−m, . . . , bm, σm+1, . . . , σn)<br />

<br />

ω−n,...,ωn<br />

<br />

exp − H τ [−n,n] (ω−n,<br />

.<br />

. . . , ωn)<br />

Define, for x, y ∈ {−1, +1}, A(x, y) = eJxy+h(x+y)/2 . Then <strong>an</strong> exp<strong>on</strong>ential<br />

term in the numera<strong>to</strong>r above c<strong>an</strong> be rewritten as<br />

<br />

<br />

−m−2 <br />

m−1 <br />

A(τ−n−1, σ−n)<br />

i=−n<br />

× A(bm, σm+1)<br />

A(σi, σi+1)<br />

n−1<br />

<br />

i=m+1<br />

A(σ−m−1, b−m)<br />

A(σi, σi+1)<br />

<br />

i=−m<br />

A(bi, bi+1)<br />

<br />

A(σn, τn+1)e −h(τ−n−1+τn+1)/2 .<br />

If we think of A as a 2-by-2 matrix (called the tr<strong>an</strong>sfer matrix)<br />

<br />

A(−1, −1)<br />

A =<br />

A(1, −1)<br />

<br />

A(−1, 1) eJ−h =<br />

A(−1, −1)<br />

e−J <br />

<strong>an</strong>d define<br />

then we have<br />

a0 =<br />

m−1 <br />

i=−m<br />

A(bi, bi+1),<br />

e −J e J+h<br />

˜γn(τ) = An−m+1 (τ−n−1, b−m) a0 A n−m+1 (bm, τn+1)<br />

A 2n+2 (τ−n−1, τn+1)<br />

The matrix A is real <strong>an</strong>d symmetric. It has two real eigenvalues, say λ <strong>an</strong>d<br />

µ, <strong>with</strong> orth<strong>on</strong>ormal eigenvec<strong>to</strong>rs u <strong>an</strong>d v. Then<br />

<br />

A = u v<br />

λ<br />

0<br />

0<br />

µ<br />

u v<br />

Observe that the determin<strong>an</strong>t of A equals e 2J − e −2J > 0. Furthermore, its<br />

trace is e J−h + e J+h > 0 <strong>an</strong>d thus the two eigenvalues are positive. They<br />

c<strong>an</strong>not be equal since otherwise A would be a multiple of the identity matrix.<br />

We c<strong>an</strong>, therefore, assume λ > µ > 0. Since (A/λ) k = uu T + (µ/λ) k vv T , we<br />

see that the largest eigenvalue <strong>an</strong>d its eigenvec<strong>to</strong>r determine the asymp<strong>to</strong>tics<br />

T<br />

.<br />

.


10.2. Phase tr<strong>an</strong>siti<strong>on</strong> at low temperature 115<br />

of A. Define U = uu T <strong>an</strong>d write<br />

˜γn(τ) = a0λ −2m (A/λ)n−m+1 (τ−n−1, b−m)(A/λ) n−m+1 (bm, τn+1)<br />

(A/λ) 2n+2 (τ−n−1, τn+1)<br />

= a0λ −2m (U(τ−n−1, b−m) + o(1))(U(bm, τn+1) + o(1))<br />

U(τ−n−1, τn+1) + o(1)<br />

= a0λ −2m u(τ−n−1)u(b−m)u(bm)u(τn+1) + o(1)<br />

u(τ−n−1)u(τn+1) + o(1)<br />

Direct inspecti<strong>on</strong> shows that if u has a zero entry, then it c<strong>an</strong>not be <strong>an</strong><br />

eigenvec<strong>to</strong>r of A. Thus, ˜γn(τ) c<strong>on</strong>verges <strong>to</strong> a0λ −2m u(b−m)u(bm), which is<br />

independent of τ. Hence, (10.1) implies that γ(A) = a0λ −2m u(b−m)u(bm).<br />

This characterizes γ ∈ G <strong>an</strong>d part (a) of Theorem 10.2 is proved.<br />

In fact, we have proved a little bit more th<strong>an</strong> absence of phase tr<strong>an</strong>siti<strong>on</strong><br />

in the <strong>on</strong>e-dimensi<strong>on</strong>al model.<br />

Exercise 10.6. Define the vec<strong>to</strong>r ρ(x) = u(x) 2 <strong>an</strong>d the matrix<br />

P (x, y) =<br />

A(x, y)u(y)<br />

, x, y ∈ {−1, +1}.<br />

λu(x)<br />

Check that P is a tr<strong>an</strong>siti<strong>on</strong> matrix <strong>an</strong>d that ρP = P ; i.e. ρ is <strong>an</strong> invari<strong>an</strong>t<br />

measure for P . Check also that γ is precisely the law of the Markov chain<br />

<strong>with</strong> marginal ρ <strong>an</strong>d tr<strong>an</strong>siti<strong>on</strong> matrix P . In other words, this Markov chain<br />

is the unique <strong>Gibbs</strong> measure for the <strong>on</strong>e-dimensi<strong>on</strong>al Ising model.<br />

Remark 10.7. One c<strong>an</strong> think of the Ising model as a Markov field; i.e. a<br />

s<strong>to</strong>chastic process (Xi) i∈Z d where the distributi<strong>on</strong> of Xi, given (Xj)j=i is<br />

determined by the values of Xj for |j − i| = 1. Specificati<strong>on</strong>s then play the<br />

role the tr<strong>an</strong>siti<strong>on</strong> matrix plays for Markov chains. Then, when d = 1, the<br />

situati<strong>on</strong> is similar <strong>to</strong> the Markov chain case <strong>an</strong>d the s<strong>to</strong>chastic process is<br />

completely determined by the specificati<strong>on</strong>. However, this is no l<strong>on</strong>ger true<br />

for d ≥ 2.<br />

10.2. Phase tr<strong>an</strong>siti<strong>on</strong> at low temperature<br />

It is the absence of phase tr<strong>an</strong>siti<strong>on</strong> in the <strong>on</strong>e-dimensi<strong>on</strong>al case that led <strong>to</strong><br />

the dismissal of the Ising model as a model of ferromagnetism until 1936<br />

when Peierls asked whether it is possible at low temperature, starting <strong>with</strong><br />

+ spins everywhere, <strong>to</strong> fluctuate <strong>to</strong> a state where most of the spins are<br />

−. For this <strong>to</strong> happen, Peierls argued, droplets of − spin must “freeze”<br />

<strong>to</strong> make the − state. Since spins align inside the droplet the energy of<br />

a droplet is proporti<strong>on</strong>al <strong>to</strong> its circumference. There are at most k3 k−1<br />

droplets of circumference k surrounding the origin. This me<strong>an</strong>s that the<br />

likelihood of trapping the origin in a droplet of length more th<strong>an</strong> k is at<br />

.


116 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

most C <br />

m≥k m3m−1 e −cmβ . If β is large enough this decays exp<strong>on</strong>entially<br />

fast <strong>with</strong> k. Therefore, droplets of − spins surrounding the origin are small,<br />

which me<strong>an</strong>s boundary does influence the origin. Peierls c<strong>on</strong>cluded that the<br />

<strong>an</strong>swer <strong>to</strong> his questi<strong>on</strong> is in the negative <strong>an</strong>d a phase tr<strong>an</strong>siti<strong>on</strong> must occur in<br />

the two-dimensi<strong>on</strong>al Ising model for β large enough. We make this rigorous<br />

in the proof of the next theorem.<br />

Theorem 10.8. Let d = 2 <strong>an</strong>d set h = 0. Then for all large enough β<br />

the following holds: every weak limit point of the sequence {µ +<br />

Λ,β,0,J : Λ ⊂<br />

Zd finite} is distinct from every limit point of {µ −<br />

Λ,β,0,J : Λ ⊂ Zd finite}.<br />

C<strong>on</strong>sequently there exists more th<strong>an</strong> <strong>on</strong>e <strong>Gibbs</strong> measure <strong>an</strong>d phase tr<strong>an</strong>siti<strong>on</strong><br />

happens at h = 0 <strong>an</strong>d large β.<br />

This theorem is a first step <strong>to</strong>wards proving part (b) of Theorem 10.2.<br />

The rigorous proof is due <strong>to</strong> Griffiths [23] <strong>an</strong>d Dobrushin [11].<br />

Proof. Let us temporarily drop β > 0, h = 0, <strong>an</strong>d J > 0 from the notati<strong>on</strong><br />

<strong>an</strong>d write µ ±<br />

Λ = µ±<br />

Λ,β,0,J . We expect the + boundary c<strong>on</strong>diti<strong>on</strong> <strong>to</strong> make it<br />

more likely for the spin at 0 <strong>to</strong> be +. In other words, we would like <strong>to</strong> prove<br />

the following.<br />

Claim 10.9. For some δ > 0, µ +<br />

Λ {σ0 = −1} ≤ 1<br />

2 − δ for all finite Λ ⊂ Z2 .<br />

Let us assume this claim for the moment <strong>an</strong>d finish the proof of the<br />

theorem. By the spin-flip symmetry (Exercise 10.1)<br />

µ −<br />

Λ {σ0 = −1} = 1 − µ −<br />

Λ {σ0 = 1} = 1 − µ +<br />

Λ {σ0 = −1} ≥ 1<br />

2 + δ<br />

for all finite Λ ⊂ Z2 . Let µ + be <strong>an</strong> arbitrary limit point of {µ +<br />

Λ }. Pick a<br />

sequence Λn ↗ Z2 such that µ +<br />

Λn → µ+ weakly. Then since {σ0 = −1} is<br />

both open <strong>an</strong>d closed<br />

µ + {σ0 = −1} = lim<br />

n→∞ µ+<br />

Λn {σ0 = −1} ≤ 1<br />

2 − δ.<br />

By the same reas<strong>on</strong>ing <strong>an</strong> arbitrary limit point µ − of {µ −<br />

Λ } satisfies µ− {σ0 =<br />

−1} ≥ 1<br />

2 + δ. This implies that µ− = µ+ <strong>an</strong>d the theorem is proved. It<br />

remains <strong>to</strong> prove Claim 10.9.<br />

Proof of Claim 10.9. Given a c<strong>on</strong>figurati<strong>on</strong> σΛ ∈ ΩΛ <strong>with</strong> + boundary<br />

c<strong>on</strong>diti<strong>on</strong>, separate each nearest-neighbor {+, −}-spin pair <strong>with</strong> a horiz<strong>on</strong>tal<br />

or vertical unit line segment drawn halfway between the sites, as in Figure<br />

10.1. (Precisely speaking, these line segments are edges between nearestneighbor<br />

points of the lattice (1/2, 1/2) + Z 2 . But we do not need a formal<br />

discussi<strong>on</strong>.) Due <strong>to</strong> the + boundary c<strong>on</strong>diti<strong>on</strong> these line segments must form<br />

closed circuits (or loops, or c<strong>on</strong><strong>to</strong>urs). If σ0 = −1, there is a unique smallest<br />

such circuit that surrounds the origin <strong>an</strong>d forms the boundary of the minus<br />

cluster of the origin.


10.2. Phase tr<strong>an</strong>siti<strong>on</strong> at low temperature 117<br />

-<br />

+<br />

+ + + + + + + +<br />

+<br />

+ + ++++<br />

+<br />

+<br />

+<br />

+<br />

+ + + + + + +<br />

+ - - - + +<br />

+ - - - - - -<br />

- + - +<br />

+ -<br />

+ + - - + +<br />

+ + - - - + +<br />

- + + + + -- +<br />

- - + - - +<br />

Figure 10.1. Peierls’ c<strong>on</strong><strong>to</strong>urs. The ellipse marks the minus spin at<br />

the origin <strong>an</strong>d the thick line shows the c<strong>on</strong><strong>to</strong>ur surrounding it.<br />

For <strong>an</strong> arbitrary closed circuit γ surrounding the origin, let ΩΛ,γ denote<br />

the event that σ0 = −1 <strong>an</strong>d γ is the boundary of the minus cluster of the<br />

origin. Let Γk be the set of closed circuits of length k surrounding the origin.<br />

Then<br />

µ +<br />

Λ {σ0 = −1} ≤<br />

∞ <br />

k=1 γ∈Γk<br />

µ +<br />

Λ (ΩΛ,γ).<br />

Lemma 10.10. Setting as above. We have |Γk| ≤ k3 k−1 for all k ≥ 1. If<br />

γ ∈ Γk then µ +<br />

Λ (ΩΛ,γ) ≤ e −2kβJ .<br />

Proof. This first claim is clear. Indeed, since 0 is trapped inside γ <strong>an</strong>d γ<br />

is a closed loop of length k, the c<strong>on</strong><strong>to</strong>ur must c<strong>on</strong>tain <strong>on</strong>e of the vertical<br />

line segments [(i + 1/2, −1/2), (i + 1/2, 1/2)] for some 0 ≤ i < k. Start by<br />

choosing <strong>on</strong>e of these k edges. Proceeding from (say) the <strong>to</strong>p endpoint of<br />

this edge, build the c<strong>on</strong><strong>to</strong>ur by choosing the remaining edges in order. At<br />

each of the remaining k − 1 steps there are at most 3 available edges.<br />

Suppose now that under c<strong>on</strong>figurati<strong>on</strong> σΛ, γ ∈ Γk is the c<strong>on</strong><strong>to</strong>ur surrounding<br />

the − cluster at 0. Let us flip all the spins inside γ <strong>to</strong> get the new<br />

c<strong>on</strong>figurati<strong>on</strong> (see Figure 10.2)<br />

(10.2)<br />

¯σj =<br />

<br />

−σj if j is inside γ,<br />

σj<br />

if j is outside γ.<br />

This mapping aligns every nearest-neighbor pair of spins separated by<br />

γ, but the alignment or disagreement of a spin pair entirely inside or outside<br />

γ is not affected. C<strong>on</strong>sequently<br />

J ¯σi¯σj =<br />

<br />

Jσiσj<br />

if γ does not separate i, j,<br />

Jσiσj + 2J if γ separates i, j.


118 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

-<br />

+<br />

+ + + + + + + +<br />

+<br />

+ + ++++<br />

+<br />

+<br />

+<br />

+<br />

+ + + + + + +<br />

+ + + - + +<br />

+ + + + + + + +<br />

+ - -<br />

- -<br />

+ + + + + +<br />

+ + + + + + +<br />

- + + + + -- +<br />

- - + - - +<br />

Figure 10.2. The c<strong>on</strong>figurati<strong>on</strong> resulting from applying (10.2) <strong>to</strong> Figure<br />

10.1.<br />

The same formula applies also <strong>to</strong> pairs {i, j} where <strong>on</strong>e site lies outside Λ<br />

<strong>an</strong>d so its spin is part of the + boundary c<strong>on</strong>diti<strong>on</strong>. For the <strong>to</strong>tal ch<strong>an</strong>ge of<br />

the Hamilt<strong>on</strong>i<strong>an</strong> we get<br />

H +<br />

Λ (¯σΛ) = −J <br />

¯σi¯σj − J <br />

¯σi<br />

i,j∈Λ<br />

i−j 1 =1<br />

= −J <br />

i,j∈Λ<br />

i−j 1 =1<br />

= H +<br />

Λ (σΛ) − 2kJ.<br />

i∈Λ,j∈Λ c<br />

i−j 1 =1<br />

σiσj − J <br />

i∈Λ,j∈Λ c<br />

i−j 1 =1<br />

σi − 2kJ<br />

2kJ is then seen <strong>to</strong> be the energy cost of the circuit. We have<br />

<br />

<br />

µ +<br />

Λ (ΩΛ,γ) =<br />

<br />

σΛ∈ΩΛ,γ e−βH+ Λ (σΛ)<br />

σΛ∈ΩΛ e−βH+<br />

= e−2kβJ<br />

Λ (σΛ)<br />

<br />

σΛ∈ΩΛ,γ e−βH+ Λ (¯σΛ)<br />

σΛ∈ΩΛ e−βH+ Λ (σΛ)<br />

For a given γ the mapping (10.2) is <strong>on</strong>e-<strong>to</strong>-<strong>on</strong>e. Hence the sum in the<br />

numera<strong>to</strong>r of the right-most fracti<strong>on</strong> c<strong>on</strong>tains a subset of the terms in the<br />

denomina<strong>to</strong>r. Thus, the ratio is less th<strong>an</strong> <strong>on</strong>e <strong>an</strong>d the lemma is proved. <br />

To complete the proof of Claim 10.9, <strong>an</strong>d thus of the theorem, write<br />

µ +<br />

Λ {σ0<br />

∞<br />

= −1} ≤ k3 k−1 e −2kβJ ∞<br />

≤ e −2(βJ−log 3)k ≤ 1<br />

− δ,<br />

2<br />

k=1<br />

for β large enough. <br />

Note that this argument fails in <strong>on</strong>e dimensi<strong>on</strong> because the cost <strong>to</strong> flip<br />

the spins in <strong>an</strong>y interval is e −4βJ , while there are n 2 possible intervals surrounding<br />

0 in (−n, n).<br />

k=1<br />

.


10.3. Uniqueness of phase at high temperature 119<br />

10.3. Uniqueness of phase at high temperature<br />

When β is small the <strong>Gibbs</strong> measure should be “close” <strong>to</strong> λ <strong>an</strong>d <strong>on</strong>e should<br />

not have phase tr<strong>an</strong>siti<strong>on</strong>.<br />

Theorem 10.11. Fix d ∈ N, J > 0, <strong>an</strong>d h = 0. Then, if β > 0 is small<br />

enough, G is a singlet<strong>on</strong>.<br />

This is the sec<strong>on</strong>d step in the proof of part (b) of Theorem 10.2. Technically,<br />

this is shown via a fixed point theorem. We will need some preliminaries<br />

before we present the proof. We will omit β, J > 0, <strong>an</strong>d h = 0 from<br />

the indices in what follows.<br />

Fix µ ∈ G . Let us start by defining the so-called correlati<strong>on</strong> functi<strong>on</strong> for<br />

µ as<br />

for Λ ⊂ Z d finite.<br />

ρ(Λ) = µ{σi = 1 ∀i ∈ Λ}<br />

∗ Exercise 10.12. Use Show that ρ determines µ.<br />

Hint: For finite, disjoint A, B ⊂ Z d exp<strong>an</strong>d the product in<br />

1I{σ = 1 <strong>on</strong> A <strong>an</strong>d σ = −1 <strong>on</strong> B}<br />

= <br />

1I{σ = 1 <strong>on</strong> A} − 1I{σ = 1 <strong>on</strong> A ∪ {i}} .<br />

i∈B<br />

∗Exercise 10.13. Fix Λ ⊂ Zd finite <strong>an</strong>d i ∈ Λ. Let Ni = {j ∈ Zd :<br />

i − j1 = 1}. Let B ⊂ Ni Λ. Use inclusi<strong>on</strong>-exclusi<strong>on</strong> <strong>to</strong> prove that<br />

<br />

<br />

µ σk = 1 ∀k ∈ (Λ {i}) ∪ B, σj = −1 ∀j ∈ Ni (B ∪ Λ)<br />

=<br />

<br />

ρ((Λ {i}) ∪ A)(−1) |B| (−1) |A| .<br />

A:B⊂A⊂NiΛ<br />

Next, for i ∈ Zd <strong>an</strong>d a collecti<strong>on</strong> of its nearest neighbors B ⊂ Ni = {j ∈<br />

Zd : j − i1 = 1}, let µ B i be the <strong>Gibbs</strong> measure <strong>on</strong> the volume {i} <strong>with</strong><br />

boundary c<strong>on</strong>diti<strong>on</strong> τj = 1 if j ∈ B <strong>an</strong>d τj = −1 otherwise. Then for Λ <strong>an</strong>d<br />

V , two finite subsets of Zd , <strong>an</strong>d for i ∈ Λ, define the Kirkwood-Salzburg<br />

kernel, <strong>on</strong> the space of finite subsets of Zd , as<br />

Ki(Λ, V ) = (−1)<br />

|A| <br />

B⊂A<br />

(−1) |B| µ (Λ{i})∪B<br />

{σi = 1},<br />

if V = (Λ {i}) ∪ A for some A ⊂ Ni Λ. Otherwise we set Ki(Λ, V ) = 0.<br />

Then opera<strong>to</strong>rs K are c<strong>on</strong>tracti<strong>on</strong>s.<br />

i


120 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

Lemma 10.14. If β > 0 is small enough, then for <strong>an</strong>y finite Λ ⊂ Zd <strong>an</strong>d<br />

<strong>an</strong>y i ∈ Λ<br />

<br />

|Ki(Λ, V )| ≤ 1<br />

2 .<br />

V ⊂Z d finite<br />

Proof. Choose β > 0 small enough so that<br />

<br />

<br />

e<br />

sup <br />

<br />

−βJk<br />

e−βJk −<br />

+ eβJk Now write <br />

V ⊂Z d finite<br />

k,ℓ∈{0,±2,...,±2d}<br />

|Ki(Λ, V )| = <br />

A⊂NiΛ<br />

B⊂A<br />

e −βJℓ<br />

e −βJℓ + e βJℓ<br />

<br />

<br />

<br />

<br />

<br />

1<br />

< .<br />

24d <br />

<br />

<br />

(−1) |B| µ (Λ{i})∪B<br />

<br />

<br />

{σi = 1} .<br />

In the sum over B inside the absolute value, group terms <strong>with</strong> |B| even<br />

<strong>with</strong> terms <strong>with</strong> |B| odd. There is a <strong>on</strong>e-<strong>to</strong>-<strong>on</strong>e corresp<strong>on</strong>dence between<br />

such terms <strong>an</strong>d by the choice of β, all the terms µ (Λ{i})∪B<br />

{σi = 1} are<br />

i<br />

<strong>with</strong>in 1<br />

24d of each other. Hence, the sum over B inside the absolute value is<br />

bounded by 2 |A|−1 · 2−4d . Since |A| ≤ 2d, this is bounded by 2−(2d+1) . But<br />

since |Ni| ≤ 2d, there are at most 22d possible sets A, <strong>an</strong>d hence the whole<br />

thing is bounded by 1/2. <br />

Next, <strong>on</strong>e observes that Ki’s fix the functi<strong>on</strong> ρ.<br />

Kirkwood-Salzburg equati<strong>on</strong>. Let ρ be the correlati<strong>on</strong> functi<strong>on</strong> corresp<strong>on</strong>ding<br />

<strong>to</strong> a <strong>Gibbs</strong> measure µ. For <strong>an</strong>y Λ ⊂ Zd finite <strong>an</strong>d <strong>an</strong>y i ∈ Λ,<br />

ρ(Λ) =<br />

<br />

ρ(V )Ki(Λ, V ).<br />

V ⊂Z d finite<br />

Proof. Compute<br />

ρ(Λ) = <br />

<br />

µ σk = 1 ∀k ∈ B ∪ Λ, σj = −1 ∀j ∈ Ni (Λ ∪ B)<br />

= <br />

B⊂NiΛ<br />

= <br />

B⊂NiΛ<br />

= <br />

=<br />

A⊂NiΛ<br />

B⊂NiΛ<br />

<br />

V ⊂Z d finite<br />

<br />

<br />

µ σ ≡ 1 <strong>on</strong> (Λ {i}) ∪ B, σ ≡ −1 <strong>on</strong> Ni (B ∪ Λ)<br />

µ (Λ{i})∪B<br />

i {σi = 1} <br />

ρ((Λ {i}) ∪ A)(−1)<br />

ρ(V )Ki(Λ, V ).<br />

A:B⊂A⊂NiΛ<br />

<br />

|A|<br />

B⊂A<br />

i<br />

× µ (Λ{i})∪B<br />

i {σi = 1}<br />

ρ((Λ {i}) ∪ A)(−1) |B| (−1) |A|<br />

(−1) |B| µ (Λ{i})∪B<br />

{σi = 1}<br />

In the third equality we used Exercise 10.13. <br />

i


10.4. Case of no external field 121<br />

As menti<strong>on</strong>ed earlier, a fixed point theorem now finishes the proof of<br />

Theorem 10.11.<br />

Proof of Theorem 10.11. Let µ, ˜µ be in G <strong>an</strong>d denote by ρ <strong>an</strong>d ˜ρ the<br />

corresp<strong>on</strong>ding correlati<strong>on</strong> functi<strong>on</strong>s. Fix a finite Λ ⊂ Zd <strong>an</strong>d i ∈ Λ. Then,<br />

<br />

<br />

|ρ(Λ) − ˜ρ(Λ)| = <br />

<br />

<br />

Ki(Λ, V )[ρ(V ) − ˜ρ(V )] <br />

V ⊂Z d finite<br />

≤ 1<br />

sup<br />

2 V ⊂Zd |ρ(V ) − ˜ρ(V )|.<br />

finite<br />

Taking sup over Λ implies that ρ ≡ ˜ρ which implies that µ = ˜µ. <br />

Exercise 10.15. Check that there is nothing special about h = 0 <strong>an</strong>d that<br />

<strong>on</strong>e c<strong>an</strong> actually prove that there is a unique <strong>Gibbs</strong> measure for <strong>an</strong>y fixed h<br />

<strong>an</strong>d small β > 0.<br />

10.4. Case of no external field<br />

To complete Theorems 10.8 <strong>an</strong>d 10.11 in<strong>to</strong> a proof of part (b) for the case<br />

h = 0 we make the following two informal observati<strong>on</strong>s.<br />

First, increasing the temperature increases the effect of noise. Hence, if<br />

at a given temperature there was no phase tr<strong>an</strong>siti<strong>on</strong>, the same should hold<br />

for higher values of the temperature. In other words, if there is no phase<br />

tr<strong>an</strong>siti<strong>on</strong> at a value β = β0, then there should not be a phase tr<strong>an</strong>siti<strong>on</strong> at<br />

<strong>an</strong>y lower value β ≤ β0.<br />

Sec<strong>on</strong>d, increasing dimensi<strong>on</strong> increases the size of the boundary relative<br />

<strong>to</strong> the volume making it easier <strong>to</strong> have distinct <strong>Gibbs</strong> states. Thus, <strong>on</strong>e<br />

might expect that if there is phase tr<strong>an</strong>siti<strong>on</strong> in dimensi<strong>on</strong> d, then there will<br />

also be phase tr<strong>an</strong>siti<strong>on</strong> in higher dimensi<strong>on</strong>s.<br />

Before making this precise we develop some preliminary definiti<strong>on</strong>s <strong>an</strong>d<br />

results.<br />

Let us put a partial order <strong>on</strong> elements of Ω. We say that ω ≤ σ if ωi ≤ σi<br />

for all i ∈ Z d . A functi<strong>on</strong> f : Ω → R is increasing if f(ω) ≤ f(σ) whenever<br />

ω ≤ σ. We c<strong>an</strong> then define a partial order <strong>on</strong> M1(Ω) by saying that µ ≤ ν<br />

(ν s<strong>to</strong>chastically dominates µ) if E µ [f] ≤ E ν [f] for all increasing functi<strong>on</strong>s<br />

f ∈ Cb(Ω).<br />

Exercise 10.16. Check that this relati<strong>on</strong> does define a partial order <strong>on</strong><br />

elements of M1(Ω); i.e. prove that if µ ≤ ν <strong>an</strong>d ν ≤ µ, then µ = ν.<br />

Hint: Observe that 1I{σ : σi = 1 ∀i ∈ A} is increasing.<br />

We will need the following interesting fact. We defer its proof <strong>to</strong> Appendix<br />

A.3.


122 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

Strassen’s lemma. Recall that Ω = {−1, 1} Zd.<br />

Let µ, ν ∈ M1(Ω) be such<br />

that µ ≤ ν. Assume that µ{σi = 1} = ν{σi = 1} for all i ∈ Zd . Then µ = ν.<br />

The next theorem expresses the intuiti<strong>on</strong> that flipping some − spins <strong>to</strong><br />

+ spins at the boundary is favorable for + spins inside the volume.<br />

Theorem 10.17. Fix d ∈ N, β > 0, J > 0, <strong>an</strong>d h ∈ R. Also fix a finite<br />

Λ ⊂ Z d . If ω ≤ σ, then µ ω Λ,β,h,J ≤ µσ Λ,β,h,J .<br />

Proof. The proof uses Holley’s inequality, the proof of which is given in<br />

Appendix C.1. Define pointwise minima <strong>an</strong>d maxima by (η ∧ ξ)i = ηi ∧ ξi<br />

<strong>an</strong>d (η ∨ ξ)i = ηi ∨ ξi.<br />

Holley’s inequality. Let ΩΛ = {−1, 1} Λ for some finite Λ ⊂ Z d . Let µ<br />

<strong>an</strong>d ν be strictly positive probability measures <strong>on</strong> ΩΛ. If<br />

for all η <strong>an</strong>d ξ in ΩΛ, then µ ≤ ν.<br />

µ(η ∧ ξ)ν(η ∨ ξ) ≥ µ(η)ν(ξ)<br />

Let us drop the indexes β, h, <strong>an</strong>d J. According <strong>to</strong> the above theorem,<br />

it is enough <strong>to</strong> prove that<br />

This tr<strong>an</strong>slates in<strong>to</strong><br />

µ ω Λ(η ∧ ξ)µ σ Λ(η ∨ ξ) ≥ µ ω Λ(η)µ σ Λ(ξ).<br />

H ω Λ(η ∧ ξ) + H σ Λ(η ∨ ξ) ≤ H ω Λ(η) + H σ Λ(ξ).<br />

It is enough <strong>to</strong> show that the above inequality holds term by term, in the<br />

sums defining the Hamilt<strong>on</strong>i<strong>an</strong>s. That is<br />

(ηi ∧ ξi)(ηj ∧ ξj) + (ηi ∨ ξi)(ηj ∨ ξj) ≥ ηiηj + ξiξj, ∀i, j ∈ Λ,<br />

(ηi ∧ ξi)ωj + (ηi ∨ ξi)σj ≥ ηiωj + ξiσj, ∀i ∈ Λ, ∀j ∈ Λ,<br />

h(ηi ∧ ξi + ηi ∨ ξi) ≤ h(ηi + ξi), ∀i ∈ Λ.<br />

The third line is obviously true. In fact there is equality there. Next, observe<br />

that if ηi = −1 or ξi = 1, then ηi ∧ ξi = ηi <strong>an</strong>d ηi ∨ ξi = ξi. Thus, in this<br />

case there is equality in the sec<strong>on</strong>d line. On the other h<strong>an</strong>d, when ηi = 1<br />

<strong>an</strong>d ξi = −1 the sec<strong>on</strong>d line reads σj − ωj ≥ ωj − σj, which is true since<br />

ω ≤ σ.<br />

As <strong>to</strong> the first line, it is again <strong>an</strong> equality unless ηi = ξj = 1 <strong>an</strong>d<br />

ξi = ηj = −1 or ηj = ξi = 1 <strong>an</strong>d ξj = ηi = −1, in which case it reads<br />

2 ≥ −2. <br />

As a c<strong>on</strong>sequence of the above <strong>on</strong>e sees that − <strong>an</strong>d + boundary c<strong>on</strong>diti<strong>on</strong>s<br />

are special.<br />

Theorem 10.18. Fix β > 0, J > 0, h ∈ R, <strong>an</strong>d d ∈ N.


10.4. Case of no external field 123<br />

(a) µ −<br />

Λ,β,h,J ≤ µω Λ,β,h,J ≤ µ+<br />

Λ,β,h,J for all Λ ⊂ Zd .<br />

(b) µ +<br />

Λ,β,h,J<br />

finite.<br />

≤ µ+<br />

∆,β,h,J<br />

<strong>an</strong>d µ−<br />

Λ,β,h,J<br />

≥ µ−<br />

∆,β,h,J<br />

(c) As Λ increases <strong>to</strong> Z d measures µ +<br />

Λ,β,h,J<br />

verge weakly <strong>to</strong> a limit µ +<br />

β,h,J<br />

(d) µ +<br />

β,h,J<br />

<strong>an</strong>d µ−<br />

β,h,J<br />

(e) µ −<br />

β,h,J ≤ µ ≤ µ+<br />

β,h,J<br />

for all ∆ ⊂ Λ ⊂ Zd<br />

(respectively, µ−<br />

Λ,β,h,J ) c<strong>on</strong>-<br />

(respectively, µ−<br />

β,h,J ).<br />

are shift-invari<strong>an</strong>t <strong>Gibbs</strong> measures.<br />

for all <strong>Gibbs</strong> measures µ.<br />

Proof. Part (a) is <strong>an</strong> immediate c<strong>on</strong>sequence of the previous theorem. Part<br />

(b) follows from part (a) <strong>an</strong>d the definiti<strong>on</strong> of a specificati<strong>on</strong>. Indeed, let<br />

f ∈ Cb(Ω) be increasing. Then, part (c) of Definiti<strong>on</strong> 8.7 implies<br />

<br />

E µ+<br />

Λ,β,h,J [f] = E µ+<br />

<br />

Λ,β,h,J E µω ∆,β,h,J [f]<br />

≤ E µ+<br />

∆,β,h,J [f].<br />

Part (c) follows from part (b). To see this observe first that for <strong>an</strong>y<br />

finite volume V ⊂ Zd the sequence µ +<br />

Λ,β,h,J {σi = 1, ∀i ∈ V } is decreasing as<br />

Λ increases. Hence, this sequence c<strong>on</strong>verges <strong>to</strong> some value ρ(V ). Inclusi<strong>on</strong>exclusi<strong>on</strong><br />

then shows that for <strong>an</strong>y cylinder set A ∈ FV , µ +<br />

Λ,β,h,J (A) c<strong>on</strong>verges<br />

<strong>to</strong> some νV (A), determined by {ρ(∆) : ∆ ⊂ V }. <strong>Measures</strong> νV ∈ M1(ΩV )<br />

are c<strong>on</strong>sistent <strong>an</strong>d Kolmogorov’s extensi<strong>on</strong> theorem (page 474 of [15] or<br />

page 60 of [26]) produces a unique probability measure µ +<br />

β,h,J that has these<br />

marginals. This is the weak limit of the measures µ +<br />

Λ,β,h,J <strong>on</strong> the full space Ω<br />

because weak c<strong>on</strong>vergence follows from c<strong>on</strong>vergence of marginals (Exercise<br />

7.1). The same argument works for µ −<br />

β,h,J .<br />

These limits are <strong>Gibbs</strong> measures due <strong>to</strong> Theorem 8.15 (taking νn = δω≡1<br />

then νn = δω≡−1). The shift invari<strong>an</strong>ce follows from applying Exercise 8.3 <strong>to</strong><br />

obtain the inequalities µ +<br />

V n−|i|,β,h,J<br />

≥ µ+<br />

Vn,β,h,J ◦θi = µ +<br />

Vn+i,β,h,J<br />

≥ µ+<br />

V n+|i|,β,h,J<br />

<strong>an</strong>d then taking a limit. Similar reas<strong>on</strong>ing for µ −<br />

β,h,J . Here, Vn = {j ∈ Z d :<br />

|j| < n}.<br />

To prove part (e) let µ be a <strong>Gibbs</strong> measure. Let f ∈ Cb(Ω) be a bounded<br />

increasing functi<strong>on</strong>. Let Vn be <strong>an</strong> increasing sequence of cubes exhausting<br />

Z d . Then, dominated c<strong>on</strong>vergence (see page 16 of [15] or page 46 of [26])<br />

implies that<br />

E µ [f] = lim<br />

n→∞ EµE<br />

µ [f | FV c n ]<br />

<br />

= lim<br />

n→∞ EµE<br />

µω <br />

Vn,β,h,J [f]<br />

≤ lim<br />

n→∞ EµE<br />

µ+<br />

<br />

Vn,β,h,J [f] = E µ+<br />

β,h,J [f].<br />

Similar argument for the other inequality. <br />

Proof of part (b) of Theorem 10.2 when h = 0. Observe that due <strong>to</strong><br />

(e) in the above theorem, phase tr<strong>an</strong>siti<strong>on</strong> at (β, 0, J) is equivalent <strong>to</strong> µ −<br />

β,0,J =


124 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

µ +<br />

β,0,J which, by shift invari<strong>an</strong>ce <strong>an</strong>d Strassen’s lemma, is equivalent <strong>to</strong><br />

µ −<br />

β,0,J {σ0 = 1} = µ +<br />

β,0,J {σ0 = 1}. By (e), Exercise 10.1, <strong>an</strong>d (c) this in<br />

turn is equivalent <strong>to</strong> having µ +<br />

β,0,J {σ0 = 1} > 1/2.<br />

We invoke <strong>an</strong>other useful inequality. The proof c<strong>an</strong> be found in Appendix<br />

C.2.<br />

Griffiths’ inequality. Fix a finite volume Λ ⊂ Zd . Let E = {{i, j} : |i−j| =<br />

1, i ∈ Λ} be the set of nearest-neighbor edges of Λ, including edges from Λ<br />

<strong>to</strong> its complement. Define the functi<strong>on</strong> F : [0, ∞) E → [−1, 1] by<br />

<br />

σΛ<br />

F (J) =<br />

σ0<br />

<br />

exp {i,j}∈E Ji,j<br />

<br />

σiσj<br />

<br />

exp {i,j}∈E Ji,j<br />

,<br />

σiσj<br />

σΛ<br />

where J = (Ji,j) {i,j}∈E, the sums run over σΛ ∈ {−1, 1} Λ , <strong>an</strong>d for j ∈ Λ we<br />

set σj = 1 (i.e. the + boundary c<strong>on</strong>diti<strong>on</strong>). Then,<br />

∂F<br />

∂Ji,j<br />

(J) ≥ 0 ∀{i, j} ∈ E <strong>an</strong>d ∀J ∈ [0, ∞) E .<br />

∗ Exercise 10.19. Fix J > 0 <strong>an</strong>d h = 0. Use Griffiths’ inequality <strong>to</strong> show<br />

that E µ+<br />

Λ,β,0,J [σ0] increases <strong>with</strong> β. Also, show that if V (d)<br />

n = {i ∈ Z d : |i| ≤<br />

n}, then <br />

σ0 dµ +<br />

V (d)<br />

n ,β,0,J ≤<br />

<br />

σ0 dµ +<br />

V (d+1)<br />

n<br />

,β,0,J .<br />

C<strong>on</strong>clude that µ +<br />

β,0,J {σ0 = 1} increases <strong>with</strong> β <strong>an</strong>d <strong>with</strong> d.<br />

We have shown, in Theorem 10.8, that for J > 0, d = 2, <strong>an</strong>d β > 0<br />

large, <strong>on</strong>e has a phase tr<strong>an</strong>siti<strong>on</strong> <strong>an</strong>d thus µ +<br />

β,0,J {σ0 = 1} > 1/2. By the<br />

above exercise, <strong>on</strong>e has a phase tr<strong>an</strong>siti<strong>on</strong> in <strong>an</strong>y d ≥ 2, provided β is large<br />

enough.<br />

Now, fix J > 0 <strong>an</strong>d d ≥ 2. Let βc = sup{β : |G Πβ,0,J | = 1}. By the<br />

previous argument, βc < ∞. By Theorem 10.11, βc > 0.<br />

Clearly, there is phase tr<strong>an</strong>siti<strong>on</strong> at <strong>an</strong>y β > βc. If, <strong>on</strong> the other h<strong>an</strong>d,<br />

there were a phase tr<strong>an</strong>siti<strong>on</strong> at β < βc, then µ +<br />

β,0,J {σ0 = 1} > 1/2 <strong>an</strong>d<br />

Griffiths’ inequality implies then that µ +<br />

β ′ ,0,J {σ0 = 1} > 1/2 for all β ′ ≥ β.<br />

This would imply that βc ≤ β which c<strong>on</strong>tradicts the choice of β. Part (b)<br />

of Theorem 10.2 is thus proved, in the case h = 0. <br />

10.5. Case of n<strong>on</strong>zero external field<br />

In this secti<strong>on</strong> we finish the proof of part (b) of Theorem 10.2.<br />

Fix J > 0 <strong>an</strong>d β > 0. We will sometimes drop the dependence <strong>on</strong> these<br />

parameters from our notati<strong>on</strong> in this secti<strong>on</strong>. Recall the infinite-volume<br />

pressure P (Φ) = P (h) from Theorem 9.2. In this secti<strong>on</strong> we first show


10.5. Case of n<strong>on</strong>zero external field 125<br />

that if P is differentiable at h, then there is no phase tr<strong>an</strong>siti<strong>on</strong> at (β, h, J).<br />

Then, we will show that P is differentiable for all h = 0. Let us start <strong>with</strong><br />

<strong>an</strong> exercise <strong>an</strong>d a lemma.<br />

We remind the reader again that subscripts Vn are abbreviated as subscripts<br />

n; e.g. H + n = H +<br />

Vn .<br />

∗Exercise 10.20. Recall the finite-volume pressure functi<strong>on</strong>s <strong>with</strong> + <strong>an</strong>d<br />

− boundary c<strong>on</strong>diti<strong>on</strong>s P ± n (h) = |Vn| −1 log Eλ [e−βH± n ]. Prove that<br />

∂P ± <br />

n<br />

1 <br />

= βEµ± n,h σi .<br />

∂h |Vn|<br />

Lemma 10.21. One has<br />

∂P<br />

lim<br />

n→∞<br />

± n<br />

∂h<br />

i∈Vn<br />

= βEµ± h [σ0].<br />

Proof. We will treat the + boundary c<strong>on</strong>diti<strong>on</strong>, the other <strong>on</strong>e being similar.<br />

Fix <strong>an</strong> integer m <strong>an</strong>d use the fact that |VnVn−m|<br />

|Vn| → 0 as n → ∞ <strong>to</strong> write<br />

∂P<br />

lim<br />

n→∞<br />

+ n<br />

∂h<br />

<br />

1<br />

|Vn|<br />

<br />

σi ,<br />

= lim<br />

n→∞ βEµ+<br />

<br />

1 <br />

n,h σi = lim<br />

|Vn|<br />

n→∞<br />

i∈Vn<br />

βEµ+ n,h<br />

i∈Vn−m<br />

<strong>with</strong> the same series of equalities for the liminf. Observe next that if i ∈<br />

Vn−m, then i + Vm ⊂ Vn. Then, by (b) <strong>an</strong>d (d) of Theorem 10.18,<br />

This implies<br />

E µ+<br />

h [σ0] = E µ+<br />

h [σi] ≤ E µ+<br />

n,h[σi] ≤ E µ+<br />

i+Vm,h[σi] = E µ+<br />

m,h[σ0].<br />

E µ+<br />

∂P<br />

h [σ0] ≤ lim<br />

n→∞<br />

+ n<br />

∂h<br />

∂P<br />

≤ lim<br />

n→∞<br />

+ n<br />

∂h<br />

≤ Eµ+ m,h[σ0].<br />

The claim follows from the fact that E µ±<br />

m,h[σ0] → E µ±<br />

h [σ0] as m → ∞. <br />

Now we will need the following result from real <strong>an</strong>alysis, the proof of<br />

which we leave <strong>to</strong> the reader.<br />

∗ Exercise 10.22. Let fn be a sequence of differentiable c<strong>on</strong>vex functi<strong>on</strong>s<br />

<strong>on</strong> <strong>an</strong> interval (a, b). Assume fn(y) → f(y) for all y ∈ (a, b) <strong>an</strong>d that f ′ (x)<br />

exists for some x ∈ (a, b). Prove that f ′ n(x) → f(x).<br />

Hint: Use c<strong>on</strong>vexity <strong>to</strong> show that for y < x < z, we have<br />

fn(x) − fn(y)<br />

x − y<br />

≤ f ′ n(x) ≤ fn(z) − fn(x)<br />

.<br />

z − x<br />

Theorem 10.23. C<strong>on</strong>sider the Ising model in dimensi<strong>on</strong> d ≥ 1 <strong>an</strong>d fix<br />

β > 0 <strong>an</strong>d J > 0. If the infinite-volume pressure P (β, h, J) is differentiable<br />

<strong>with</strong> respect <strong>to</strong> h at h = h0, then |G Πβ,h 0 ,J | = 1.


126 10. Phase tr<strong>an</strong>siti<strong>on</strong> in the Ising model<br />

The proof requires the c<strong>on</strong>vexity of P + n in h. This is the familiar c<strong>on</strong>sequence<br />

of Hölder’s inequality. We leave it as <strong>an</strong> exercise.<br />

∗ Exercise 10.24. Prove that for each n, P + n (h) is c<strong>on</strong>vex in h.<br />

Proof of Theorem 10.23. Fix J > 0 <strong>an</strong>d β > 0. By Exercise 10.20<br />

<strong>an</strong>d Theorem 9.2 P + n is a sequence of c<strong>on</strong>vex differentiable functi<strong>on</strong>s that<br />

c<strong>on</strong>verge, pointwise in h, <strong>to</strong> P (β, ·, J). By Exercise 10.22, if ∂<br />

∂h P (h0) ex-<br />

ists, then ∂<br />

∂h Pn(h0) → ∂<br />

∂h P (h0). But then Lemma 10.21 implies that<br />

P (h0) = βE µ+<br />

h 0 [σ0]. The same argument implies that P (h0) = βE µ−<br />

h 0 [σ0].<br />

Thus, µ +<br />

h0 {σ0 = 1} = µ −<br />

h0 {σ0 = 1} <strong>an</strong>d Strassen’s lemma al<strong>on</strong>g <strong>with</strong> shift<br />

= µ− . By part (e) of Theorem 10.18 this implies<br />

h0<br />

the uniqueness of the <strong>Gibbs</strong> measure. <br />

invari<strong>an</strong>ce imply that µ +<br />

h0<br />

Since a c<strong>on</strong>vex functi<strong>on</strong> <strong>on</strong> R is differentiable at all but countably m<strong>an</strong>y<br />

points, we know phase tr<strong>an</strong>siti<strong>on</strong> c<strong>an</strong> occur at at most countably m<strong>an</strong>y values<br />

of h for each fixed β > 0 <strong>an</strong>d J > 0. To c<strong>on</strong>clude the proof of Theorem 10.2<br />

we prove that P (h) is differentiable at all h = 0. This is the claim of the<br />

next theorem.<br />

Theorem 10.25. C<strong>on</strong>sider the Ising model in dimensi<strong>on</strong> d ≥ 1. Then, for<br />

all β > 0 <strong>an</strong>d J > 0, P (β, h, J) is differentiable in h at h = 0.<br />

Proof. Fix β > 0 <strong>an</strong>d J > 0. We start <strong>with</strong> the case h > 0. Define<br />

Mn(h) = E µ+<br />

<br />

1 <br />

n,h σi .<br />

|Vn|<br />

Since |Mn(h)| ≤ 1, there exists, for each fixed h, a subsequence nk such<br />

that Mnk (h) → M(h). By the diag<strong>on</strong>al trick (see page 76), <strong>on</strong>e c<strong>an</strong> find <strong>on</strong>e<br />

subsequence that works simult<strong>an</strong>eously for all rati<strong>on</strong>al h ≥ 0. Denote this<br />

subsequence by nk. Next, we will need the following inequality. We defer<br />

the proof <strong>to</strong> Appendix C.3.<br />

i∈Vn<br />

Griffiths-Hurst-Sherm<strong>an</strong> inequality. C<strong>on</strong>sider the Ising model in d ≥ 1.<br />

Fix β > 0, J > 0, <strong>an</strong>d the volume Vn. Then, ∂2 Mn<br />

∂h 2 ≤ 0 for h > 0.<br />

The above inequality says that Mn is c<strong>on</strong>cave when h > 0. We will now<br />

use a geometric argument <strong>to</strong> show that Mnk actually c<strong>on</strong>verges for all h > 0.<br />

be two rati<strong>on</strong>al numbers such<br />

Let h > 0 be irrati<strong>on</strong>al <strong>an</strong>d let hi <strong>an</strong>d h ′ i<br />

that (i − 1)h/i < hi < h < h + i(h − hi) < h ′ i . Let ti ∈ (0, 1) be such that<br />

h = tihi + (1 − ti)h ′ i . Using the fact that h′ i > h + i(h − hi) <strong>on</strong>e finds that<br />

ti > i/(i + 1). Hence, ti → 1 as i → ∞.<br />

Similarly, let h ′′<br />

i be a rati<strong>on</strong>al number in (0, hi − (i − 1)(h − hi)). Such<br />

a number exists because hi > (i − 1)h/i. Choose t ′ i ∈ (0, 1) <strong>to</strong> have hi =<br />

t ′ i h + (1 − t′ i )h′′<br />

i . Then, t′ i > (i − 1)/i <strong>an</strong>d t′ i<br />

→ 1.


10.5. Case of n<strong>on</strong>zero external field 127<br />

Now use the c<strong>on</strong>cavity of Mn <strong>to</strong> write<br />

which implies<br />

<strong>an</strong>d thus<br />

Similarly,<br />

implies<br />

Mnk (h) ≥ tiMnk (hi) + (1 − ti)Mnk (h′ i)<br />

lim Mnk<br />

k→∞<br />

(h) ≥ tiM(hi) + (1 − ti)M(h ′ i)<br />

lim Mnk (h) ≥ lim<br />

k→∞<br />

i→∞ M(hi).<br />

Mnk (hi) ≥ t ′ iMnk (h) + (1 − t′ i)Mnk (h′′<br />

i )<br />

lim M(hi) ≥ lim Mnk (h).<br />

i→∞<br />

k→∞<br />

C<strong>on</strong>sequently, Mnk (h) does c<strong>on</strong>verge <strong>to</strong> some value M(h). Since the functi<strong>on</strong>s<br />

Mnk are all c<strong>on</strong>cave for h > 0, their pointwise limit M must be c<strong>on</strong>cave<br />

as well. But then, it is c<strong>on</strong>tinuous for h > 0.<br />

Exercise 10.20 implies that<br />

P + n (h) − P + h<br />

n (0) = β<br />

0<br />

Mn(s) ds.<br />

Taking n <strong>to</strong> infinity al<strong>on</strong>g the subsequence nk <strong>an</strong>d using dominated c<strong>on</strong>vergence<br />

<strong>on</strong>e c<strong>on</strong>cludes that<br />

P (h) − P (0) = β<br />

h<br />

0<br />

M(s) ds.<br />

Since M is c<strong>on</strong>tinuous <strong>on</strong> R {0}, P must be differentiable at h > 0.<br />

When h < 0 <strong>on</strong>e c<strong>an</strong> use symmetry <strong>to</strong> write<br />

E µ−<br />

<br />

1 <br />

n,h σi = −E<br />

|Vn|<br />

µ+<br />

<br />

1 <br />

n,−h<br />

|Vn|<br />

i∈Vn<br />

Then, Exercise 10.20 implies that<br />

P − n (0) − P − 0<br />

n (h) = −β<br />

h<br />

i∈Vn<br />

σi<br />

Mn(−s) ds.<br />

<br />

= −Mn(−h).<br />

Once again, taking n <strong>to</strong> infinity shows that P is differentiable at h < 0. <br />

The above theorem combined <strong>with</strong> Theorem 10.23 shows that there is<br />

no phase tr<strong>an</strong>siti<strong>on</strong> when h = 0. This completes the proof of Theorem 10.2.


Percolati<strong>on</strong> approach<br />

<strong>to</strong> phase tr<strong>an</strong>siti<strong>on</strong><br />

Chapter 11<br />

This chapter will introduce the Fortuin-Kasteleyn r<strong>an</strong>dom cluster model <strong>an</strong>d<br />

use it <strong>to</strong> give alternative proofs of some of the Ising model results.<br />

129


Part III<br />

Further large<br />

deviati<strong>on</strong>s <strong>to</strong>pics


Further asymp<strong>to</strong>tics<br />

for i.i.d. r<strong>an</strong>dom<br />

variables<br />

12.1. Refinement of Cramér’s theorem<br />

Chapter 12<br />

In this secti<strong>on</strong> X, X1, X2, . . . are i.i.d. real valued r<strong>an</strong>dom variables <strong>with</strong><br />

n<strong>on</strong>degenerate law µ. Furthermore, Sn = X1 + · · · + Xn, for n ≥ 1, M(θ) =<br />

E[e θX ], dom(M) = {θ : M(θ) < ∞}, <strong>an</strong>d I(x) = sup θ∈R{θx−log M(θ)}, for<br />

x ∈ R. We also write θ + = sup{θ : M(θ) < ∞} <strong>an</strong>d θ − = inf{θ : M(θ) <<br />

∞}.<br />

Let θ− < 0 < θ + . Then m = x µ(dx) exists as a finite number, <strong>an</strong>d for<br />

all a > m<br />

1<br />

lim<br />

n→∞ n log P {Sn/n ≥ a} ≤ −I(a),<br />

<strong>an</strong>d<br />

1<br />

lim<br />

n→∞ n log P {Sn/n > a} ≥ −I(a).<br />

If a < m, then<br />

<strong>an</strong>d<br />

1<br />

lim<br />

n→∞ n log P {Sn/n ≤ a} ≤ −I(a),<br />

1<br />

lim<br />

n→∞ n log P {Sn/n < a} ≥ −I(a).<br />

This is a c<strong>on</strong>sequence of Cramér’s theorem; see page 23. However, here<br />

we w<strong>an</strong>t <strong>to</strong> examine these probabilities at the n<strong>on</strong>logarithmic scale. The<br />

133


134 12. Further asymp<strong>to</strong>tics for i.i.d. r<strong>an</strong>dom variables<br />

refinement of Cramér’s theorem is the following, where we now have n<strong>on</strong>logarithmic<br />

expressi<strong>on</strong>s for the large deviati<strong>on</strong> probabilities P {Sn/n > a}<br />

for a > m, <strong>an</strong>d P {Sn/n < a} for a < m.<br />

Theorem 12.1. If θ − < 0 < θ + <strong>an</strong>d m < a < c + = lim θ↗θ + M ′ (θ)<br />

M(θ)<br />

(12.1)<br />

0 < inf<br />

n≥1<br />

, then<br />

√ nI(a) √ nI(a)<br />

n e P {Sn/n > a} ≤ sup n e P {Sn/n > a} < ∞.<br />

n≥1<br />

Similarly, if θ − < 0 < θ + <strong>an</strong>d c − = lim θ↘θ − M ′ (θ)<br />

M(θ)<br />

0 < inf<br />

n≥1<br />

< a < m, then<br />

√ nI(a) √ nI(a)<br />

n e P {Sn/n < a} ≤ sup n e P {Sn/n < a} < ∞.<br />

n≥1<br />

Remark 12.2. An upper bound <strong>on</strong> √ n e nI(a) P {Sn/n > a} is given in the<br />

proof. See (12.6) below.<br />

Proof. We will verify (12.1). The other statement then follows from replacing<br />

X <strong>with</strong> −X. Hence let m < a < c + . Then<br />

(12.2)<br />

where<br />

(12.3)<br />

P {Sn/n > a} = e −n(θa−log M(θ)) Jn(θ),<br />

<br />

Jn(θ) = · · ·<br />

e −θ Pn j=1 (xj−a)<br />

n<br />

<br />

1I (xj − a) > 0<br />

j=1<br />

× eθ(x1+···+xn)<br />

M(θ) n<br />

µ(dx1) · · · µ(dxn).<br />

Since µ is n<strong>on</strong>degenerate, direct differentiati<strong>on</strong> shows M ′ (θ)<br />

M(θ) is strictly increasing<br />

<strong>an</strong>d c<strong>on</strong>tinuous <strong>on</strong> (0, θ + ). Thus the limit defining c + exists, <strong>an</strong>d<br />

since M ′ (0)<br />

M(0) = m we indeed have m < c+ <strong>an</strong>d m < a < c + implies there<br />

exists a unique θa ∈ (0, θ + ) such that M ′ (θa)<br />

M(θa) = a. Then, Exercises 3.6 <strong>an</strong>d<br />

3.7 give<br />

<strong>an</strong>d<br />

(12.4)<br />

Now<br />

(12.5)<br />

where<br />

I(a) = sup{θa<br />

− log M(θ)} = θaa − log M(θa),<br />

θ≥0<br />

P {Sn/n > a} = e −nI(a) Jn(θa).<br />

Jn(θa) = E[e −θa e Sn 1I{ Sn > 0}],<br />

Sn =<br />

n<br />

(Zj − a),<br />

j=1


12.1. Refinement of Cramér’s theorem 135<br />

<strong>an</strong>d Z, Z1, Z2, . . . are i.i.d. <strong>with</strong> law νθa<br />

dνθa eθax<br />

(x) =<br />

dµ M(θa) .<br />

Letting Fn(u) = P { Sn ≤ u √ n}, we see<br />

<br />

Jn(θa) =<br />

Define<br />

<br />

ρ =<br />

<br />

=<br />

=<br />

having Rad<strong>on</strong>-Nikodym derivative<br />

e<br />

(0,∞)<br />

−√n θau<br />

dFn(u)<br />

∞ √<br />

n θae<br />

(0,∞) u<br />

−√n θax<br />

dx dFn(u)<br />

∞<br />

(Fn(x) − Fn(0))<br />

0<br />

√ n θae −√n θax<br />

dx.<br />

3 eθax<br />

|x − a|<br />

M(θa) µ(dx) <strong>an</strong>d σ2 <br />

=<br />

2 eθax<br />

(x − a)<br />

M(θa) µ(dx).<br />

Now the Berry-Esseen theorem (see page 126 of [15]) implies<br />

sup |Φ(x) − Fn(x)| ≤ 3<br />

x<br />

E[|Z − a|3 ]<br />

σ3√ = 3ρ/(σ<br />

n<br />

3√ n),<br />

where Φ(x) is the c.d.f. of a me<strong>an</strong> zero Gaussi<strong>an</strong> r<strong>an</strong>dom variable G <strong>with</strong><br />

vari<strong>an</strong>ce σ2 . Hence<br />

Jn(θa) ≤<br />

∞<br />

0<br />

(Φ(x) − Φ(0)) √ n θae −√ n θax dx + 6ρ<br />

σ 3√ n<br />

∞ √<br />

n θae −√n θax<br />

dx,<br />

<strong>an</strong>d reversing the previous argument <strong>with</strong> Φ replacing Fn we see<br />

<br />

Jn(θa) ≤<br />

which implies<br />

=<br />

e<br />

(0,∞)<br />

−√n θax 6ρ<br />

dΦ(x) +<br />

σ3√n ∞<br />

e<br />

0<br />

−√n θax−x2 /(2σ2 ) dx<br />

√<br />

2πσ2 √<br />

n Jn(θa) ≤ (2πσ 2 θ 2 1<br />

−<br />

a)<br />

0<br />

+ 6ρ<br />

σ 3√ n ,<br />

2 + 6ρ<br />

.<br />

σ3 (12.6)<br />

Thus the upper bound in (12.1) holds <strong>an</strong>d it remains <strong>to</strong> check the lower<br />

bound. Now<br />

Jn(θa) ≥ e −2Aθa P {n −1/2 Sn<br />

∈ (n −1/2 A, 2n −1/2 A)}<br />

≥ e −2Aθa<br />

<br />

P {G ∈ (n −1/2 A, 2n −1/2 A)} − 6ρ/( √ nσ 3 <br />

)<br />

∼ e −2Aθa (A/ √ 2πσ 2 n − 6ρ/( √ nσ 3 ))<br />

as n → ∞. Thus taking A sufficiently large we see<br />

√<br />

lim n Jn(θa) > 0.<br />

n→∞


136 12. Further asymp<strong>to</strong>tics for i.i.d. r<strong>an</strong>dom variables<br />

(12.1) then follows from observing that for each n ≥ 1 the qu<strong>an</strong>tity in<br />

questi<strong>on</strong> is positive. For otherwise, we would have P (X > a) n ≤ P (Sn ><br />

na) = 0 <strong>an</strong>d X ≤ a P-a.s. But then M ′ (θ) ≤ aM(θ), for all θ ∈ (θ − , θ + ),<br />

<strong>an</strong>d c+ ≤ a which c<strong>on</strong>tradicts a < c + . The theorem is proved. <br />

Remark 12.3. A similar argument yields the same result for P {Sn/n ≥ a}<br />

<strong>an</strong>d P {Sn/n ≤ a}, except <strong>on</strong>e needs <strong>to</strong> c<strong>on</strong>sider the possibility that Sn has<br />

<strong>an</strong> a<strong>to</strong>m at zero, but that requires <strong>on</strong>ly minor adjustments.<br />

Remark 12.4. If the law of Z has a density, or its characteristic functi<strong>on</strong><br />

φZ(θ) satisfies<br />

lim<br />

|θ|→∞ |φZ(θ)| < 1,<br />

then more precise error estimates are possible th<strong>an</strong> those obtained from<br />

the Berry-Esseen theorem. These are the so-called Edgeworth exp<strong>an</strong>si<strong>on</strong>s,<br />

<strong>an</strong>d when applicable they would imply asymp<strong>to</strong>tic results for these large<br />

deviati<strong>on</strong> probabilities rather th<strong>an</strong> the upper <strong>an</strong>d lower bounds produced<br />

here.<br />

Remark 12.5. Formulas (12.1) <strong>an</strong>d (12.5) provide a representati<strong>on</strong> formula<br />

for the corresp<strong>on</strong>ding large deviati<strong>on</strong> probabilities, <strong>an</strong>d their <strong>an</strong>alogues are<br />

known for r<strong>an</strong>dom variables <strong>with</strong> values in R d , <strong>an</strong>d even in a separable<br />

B<strong>an</strong>ach space, when the intervals (a, ∞) <strong>an</strong>d (∞, a) are replaced by open<br />

c<strong>on</strong>vex subsets D. One of the first difficulties in establishing such formulas<br />

is <strong>to</strong> find a suitable replacement for the endpoint of the interval, when<br />

the set D is of higher dimensi<strong>on</strong>. The replacement point is known as the<br />

dominating point for the set D, <strong>an</strong>d their existence <strong>an</strong>d uniqueness is now<br />

reas<strong>on</strong>ably well unders<strong>to</strong>od. They have found applicati<strong>on</strong>s <strong>to</strong> the <strong>Gibbs</strong><br />

c<strong>on</strong>diti<strong>on</strong>ing principle, large <strong>an</strong>d moderate deviati<strong>on</strong> probabilities, <strong>an</strong>d also<br />

<strong>to</strong> the Nummelin c<strong>on</strong>diti<strong>on</strong>al weak law of large numbers in the vec<strong>to</strong>r space<br />

setting. In each of these applicati<strong>on</strong>s the starting point is the <strong>an</strong>alogue of<br />

the formulas (12.1) <strong>an</strong>d (12.5) for the relev<strong>an</strong>t vec<strong>to</strong>r space setting.<br />

12.2. Moderate deviati<strong>on</strong>s<br />

C<strong>on</strong>sider X1, X2, . . . i.i.d. real valued r<strong>an</strong>dom variables. For c<strong>on</strong>venience,<br />

let X be <strong>an</strong> independent copy of these variables. Recall that for n ≥ 1,<br />

Sn = X1 + · · · + Xn.<br />

As we have seen <strong>on</strong> page 27, order 1 deviati<strong>on</strong>s of Sn/n are h<strong>an</strong>dled by<br />

Cramér’s theorem while order n −1/2 deviati<strong>on</strong>s are described by the central<br />

limit theorem. In this secti<strong>on</strong> we are c<strong>on</strong>cerned <strong>with</strong> deviati<strong>on</strong>s of order<br />

n −1/2+α , for α ∈ (0, 1/2).<br />

Define t + = sup{t : E[e tX ] < ∞} <strong>an</strong>d t − = inf{t : E[e tX ] < ∞}.


12.2. Moderate deviati<strong>on</strong>s 137<br />

Theorem 12.6. Let t − < 0 < t + , E[X] = 0, <strong>an</strong>d 0 < σ 2 = E[X 2 ] < ∞.<br />

Also assume {bn : n ≥ 1} is a sequence of positive numbers such that<br />

Then for all a > 0<br />

(12.7)<br />

1<br />

lim bn/n 2 = ∞ <strong>an</strong>d lim<br />

n→∞ n→∞ bn/n = 0.<br />

lim<br />

n→∞<br />

n<br />

b 2 n<br />

log P (Sn/bn ≥ a) = − a2<br />

.<br />

2σ2 Remark 12.7. In the proof of the lower bound part of (12.7) we <strong>on</strong>ly use<br />

the fact that 0 < σ 2 < ∞. However, in the upper bound for (12.7) we use<br />

the fact that the moment generating functi<strong>on</strong> is finite near zero.<br />

Proof. The upper bound is basically Chebyshev’s inequality (page 15 of<br />

[15]). That is, for a > 0 <strong>an</strong>d all s ≥ 0<br />

(12.8)<br />

(12.9)<br />

P {Sn/bn ≥ a} ≤ P {bnSn/n ≥ ab 2 n/n} ≤ E[e sbnSn/n ]e −sab2 n/n .<br />

Next we prove that<br />

lim<br />

n→∞<br />

n<br />

b 2 n<br />

log E[e sbnSn/n ] = s 2 σ 2 /2.<br />

To prove (12.9) first observe that by Taylor’s formula for all t there exists<br />

θ = θ(t) ∈ (0, 1) such that et = 1 + t + t2<br />

2 eθt . Hence we have<br />

E[e sbnX/n <br />

] = E 1 + sbnX/n + 1<br />

2 (sbnX/n) 2 e θsbnX/n<br />

,<br />

where |θ| ≤ 1. Since E[X] = 0 we thus have<br />

E[e sbnX/n <br />

1<br />

] = 1 + E 2 (sbnX/n) 2 e θsbnX/n<br />

,<br />

<strong>an</strong>d since E[(tX) 2 e t|X| ] < ∞ for all t < min{|t − |, t + } the dominated c<strong>on</strong>vergence<br />

theorem (page 16 of [15] or page 46 of [26]) easily implies<br />

Hence<br />

lim<br />

n→∞<br />

n<br />

b 2 n<br />

lim<br />

n→∞ E[(sX)2e θsbnX/n ] = s 2 E[X 2 ] = s 2 σ 2 .<br />

log E[e sbnSn/n n<br />

] = lim<br />

n→∞<br />

2<br />

b2 log E[e<br />

n<br />

sbnX/n ]<br />

n<br />

= lim<br />

n→∞<br />

2<br />

= lim<br />

n→∞<br />

b 2 n<br />

n 2<br />

b 2 n<br />

= 1<br />

2 s2 σ 2 .<br />

<br />

log 1 + 1<br />

2E[(sX)2e θsbnX/n ]b 2 n/n 2<br />

,<br />

1<br />

2 E[(sX)2 e θsbnX/n ]b 2 n/n 2


138 12. Further asymp<strong>to</strong>tics for i.i.d. r<strong>an</strong>dom variables<br />

Thus (12.9) holds <strong>an</strong>d (12.8) implies<br />

lim<br />

n→∞<br />

Setting s = a/σ 2 yields<br />

(12.10)<br />

n<br />

b 2 n<br />

lim<br />

n→∞<br />

log P {Sn/bn ≥ a} ≤ s2σ2 − sa.<br />

2<br />

n<br />

b 2 n<br />

log P {Sn/bn ≥ a} ≤ − a2<br />

.<br />

2σ2 Thus we have the upper bound result for (12.7), <strong>an</strong>d it remains <strong>to</strong> prove the<br />

<strong>an</strong>alogous lower bound.<br />

Recall that [x] denotes the greatest integer smaller or equal <strong>to</strong> x. Set<br />

pn = [t2n/( b2n n )], qn = [n/pn], <strong>an</strong>d rn = bn<br />

for t > 0. Next, fix a > 0 <strong>an</strong>d<br />

tqn<br />

ε > 0 <strong>an</strong>d note that<br />

(12.11)<br />

P {Spn/rn ≥ t(a + ε)} qn ≤ P {Spnqn/rn ≥ tqn(a + ε)}<br />

= P {Spnqn/bn ≥ a + ε}.<br />

Since pnqn ≤ n <strong>an</strong>d Sn = Spnqn + (Sn − Spnqn) we have<br />

(12.12)<br />

P {Sn/bn ≥ a} ≥ P {Spnqn/bn ≥ a + ε, |Sn − Spnqn| < εbn}.<br />

But 1 − pnqn/n ≤ pn/n ≤ t2n/b2 n. Thus,<br />

n<br />

b2 log P {|Sn − Spnqn| ≥ ε<br />

n<br />

√ n} ≤ n<br />

b2 <br />

log E[(Sn − Spnqn)<br />

n<br />

2 ]/(nε 2 <br />

)<br />

= n<br />

b2 <br />

log (n − pnqn)σ<br />

n<br />

2 /(nε 2 (12.13)<br />

<br />

) → 0<br />

as n → ∞. By the independence of Sn−Spnqn <strong>an</strong>d Spnqn, (12.12) <strong>an</strong>d (12.13)<br />

combine <strong>to</strong> yield<br />

n<br />

n<br />

lim log P {Sn/bn ≥ a} ≥ lim log P {Spnqn/bn ≥ a + ε}.<br />

n→∞<br />

n→∞<br />

b 2 n<br />

Thus (12.11) implies<br />

n<br />

lim log P {Sn/bn ≥ a} ≥ lim<br />

n→∞<br />

n→∞<br />

But<br />

<strong>an</strong>d<br />

b 2 n<br />

n<br />

b 2 n<br />

rn = bn<br />

tqn<br />

b 2 n<br />

n<br />

qn log P {Spn/rn ≥ t(a + ε)}.<br />

b 2 n<br />

qn = n<br />

b2 [n/pn] ∼<br />

n<br />

n2<br />

b2 npn<br />

= bn<br />

t[ n<br />

bnpn<br />

∼<br />

] nt pn<br />

∼ t −2<br />

bn t<br />

∼<br />

nt<br />

2n2 b2 n<br />

∼ p 1<br />

2<br />

n .<br />

Hence the central limit theorem implies<br />

n<br />

lim log P {Sn/bn ≥ a} ≥ t<br />

n→∞<br />

−2 (12.14)<br />

log P {Gσ ≥ t(a + ε)},<br />

b 2 n


12.2. Moderate deviati<strong>on</strong>s 139<br />

where Gσ is normal <strong>with</strong> me<strong>an</strong> zero <strong>an</strong>d vari<strong>an</strong>ce σ2 . Letting ε ↓ 0, [a+ε, ∞)<br />

increases <strong>to</strong> (a, ∞) <strong>an</strong>d hence (12.14) implies<br />

n<br />

lim<br />

n→∞ b2 log P {Sn/bn ≥ a} ≥ t<br />

n<br />

−2 (12.15)<br />

log P {Gσ > ta}.<br />

Now for s > 0<br />

√<br />

s+1/s<br />

2πσ2P {Gσ > s} ≥<br />

<strong>an</strong>d hence<br />

s<br />

u2<br />

−<br />

e 2σ2 du ≥ s −1 (s+1/s)2<br />

−<br />

e 2σ2 ,<br />

lim t<br />

t→∞<br />

−2 log P {Gσ > ta} = − a2<br />

,<br />

2σ2 which when combined <strong>with</strong> (12.15) implies the lower bound<br />

lim<br />

n→∞<br />

n<br />

b 2 n<br />

log P {Sn/bn ≥ a} ≥ − a2<br />

.<br />

2σ2 Thus combining this lower bound <strong>with</strong> (12.10) implies (12.7), <strong>an</strong>d the<br />

theorem is proved. <br />

Remark 12.8. Replacing Xi by −Xi yields the same result for P {Sn/bn ≤<br />

a} when a < 0.


<strong>Large</strong> deviati<strong>on</strong>s for<br />

Markov chains<br />

13.1. Restricting entropies <strong>on</strong> product spaces<br />

Chapter 13<br />

In this secti<strong>on</strong> we prove a general theorem about entropies <strong>on</strong> product spaces<br />

<strong>with</strong> c<strong>on</strong>diti<strong>on</strong>s <strong>on</strong> the marginals. This result will be helpful both for Markov<br />

chains in this chapter <strong>an</strong>d for n<strong>on</strong>stati<strong>on</strong>ary independent r<strong>an</strong>dom variables<br />

later in the next chapter.<br />

Let X <strong>an</strong>d Y be Polish spaces, κ ∈ M1(Y), <strong>an</strong>d y ↦→ ρ y a measurable<br />

map (in other words, a s<strong>to</strong>chastic kernel) from Y in<strong>to</strong> M1(X ). Define two<br />

probability measures <strong>on</strong> X ×Y by r y = ρ y ⊗δy <strong>an</strong>d Q(dx, dy) = ρ y (dx) κ(dy).<br />

For f ∈ Cb(X × Y) define a functi<strong>on</strong>al Λ by<br />

(13.1)<br />

<br />

Λ(f) = log E<br />

Y<br />

ry<br />

[e f <br />

] κ(dy) =<br />

<br />

log<br />

Y<br />

<br />

e<br />

X<br />

f(x,y) ρ y <br />

(dx) κ(dy).<br />

For ν ∈ M1(X × Y) <strong>an</strong>d α ∈ M1(X ) define<br />

(13.2) I(ν) = sup<br />

f∈Cb(X ×Y)<br />

{E ν [f] − Λ(f)}<br />

<strong>an</strong>d<br />

(13.3) J(α) = sup<br />

g∈Cb(X )<br />

{E α [g] − Λ(g)}.<br />

The defining formula (13.1) of Λ works just as well for functi<strong>on</strong>s that depend<br />

<strong>on</strong>ly <strong>on</strong> the variable x, <strong>an</strong>d so Λ c<strong>an</strong> be regarded also as a functi<strong>on</strong> <strong>on</strong> Cb(X ).<br />

This is the sense in which Λ appears in definiti<strong>on</strong> (13.3). Let νX <strong>an</strong>d νY<br />

denote the X - <strong>an</strong>d Y-marginals of a measure ν ∈ M1(X × Y).<br />

141


142 13. <strong>Large</strong> deviati<strong>on</strong>s for Markov chains<br />

Theorem 13.1. We have these identities for ν ∈ M1(X × Y) <strong>an</strong>d α ∈<br />

M1(X ):<br />

<br />

(13.4) I(ν) =<br />

H(ν | Q)<br />

∞<br />

if νY = κ<br />

if νY = κ,<br />

<strong>an</strong>d<br />

(13.5) J(α) = inf{I(µ) : µ ∈ M1(X × Y) <strong>an</strong>d µX = α }.<br />

Proof. We prove first (13.4). By Jensen’s inequality,<br />

(13.6) I(ν) ≥ sup<br />

f∈Cb(X ×Y)<br />

{E ν [f] − log E Q [e f ]} = H(ν | Q).<br />

Write νy for the c<strong>on</strong>diti<strong>on</strong>al probability of ν, given y ∈ Y. If νY = κ,<br />

<br />

ν<br />

I(ν) = sup E y<br />

[f] − log E ry<br />

[e f ] κ(dy)<br />

<br />

≤<br />

f∈Cb(X ×Y)<br />

Y<br />

H(ν<br />

Y<br />

y | r y ) κ(dy) = H(ν | Q).<br />

The last equality used the c<strong>on</strong>diti<strong>on</strong>al entropy formula (Exercise 6.14). Taking<br />

f ∈ Cb(Y) in (13.2) shows that I(ν) = ∞ if νY = κ. We have proved<br />

(13.4).<br />

We prove (13.5) first for the case of compact X <strong>an</strong>d Y. By dominated<br />

c<strong>on</strong>vergence, Λ is a str<strong>on</strong>gly c<strong>on</strong>tinuous functi<strong>on</strong> <strong>on</strong> the B<strong>an</strong>ach space<br />

Cb(X × Y), in other words, c<strong>on</strong>tinuous when Cb(X × Y) is equipped <strong>with</strong><br />

the supremum norm. By Hölder’s inequality it is a c<strong>on</strong>vex functi<strong>on</strong>. It<br />

follows that Λ is a weakly lower semic<strong>on</strong>tinuous functi<strong>on</strong> <strong>on</strong> Cb(X × Y).<br />

Here is the argument for this step. By c<strong>on</strong>tinuity <strong>an</strong>d c<strong>on</strong>vexity, the set<br />

U = {f : Λ(f) ≤ s} is a str<strong>on</strong>gly closed, c<strong>on</strong>vex set. By a separati<strong>on</strong><br />

theorem a functi<strong>on</strong> g ∈ U c c<strong>an</strong> be strictly separated from U <strong>with</strong> a linear<br />

functi<strong>on</strong>al (see the Hahn-B<strong>an</strong>ach separati<strong>on</strong> theorem <strong>on</strong> page 42 or Theorem<br />

3.12 in [33]). Thus each g ∈ U c lies in the weak interior of U c which makes<br />

U c weakly open, <strong>an</strong>d thereby U is weakly closed. That sets of type U are<br />

weakly closed is exactly the definiti<strong>on</strong> of weak lower semic<strong>on</strong>tinuity.<br />

As a c<strong>on</strong>vex, weakly lower semic<strong>on</strong>tinuous functi<strong>on</strong> Λ is equal <strong>to</strong> its<br />

c<strong>on</strong>vex bic<strong>on</strong>jugate. Let g ∈ Cb(X ). Below we c<strong>an</strong> think of g also as a<br />

functi<strong>on</strong> <strong>on</strong> X × Y, by composing it <strong>with</strong> the projecti<strong>on</strong> (x, y) ↦→ x.<br />

Λ(g) = Λ ∗∗ (g) = I ∗ (g)<br />

= sup{E νX [g] − I(ν) : ν ∈ M1(X × Y)}<br />

= sup E α [g] − inf{I(ν) : νX = α} : α ∈ M1(X ) .


13.1. Restricting entropies <strong>on</strong> product spaces 143<br />

The third equality above hides a couple steps. The dual of Cb(X × Y)<br />

is the space M(X × Y) of finite signed Borel measures <strong>on</strong> X × Y. (This is<br />

<strong>on</strong>e of the Riesz representati<strong>on</strong> theorems, see Secti<strong>on</strong> 7.3 in [18].) Definiti<strong>on</strong><br />

(13.2) defines I as a c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous functi<strong>on</strong> <strong>on</strong> the space<br />

M(X × Y). By the proof of Theorem 6.5 I(ν) = ∞ unless ν is a probability<br />

measure. Thus the supremum in the definiti<strong>on</strong> of I ∗ c<strong>an</strong> be restricted <strong>to</strong><br />

probability measures ν as d<strong>on</strong>e above.<br />

The functi<strong>on</strong> α ↦→ inf{I(ν) : νX = α} is again c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous<br />

in the Cb(X )-generated weak ∗ <strong>to</strong>pology of M(X ). Since X <strong>an</strong>d Y<br />

are assumed compact I is au<strong>to</strong>matically tight, <strong>an</strong>d so lower semic<strong>on</strong>tinuity<br />

follows from part (b) of the c<strong>on</strong>tracti<strong>on</strong> principle (page 29). Thus taking<br />

c<strong>on</strong>vex c<strong>on</strong>jugates <strong>on</strong>ce more gives<br />

J(α) = Λ ∗ (α) = inf{I(ν) : νX = α}<br />

<strong>an</strong>d completes the proof of (13.5) for compact X <strong>an</strong>d Y.<br />

To prove (13.5) for Polish X <strong>an</strong>d Y, we begin by observing that we c<strong>an</strong><br />

assume that X <strong>an</strong>d Y are dense Borel subsets of compact metric spaces ¯ X<br />

<strong>an</strong>d ¯ Y. Here is the argument. By separability X has a <strong>to</strong>tally bounded<br />

metric ¯ d, <strong>an</strong>d then the completi<strong>on</strong> ( ¯ X , ¯ d) of (X , ¯ d) is compact (details c<strong>an</strong><br />

be found in Theorem 2.8.2 in [14] or Lemmas 6.1-6.3 in [30]). As a Polish<br />

space X also has a complete metric, in other words, (X , ¯ d) is a <strong>to</strong>pologically<br />

complete metric space. By Theorem 2.5.4 in [14] a metric space is <strong>to</strong>pologically<br />

complete if <strong>an</strong>d <strong>on</strong>ly if it is a countable intersecti<strong>on</strong> of open sets (i.e.<br />

a Gδ set) in its completi<strong>on</strong>.<br />

Definiti<strong>on</strong> (13.1) of Λ works just as well for functi<strong>on</strong>s f <strong>on</strong> ¯ X × ¯ Y,<br />

because we c<strong>an</strong> think of ρy <strong>an</strong>d κ as probability measures <strong>on</strong> ¯ X <strong>an</strong>d ¯ Y that<br />

happen <strong>to</strong> satisfy κ(Y) = 1 <strong>an</strong>d ρy (X ) = 1 κ-almost surely. Let Ī <strong>an</strong>d<br />

¯J denote the functi<strong>on</strong>s defined by (13.2) <strong>an</strong>d (13.3) <strong>on</strong> M1( ¯ X × ¯ Y) <strong>an</strong>d<br />

M1( ¯ X ), respectively, <strong>with</strong> the supremums now over Cb( ¯ X × ¯ Y) <strong>an</strong>d Cb( ¯ X ).<br />

A probability measure α <strong>on</strong> X c<strong>an</strong> be thought of as a measure <strong>on</strong> ¯ X , <strong>an</strong>d<br />

hence the proof for compact spaces gives<br />

(13.7)<br />

¯ J(α) = inf{Ī(¯ν) : ¯ν ∈ M1( ¯ X × ¯ Y), ¯ν ¯ X = α}.<br />

By Lemma B.1 the supremum in the definiti<strong>on</strong> (13.3) of J(α) c<strong>an</strong> just<br />

as well be taken over g ∈ Ub, d ¯(X ), where Ub, d ¯(X ) is the space of bounded<br />

uniformly c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> (X , ¯ d). Functi<strong>on</strong>s in Ub, d ¯(X ) <strong>an</strong>d Cb( ¯ X )<br />

corresp<strong>on</strong>d <strong>to</strong> each other bijectively via restricti<strong>on</strong> <strong>an</strong>d unique extensi<strong>on</strong>,<br />

hence ¯ J(α) = J(α). Since X × Y is dense in ¯ X × ¯ Y this same argument<br />

gives Ī(ν) = I(ν) for probability measures ν <strong>on</strong> X × Y. Since Q(X × Y) = 1<br />

<strong>an</strong>d H(¯ν | Q) is finite <strong>on</strong>ly if ¯ν ≪ Q, (13.4) shows that Ī(¯ν) = ∞ unless


144 13. <strong>Large</strong> deviati<strong>on</strong>s for Markov chains<br />

¯ν(X × Y) = 1 <strong>to</strong>o. These facts combine <strong>to</strong> imply that for α ∈ M1(X ),<br />

(13.7) is the same as (13.5). <br />

Now, we apply the above <strong>to</strong> a Markov kernel.<br />

Let C +<br />

b (S) be the space of functi<strong>on</strong>s f ∈ Cb(S) that are stricly positive<br />

<strong>an</strong>d bounded away from 0. Let p be a s<strong>to</strong>chastic kernel from S in<strong>to</strong> S, in<br />

other words, a Markov chain tr<strong>an</strong>siti<strong>on</strong> probability kernel. Define<br />

(13.8) J(α) = sup<br />

f∈C +<br />

b (S)<br />

<br />

log f<br />

pf dα.<br />

If q(x) is a measurable M1(S)-valued functi<strong>on</strong> of x ∈ S <strong>an</strong>d α ∈ M1(S),<br />

then define the push forward probability measure <strong>on</strong> S as<br />

<br />

αq(A) = q(x, A) α(dx).<br />

Theorem 13.2. Assume the state space S Polish <strong>an</strong>d p(x) a measurable<br />

M1(S)-valued functi<strong>on</strong> of x ∈ S. Then<br />

(13.9) J(α) = inf H(α × q|α × p).<br />

q:αq=α<br />

Proof. Apply Theorem 13.1 <strong>with</strong> these choices: X = Y = S, ρ y = p(y) for<br />

y ∈ S, κ = α. Then (13.4) says<br />

J(α) = inf<br />

ν H(ν | Q)<br />

<strong>with</strong> infimum over ν ∈ M1(S × S) <strong>with</strong> both marginals equal <strong>to</strong> α, <strong>an</strong>d<br />

Q = α × p. (Think of the space S × S as Y × X rather th<strong>an</strong> X × Y.) If<br />

νY = νX = α, the c<strong>on</strong>diti<strong>on</strong>al probability of x under ν, given y, defines a<br />

kernel q that fixes α. C<strong>on</strong>versely, <strong>an</strong>y such kernel defines a ν. <br />

13.2. <strong>Large</strong> deviati<strong>on</strong>s<br />

MORE TO COME...


C<strong>on</strong>vexity criteri<strong>on</strong> for<br />

large deviati<strong>on</strong>s<br />

Chapter 14<br />

Let X <strong>an</strong>d L be two real vec<strong>to</strong>r spaces in duality, endowed <strong>with</strong> their weak<br />

<strong>to</strong>pologies as described in the beginning of Chapter 5. Assumpti<strong>on</strong> 5.1 is<br />

in force so that spaces X <strong>an</strong>d L are Hausdorff spaces. In this secti<strong>on</strong> we<br />

prove <strong>an</strong> abstract large deviati<strong>on</strong> theorem for a sequence of Borel probability<br />

distributi<strong>on</strong>s {µn}n∈N <strong>on</strong> a c<strong>on</strong>vex, compact subset X0 of X . The c<strong>on</strong>crete<br />

example <strong>to</strong> keep in mind is the <strong>on</strong>e where X = M(S) is the space of realvalued<br />

Borel measures <strong>on</strong> a compact metric space S, L = C (S) = Cb(S)<br />

<strong>an</strong>d X0 = M1(S) is the space of Borel probability measures <strong>on</strong> S. When<br />

we apply this result we will be able <strong>to</strong> remove the compactness assumpti<strong>on</strong><br />

<strong>with</strong> a compactificati<strong>on</strong>, <strong>with</strong> the aid of exp<strong>on</strong>ential tightness.<br />

Let 0 < rn ↗ ∞ be a normalizing sequence.<br />

(upper) pressure by<br />

For ϕ ∈ L define the<br />

(14.1) ¯p(ϕ) = lim<br />

n→∞<br />

1<br />

<br />

log e rn〈x,ϕ〉 µn(dx).<br />

rn<br />

This functi<strong>on</strong> is finite because each ϕ is bounded <strong>on</strong> the compact space X0.<br />

By Jensen’s inequality ¯p is c<strong>on</strong>vex. Let J : X → [0, ∞] denote the c<strong>on</strong>vex<br />

c<strong>on</strong>jugate of ¯p:<br />

(14.2) J(x) = sup{〈x,<br />

ϕ〉 − ¯p(ϕ)},<br />

ϕ∈L<br />

x ∈ X .<br />

If the limit in (14.1) exists we drop the bar <strong>an</strong>d write<br />

(14.3) p(ϕ) = lim<br />

n→∞<br />

1<br />

<br />

log e rn〈x,ϕ〉 µn(dx).<br />

rn<br />

X0<br />

X0<br />

145


146 14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s<br />

For x ∈ X define upper <strong>an</strong>d lower local rate functi<strong>on</strong>s by<br />

(14.4) κ(x) = − inf<br />

x ∈ G ⊂ X , G open<br />

<strong>an</strong>d<br />

(14.5) κ(x) = − inf<br />

x ∈ G ⊂ X , G open<br />

lim<br />

n→∞<br />

lim<br />

n→∞<br />

1<br />

rn<br />

1<br />

rn<br />

log µn(G)<br />

log µn(G).<br />

Since the measures µn are supported <strong>on</strong> X0, for x ∈ X X0 we have<br />

κ(x) = κ(x) = ∞. The functi<strong>on</strong>s κ <strong>an</strong>d κ are [0, ∞]-valued <strong>an</strong>d lower<br />

semic<strong>on</strong>tinuous. By Exercise 3.5 <strong>an</strong>d the compactness of the space X0, if<br />

κ = κ = κ then the LDP holds <strong>with</strong> rate functi<strong>on</strong> κ. The main theorem of<br />

this secti<strong>on</strong> asserts that this is the case provided the pressure (14.3) exists<br />

<strong>an</strong>d κ is c<strong>on</strong>vex.<br />

Baxter-Jain theorem. [5] Let {µn} be a sequence of Borel probability measures<br />

<strong>on</strong> the compact c<strong>on</strong>vex subset X0 of X . Assume that the limit p(ϕ) in<br />

(14.3) exists for all ϕ ∈ L <strong>an</strong>d that κ is c<strong>on</strong>vex. Then the LDP holds for<br />

{µn} <strong>with</strong> rate functi<strong>on</strong> J = κ = κ <strong>an</strong>d normalizati<strong>on</strong> {rn}.<br />

We proceed in the proof <strong>with</strong>out the two assumpti<strong>on</strong>s, existence of (14.3)<br />

<strong>an</strong>d c<strong>on</strong>vexity of κ, as far as possible. When these two assumpti<strong>on</strong>s are used,<br />

they are stated explicitly in the hypotheses of the lemma.<br />

Lemma 14.1. J ≡ ∞ <strong>on</strong> X X0. For all x ∈ X0<br />

(14.6) J(x) = κ ∗∗ (x) ≤ κ(x) ≤ κ(x).<br />

The set {κ = 0} is not empty. C<strong>on</strong>sequently J is a lower semic<strong>on</strong>tinuous,<br />

proper c<strong>on</strong>vex functi<strong>on</strong>.<br />

Proof. Hahn-B<strong>an</strong>ach separati<strong>on</strong> implies that J ≡ ∞ <strong>on</strong> X X0. This is<br />

left as Exercise 14.7 at the end of the secti<strong>on</strong>.<br />

In (14.6) the last inequality is evident from definiti<strong>on</strong>s (14.4)–(14.5).<br />

The middle inequality is part (b) of Propositi<strong>on</strong> 5.12.<br />

We claim that<br />

(14.7) ¯p(ϕ) ≥ 〈x, ϕ〉 − κ(x) ∀x ∈ X , ∀ϕ ∈ L.<br />

Given x <strong>an</strong>d ϕ, <strong>an</strong>d c < 〈x, ϕ〉, let G = {y ∈ X : 〈y, ϕ〉 > c}. Then<br />

<br />

e rn〈y,ϕ〉 <br />

µn(dy) ≥ e rn〈y,ϕ〉 µn(dy) ≥ e rnc µn(G)<br />

from which<br />

X0<br />

¯p(ϕ) ≥ c + lim<br />

n→∞<br />

Letting c ↗ 〈x, ϕ〉 verifies (14.7).<br />

G<br />

1<br />

rn<br />

log µn(G) ≥ c − κ(x).


14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s 147<br />

Taking supremum over x <strong>on</strong> the right in (14.7) gives ¯p ≥ κ ∗ , <strong>an</strong>d <strong>an</strong>other<br />

round of c<strong>on</strong>vex c<strong>on</strong>jugati<strong>on</strong> gives<br />

(14.8) J(x) ≤ κ ∗∗ (x).<br />

Next we claim that<br />

(14.9) ¯p(ϕ) ≤ κ ∗ (ϕ) = sup {〈x, ϕ〉 − κ(x)}.<br />

x∈X0<br />

The equality above is simply the definiti<strong>on</strong>. The inequality completes the<br />

proof of (14.6) because it implies J(x) ≥ κ ∗∗ (x). The argument for (14.9)<br />

will be needed shortly again so we separate it in<strong>to</strong> the next lemma.<br />

The claim that {κ = 0} = ∅ is left as Exercise 14.8. Thus J is not<br />

identically infinite. J is c<strong>on</strong>vex <strong>an</strong>d lower semic<strong>on</strong>tinuous by its definiti<strong>on</strong><br />

(14.2). <br />

Lemma 14.2. Let g : X0 → [−∞, ∞) be upper semic<strong>on</strong>tinuous <strong>an</strong>d extend<br />

the use of the notati<strong>on</strong> ¯p(g) <strong>to</strong> such functi<strong>on</strong>s by defining<br />

<br />

1<br />

(14.10) ¯p(g) = lim log e<br />

n→∞<br />

rng(x) µn(dx).<br />

Then<br />

(14.11) ¯p(g) ≤ sup {g(x) − κ(x)}.<br />

x∈X0<br />

rn<br />

Proof. Denote the right-h<strong>an</strong>d side of (14.11) by A(g). We c<strong>an</strong> assume<br />

A(g) < ∞. Let a > A(g) <strong>an</strong>d ε > 0 such that a − ε > A(g). (We have not<br />

ruled out A(g) = −∞ but a is real.) We claim that each x ∈ X0 has <strong>an</strong><br />

open neighborhood Gx such that for large enough n<br />

<br />

(14.12)<br />

e rng dµn ≤ e rna .<br />

Gx<br />

This suffices for the proof because by compactness we c<strong>an</strong> cover X0 <strong>with</strong> a<br />

finite collecti<strong>on</strong> Gx1 , . . . , Gxm of these neighborhoods. Then for large enough<br />

n,<br />

<br />

e rng m<br />

<br />

dµn ≤ e rng dµn ≤ me rna .<br />

X0<br />

This implies ¯p(ϕ) ≤ a, <strong>an</strong>d letting a ↘ A(g) verifies (14.11).<br />

i=1<br />

Gx i<br />

Now <strong>to</strong> show (14.12). If g(x) = −∞ take Gx = {g < a} which is open<br />

by upper semic<strong>on</strong>tinuity. If −∞ < g(x) < ∞ [note that g(x) = ∞ is not<br />

allowed] then pick Gx so that g(y) ≤ g(x) + ε/4 for y ∈ Gx. Since<br />

X0<br />

−κ(x) ≤ A(g) − g(x) < a − ε − g(x),


148 14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s<br />

looking at the definiti<strong>on</strong> of κ(x) we c<strong>an</strong> shrink Gx <strong>to</strong> ensure that<br />

lim<br />

n→∞<br />

1<br />

rn<br />

C<strong>on</strong>sequently, for large enough n<br />

log µn(Gx) ≤ a − ε/2 − g(x).<br />

µn(Gx) ≤ exp rn(a − ε/4 − g(x)) <br />

<strong>an</strong>d (14.12) follows. <br />

Combined <strong>with</strong> the observati<strong>on</strong> that κ = κ implies the LDP, Lemma<br />

14.1 tells us that J = κ implies the LDP <strong>with</strong> rate functi<strong>on</strong> J. Our goal will<br />

now be <strong>to</strong> show that if the pressure exists <strong>an</strong>d κ is c<strong>on</strong>vex, J = κ. The next<br />

definiti<strong>on</strong> is the key.<br />

Definiti<strong>on</strong> 14.3. Let f : X0 → (−∞, ∞] be a lower semic<strong>on</strong>tinuous c<strong>on</strong>vex<br />

functi<strong>on</strong>. A point z ∈ X0 is a regular point of f if, for <strong>an</strong>y δ > 0 <strong>an</strong>d <strong>an</strong>y<br />

open set G ∋ z, there exist ψ ∈ L <strong>an</strong>d <strong>an</strong> open set U such that z ∈ U ⊂ G<br />

<strong>an</strong>d these inequalities hold:<br />

(14.13) inf<br />

x∈X0U<br />

<strong>an</strong>d<br />

f(x) − 〈x, ψ〉 > f(z) − 〈z, ψ〉<br />

(14.14) |〈x, ψ〉 − 〈z, ψ〉| < δ for x ∈ U.<br />

Exercise 14.4. Let f be a finite c<strong>on</strong>vex functi<strong>on</strong> <strong>on</strong> (a, b). Show that<br />

z ∈ (a, b) is regular if <strong>an</strong>d <strong>on</strong>ly if z is not <strong>an</strong> interior point of <strong>an</strong> interval<br />

(c, d) <strong>on</strong> which f is of the form f(x) = αx + β.<br />

The strict inequality in (14.13) is import<strong>an</strong>t. We illustrate the usefulness<br />

of this noti<strong>on</strong> by showing that J(z) = κ(z) at regular points z of J. Now we<br />

need <strong>to</strong> assume that the limit (14.3) defining the pressure actually exists.<br />

Lemma 14.5. Assume that the limit in (14.3) exists for all ϕ ∈ L. Then<br />

J(z) = κ(z) at each regular point z of J.<br />

Proof. Fix a regular point z, pick δ > 0 <strong>an</strong>d a neighborhood G of z, <strong>an</strong>d let<br />

ψ ∈ L <strong>an</strong>d neighborhood U be given by Definiti<strong>on</strong> 14.3. Define the upper<br />

semic<strong>on</strong>tinuous g : X → [−∞, ∞) by<br />

g(x) = 〈x, ψ〉 · 1IU c(x) − ∞ · 1IU(x).<br />

By Exercise 14.9 at the end of this secti<strong>on</strong>,<br />

(14.15)<br />

<br />

p(ψ) ≤ lim<br />

n→∞<br />

1<br />

<br />

log e rn〈x,ψ〉 <br />

µn(dx) ∨ ¯p(g).<br />

rn<br />

U


14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s 149<br />

From the definiti<strong>on</strong> (14.2),<br />

(14.16) p(ψ) ≥ 〈z, ψ〉 − J(z).<br />

On the other h<strong>an</strong>d, combining Lemma 14.2, the definiti<strong>on</strong> of g, inequality<br />

(14.6), <strong>an</strong>d property (14.13) of regularity gives<br />

¯p(g) ≤ sup<br />

x∈U c<br />

{〈x, ψ〉 − κ(x)} ≤ sup<br />

x∈U c<br />

{〈x, ψ〉 − J(x)}<br />

< 〈z, ψ〉 − J(z).<br />

C<strong>on</strong>sequently the sec<strong>on</strong>d member of the maximum in (14.15) is irrelev<strong>an</strong>t.<br />

Starting <strong>with</strong> (14.16) <strong>an</strong>d using the sec<strong>on</strong>d property (14.14) of regularity,<br />

<br />

1<br />

〈z, ψ〉 − J(z) ≤ p(ψ) = lim log e<br />

n→∞ rn U<br />

rn〈x,ψ〉 µn(dx)<br />

1<br />

≤ 〈z, ψ〉 + δ + lim log µn(U),<br />

n→∞ rn<br />

from which, since U ⊂ G,<br />

1<br />

lim log µn(G) ≥ −J(z) − δ.<br />

n→∞<br />

rn<br />

Since G is <strong>an</strong> arbitrary neighborhood of z <strong>an</strong>d δ > 0 is arbitrary, −κ(z) ≥<br />

−J(z) follows. <br />

We c<strong>an</strong> now state the main technical result of this secti<strong>on</strong> which shows<br />

that regular points are plentiful. Recall that a point z is <strong>an</strong> extreme point<br />

of a c<strong>on</strong>vex set K if z ∈ K <strong>an</strong>d it has this property: if z = sx + (1 − s)y for<br />

some x, y ∈ K <strong>an</strong>d 0 < s < 1, then x = y = z. In other words, z c<strong>an</strong>not<br />

lie <strong>on</strong> a line segment inside K unless it is <strong>an</strong> endpoint. The set of extreme<br />

points of K is denoted by ex(K).<br />

Theorem 14.6. Let f : X0 → (−∞, ∞] be a lower semic<strong>on</strong>tinuous c<strong>on</strong>vex<br />

functi<strong>on</strong>. Let ϕ ∈ L, c ∈ R, <strong>an</strong>d define the affine functi<strong>on</strong> g(x) = 〈x, ϕ〉 + c.<br />

Assume that g ≤ f <strong>on</strong> X0, <strong>an</strong>d that the c<strong>on</strong>vex set A = {x ∈ X0 : f(x) =<br />

g(x)} is n<strong>on</strong>empty. Then every extreme point of A is a regular point of f.<br />

A is a closed subset of the compact space X0, <strong>an</strong>d hence it is a compact<br />

c<strong>on</strong>vex set. By the Krein-Milm<strong>an</strong> theorem [33, Theorem 3.23] a compact<br />

c<strong>on</strong>vex set is the closed c<strong>on</strong>vex hull of its extreme points. Thus A has<br />

extreme points, <strong>an</strong>d c<strong>on</strong>sequently f has regular points. Before proving Theorem<br />

14.6 let us see how it completes the LDP.<br />

Proof of Baxter-Jain’s theorem. By its definiti<strong>on</strong> κ is lower semic<strong>on</strong>tinuous,<br />

<strong>an</strong>d the assumpti<strong>on</strong> is that it is also c<strong>on</strong>vex. Hence κ = κ ∗∗ <strong>an</strong>d by<br />

Exercise 5.7 for z ∈ X ,<br />

κ(z) = sup{〈z, ϕ〉 + c : ϕ ∈ L, c ∈ R, 〈x, ϕ〉 + c ≤ κ(x) ∀x ∈ X }.


150 14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s<br />

Suppose 〈x, ϕ〉 + c ≤ κ(x) for all x ∈ X . We claim that then also<br />

(14.17) 〈x, ϕ〉 + c ≤ J(x) for all x ∈ X .<br />

This implies that κ ≤ J, <strong>an</strong>d thereby κ = J since the opposite inequality<br />

was in Lemma 14.1. Then we have J = κ = κ <strong>an</strong>d as already pointed out,<br />

this is enough for the LDP <strong>with</strong> rate J.<br />

Inequality (14.17) needs <strong>to</strong> be verified <strong>on</strong>ly <strong>on</strong> X0 because J ≡ ∞ <strong>on</strong><br />

X X0. Let c1 be the (finite) infimum of the lower semic<strong>on</strong>tinuous functi<strong>on</strong><br />

J(x) − 〈x, ϕ〉 <strong>on</strong> the compact set X0. Since the sets {x : J(x) − 〈x, ϕ〉 ≤<br />

c1 + m −1 } are a nested family of n<strong>on</strong>empty compact sets, their intersecti<strong>on</strong><br />

A = {x : J(x) − 〈x, ϕ〉 = c1} is not empty. Let g(x) = 〈x, ϕ〉 + c1. We are<br />

exactly in the situati<strong>on</strong> of Theorem 14.6 <strong>an</strong>d so ex(A) c<strong>on</strong>sists of regular<br />

points of J. By Lemma 14.5, J(z) = κ(z) for all z ∈ ex(A).<br />

To summarize, we have now for z ∈ ex(A) that<br />

〈z, ϕ〉 + c ≤ κ(z) = J(z) = 〈z, ϕ〉 + c1.<br />

C<strong>on</strong>sequently c ≤ c1, <strong>an</strong>d by the definiti<strong>on</strong> c1 = infx(J(x) − 〈x, ϕ〉) (14.17)<br />

follows. <br />

As the last item of this secti<strong>on</strong> we prove Theorem 14.6.<br />

Proof of Theorem 14.6. Let ϕ ∈ L, g = ϕ + c <strong>an</strong>d A = {f = g} as<br />

given in the statement of the theorem. Let z ∈ ex(A), <strong>an</strong>d let δ > 0 <strong>an</strong>d<br />

<strong>an</strong> open neighborhood G of z be given. We need <strong>to</strong> produce ψ ∈ L <strong>an</strong>d a<br />

neighborhood U of z such that (14.13)–(14.14) are satisfied.<br />

Pick <strong>an</strong> open neighborhood U ⊂ G of z such that<br />

|g(x) − g(z)| < δ/2 for x ∈ U.<br />

Let K be the closed c<strong>on</strong>vex hull of the compact set A U. As a subset of<br />

the compact space X0, K is compact. By Milm<strong>an</strong>’s theorem [33, Theorem<br />

3.25], ex(K) ⊂ A U. Since A is compact <strong>an</strong>d c<strong>on</strong>vex, K ⊂ A. It follows<br />

that z /∈ K. (If z ∈ K then z must be extreme in K since it is extreme in<br />

the larger set A. But z ∈ U <strong>an</strong>d ex(K) ⊂ A U.) By the Hahn-B<strong>an</strong>ach<br />

separati<strong>on</strong> theorem there exists λ ∈ L such that<br />

sup〈x,<br />

λ〉 < 〈z, λ〉.<br />

x∈K<br />

By c<strong>on</strong>tinuity, we c<strong>an</strong> find <strong>an</strong> open set V ⊃ K <strong>an</strong>d η > 0 such that<br />

(14.18) sup〈x,<br />

λ〉 < 〈z, λ〉 − η.<br />

x∈V<br />

Since U ∪ V is <strong>an</strong> open set <strong>an</strong>d c<strong>on</strong>tains A, by compactness there exists<br />

ε > 0 such that f − g ≥ ε <strong>on</strong> X0 (U ∪ V ). Again by compactness we c<strong>an</strong>


14. C<strong>on</strong>vexity criteri<strong>on</strong> for large deviati<strong>on</strong>s 151<br />

fix a > 0 small enough so that<br />

Now set<br />

a |〈x − z, λ〉| ≤ 1<br />

2 (ε ∧ δ) for all x ∈ X0.<br />

ψ = ϕ + aλ.<br />

We check that (14.13)–(14.14) are satisfied. For (14.14), recalling that<br />

g = ϕ + c,<br />

|〈x, ψ〉 − 〈z, ψ〉| ≤ |g(x) − g(z)| + a |〈x − z, λ〉| < δ.<br />

C<strong>on</strong>diti<strong>on</strong> (14.13) is checked in two parts. For x ∈ X0(U ∪V ), recalling<br />

that f(z) = g(z),<br />

This rearr<strong>an</strong>ges <strong>to</strong><br />

f(x) − g(x) ≥ ε ≥ a〈x − z, λ〉 + ε/2 + f(z) − g(z).<br />

For x ∈ V , by (14.18) <strong>an</strong>d f ≥ g,<br />

<strong>an</strong>d thus<br />

f(x) − 〈x, ψ〉 ≥ f(z) − 〈z, ψ〉 + ε/2.<br />

f(x) − g(x) − a〈x, λ〉 ≥ f(z) − g(z) − a〈z, λ〉 + aη<br />

f(x) − 〈x, ψ〉 ≥ f(z) − 〈z, ψ〉 + aη.<br />

This completes the proof of Theorem 14.6. <br />

∗ Exercise 14.7. Use Hahn-B<strong>an</strong>ach separati<strong>on</strong> <strong>to</strong> show that J ≡ ∞ <strong>on</strong><br />

X X0.<br />

∗ Exercise 14.8. Show that κ is a rate functi<strong>on</strong> that satisfies the upper<br />

large deviati<strong>on</strong> bound <strong>an</strong>d the set {κ = 0} is not empty.<br />

∗Exercise 14.9. For <strong>an</strong>, bn ≥ 0,<br />

<br />

1<br />

lim log(<strong>an</strong> + bn) ≤ lim<br />

n→∞<br />

n→∞<br />

rn<br />

1<br />

rn<br />

log <strong>an</strong><br />

<br />

<br />

∨ lim<br />

n→∞<br />

1<br />

rn<br />

<br />

log bn .<br />

Exercise 14.10. Reprove S<strong>an</strong>ov’s theorem <strong>with</strong> the Baxter-Jain Theorem.<br />

C<strong>an</strong> you do the same for the process-level LDP for i.i.d. r<strong>an</strong>dom fields?


N<strong>on</strong>stati<strong>on</strong>ary<br />

independent variables<br />

Chapter 15<br />

15.1. Generalizati<strong>on</strong> of relative entropy <strong>an</strong>d S<strong>an</strong>ov’s theorem<br />

In this chapter we generalize S<strong>an</strong>ov’s theorem <strong>to</strong> a sequence of independent<br />

but not identically distributed r<strong>an</strong>dom variables. As in Secti<strong>on</strong> 6.2 the state<br />

space of the r<strong>an</strong>dom variables {Xk} is a Polish space S. We assume the<br />

r<strong>an</strong>dom variables are defined as coordinate variables <strong>on</strong> the sequence space<br />

Ω = S N . In this secti<strong>on</strong> we need <strong>to</strong> discuss even more frequently th<strong>an</strong> before<br />

probability measures <strong>on</strong> the space M1(S) of probability measures <strong>on</strong> S, so<br />

let us abbreviate M1 = M1(S) <strong>an</strong>d M1(M1) will denote M1(M1(S)). As<br />

throughout, M1 is given the weak <strong>to</strong>pology generated by Cb(S). Generic<br />

elements of M1, that is, probability measures <strong>on</strong> S, will be denoted by α,<br />

β <strong>an</strong>d γ, while κ will be a probability measure <strong>on</strong> M1, <strong>an</strong>d µ <strong>an</strong>d ν are<br />

probability measures <strong>on</strong> the product space S × M1.<br />

Let λ = {λk : k ≥ 1} be a sequence of probability measures <strong>on</strong> S, that is,<br />

<strong>an</strong> element of M N 1 . Let P λ = λk denote the product probability measure<br />

<strong>on</strong> Ω <strong>with</strong> marginals (λk), uniquely defined by the requirement that<br />

P λ {X1 ∈ B1, X2 ∈ B2, . . . , Xn ∈ Bn} =<br />

n<br />

λk(Bk)<br />

for Borel subsets Bk ⊂ S. Our interest lies in the large deviati<strong>on</strong>s of the<br />

empirical measure<br />

Ln = 1<br />

n<br />

δXk<br />

n<br />

k=1<br />

k=1<br />

153


154 15. N<strong>on</strong>stati<strong>on</strong>ary independent variables<br />

under P λ .<br />

We begin by developing the relev<strong>an</strong>t generalizati<strong>on</strong> of relative entropy.<br />

Assume given a probability measure Ψ <strong>on</strong> M1. Define a joint distributi<strong>on</strong><br />

µ <strong>on</strong> S × M1 by<br />

(15.1) µ(dx, dα) = α(dx) Ψ(dα).<br />

The S-marginal of µ c<strong>an</strong> be regarded as the me<strong>an</strong> of Ψ:<br />

<br />

(15.2) µS(B) = α(B) Ψ(dα) for Borel subsets B ⊂ S.<br />

M1<br />

Recall Definiti<strong>on</strong> 6.2 of relative entropy H.<br />

Definiti<strong>on</strong> 15.1. Given Ψ ∈ M1(M1) <strong>an</strong>d µ defined by (15.1), define the<br />

following entropy for γ ∈ M1:<br />

(15.3) K(γ | Ψ) = inf<br />

ν H(ν | µ)<br />

where the infimum is over probability measures ν <strong>on</strong> S ×M1 <strong>with</strong> marginals<br />

γ <strong>an</strong>d Ψ.<br />

Here is the duality <strong>an</strong>d other basic properties of K.<br />

Theorem 15.2. (a) For γ ∈ M1<br />

<br />

(15.4) K(γ | Ψ) = sup E<br />

f∈Cb(S)<br />

γ <br />

[f] −<br />

M1<br />

log E α [e f <br />

] Ψ(dα) .<br />

(b) K(γ | Ψ) ≥ H(γ | µS) for all γ ∈ M1. K(γ | Ψ) = H(γ | µS) for all<br />

γ ∈ M1 if <strong>an</strong>d <strong>on</strong>ly if Ψ is a pointmass.<br />

(c) K is a c<strong>on</strong>vex, tight rate functi<strong>on</strong> <strong>on</strong> M1 <strong>an</strong>d K(γ | Ψ) = 0 if <strong>an</strong>d<br />

<strong>on</strong>ly if γ = µS.<br />

Proof. (a) Apply Theorem 13.1 <strong>with</strong> the choices X = S, Y = M1, <strong>an</strong>d<br />

ρ α = α for α ∈ M1.<br />

(b) For the inequality apply Jensen’s inequality inside the supremum in<br />

(15.4), as in (13.6). If Ψ is a pointmass then K(γ | Ψ) = H(γ | µS) follows<br />

from (15.4).<br />

Suppose now that K(γ | Ψ) = H(γ | µS) for all γ ∈ M1. Let f be<br />

a bounded Borel functi<strong>on</strong> <strong>on</strong> S that satisfies e f dµS = 1. By Jensen’s<br />

inequality<br />

(15.5)<br />

<br />

M1<br />

log E α [e f <br />

] Ψ(dα) ≤ log<br />

M1<br />

E α [e f ] Ψ(dα) = 0.<br />

Define the probability measure γ by dγ = ef dµS. Then<br />

<br />

E γ [f] = H(γ | µS) = K(γ | Ψ) ≥ E γ [f] −<br />

M1<br />

log E α [e f ] Ψ(dα)


15.2. Proof of the large deviati<strong>on</strong> principle 155<br />

from which <br />

log E α [e f ] Ψ(dα) ≥ 0.<br />

M1<br />

Thus we have equality in the Jensen inequality in (15.5). Then by the strict<br />

c<strong>on</strong>cavity of log, the r<strong>an</strong>dom variable α ↦→ Eα [ef ] must be degenerate, in<br />

other words Eα [ef ] = 1 = E µS [ef ] for Ψ-a.e. α. We c<strong>an</strong> find a countable<br />

family F of bounded functi<strong>on</strong>s such that {ef : f ∈ F} separates measures,<br />

<strong>an</strong>d so this is enough for c<strong>on</strong>cluding that Ψ-a.e. α equals µS.<br />

(c) C<strong>on</strong>vexity is evident from either (15.3) or (15.4). K(µS | Ψ) = 0<br />

follows from observing that for γ = µS, the right-h<strong>an</strong>d side of (15.4) is<br />

≤ 0 for all f, by Jensen’s inequality. The rest follows from K(γ | Ψ) ≥<br />

H(γ | µS). <br />

Definiti<strong>on</strong> 15.3. A sequence λ = {λk : k ≥ 1} ∈ MN 1<br />

the limit<br />

is called regular if<br />

(15.6)<br />

n 1<br />

Ψ(λ) = lim δλk<br />

n→∞ n<br />

exists in the weak <strong>to</strong>pology of M1(M1).<br />

The main theorem of this secti<strong>on</strong> asserts that regularity is necessary <strong>an</strong>d<br />

sufficient for the LDP.<br />

Theorem 15.4. The distributi<strong>on</strong>s P λ {Ln ∈ · } satisfy <strong>an</strong> LDP <strong>with</strong> normalizati<strong>on</strong><br />

{n} if <strong>an</strong>d <strong>on</strong>ly if λ is regular. In this case the rate functi<strong>on</strong> is<br />

K( · | Ψ(λ)).<br />

This LDP appeared in [4] <strong>an</strong>d was generalized <strong>an</strong>d further elucidated<br />

in [35] <strong>an</strong>d [36]. In particular, process level versi<strong>on</strong>s exist. Applicati<strong>on</strong>s <strong>to</strong><br />

statistical mech<strong>an</strong>ics, in the spirit of our Secti<strong>on</strong>s 4.3 <strong>an</strong>d 6.3 <strong>an</strong>d Chapter<br />

9, appear in [37] <strong>an</strong>d [38].<br />

k=1<br />

15.2. Proof of the large deviati<strong>on</strong> principle<br />

We begin <strong>with</strong> a small generalizati<strong>on</strong> of the LDP proved in Chapter 14. As<br />

in that chapter, let X <strong>an</strong>d L be two real vec<strong>to</strong>r spaces in duality, endowed<br />

<strong>with</strong> their weak <strong>to</strong>pologies, assumed Hausdorff, <strong>an</strong>d let X0 be a c<strong>on</strong>vex,<br />

compact subset of X . Let I be <strong>an</strong> arbitrary index set, <strong>an</strong>d for each i ∈ I<br />

let {µ (i)<br />

n : n ∈ N} be a sequence of Borel probability measures <strong>on</strong> X0. Let<br />

0 < rn ↗ ∞ be a normalizing sequence.<br />

For x ∈ X define the upper <strong>an</strong>d lower local rate functi<strong>on</strong>s by<br />

(15.7) κ (i) (x) = − inf<br />

x ∈ G ⊂ X , G open<br />

lim<br />

n→∞<br />

1<br />

rn<br />

log µ (i)<br />

n (G)


156 15. N<strong>on</strong>stati<strong>on</strong>ary independent variables<br />

<strong>an</strong>d<br />

(15.8) κ (i) (x) = − inf<br />

x ∈ G ⊂ X , G open<br />

lim<br />

n→∞<br />

1<br />

rn<br />

log µ (i)<br />

n (G).<br />

Assume that the sequences {µ (i)<br />

n : n ∈ N} all determine the same pressure:<br />

for all ϕ ∈ L <strong>an</strong>d i ∈ I,<br />

(15.9) p(ϕ) = lim<br />

n→∞<br />

1<br />

rn<br />

<br />

log<br />

X0<br />

e rn〈x,ϕ〉 µ (i)<br />

n (dx).<br />

Let J : X → [0, ∞] denote the c<strong>on</strong>vex c<strong>on</strong>jugate of p:<br />

(15.10) J(x) = sup{〈x,<br />

ϕ〉 − p(ϕ)},<br />

ϕ∈L<br />

x ∈ X .<br />

Define<br />

(15.11) κ0(x) = sup κ<br />

i∈I<br />

(i) (x).<br />

Theorem 15.5. Assume that the limit p(ϕ) in (15.9) exists for all ϕ ∈ L<br />

<strong>an</strong>d i ∈ I, <strong>an</strong>d that κ0 is c<strong>on</strong>vex. Then for each i ∈ I the sequence {µ (i)<br />

n }n∈N<br />

satisfies the LDP <strong>with</strong> rate functi<strong>on</strong> J = κ (i) = κ (i) = κ0 <strong>an</strong>d normalizati<strong>on</strong><br />

{rn}.<br />

∗ Exercise 15.6. Go through Chapter 14 <strong>to</strong> check what more needs <strong>to</strong> be<br />

said <strong>to</strong> prove Theorem 15.5. Not much, you should discover.<br />

To start the proof of Theorem 15.4, assume first that limit (15.6) holds<br />

<strong>an</strong>d also temporarily that S is compact. We apply Theorem 15.5 <strong>to</strong> prove<br />

the LDP. Take X <strong>to</strong> be the vec<strong>to</strong>r space of real-valued Borel measures <strong>on</strong><br />

S, X0 = M1 <strong>an</strong>d L = Cb(S). The normalizati<strong>on</strong> sequence is naturally<br />

rn = n. The index set is I = Z+ = {0, 1, 2, . . . }, <strong>an</strong>d the measures µ (i)<br />

n are<br />

distributi<strong>on</strong>s of shifted empirical measures:<br />

(15.12) µ (i)<br />

n (B) = P λ<br />

<br />

1<br />

n<br />

in+n <br />

k=in+1<br />

δXk<br />

<br />

∈ B , B ⊂ M1 measurable.<br />

We begin by verifying assumpti<strong>on</strong> (15.9). A linear functi<strong>on</strong>al ϕ ∈ L<br />

is now identified <strong>with</strong> a functi<strong>on</strong> f ∈ Cb(S) <strong>an</strong>d the duality is given by


15.2. Proof of the large deviati<strong>on</strong> principle 157<br />

integrati<strong>on</strong>: 〈α, ϕ〉 = f dα for α ∈ X . C<strong>on</strong>sequently<br />

1<br />

n log<br />

<br />

e n〈α,ϕ〉 µ (i)<br />

n (dα) = 1<br />

n log<br />

in+n <br />

exp<br />

(15.13)<br />

X0<br />

= 1<br />

n<br />

in+n <br />

k=in+1<br />

= i + 1<br />

(i + 1)n<br />

<br />

−→<br />

M1<br />

log E λk [e f ]<br />

in+n <br />

k=1<br />

as n → ∞, by assumpti<strong>on</strong> (15.6).<br />

Ω<br />

log E λk [e f ] − i<br />

in<br />

k=in+1<br />

in<br />

k=1<br />

log E α [e f ] Ψ(λ, dα) ≡ p(f)<br />

<br />

f(Xk) dP λ<br />

log E λk [e f ]<br />

C<strong>on</strong>vexity of κ0 will follow from verifying that, for all i ∈ Z+ <strong>an</strong>d probability<br />

measures α, β <strong>an</strong>d γ ∈ M1 such that γ = (α + β)/2,<br />

(15.14) κ (i) (γ) ≤ 1<br />

2 κ(2i) (α) + 1<br />

2 κ(2i+1) (β).<br />

To complete the argument for c<strong>on</strong>vexity of κ0 from (15.14), note that the<br />

right-h<strong>an</strong>d side is bounded above by κ0(α)+κ0(β) /2, then take supremum<br />

over i <strong>on</strong> the left, <strong>an</strong>d apply Exercise 5.9.<br />

To prove (15.14), let c > −κ (i) (γ), <strong>an</strong>d pick <strong>an</strong> open neighborhood G of<br />

γ such that<br />

in+n<br />

1 <br />

<br />

∈ G .<br />

1 λ<br />

(15.15) c > lim log P<br />

n→∞ n<br />

We c<strong>an</strong> assume that G is of the form<br />

n<br />

k=in+1<br />

δXk<br />

G = {ρ ∈ M1 : |E ρ [fℓ] − E γ [fℓ]| ≤ 4ε, ℓ = 1, . . . , L}<br />

for some functi<strong>on</strong>s f1, . . . , fL ∈ Cb(S) <strong>an</strong>d ε > 0. Define neighborhoods of α<br />

<strong>an</strong>d β by<br />

<strong>an</strong>d<br />

G1 = {ρ ∈ M1 : |E ρ [fℓ] − E α [fℓ]| ≤ ε, ℓ = 1, . . . , L}<br />

G2 = {ρ ∈ M1 : |E ρ [fℓ] − E β [fℓ]| ≤ ε, ℓ = 1, . . . , L}.<br />

Let m = ⌊n/2⌋. Do a quick calculati<strong>on</strong> <strong>to</strong> check that, if n is large enough,<br />

(15.16)<br />

in+n<br />

<br />

<br />

1 <br />

fℓ(Xk) −<br />

n<br />

k=in+1<br />

1<br />

(2i+1)m<br />

1 <br />

· fℓ(Xk)<br />

2 m<br />

k=2im+1<br />

− 1<br />

(2i+2)m<br />

1 <br />

·<br />

2 m<br />

<br />

<br />

fℓ(Xk) <br />

≤ ε<br />

k=(2i+1)m+1


158 15. N<strong>on</strong>stati<strong>on</strong>ary independent variables<br />

uniformly for all values {Xk}. (Since fℓ is bounded, the issue is <strong>on</strong>ly which<br />

terms appear in the sums.) This <strong>an</strong>d independence of the {Xk} under P λ<br />

imply that<br />

P λ<br />

<br />

1<br />

n<br />

in+n <br />

k=in+1<br />

≥ P λ<br />

<br />

1<br />

m<br />

= P λ<br />

<br />

1<br />

m<br />

δXk<br />

<br />

∈ G<br />

(2i+1)m<br />

<br />

k=2im+1<br />

(2i+1)m<br />

<br />

k=2im+1<br />

δXk ∈ G1 , 1<br />

m<br />

δXk<br />

∈ G1<br />

<br />

(2i+2)m<br />

<br />

k=(2i+1)m+1<br />

P λ<br />

<br />

1<br />

m<br />

δXk<br />

(2i+2)m<br />

<br />

k=(2i+1)m+1<br />

Apply n −1 log <strong>an</strong>d let n → ∞ <strong>to</strong> c<strong>on</strong>tinue from (15.15):<br />

<br />

1 λ 1<br />

c > lim log P<br />

n→∞ n n<br />

in+n <br />

k=in+1<br />

≥ 1<br />

<br />

1 λ 1<br />

· lim log P<br />

2 m→∞ m m<br />

δXk<br />

(2i+1)m<br />

<br />

k=2im+1<br />

+ 1<br />

<br />

1 λ 1<br />

· lim log P<br />

2 m→∞ m m<br />

≥ − 1<br />

2 κ(2i) (α) − 1<br />

2 κ(2i+1) (β).<br />

<br />

∈ G<br />

δXk<br />

(2i+2)m<br />

<br />

∈ G1<br />

k=(2i+1)m+1<br />

Letting c ↘ −κ (i) (γ) completes the proof of (15.14).<br />

<br />

δXk<br />

∈ G2<br />

δXk<br />

∈ G2<br />

<br />

<br />

∈ G2 .<br />

We have verified the hypotheses of Theorem 15.5. C<strong>on</strong>sequently we have<br />

the LDP for the sequence {µ (0)<br />

n } <strong>with</strong> the rate functi<strong>on</strong> J = p ∗ defined by<br />

(15.10). Calculati<strong>on</strong> (15.13) identifies the pressure p(f), <strong>an</strong>d then a gl<strong>an</strong>ce at<br />

(15.4) c<strong>on</strong>firms that J = K( · | Ψ(λ)) as claimed in Theorem 15.4. We have<br />

completed the proof of the if-part of Theorem 15.4 under the assumpti<strong>on</strong><br />

that S is compact.<br />

It remains <strong>to</strong> lift the compactness assumpti<strong>on</strong>. Let now S be Polish. We<br />

check exp<strong>on</strong>ential tightness.<br />

Lemma 15.7. Assume (15.6). Then for each b > 0 there exists a compact<br />

subset Kb ⊂ M1 such that for all n ∈ N<br />

(15.17) P λ {Ln ∈ K c b } ≤ e−bn .


15.2. Proof of the large deviati<strong>on</strong> principle 159<br />

∗Exercise 15.8. Show that Γ ↦→ ρΓ(·) = <br />

α(·) Γ(dα) defines a c<strong>on</strong>tin-<br />

M1<br />

uous mapping from M1(M1) <strong>to</strong> M1. This follows fairly directly from the<br />

definiti<strong>on</strong> of weak <strong>to</strong>pology <strong>on</strong> probability measures.<br />

Proof of Lemma 15.17. Pick a positive sequence εj ↘ 0 <strong>an</strong>d define cj =<br />

ε −1<br />

j (j + 1 + log 2). By the above exercise <strong>an</strong>d assumpti<strong>on</strong> (15.6) the probability<br />

measures<br />

¯λn = 1<br />

n<br />

λk<br />

n<br />

k=1<br />

c<strong>on</strong>verge weakly. A c<strong>on</strong>vergent sequence is tight, so we may pick compact<br />

sets Aj ⊂ S such that ¯ λn(Ac j ) ≤ e−cj for all n. By <strong>an</strong> exp<strong>on</strong>ential Chebyshev<br />

inequality <strong>an</strong>d the arithmetic-geometric inequality (a special case of Jensen’s<br />

inequality) <strong>an</strong>d the definiti<strong>on</strong> of cj,<br />

<br />

n<br />

P λ {Ln(A c j) ≥ εj} ≤ P λ<br />

≤ e −nεjcj<br />

n<br />

k=1<br />

= e −nεjcj<br />

<br />

E ¯ λn e cj1I Ac j<br />

k=1<br />

1IA c j (Xk) ≥ nεj<br />

E λk e cj1I Ac <br />

j −nεjcj ≤ e<br />

n<br />

1<br />

n<br />

<br />

n<br />

k=1<br />

≤ e −nεjcj 2 n = e −n(j+1) .<br />

E λk e cj1I Ac <br />

j<br />

n Let Kb = {α ∈ M1 : α(A c j ) ≤ εj for j ≥ b }. Kb is closed by the portm<strong>an</strong>teau<br />

theorem (Exercise A.3) <strong>an</strong>d compact because the defining c<strong>on</strong>diti<strong>on</strong><br />

forces tightness. And finally,<br />

P λ {Ln ∈ K c b<br />

<br />

} ≤ P λ {Ln(A c j) > εj} ≤ e −bn . <br />

j≥b<br />

For the final step of the proof of the LDP we compactify, obtain the LDP<br />

<strong>on</strong> the compactificati<strong>on</strong>, <strong>an</strong>d then argue that the LDP c<strong>an</strong> be restricted <strong>to</strong><br />

the original space.<br />

Let ¯ S be the compactificati<strong>on</strong> of S obtained by completing S under a<br />

<strong>to</strong>tally bounded metric ¯ d, as was d<strong>on</strong>e in the proof of Theorem 13.1. S<br />

is a dense Borel subset of ¯ S. M1 c<strong>an</strong> be c<strong>on</strong>sidered as a subset of M1( ¯ S)<br />

because every measure <strong>on</strong> S is also a measure <strong>on</strong> ¯ S. As a subspace of M1( ¯ S)<br />

the space M1 has its original <strong>to</strong>pology. This is because the weak <strong>to</strong>pology<br />

of M1( ¯ S) is generated by Cb( ¯ S), which is in bijective corresp<strong>on</strong>dence <strong>with</strong><br />

U b, ¯ d (S) via restricti<strong>on</strong> <strong>an</strong>d unique extensi<strong>on</strong> of functi<strong>on</strong>s. And in fact we<br />

c<strong>an</strong> write ¯ M1 for M1( ¯ S) because it is the closure of M1 (Exercise 15.14).<br />

From these c<strong>on</strong>siderati<strong>on</strong>s follows that the limit in assumpti<strong>on</strong> (15.6) is<br />

also valid in the space M1( ¯ M1). The first part of the proof then gives us<br />

<strong>an</strong> LDP <strong>on</strong> the space ¯ M1 <strong>with</strong> rate functi<strong>on</strong> K( · |Ψ(λ)). Let G be <strong>an</strong> open<br />

subset of M1 <strong>an</strong>d A a closed subset of M1. By the definiti<strong>on</strong> of relative


160 15. N<strong>on</strong>stati<strong>on</strong>ary independent variables<br />

<strong>to</strong>pology, there exist <strong>an</strong> open set G1 ⊂ ¯ M1 <strong>an</strong>d a closed set A1 ⊂ ¯ M1 such<br />

that G = G1 ∪ M1 <strong>an</strong>d A = A1 ∪ M1. Since Ln ∈ M1 P λ -a.s., the LDP <strong>on</strong><br />

the larger space ¯ M1 gives us the bounds<br />

1<br />

(15.18) lim<br />

n→∞ n log P λ {Ln ∈ G} ≥ − inf K(α |Ψ(λ))<br />

α∈G1<br />

<strong>an</strong>d<br />

(15.19) lim<br />

n→∞<br />

1<br />

n log P λ {Ln ∈ A} ≤ − inf K(α |Ψ(λ)).<br />

α∈A1<br />

The lower bound <strong>on</strong> the space M1 follows now because the right-h<strong>an</strong>d side<br />

of (15.18) c<strong>an</strong>not increase if G1 is replaced by G. To replace A1 <strong>with</strong> A <strong>on</strong><br />

the right-h<strong>an</strong>d side of (15.19) we need <strong>to</strong> show that<br />

(15.20) K(α |Ψ(λ)) = ∞ for α ∈ ¯ M1 \ M1.<br />

This is a c<strong>on</strong>sequence of exp<strong>on</strong>ential tightness: (15.17) <strong>an</strong>d the lower bound<br />

give<br />

inf<br />

α∈Kc 1<br />

K(α |Ψ(λ)) ≥ − lim<br />

b<br />

n→∞ n log P λ {Ln ∈ K c b } ≥ b.<br />

Since Kb ⊂ M1 we c<strong>an</strong> let b ↗ ∞ <strong>to</strong> get (15.20).<br />

To summarize, (15.18)–(15.19) turn in<strong>to</strong> the LDP <strong>on</strong> the space M1<br />

claimed in Theorem 15.4, <strong>an</strong>d the proof of the if-part is complete.<br />

Next we prove the <strong>on</strong>ly if-part of Theorem 15.4, namely that existence of<br />

<strong>an</strong> LDP implies the existence of the limit in (15.6). First a technical lemma.<br />

Lemma 15.9. Let Z be a compact metric space, <strong>an</strong>d Φ <strong>an</strong>d Ψ Borel probability<br />

measures <strong>on</strong> M1(Z ). Assume that<br />

<br />

(15.21)<br />

log E α [e g <br />

] Φ(dα) = log E α [e g ] Ψ(dα)<br />

M1(Z )<br />

for all g ∈ Cb(Z ). Then Φ = Ψ.<br />

M1(Z )<br />

Remark 15.10. By a compactificati<strong>on</strong> argument this lemma c<strong>an</strong> be proved<br />

for Polish spaces but we need it <strong>on</strong>ly for compact spaces.<br />

Proof. Let h ∈ Cb(Z ). Let δ > 0 be small enough so that δ h∞ < 1.<br />

Then for u ∈ (−δ, δ) let g(x) = log(1 + uh(x)), <strong>an</strong>d exp<strong>an</strong>d:<br />

<br />

log E α [e g <br />

] Φ(dα) = log(1 + uE α [h]) Φ(dα)<br />

M1(Z )<br />

= −<br />

M1(Z )<br />

∞ (−u) k <br />

k=1<br />

k<br />

(E<br />

M1(Z )<br />

α [h]) k Φ(dα).


15.2. Proof of the large deviati<strong>on</strong> principle 161<br />

The last expressi<strong>on</strong> is <strong>an</strong> <strong>an</strong>alytic functi<strong>on</strong> of u ∈ (−δ, δ), <strong>an</strong>d c<strong>on</strong>sequently<br />

the coefficients of its power series exp<strong>an</strong>si<strong>on</strong> are uniquely determined. We<br />

c<strong>an</strong> c<strong>on</strong>clude that<br />

<br />

<br />

(E α [h]) k Ψ(dα) ∀h ∈ Cb(Z ).<br />

(E<br />

M1(Z )<br />

α [h]) k Φ(dα) =<br />

M1(Z )<br />

From this <strong>an</strong>d the power series exp<strong>an</strong>si<strong>on</strong> for the exp<strong>on</strong>ential we deduce<br />

the equality of these characteristic functi<strong>on</strong>s:<br />

<br />

e iEα <br />

[h]<br />

Φ(dα) = e iEα [h]<br />

Ψ(dα) ∀h ∈ Cb(Z )<br />

M1(Z )<br />

M1(Z )<br />

where i = √ −1 is the imaginary unit. Take linear combinati<strong>on</strong>s h =<br />

m<br />

k=1 tkgk for g1, . . . , gm ∈ Cb(Z ) <strong>an</strong>d vary the vec<strong>to</strong>rs (t1, . . . , tm) ∈ R m<br />

<strong>to</strong> c<strong>on</strong>clude that <strong>an</strong>y vec<strong>to</strong>r of the form (E α [g1], . . . , E α [gm]) has the same<br />

distributi<strong>on</strong> under Φ(dα) <strong>an</strong>d Ψ(dα).<br />

<br />

Integrating term by term, it follows that<br />

<br />

p(E<br />

M1(Z )<br />

α [g1], . . . , E α [gm]) Φ(dα) =<br />

p(E<br />

M1(Z )<br />

α [g1], . . . , E α [gm]) Ψ(dα)<br />

for <strong>an</strong>y polynomial p in m variables <strong>an</strong>d g1, . . . , gm ∈ Cb(Z ). Finally we use<br />

compactness. By the St<strong>on</strong>e-Weierstrass theorem [18, Secti<strong>on</strong> 4.7] this class<br />

of functi<strong>on</strong>s is dense am<strong>on</strong>g c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> M1(Z ). C<strong>on</strong>sequently<br />

we have <br />

<br />

F (α) Φ(dα) = F (α) Ψ(dα)<br />

M1(Z )<br />

M1(Z )<br />

for all c<strong>on</strong>tinuous F <strong>on</strong> M1(Z ) which implies the result. <br />

Now assume that distributi<strong>on</strong>s P λ {Ln ∈ · } satisfy <strong>an</strong> LDP <strong>with</strong> normalizati<strong>on</strong><br />

{n} <strong>an</strong>d some rate functi<strong>on</strong> I <strong>on</strong> M1. With λ fixed, our goal is<br />

<strong>to</strong> prove that the probability measures<br />

Ψn = 1<br />

n<br />

c<strong>on</strong>verge weakly <strong>on</strong> the space M1(M1).<br />

To get limit points for {Ψn} for free we tr<strong>an</strong>sfer the discussi<strong>on</strong> <strong>to</strong> the<br />

compact space M1( ¯ S). By the c<strong>on</strong>tracti<strong>on</strong> principle the LDP holds for the<br />

distributi<strong>on</strong>s P λ {Ln ∈ · } <strong>on</strong> the space M1( ¯ S) <strong>with</strong> rate functi<strong>on</strong><br />

(15.22) J(α) =<br />

n<br />

k=1<br />

δλk<br />

<br />

I(α), α ∈ M1<br />

lim M1∋β→α I(β), α ∈ M1( ¯ S) \ M1.<br />

(This situati<strong>on</strong> was addressed in Exercise 4.3. See also Exercise 15.14 below.)<br />

Since M1( ¯ S) is compact, so is M1(M1( ¯ S)). By st<strong>an</strong>dard metric space<br />

arguments c<strong>on</strong>vergence of the sequence {Ψn} follows if we c<strong>an</strong> show that


162 15. N<strong>on</strong>stati<strong>on</strong>ary independent variables<br />

there is a unique limit point. So let Ψ = limj→∞ Ψnj be the limit of some<br />

subsequence {Ψnj }. Let g ∈ Cb( ¯ S). Integrating the bounded, c<strong>on</strong>tinuous<br />

functi<strong>on</strong> F (α) = log Eα [eg ] against the measure Ψnj <strong>an</strong>d taking the weak<br />

limit gives<br />

<br />

1<br />

nj <br />

log E λk g<br />

[e ]<br />

(15.23)<br />

M1( ¯ S)<br />

log E α [e g ] Ψ(dα) = lim<br />

j→∞<br />

= lim<br />

j→∞<br />

nj<br />

k=1<br />

1<br />

nj<br />

log E λ<br />

exp<br />

nj<br />

<br />

k=1<br />

<br />

g(Xk) .<br />

On the other h<strong>an</strong>d, Varadh<strong>an</strong>’s theorem (page 32) <strong>to</strong>gether <strong>with</strong> the assumed<br />

LDP gives<br />

1<br />

lim log Eλexp<br />

n→∞ n<br />

n <br />

k=1<br />

<br />

g(Xk) = sup<br />

γ∈M1( ¯ {E<br />

S)<br />

γ [g] − J(γ)} = J ∗ (g).<br />

Thus the qu<strong>an</strong>tities in (15.23) are uniquely defined <strong>an</strong>d equal J ∗ (g), for all<br />

limit points Ψ, <strong>an</strong>d then by Lemma 15.9 there c<strong>an</strong> be <strong>on</strong>ly <strong>on</strong>e limit point.<br />

We have proved that there is a limit Ψn → Ψ in the space M1(M1( ¯ S)).<br />

It remains <strong>to</strong> argue that this limit is actually <strong>an</strong> element of the space<br />

M1(M1), or in other words that Ψ{α : α(S) = 1} = 1. Let ¯α denote<br />

the me<strong>an</strong> of Ψ:<br />

<br />

¯α(B) = α(B) Ψ(dα) for B ⊂ S measurable.<br />

M1( ¯ S)<br />

It is enough <strong>to</strong> show that ¯α(S) = 1 because this implies<br />

0 = ¯α( ¯ <br />

S \ S) = α( ¯ S \ S) Ψ(dα).<br />

M1( ¯ S)<br />

To show this last point, start <strong>with</strong><br />

<br />

E γ <br />

(f) −<br />

J(γ) ≥ J ∗∗ (γ) = sup<br />

f∈Cb( ¯ S)<br />

M1( ¯ S)<br />

log E α [e f <br />

] Ψ(dα) ≥ H(γ | ¯α)<br />

where the last inequality came from part (b) of Theorem 15.2. Now rate<br />

functi<strong>on</strong> I c<strong>an</strong>not be identically infinite <strong>on</strong> M1 because the upper bound<br />

of the LDP implies inf I = 0. Then ¯α(S) > 0 because otherwise every γ<br />

supported <strong>on</strong> S would fail γ ≪ ¯α <strong>an</strong>d the inequality above would force<br />

I(γ) = ∞. Combining Exercise 6.15 <strong>with</strong> the inequality above gives for<br />

γ ∈ M1<br />

I(γ) = J(γ) ≥ H(γ | ¯α) ≥ − log ¯α(S) ≥ 0.<br />

Choose a sequence γj ∈ M1 such that I(γj) ↘ 0. This forces ¯α(S) = 1, <strong>an</strong>d<br />

completes the proof of Theorem 15.4.


15.2. Proof of the large deviati<strong>on</strong> principle 163<br />

Exercise 15.11. [36] Find the necessary <strong>an</strong>d sufficient c<strong>on</strong>diti<strong>on</strong> <strong>on</strong> λ under<br />

which P λ {Ln ∈ · } satisfies the LDP <strong>with</strong> rate given by relative entropy<br />

H( · | ρ) <strong>with</strong> respect <strong>to</strong> some probability measure ρ.<br />

Exercise 15.12. [36] With notati<strong>on</strong> as in (15.1)–(15.2), let p be the s<strong>to</strong>chastic<br />

kernel from S in<strong>to</strong> M1(M1) defined by<br />

<br />

1IA(x)p(x, B) µS(dx) = µ(A × B)<br />

S<br />

for Borel sets A ⊂ S <strong>an</strong>d B ⊂ M1. Show that for α ∈ M1,<br />

K(α | Ψ) = H(α | µS) + K(Ψ | α ◦ p −1 ).<br />

Note that α ◦ p −1 is a probability measure <strong>on</strong> M1(M1) so the sec<strong>on</strong>d Kentropy<br />

makes sense. In particular, K(α | Ψ) = H(α | µS) if <strong>an</strong>d <strong>on</strong>ly if<br />

αp = Ψ.<br />

∗ Exercise 15.13. Supply the missing details for the argument that Ψ-a.e.<br />

α equals µS in the proof of part (b) of Theorem 15.2.<br />

∗ Exercise 15.14. Show that probability measures <strong>on</strong> S are dense am<strong>on</strong>g<br />

probability measures <strong>on</strong> ¯ S. (Hint: c<strong>on</strong>vex combinati<strong>on</strong>s of point masses are<br />

dense am<strong>on</strong>g probability measures.) This provides justificati<strong>on</strong> for formula<br />

(15.22).


Appendixes


Topics from probability<br />

Appendix A<br />

Ultimately, this will be <strong>an</strong> appendix <strong>on</strong> general probability: a 3-page summary<br />

of graduate probability, probability space, measure, σ-algebra, main<br />

limit theorems, c<strong>on</strong>diti<strong>on</strong>al probabilities, m<strong>on</strong>ot<strong>on</strong>e class, etc.<br />

A.1. Weak c<strong>on</strong>vergence of probability measures<br />

Let (X , d) be a metric space. Let M(X ) be the space of bounded signed<br />

Borel measures <strong>an</strong>d M1(X ) the subspace of probability measures. Let Cb(X )<br />

be the space of bounded c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> X .<br />

Definiti<strong>on</strong> A.1. A sequence of probability measures µn ∈ M1(X ) c<strong>on</strong>verges<br />

weakly <strong>to</strong> µ ∈ M1(X ) if f dµn → f dµ for all bounded c<strong>on</strong>tinuous<br />

functi<strong>on</strong>s f.<br />

Since knowing f dµ for all bounded c<strong>on</strong>tinuous functi<strong>on</strong>s uniquely determines<br />

µ, µn c<strong>an</strong>not c<strong>on</strong>verge weakly <strong>to</strong> two different measures.<br />

For f ∈ Cb(X ) <strong>an</strong>d µ ∈ M(X ), let 〈µ, f〉 = f dµ. Then, M(X ) <strong>an</strong>d<br />

Cb(X ) are in duality <strong>an</strong>d c<strong>on</strong>vergence of probability measures in <strong>to</strong>pology<br />

σ(M(X ), Cb(X )) corresp<strong>on</strong>ds <strong>to</strong> weak c<strong>on</strong>vergence, as defined above. By<br />

Exercise 6.1 this weak <strong>to</strong>pology is Hausdorff.<br />

This noti<strong>on</strong> of weak c<strong>on</strong>vergence has other useful characterizati<strong>on</strong>s.<br />

Definiti<strong>on</strong> A.2. A family of bounded measurable functi<strong>on</strong>s {gi} determines<br />

weak c<strong>on</strong>vergence <strong>on</strong> M1(X ) if weak c<strong>on</strong>vergence of µn <strong>to</strong> µ is equivalent <strong>to</strong><br />

c<strong>on</strong>vergence of gi dµn <strong>to</strong> gi dµ for all i.<br />

∗ Exercise A.3. (Portm<strong>an</strong>teau theorem) Let µn be a sequence of probability<br />

measures. Prove that the following are equivalent.<br />

167


168 A. Topics from probability<br />

(a) µn c<strong>on</strong>verge weakly <strong>to</strong> µ.<br />

(b) f dµn → f dµ for all f ∈ U b, e d (X ), where U b, e d (X ) is the space<br />

of bounded uniformly c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> (X , d) <strong>an</strong>d d is <strong>an</strong>y<br />

metric that defines <strong>on</strong> X the same <strong>to</strong>pology as d.<br />

(c) limn→∞ µn(F ) ≤ µ(F ) for closed sets F .<br />

(d) lim n→∞ µn(G) ≥ µ(G) for open sets G.<br />

(e) limn→∞ µn(A) = µ(A) for c<strong>on</strong>tinuity sets of µ; i.e. measurable sets<br />

A such that µ(A A ◦ ) = 0.<br />

Hint: (a) ⇒ (b) is trivial. Use x ↦→ 1/(1 + d(x, F )) k <strong>to</strong> prove that (b) ⇒<br />

(c). Then prove that (c) ⇔ (d) <strong>an</strong>d that (c) <strong>an</strong>d (d) ⇒ (e). For fixed ε > 0<br />

choose {ai} N i=1 so that µ{f = ai} = 0, − sup |f| − 1 = a0 < a1 < · · · <<br />

aN−1 < aN = sup |f| + 1, <strong>an</strong>d ai − ai−1 < ε. Now prove that (e) ⇒ (a).<br />

Unless X is finite, <strong>on</strong>e c<strong>an</strong>not metrize M(X ).<br />

Exercise A.4. Let U <strong>an</strong>d V be two vec<strong>to</strong>r spaces in duality <strong>an</strong>d let them<br />

induce weak <strong>to</strong>pologies <strong>on</strong> each other.<br />

(a) Prove that if U is metrizable, V must have a countable algebraic<br />

basis.<br />

Hint: If U is metrizable, there is a countable collecti<strong>on</strong> Bk of<br />

neighborhoods around 0, of the form in part (a) of Propositi<strong>on</strong> 5.5,<br />

such that every open neighborhood of 0 c<strong>on</strong>tains some Bk. Let<br />

{vk} be the entire, at most countable, collecti<strong>on</strong> of vec<strong>to</strong>rs in V<br />

appearing in the expressi<strong>on</strong>s of the Bk’s. Show that each v ∈ V is<br />

a linear combinati<strong>on</strong> of finitely m<strong>an</strong>y vk’s. You may need <strong>to</strong> use<br />

Lemma 3.9 of [33].<br />

(b) Prove that M(R) <strong>an</strong>d M(N) are not metrizable.<br />

Now, if X is separable, then it is homeomorphic <strong>to</strong> a subset of a compact<br />

metric space <strong>an</strong>d thus there exists a metric d that defines the same <strong>to</strong>pology<br />

<strong>on</strong> X <strong>an</strong>d under which X is <strong>to</strong>tally bounded. Its completi<strong>on</strong> under this<br />

metric is then compact <strong>an</strong>d U b, e d (X ) is separable under the sup norm; see<br />

Lemmas 6.1-6.3 in [30] <strong>an</strong>d Theorem Theorem 2.8.2 in [14]. Thus, there<br />

exists a dense (in the sup norm) countable set {gn} ⊂ U b, e d (X ). By (b) in<br />

the portm<strong>an</strong>teau theorem, weak c<strong>on</strong>vergence in M1(X ) is then metrized by<br />

the metric<br />

(A.1)<br />

δ(µ, ν) = <br />

1<br />

2<br />

n≥1<br />

n gn∞ <br />

<br />

<br />

<br />

gn dµ −<br />

gn dν<br />

∗ Exercise A.5. Let X be a metric space. Let {gk} be <strong>an</strong>y countable collecti<strong>on</strong><br />

of Cb(X )-functi<strong>on</strong>s that determines weak c<strong>on</strong>vergence <strong>on</strong> M1(X ).<br />

<br />

<br />

.


A.1. Weak c<strong>on</strong>vergence of probability measures 169<br />

Prove that <strong>on</strong> M1(X ) the weak <strong>to</strong>pology generated by Cb(X ) coincides <strong>with</strong><br />

the <strong>to</strong>pology given by the metric δ defined in (A.1).<br />

Hint: Recall part (a) of Propositi<strong>on</strong> 5.5 <strong>an</strong>d use the equivalence between<br />

(a) <strong>an</strong>d (b) in the portm<strong>an</strong>teau theorem.<br />

The above exercise shows that δ metrizes the weak <strong>to</strong>pology <strong>on</strong> M1(X )<br />

generated by Cb(X ). One, however, has other metrics that are defined <strong>with</strong>out<br />

reference <strong>to</strong> some unknown set {gn}. Define the Prohorov metric ρ <strong>on</strong><br />

M1(X ) by<br />

(A.2)<br />

ρ(µ, ν) = inf{ε > 0 : µ(F ) ≤ ν(F ε ) + ε for all closed F ⊂ X }.<br />

Here, F ε = {x : d(x, y) < ε for some y ∈ F }.<br />

Exercise A.6. Show that ρ is a metric.<br />

Exercise A.7. Assuming that X is separable, show that ρ metrizes the<br />

<strong>to</strong>pology <strong>on</strong> M1(X ) induced by the weak <strong>to</strong>pology σ(M(X ), Cb(X )).<br />

Hint: Since the weak <strong>to</strong>pology is metrizable, it is enough <strong>to</strong> prove that<br />

ρ(µn, µ) → 0 is equivalent <strong>to</strong> µn → µ.<br />

Exercise A.8. Let X = Z. Show that the Prohorov metric is given by<br />

ρ(µ, ν) = 1 <br />

|µ(k) − ν(k)|.<br />

2<br />

k∈Z<br />

Once <strong>on</strong>e has a <strong>to</strong>pology, <strong>on</strong>e c<strong>an</strong> ask when a set is compact. Just <strong>to</strong><br />

get a feeling for what will come next, c<strong>on</strong>sider the case X = R.<br />

Exercise A.9. Let µn be a sequence of probability measures <strong>on</strong> R. Prove<br />

that the following are equivalent.<br />

(a) µn has a weakly c<strong>on</strong>vergent subsequence c<strong>on</strong>verging <strong>to</strong> a probability<br />

measure;<br />

(b) for all ε > 0 there exist a < b such that sup n µn{[a, b]} > 1 − ε.<br />

This necessary c<strong>on</strong>diti<strong>on</strong> is in fact true more generally.<br />

Definiti<strong>on</strong> A.10. Let X be a <strong>to</strong>pological space. A family of probability<br />

measures A is tight if for every ε > 0 there exists a compact subset K ⊂ X<br />

such that sup ν∈A ν(K c ) < ε.<br />

The following is a characterizati<strong>on</strong> of compactness of a family of probability<br />

measures. See Theorems 6.1 <strong>an</strong>d 6.2 of [6].<br />

Prohorov’s theorem. Let X be a metric space. If a family of probability<br />

measures is tight, then it is relatively sequentially compact (i.e. every<br />

sequence has a weakly c<strong>on</strong>vergent subsequence). If X is separable <strong>an</strong>d complete,<br />

then the c<strong>on</strong>verse is also true.


170 A. Topics from probability<br />

Note that unless the tight family is closed the limit point provided by<br />

the theorem above does not necessarily bel<strong>on</strong>g <strong>to</strong> the same family.<br />

A special case of Prohorov’s theorem is when A is a singlet<strong>on</strong>.<br />

∗ Exercise A.11. (Ulam’s theorem) Let X be separable <strong>an</strong>d complete.<br />

Prove, <strong>with</strong>out using Prohorov’s theorem, that for <strong>an</strong>y probability measure<br />

λ ∈ M1(X ) <strong>an</strong>d for <strong>an</strong>y ε > 0 there exists a compact set K ⊂ X such<br />

that λ(K c ) < ε.<br />

Hint: By separability, there is a countable number of 1/n spheres (Ai,n)i≥1<br />

covering X . Choose in such that λ{∪ in<br />

i=1 Ai,n} > 1 − ε/2 n . The closure of<br />

∩n≥1 ∪ in<br />

i=1 Ai,n does the job.<br />

The next theorem c<strong>on</strong>cerns M1(X ) being a Polish space (i.e. separable<br />

<strong>an</strong>d complete); see Theorem 6.5 in [30].<br />

Theorem A.12. Suppose (X , d) is a complete separable metric space. Then<br />

so is (M1(X ), ρ), the space of Borel probability measures <strong>on</strong> X <strong>with</strong> the<br />

Prohorov metric defined by (A.2).<br />

A.2. Ergodic theorem<br />

Let (Ω, F , P ) be a probability space, d ≥ 1, <strong>an</strong>d (θi) i∈Z d a group of measurable<br />

P -preserving tr<strong>an</strong>sformati<strong>on</strong>s of Ω such that θ0 is the identity <strong>an</strong>d<br />

θi ◦ θj = θi+j for all i, j ∈ Z d . Let I be the σ-algebra of all shift-invari<strong>an</strong>t<br />

events. (A is shift invari<strong>an</strong>t if θiA = A for all i ∈ Z d .)<br />

Exercise A.13. Prove that f : Ω → R is I-measurable if <strong>an</strong>d <strong>on</strong>ly if<br />

f ◦ θi = f for all i ∈ Z d .<br />

Let Mθ(Ω) be the space of all shift-invari<strong>an</strong>t probability measures <strong>on</strong><br />

(Ω, F ). Note that the example θk(ω) = ω + k <strong>on</strong> Ω = Z shows that this<br />

space could be empty.<br />

Definiti<strong>on</strong> A.14. P ∈ Mθ(Ω) is said <strong>to</strong> be ergodic if P (A) ∈ {0, 1} for all<br />

I-measurable sets A.<br />

Let (Vn)n≥1 be <strong>an</strong>y increasing sequence of cubes in Z d such that |Vn| →<br />

∞ as n → ∞. For ω ∈ Ω let<br />

<br />

−1<br />

Rn(ω) = |Vn|<br />

i∈Vn<br />

δθiω.<br />

Multidimensi<strong>on</strong>al ergodic theorem. Setting as above. Let P ∈ Mθ(Ω).<br />

Then, for all p ≥ 1 <strong>an</strong>d f ∈ L p (P ), E Rn(ω) [f] → E[f | I] in L p (P ) <strong>an</strong>d<br />

P -a.s. When P is ergodic, the limit E[f | I] = E[f].


A.2. Ergodic theorem 171<br />

To illuminate the theorem somewhat, note that for f ∈ L2 (P, F ) the<br />

c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> E[f | I] is the orthog<strong>on</strong>al projecti<strong>on</strong> of f <strong>on</strong><strong>to</strong> the<br />

space L2 (P, I). Thus the L2 versi<strong>on</strong> of the theorem says that the limit of<br />

|Vn|<br />

−1 <br />

i∈Vn f ◦ θi is the orthog<strong>on</strong>al projecti<strong>on</strong> of f <strong>on</strong><strong>to</strong> the space of fixed<br />

points of the tr<strong>an</strong>sformati<strong>on</strong>s (θi). The L p versi<strong>on</strong> is obtained from this by<br />

a density argument. The almost sure c<strong>on</strong>vergence is the hardest part. See<br />

Appendix 14.A of [22] for the proof of the theorem.<br />

A corollary of the ergodic theorem is that distinct ergodic measures are<br />

supported <strong>on</strong> disjoint sets.<br />

∗ Exercise A.15. Prove that if P1 <strong>an</strong>d P2 are two different ergodic measures,<br />

then they are mutually singular; i.e. there exists a measurable set A such<br />

that P1(A) = 0 <strong>an</strong>d P2(A) = 1.<br />

Now, Mθ(Ω) is clearly a c<strong>on</strong>vex set. By definiti<strong>on</strong>, its extreme points<br />

are measures P ∈ Mθ(Ω) such that P = tP1 + (1 − t)P2 <strong>with</strong> t ∈ (0, 1) <strong>an</strong>d<br />

P1, P2 ∈ Mθ(Ω) implies P1 = P2 = P . It turns out that the extreme points<br />

are precisely the ergodic measures. This is again a corollary of the ergodic<br />

theorem.<br />

∗ Exercise A.16. A probability measure P ∈ Mθ(Ω) is ergodic if, <strong>an</strong>d <strong>on</strong>ly<br />

if, it is <strong>an</strong> extreme point of Mθ(Ω).<br />

Hint: Use the above exercise.<br />

Having a c<strong>on</strong>vex set it is natural <strong>to</strong> ask if there are enough extreme<br />

points <strong>to</strong> recover the whole c<strong>on</strong>vex set by taking weighted averages. This<br />

is indeed the case <strong>with</strong> the space Mθ(Ω) if, for example, (Ω, F ) is a Polish<br />

space <strong>with</strong> its Borel sets.<br />

Ergodic decompositi<strong>on</strong> theorem. Setting as above. Assume additi<strong>on</strong>ally<br />

that Ω is Polish <strong>an</strong>d F is its Borel σ-algebra. Then, for <strong>an</strong>y shift-invari<strong>an</strong>t<br />

measure P ∈ Mθ(Ω), there is a unique probability measure µP supported <strong>on</strong><br />

the set Me(Ω) of ergodic measures such that<br />

<br />

P = Q µP (dQ).<br />

Me(Ω)<br />

The proof of the above theorem uses the existence of a c<strong>on</strong>diti<strong>on</strong>al probability<br />

Pω of P given I (see Example 8.5 for the definiti<strong>on</strong> of c<strong>on</strong>diti<strong>on</strong>al<br />

probability). The idea is <strong>to</strong> show that if P ∈ Mθ(Ω), then for P -a.e. choice<br />

of ω, Pω is ergodic, <strong>an</strong>d <strong>on</strong>e c<strong>an</strong> define µP as the distributi<strong>on</strong> induced by<br />

the map ω ↦→ Pω. See Exercises A.25 <strong>an</strong>d A.26 at the end of this secti<strong>on</strong> for<br />

the details.<br />

For exercises A.17-A.24 we specialize <strong>to</strong> the following setting. (X , B) is<br />

a measurable space <strong>an</strong>d Ω = X Zd<br />

is endowed <strong>with</strong> the product σ-algebra F .


172 A. Topics from probability<br />

FV is the σ-algebra generated by {Xi : i ∈ V }. Vn is <strong>an</strong> increasing sequence<br />

of cubes exhausting Z d . The tail σ-algebra is T = ∩nFV c n .<br />

∗ Exercise A.17. (Kolmogorov’s 0-1 law) Let P be a product probability<br />

measure <strong>on</strong> (Ω, F ); i.e. P = ⊗ i∈Z dλi, <strong>with</strong> λi ∈ M1(X ). Prove that P (A) ∈<br />

{0, 1} for <strong>an</strong>y T -measurable set A.<br />

Hint: Observe that A is independent of FVn, for all n ≥ 1. Now prove that<br />

the class of sets that are independent of A is a m<strong>on</strong>ot<strong>on</strong>e class. C<strong>on</strong>clude<br />

that A is independent of itself.<br />

∗ Exercise A.18. Let P be a shift-invari<strong>an</strong>t probability measure <strong>on</strong> Ω. Show<br />

that if A is I-measurable, then there exists a set B that is T -measurable<br />

<strong>an</strong>d such that P (A∆B) = 0.<br />

Hint: For each k ∈ N find Bk ∈ FV m(k) such that P (A∆Bk) < 2 −k ; see<br />

Exercise 18 <strong>on</strong> page 32 of Foll<strong>an</strong>d’s textbook [18] for why such a Bk exists.<br />

Check that B = ∪m≥n ∩k≥m θ −k−m(k)Bk works. In particular, note that B<br />

is independent of n.<br />

Exercise A.19. Let λ be a probability measure <strong>on</strong> X . Prove that λ⊗Zd is<br />

ergodic for the shifts (θj) j∈Zd. In the next few exercises the space X will be a Polish space.<br />

Exercise A.20. Suppose (X , d) is a separable metric space. Let Ω = X Zd<br />

<strong>an</strong>d let f : Ω → R be a measurable functi<strong>on</strong>. Prove that if f is c<strong>on</strong>tinuous<br />

I-measurable, then it is c<strong>on</strong>st<strong>an</strong>t.<br />

Hint: Let {xk} be a dense set in X . Find ˜ω ∈ Ω such that for <strong>an</strong>y n <strong>an</strong>d<br />

<strong>an</strong>y ω ∈ Ω such that ωi ∈ {xk} for all i ∈ Vn, there exists a j such that<br />

˜ωj+Vn = ωVn. Prove that {θi˜ω : i ∈ Z d } is dense.<br />

Exercise A.21. Prove that even though F is countably generated, I itself<br />

is not.<br />

Hint: Pick x = y ∈ X <strong>an</strong>d let P = λ⊗Zd, <strong>with</strong> λ = (δx + δy)/2. If I<br />

is generated by (Ak)k∈N, then let Bk = Ak if P (Ak) = 1 <strong>an</strong>d Bk = Ac k if<br />

P (Ak) = 0. Check that C = ∩Bk = {θiω : i ∈ Zd } for some ω ∈ Ω. Show<br />

that P (C) should then be 0 <strong>an</strong>d 1 at the same time.<br />

Since distinct ergodic measures have disjoint supports, <strong>on</strong>e c<strong>an</strong> actually<br />

define a universal versi<strong>on</strong> of the c<strong>on</strong>diti<strong>on</strong>al probability Pω (given I)<br />

simult<strong>an</strong>eously for all measures P ∈ Mθ(Ω).<br />

Exercise A.22. Let X be a Polish space <strong>an</strong>d define the set Ω0 = {ω :<br />

κ(ω) = limn→∞ Rn(ω) exists}. For ω ∈ Ω0 define κ(ω) = Q for some fixed<br />

Q ∈ Mθ(Ω). Prove that Ω0 is a Borel set <strong>an</strong>d that κ : Ω → Mθ(Ω) is


A.2. Ergodic theorem 173<br />

measurable. Moreover, κ ◦ θi = κ for all i ∈ Z d <strong>an</strong>d for <strong>an</strong>y bounded<br />

measurable functi<strong>on</strong> f <strong>an</strong>d bounded I-measurable functi<strong>on</strong> g,<br />

(a) E κ(ω) [f] is I-measurable, <strong>an</strong>d<br />

(b) E P [gf] = g(ω)E κ(ω) [f]P (dω), for <strong>an</strong>y P ∈ Mθ(Ω).<br />

In other words, κ(ω) is a versi<strong>on</strong> of Pω, for <strong>an</strong>y shift-invari<strong>an</strong>t P .<br />

Exercise A.23. Prove that the universal kernel κ from Exercise A.22 is<br />

T -measurable.<br />

Exercise A.24. Let P = λ⊗Zd for λ ∈ M1(X ). Let ˜ h(ω) = h(κ(ω) | P ),<br />

where h is the specific relative entropy introduced in Secti<strong>on</strong> 7.2. Prove that<br />

h(ν | P ) = h˜ dν, for all ν ∈ Mθ(Ω).<br />

Hint: Use the affinity <strong>an</strong>d lower semic<strong>on</strong>tinuity of h <strong>to</strong>gether <strong>with</strong> the<br />

ergodic decompositi<strong>on</strong> ν = κ dν. If this is <strong>to</strong>o difficult, Lemma 5.4.24 in<br />

[9] gives a more general result.<br />

The next exercise develops <strong>an</strong> abstract approach <strong>to</strong> the ergodic decompositi<strong>on</strong><br />

from article [16]. The virtue of this approach is that it c<strong>an</strong> be applied<br />

<strong>to</strong> other examples <strong>to</strong>o as illustrated by exercises. The key hypothesis is that<br />

a collecti<strong>on</strong> of measures has a comm<strong>on</strong> kernel for c<strong>on</strong>diti<strong>on</strong>al probabilities<br />

relative <strong>to</strong> some σ-algebra. Exercise A.22 showed that such a kernel c<strong>an</strong> be<br />

defined for shift-invari<strong>an</strong>t measures. Exercise A.27 below shows the same<br />

for exch<strong>an</strong>geable measures, <strong>an</strong>d Exercises 8.23 <strong>an</strong>d 8.24 in Secti<strong>on</strong> 8.5 apply<br />

Exercise A.25 <strong>to</strong> <strong>Gibbs</strong> measures.<br />

∗ Exercise A.25. Let (Ω, F ) be a measurable space <strong>an</strong>d M1 the space of<br />

probability measures <strong>on</strong> (Ω, F ). Equip M1 <strong>with</strong> the smallest σ-algebra H<br />

under which all functi<strong>on</strong>s µ ↦→ g dµ are measurable for g ∈ bF . Fix a<br />

n<strong>on</strong>empty measurable subset P of M1 <strong>an</strong>d a sub-σ-algebra A of F . Define<br />

the subset of A -ergodic measures by<br />

Make two assumpti<strong>on</strong>s.<br />

(A.3)<br />

Pe = {µ ∈ P : µ(A) = 0 or 1 ∀A ∈ A }.<br />

(i) There exists a countable family W ⊂ bF of bounded measurable<br />

functi<strong>on</strong>s that distinguishes elements of P: if µ, ν ∈ P <strong>an</strong>d g dµ =<br />

g dν for all g ∈ W then µ = ν.<br />

(ii) There exists a s<strong>to</strong>chastic kernel κ from (Ω, A ) in<strong>to</strong> (Ω, F ) <strong>with</strong><br />

these properties:<br />

κ(ω, · ) ∈ P for all ω ∈ Ω.<br />

For every µ ∈ P <strong>an</strong>d B ∈ F , κ(ω, B) is a versi<strong>on</strong> of µ(B | A )(ω).<br />

(a) Let µ ∈ P. Show that statements (i)–(iv) are equivalent.


174 A. Topics from probability<br />

(i) µ ∈ Pe.<br />

(ii) Eκ(ω) [g] = E µ [g] for µ-a.e. ω, for all g ∈ W .<br />

(iii) Φg(µ) = 0 for all g ∈ W , where<br />

<br />

E κ(ω) µ <br />

Φg(µ) = [g] − E [g] µ(dω) =<br />

2<br />

(iv) µ{ω : κ(ω) = µ} = 1.<br />

Hint: (i)=⇒(ii)=⇒(iii)=⇒(iv)=⇒(i).<br />

E κ(ω) [g] 2 µ(dω) − E µ [g] 2 .<br />

(b) Establish these statements.<br />

(i) Pe is a measurable subset of M1, <strong>an</strong>d a subset of the image<br />

{κ(ω) : ω ∈ Ω}.<br />

(ii) For <strong>an</strong>y µ ∈ P, µ{ω : κ(ω) ∈ Pe} = 1.<br />

(iii) Pe is not empty.<br />

Hint: Pe is the intersecti<strong>on</strong> of the sets {Φg = 0}. By c<strong>on</strong>diti<strong>on</strong>ing<br />

<strong>on</strong> A derive Φg(κ(ω)) µ(dω) = 0.<br />

(c) Show that for every µ ∈ P there exists a unique probability measure<br />

Qµ <strong>on</strong> Pe such that<br />

<br />

µ(B) =<br />

Pe<br />

ν(B) Qµ(dν) ∀B ∈ F .<br />

Hint: Let Qµ(D) = µ{ω : κ(ω) ∈ D}, after adjusting κ(ω) so that<br />

all its values are in Pe. For uniqueness, use point (iv) of part (a)<br />

<strong>to</strong> show that if Qµ gives the decompositi<strong>on</strong> then it must be the<br />

distributi<strong>on</strong> of κ(ω) under µ.<br />

∗ Exercise A.26. (Ergodic decompositi<strong>on</strong>) Apply Exercise A.25 <strong>to</strong> deduce<br />

the ergodic decompositi<strong>on</strong>. Define the kernel κ(ω) as in Exercise A.22.<br />

∗ Exercise A.27. (de Finetti’s theorem) Let S be a Polish space <strong>an</strong>d Ω =<br />

S N the space of S-valued sequences <strong>with</strong> its product σ-algebra F . Let<br />

Sn denote the group of permutati<strong>on</strong>s (bijective mappings) <strong>on</strong> {1, 2, . . . , n}.<br />

Permutati<strong>on</strong>s π ∈ Sn act <strong>on</strong> Ω in the obvious way: (πω)i = ω π(i) where we<br />

take π(i) = i for i > n. Define σ-algebras<br />

<strong>an</strong>d the exch<strong>an</strong>geable σ-algebra<br />

En = {A ∈ F : π −1 A = A ∀π ∈ Sn}<br />

E = <br />

En.<br />

n≥1<br />

A probability measure µ ∈ M1(Ω) is exch<strong>an</strong>geable if it is invari<strong>an</strong>t under<br />

all finite permutati<strong>on</strong>s.<br />

The goal here is <strong>to</strong> apply Exercise A.25 <strong>to</strong> prove the following.


A.3. S<strong>to</strong>chastic ordering 175<br />

de Finetti’s theorem. Fix <strong>an</strong> exch<strong>an</strong>geable measure µ. Then, c<strong>on</strong>diti<strong>on</strong>al<br />

<strong>on</strong> E , the coordinates Xi(ω) = ωi are i.i.d. under µ, <strong>an</strong>d µ c<strong>an</strong> be decomposed<br />

in<strong>to</strong> a mixture of i.i.d. measures.<br />

Define the empirical measures<br />

An(ω) = 1<br />

n!<br />

<br />

π∈Sn<br />

δπω.<br />

(a) Show that E An(ω) [f] = E µ [f | En](ω) <strong>an</strong>d c<strong>on</strong>sequently, by the backwards-martingale<br />

c<strong>on</strong>vergence theorem (Theorem 6.1 <strong>on</strong> page 265<br />

of [15] or page 155 of [26]), E An [f] → E µ [f | E ] µ-a.s. Show that<br />

out of these limits we c<strong>an</strong> c<strong>on</strong>struct a kernel that satisfies (A.3).<br />

(b) Let f be a bounded functi<strong>on</strong> <strong>on</strong> S k <strong>an</strong>d g a bounded functi<strong>on</strong> <strong>on</strong><br />

S. Show that<br />

E An f(X1, . . . , Xk)g(Xk+1) − E An f(X1, . . . , Xk) E An g(X1) = O(n −1 ).<br />

Show that <strong>an</strong>y limit point of An is <strong>an</strong> i.i.d. product measure.<br />

(c) Show that E is trivial under <strong>an</strong> exch<strong>an</strong>geable measure µ if <strong>an</strong>d <strong>on</strong>ly<br />

if µ is <strong>an</strong> i.i.d. product measure. Note that <strong>on</strong>e directi<strong>on</strong> here is the<br />

Hewitt-Savage 0-1 law. Complete the proof of de Finetti’s theorem.<br />

A.3. S<strong>to</strong>chastic ordering<br />

Strassen’s lemma. Let Ω = {−1, 1} Zd.<br />

Let µ, ν ∈ M1(Ω) be such that<br />

µ ≤ ν. Assume that µ{σi = 1} = ν{σi = 1} for all i ∈ Zd . Then µ = ν.<br />

Proof. We will prove the following claim for all N ≥ 1. Recall that ΩΛ =<br />

{−1, 1} Λ .<br />

Claim A.28. For <strong>an</strong>y Λ ⊂ Z d <strong>with</strong> |Λ| = N <strong>an</strong>d µ, ν ∈ M1(ΩΛ), if µ ≤ ν<br />

<strong>an</strong>d µ(σi = 1) = ν(σi = 1) for all i ∈ Λ, then µ = ν.<br />

This will prove that the marginals of the µ <strong>an</strong>d ν in the statement of<br />

the lemma coincide <strong>an</strong>d thus µ = ν.<br />

To prove the claim we will use inducti<strong>on</strong> over N. The case N = 1 is<br />

obvious. The case N = 2 c<strong>an</strong> be d<strong>on</strong>e by direct computati<strong>on</strong>. Indeed, if<br />

Λ = {i, j}, then observe that 1I{σi = 1 or σj = 1} is increasing. Thus,<br />

µ{σi = −1, σj = 1} + µ{σi = 1, σj = 1} + µ{σi = 1, σj = −1}<br />

≤ ν{σi = −1, σj = 1} + ν{σi = 1, σj = 1} + ν{σi = 1, σj = −1}.<br />

But the sums of the first two terms <strong>on</strong> each side are equal. Thus,<br />

µ{σi = 1, σj = −1} ≤ ν{σi = 1, σj = −1}.


176 A. Topics from probability<br />

Since 1I{σi = 1, σj = 1} is increasing we also have<br />

µ{σi = 1, σj = 1} ≤ ν{σi = 1, σj = 1}.<br />

The sum of the left-h<strong>an</strong>d-sides of the above two inequalities equals that of the<br />

right-h<strong>an</strong>d-sides. Thus, these must actually be equalities. Interch<strong>an</strong>ging the<br />

roles of i <strong>an</strong>d j also proves that µ{σi = −1, σj = 1} = ν{σi = −1, σj = 1}.<br />

Then µ{σi = −1, σj = −1} = ν{σi = −1, σj = −1} follows, proving the<br />

claim when N = 2.<br />

Now assume the claim is true for N spins, <strong>with</strong> N ≥ 2. C<strong>on</strong>sider measures<br />

<strong>on</strong> ΩΛ <strong>with</strong> |Λ| = N + 1. By Exercise 10.12, if µ <strong>an</strong>d ν agree <strong>on</strong> all<br />

events of the type K(A) = {σi = 1 ∀i ∈ A}, A ⊂ Λ, then µ = ν. So suppose<br />

they do not agree <strong>on</strong> all such events. Since their N-spin marginals do agree,<br />

the <strong>on</strong>ly case of possible disagreement is A = Λ.<br />

So suppose there is ε > 0 such that ν{K(Λ)} = µ{K(Λ)} + ε. (Since<br />

µ ≤ ν the difference must go this way.)<br />

Again since the marginals <strong>on</strong> N spins agree, it must be that for <strong>an</strong>y<br />

i ∈ Λ, µ{K(Λ {i}), σi = −1} = ν{K(Λ {i}), σi = −1} + ε.<br />

However, if now we take i, j ∈ Λ, <strong>with</strong> i = j, <strong>an</strong>d set<br />

B = K(Λ {i, j}) ∩ ({σi = 1} ∪ {σj = 1}),<br />

then from above µ(B) = ν(B) + 2ε − ε = ν(B) + ε. This is a c<strong>on</strong>tradicti<strong>on</strong><br />

<strong>with</strong> µ ≤ ν because 1IB is <strong>an</strong> increasing functi<strong>on</strong>.


Topics from <strong>an</strong>alysis<br />

Ultimately, this will be <strong>an</strong> appendix <strong>on</strong> general <strong>an</strong>alysis stuff.<br />

B.1. Measure-theoretic lemma<br />

Appendix B<br />

Lemma B.1. Let X be a metric space <strong>an</strong>d let H be a class of bounded<br />

functi<strong>on</strong>s that c<strong>on</strong>tains the space Ub(X ) of bounded uniformly c<strong>on</strong>tinuous<br />

functi<strong>on</strong>s <strong>an</strong>d is closed under uniformly bounded pointwise c<strong>on</strong>vergence (i.e.<br />

fn ∈ H for all n, maxn sup x |fn(x)| < ∞, <strong>an</strong>d fn(x) → f(x) for all x ∈ X<br />

implies f ∈ H). Then bB ⊂ H.<br />

For this we need <strong>an</strong>other technical lemma.<br />

Lemma B.2. Let X be a metric space <strong>an</strong>d let H be a class of bounded<br />

functi<strong>on</strong>s that c<strong>on</strong>tains the space Ub(X ) of bounded uniformly c<strong>on</strong>tinuous<br />

functi<strong>on</strong>s <strong>an</strong>d is closed under uniformly bounded pointwise c<strong>on</strong>vergence. Fix<br />

<strong>an</strong> arbitrary functi<strong>on</strong> g. Suppose g + f ∈ H for all f ∈ Ub(X ). Then<br />

g + α1IA + f ∈ H for all real α, all f ∈ Ub(X ), <strong>an</strong>d all Borel sets A.<br />

Proof. Let<br />

C = {A ⊂ X : g + α1IA + f ∈ H, ∀α ∈ R, ∀f ∈ Ub(X )}.<br />

C c<strong>on</strong>tains the algebra<br />

A = {A ⊂ X : ∃fn ∈ Ub(X ) uniformly bounded : 1IA = lim<br />

n→∞ fn}<br />

because if A ∈ A then<br />

g + α1IA + f = lim<br />

n→∞ (g + αfn + f) ∈ H<br />

177


178 B. Topics from <strong>an</strong>alysis<br />

by the hypothesis <strong>an</strong>d closedness under uniformly bounded pointwise limits.<br />

C is also a m<strong>on</strong>ot<strong>on</strong>e class because if Ak ∈ C <strong>an</strong>d 1IAk → 1IA then<br />

g + α1IA + f = lim (g + α1IAk + f) ∈ H.<br />

k→∞<br />

By the m<strong>on</strong>ot<strong>on</strong>e class theorem (see page of 30 of [26]) C c<strong>on</strong>tains the<br />

σ-algebra generated by A. This is all of B because A c<strong>on</strong>tains all open<br />

<strong>an</strong>d closed sets. (Recall that if A is closed, 1IA is approximated by fn(x) =<br />

(1 + nd(x, A)) −1 , where d is the metric <strong>on</strong> X .) <br />

Proof of Lemma B.1. The hypothesis of the above lemma is true for g =<br />

0. C<strong>on</strong>sequently α1IA + f ∈ H for all f ∈ Ub(X ), α ∈ R, <strong>an</strong>d A measurable.<br />

Suppose we have shown that H c<strong>on</strong>tains all functi<strong>on</strong>s of the form h + f<br />

where h is a simple functi<strong>on</strong> <strong>with</strong> at most n terms <strong>an</strong>d f ∈ Ub(X ). (The case<br />

n = 1 has been verified in the previous paragraph.) Then the above lemma<br />

shows again that H c<strong>on</strong>tains also all h + f where h is a simple functi<strong>on</strong> <strong>with</strong><br />

at most n + 1 terms <strong>an</strong>d f ∈ Ub(X ). By taking f = 0 we get that all simple<br />

functi<strong>on</strong>s are in H. Since bounded measurable functi<strong>on</strong>s are uniform limits<br />

of simple <strong>on</strong>es, H c<strong>on</strong>tains all bounded measurable functi<strong>on</strong>s. <br />

B.2. Minimax theorem<br />

In this secti<strong>on</strong> we will prove a theorem of König [27] following the proof<br />

given by Kassay [25].<br />

Let X <strong>an</strong>d Y be n<strong>on</strong>empty sets <strong>an</strong>d f : X × Y → R a given functi<strong>on</strong>.<br />

We say f is uniformly Jensen-c<strong>on</strong>cave-c<strong>on</strong>vex-like if<br />

(B.1)<br />

<strong>an</strong>d<br />

(B.2)<br />

∀y0, y1 ∈ Y, ∃y ∈ Y : ∀x ∈ X , f(x, y) ≤ (f(x, y0) + f(x, y1))/2,<br />

∀x0, x1 ∈ X , ∃x ∈ X : ∀y ∈ Y, f(x, y) ≥ (f(x0, y) + f(x1, y))/2.<br />

Of course, if X <strong>an</strong>d Y are c<strong>on</strong>vex sets <strong>an</strong>d f(x, y) is c<strong>on</strong>cave in x for<br />

each fixed y, <strong>an</strong>d c<strong>on</strong>vex in y for each fixed x, then f is also uniformly<br />

Jensen-c<strong>on</strong>cave-c<strong>on</strong>vex-like.<br />

Let D be the set of dyadic rati<strong>on</strong>al numbers in [0, 1]; i.e. numbers of the<br />

form k/2 n for some integers n ≥ 1 <strong>an</strong>d k ∈ {1, . . . , 2 n }.<br />

Exercise B.3. Show that (B.1) implies that for every y0, y1 ∈ Y <strong>an</strong>d t ∈ D,<br />

there exists yt ∈ Y such that we have for all x ∈ X ,<br />

f(x, yt) ≤ (1 − t)f(x, y0) + tf(x, y1).<br />

Similarly, (B.2) implies that for every x0, x1 ∈ X <strong>an</strong>d t ∈ D there exists<br />

xt ∈ X such that we have for all y ∈ Y,<br />

f(xt, y) ≥ (1 − t)f(x0, y) + tf(x1, y).


B.2. Minimax theorem 179<br />

∗ Exercise B.4. Fix <strong>an</strong> integer k ≥ 2. Show that there exists a dense set<br />

Mk ⊂<br />

<br />

(t1, . . . , tk) ∈ [0, 1] k :<br />

k<br />

i=1<br />

such that for t = (t1, . . . , tk) ∈ Mk the following hold:<br />

<br />

ti = 1<br />

(a) for every x1, . . . , xk ∈ X , there exists xt ∈ X such that for every<br />

y ∈ Y<br />

k<br />

f(xt, y) ≥ tif(xi, y).<br />

i=1<br />

(b) for every y1, . . . , yk ∈ Y, there exists yt ∈ X such that for every<br />

x ∈ X<br />

k<br />

f(x, yt) ≤ tif(x, yi).<br />

i=1<br />

König’s minimax theorem. Suppose X is a compact Hausdorff space <strong>an</strong>d<br />

f : X × Y → R is uniformly Jensen-c<strong>on</strong>cave-c<strong>on</strong>vex-like. Assume f(·, y) :<br />

X → R is upper semic<strong>on</strong>tinuous for every fixed y ∈ Y. Then<br />

sup<br />

x∈X<br />

inf f(x, y) = inf<br />

y∈Y y∈Y sup f(x, y).<br />

x∈X<br />

Proof. One directi<strong>on</strong> is easy. Indeed, it is always true that<br />

sup<br />

x∈X<br />

inf f(x, y) ≤ inf<br />

y∈Y y∈Y sup f(x, y).<br />

The other directi<strong>on</strong> needs the following lemma. Let c + = infy sup x f(x, y)<br />

<strong>an</strong>d for c < c + <strong>an</strong>d y ∈ Y define the (n<strong>on</strong>empty) set<br />

x∈X<br />

Hc,y = {x ∈ X : f(x, y) ≥ c}.<br />

Lemma B.5. Same assumpti<strong>on</strong>s as König’s minimax theorem. Then, for<br />

each c < c + , y1, . . . , yn ∈ Y, we have ∩n Hc,yi i=1 = ∅.<br />

Proof. Fix c < c + <strong>an</strong>d assume ∩n Hc,yi i=1 = ∅. Define the functi<strong>on</strong> h : X →<br />

Rn by<br />

h(x) = (f(x, y1) − c, . . . , f(x, yn) − c)<br />

<strong>an</strong>d set K = [0, ∞) n . Let co(h(X )) be the closure of the c<strong>on</strong>vex hull of<br />

h(X ); i.e. the intersecti<strong>on</strong> of all c<strong>on</strong>vex closed sets c<strong>on</strong>taining h(X ). Let K ◦<br />

be the interior of K.<br />

If co(h(X )) ∩ K ◦ = ∅, then there must exist s1, . . . , sk ∈ [0, 1] <strong>with</strong><br />

k<br />

i=1 si = 1 <strong>an</strong>d x1, . . . , xk ∈ X such that k<br />

i=1 sih(xi) ∈ K ◦ . Recall<br />

Exercise B.4. Choose (t1, . . . , tk) ∈ Mk such that k<br />

i=1 tih(xi) ∈ K. Then,<br />

h(xt) − k<br />

i=1 tih(xi) is also in K, which implies that h(X ) ∩ K = ∅ <strong>an</strong>d<br />

c<strong>on</strong>tradicts the assumpti<strong>on</strong> ∩ n i=1 Hc,yi = ∅. Thus, co(h(X )) ∩ K◦ = ∅.


180 B. Topics from <strong>an</strong>alysis<br />

h₁(χ)<br />

p₂<br />

p₁<br />

t‧u=0<br />

δ‧u=-d<br />

Figure B.1. The hyperpl<strong>an</strong>e separating h1(X ) <strong>an</strong>d K <strong>an</strong>d passing<br />

through the origin.<br />

By the Hahn-B<strong>an</strong>ach separati<strong>on</strong> theorem (page 42), there exists a hyperpl<strong>an</strong>e<br />

which separates co(h(X )) <strong>an</strong>d K◦ ; i.e. there exist γ <strong>an</strong>d α ∈ Rn ,<br />

such that for all x ∈ X <strong>an</strong>d u ∈ K◦ ,<br />

n<br />

n<br />

αiui < γ ≤ αi(f(x, yi) − c).<br />

i=1<br />

i=1<br />

Taking ui → ∞ shows that αi ≤ 0. Taking u → 0 shows that γ ≥ 0. Clearly,<br />

there exists <strong>an</strong> i ≤ n such that αi = 0. Setting δ = α/ n i=1 αi ∈ [0, 1] n <strong>on</strong>e<br />

has n i=1 δi = 1 <strong>an</strong>d n i=1 δif(x, yi) ≤ c.<br />

Now take c < c1 < c + <strong>an</strong>d set d = c1 − c. Then, for all x ∈ X ,<br />

n<br />

δi[f(x, yi) − c1] ≤ −d.<br />

i=1<br />

Hence, if h1(x) = (f(x, y1) − c1, . . . , f(x, yn) − c1), then h1(X ) is separated<br />

from K by the hyperpl<strong>an</strong>e n<br />

i=1 δiui = −d. Since X is compact <strong>an</strong>d f(·, yi)’s<br />

are upper semic<strong>on</strong>tinuous, they are bounded above; see Exercise 2.9. Thus,<br />

there exists p ∈ K ◦ such that h1(X ) ⊂ n<br />

i=1 (−∞, pi]. One then c<strong>an</strong> find<br />

a t ∈ Mn (recall Exercise B.4) such that the hyperpl<strong>an</strong>e n<br />

i=1 tiui = 0 still<br />

separates h1(X ) <strong>an</strong>d K; see Figure B.1. This me<strong>an</strong>s that<br />

n<br />

tif(x, yi) ≤ c1<br />

i=1<br />

for all x ∈ X . C<strong>on</strong>sider now yt ∈ Y such that f(x, yt) ≤ n i=1 tif(x, yi) for<br />

all x ∈ X . Then, c + ≤ supx∈X f(x, yt) ≤ c1. This c<strong>on</strong>tradicts c1 < c + <strong>an</strong>d<br />

proves that ∩n Hc,yi i=1 = ∅.


B.2. Minimax theorem 181<br />

Now, if ∩yHc,y = ∅, then {X Hc,y; y ∈ Y} is a family of open sets<br />

covering X . Since X is compact, it admits a finite covering. But this would<br />

imply the existence of finitely m<strong>an</strong>y sets Hc,y <strong>with</strong> empty intersecti<strong>on</strong>. Thus,<br />

∩yHc,y = ∅ <strong>an</strong>d sup x infy f(x, y) ≥ c. The theorem follows from taking<br />

c ↗ c + .


Inequalities<br />

C.1. Holley’s inequality<br />

Appendix C<br />

Define pointwise minima <strong>an</strong>d maxima by (η∧ξ)i = ηi∧ξi <strong>an</strong>d (η∨ξ)i = ηi∨ξi.<br />

Holley’s inequality. Let ΩΛ = {−1, 1} Λ for some finite Λ ⊂ Z d . Let µ<br />

<strong>an</strong>d ν be strictly positive probability measures <strong>on</strong> ΩΛ. If<br />

(C.1)<br />

for all η <strong>an</strong>d ξ in ΩΛ, then µ ≤ ν.<br />

µ(η ∧ ξ)ν(η ∨ ξ) ≥ µ(η)ν(ξ)<br />

Proof. The idea is <strong>to</strong> couple two Markov chains <strong>on</strong> ΩΛ = {−1, 1} Λ <strong>with</strong><br />

stati<strong>on</strong>ary distributi<strong>on</strong>s µ <strong>an</strong>d ν so that the coupled process is a Markov<br />

chain <strong>on</strong> the space {(η, ξ) ∈ ΩΛ × ΩΛ : η ≤ ξ}.<br />

Let us use the notati<strong>on</strong> η j for the c<strong>on</strong>figurati<strong>on</strong> where the j-th spin in η<br />

is flipped. That is, η j<br />

i = ηi if i = j <strong>an</strong>d η j<br />

j = −ηj. At each step the Markov<br />

chain will choose a site in Λ <strong>an</strong>d flip the η-spin <strong>an</strong>d/or the ξ-spin at that<br />

site. To define the tr<strong>an</strong>siti<strong>on</strong> probabilities set first for each j ∈ Λ<br />

r((η, ξ), (η j , ξ)) = 1 if ηj = −1 <strong>an</strong>d ξj = 1,<br />

r((η, ξ), (η, ξj )) = ν(ξj )<br />

ν(ξ)<br />

r((η, ξ), (η<br />

if ηj = −1 <strong>an</strong>d ξj = 1,<br />

j , ξj )) = 1 if ηj = ξj = −1,<br />

r((η, ξ), (ηj , ξj )) = ν(ξj )<br />

ν(ξ)<br />

r((η, ξ), (η<br />

if ηj = ξj = 1,<br />

j , ξ)) = µ(ηj )<br />

µ(η) − ν(ξj )<br />

ν(ξ)<br />

r((η, ξ), (η<br />

if ηj = ξj = 1, <strong>an</strong>d<br />

′ , ξ ′ )) = 0 otherwise.<br />

183


184 C. Inequalities<br />

Next, let<br />

C = max<br />

(η,ξ)<br />

<br />

(η ′ ,ξ ′ )<br />

<strong>an</strong>d define the tr<strong>an</strong>siti<strong>on</strong> probabilities<br />

when (η, ξ) = (η ′ , ξ ′ ) <strong>an</strong>d<br />

r((η, ξ), (η ′ , ξ ′ )),<br />

p((η, ξ), (η ′ , ξ ′ )) = r((η, ξ), (η ′ , ξ ′ ))/C,<br />

p((η, ξ), (η, ξ)) = 1 − C<br />

<br />

−1<br />

r((η, ξ), (η ′ , ξ ′ )).<br />

The definiti<strong>on</strong> above requires µ(η j )ν(ξ) ≥ µ(η)ν(ξ j ) whenever ηj = ξj =<br />

1 which is ensured by c<strong>on</strong>diti<strong>on</strong> (C.1). These tr<strong>an</strong>siti<strong>on</strong> probabilities preserve<br />

η ≤ ξ.<br />

Exercise C.1. Prove that each of the two coordinates of the above Markov<br />

chain is itself a Markov chain. Show that the tr<strong>an</strong>siti<strong>on</strong> probabilities of the<br />

η-chain are<br />

p(η, η j ) = 1/C if ηj = −1,<br />

p(η, η j ) = µ(ηj )<br />

Cµ(η)<br />

η ′ ,ξ ′<br />

if ηj = 1, <strong>an</strong>d<br />

p(η, η) = 1 − <br />

j p(η, ηj ) for all η.<br />

Also show that the tr<strong>an</strong>siti<strong>on</strong> probabilities of the ξ-chain are<br />

p(ξ → ξ j ) = 1/C if ξj = −1,<br />

p(ξ, ξ j ) = ν(ξj )<br />

Cν(ξ)<br />

if ξj = 1, <strong>an</strong>d<br />

p(ξ, ξ) = 1 − <br />

j p(ξ, ξj ) for all ξ.<br />

Both chains are clearly irreducible, since <strong>on</strong>e c<strong>an</strong> obtain <strong>an</strong>y c<strong>on</strong>figurati<strong>on</strong><br />

from <strong>an</strong>y other <strong>on</strong>e by a finite number of spin flips.<br />

Exercise C.2. Show that µ is the invari<strong>an</strong>t measure for the η-chain (denoted<br />

by η(n)) <strong>an</strong>d ν is the invari<strong>an</strong>t measure for the ξ-chain (denoted by<br />

ξ(n)).<br />

Thus, if f is <strong>an</strong> increasing functi<strong>on</strong> <strong>on</strong> Ω, then<br />

E[f(η(n)) | η(0) = η] = E[f(η(n)) | η(0) = η, ξ(0) = ξ]<br />

≤ E[f(ξ(n)) | η(0) = η, ξ(0) = ξ]<br />

= E[f(ξ(n)) | ξ(0) = ξ],<br />

whenever η ≤ ξ. As n → ∞ the inequality E µ [f] ≤ E ν [f] follows by the<br />

Markov chain c<strong>on</strong>vergence theorem (Theorem 5.5 <strong>on</strong> page of 314 of [15] or<br />

Exercise 6.17).


C.2. Griffiths’ inequality 185<br />

C.2. Griffiths’ inequality<br />

Griffiths’ inequality. Fix a finite volume Λ ⊂ Zd . Let E = {{i, j} : |i−j| =<br />

1, i ∈ Λ} be the set of nearest-neighbor edges of Λ, including edges from Λ<br />

<strong>to</strong> its complement. Define the functi<strong>on</strong> F : [0, ∞) E → [−1, 1] by<br />

<br />

σΛ<br />

F (J) =<br />

σ0<br />

<br />

exp {i,j}∈E Ji,j<br />

<br />

σiσj<br />

<br />

exp {i,j}∈E Ji,j<br />

,<br />

σiσj<br />

σΛ<br />

where J = (Ji,j) {i,j}∈E, the sum runs over σΛ ∈ {−1, 1} Λ , <strong>an</strong>d for j ∈ Λ we<br />

set σj = 1 (i.e. the + boundary c<strong>on</strong>diti<strong>on</strong>). Then,<br />

∂F<br />

∂Ji,j<br />

(J) ≥ 0 ∀{i, j} ∈ E <strong>an</strong>d ∀J ∈ [0, ∞) E .<br />

We start by proving the following lemma.<br />

Lemma C.3. Assumpti<strong>on</strong>s as above. For <strong>an</strong>y J ∈ [0, ∞) E <strong>an</strong>d {k, ℓ} ∈ E,<br />

<br />

<br />

σ0σkσℓ exp<br />

≥ 0.<br />

σΛ<br />

{i,j}∈E<br />

Ji,j σiσj<br />

Proof. The proof goes by a direct computati<strong>on</strong>, exp<strong>an</strong>ding the exp<strong>on</strong>ential.<br />

<br />

<br />

σ0σkσℓ exp<br />

σΛ<br />

= <br />

σΛ<br />

= <br />

n≥0<br />

{i,j}∈E<br />

<br />

σ0σkσℓ<br />

n≥0<br />

1<br />

n!<br />

Ji,j σiσj<br />

1<br />

<br />

n!<br />

<br />

{i,j}∈E<br />

{ik,jk}∈E, k=1,...,n<br />

n <br />

Ji,j σiσj<br />

k=1<br />

Jik,jk<br />

n<br />

<br />

σΛ<br />

σ0σkσℓ<br />

n <br />

k=1<br />

σik σjk<br />

The expressi<strong>on</strong> in square brackets v<strong>an</strong>ishes if some spin inside Λ appears in<br />

the product <strong>an</strong> odd number of times. Otherwise the product c<strong>on</strong>tains <strong>on</strong>ly<br />

squares <strong>an</strong>d (+1)-valued boundary spins <strong>an</strong>d hence is positive. <br />

Proof of Griffiths’ inequality. A direct computati<strong>on</strong> shows that<br />

∂F <br />

<br />

−2<br />

= Z σ0(σkσℓ − τkτℓ) exp<br />

<br />

Ji,j(σiσj + τiτj) ,<br />

∂Jk,ℓ<br />

where Z = <br />

σΛ<br />

σΛ,τΛ<br />

exp <br />

{i,j}∈E Ji,j σiσj<br />

{i,j}∈E<br />

<br />

<strong>an</strong>d, just as for σΛ, τΛ r<strong>an</strong>ges over<br />

{−1, 1} Λ <strong>an</strong>d τi = 1 when i ∈ Λ. Define the c<strong>on</strong>figurati<strong>on</strong> η such that ηi = 1<br />

if σi = τi <strong>an</strong>d −1 if σi = τi. Noting that τi = ηiσi for all i (in particular,<br />

ηi = 1 when i ∈ Λ), a ch<strong>an</strong>ge of variables gives<br />

∂F<br />

∂Jk,ℓ<br />

= Z<br />

−2 <br />

ηΛ<br />

(1 − ηkηℓ) <br />

<br />

σ0σkσℓ exp<br />

σΛ<br />

{i,j}∈E<br />

Ji,j(1 + ηiηj)σiσj<br />

.<br />

<br />

.


186 C. Inequalities<br />

The inequality now follows from the above lemma using the couplings Ji,j(1+<br />

ηiηi) ≥ 0 instead of Ji,j. <br />

C.3. Griffiths-Hurst-Sherm<strong>an</strong> inequality<br />

Griffiths-Hurst-Sherm<strong>an</strong> inequality. C<strong>on</strong>sider the Ising model in d ≥ 1.<br />

Fix β > 0, J > 0, <strong>an</strong>d the volume Vn. Define<br />

Mn(h) = E µ+<br />

<br />

1 <br />

Vn,β,h,J σi .<br />

|Vn|<br />

Then ∂2 Mn<br />

∂h 2 ≤ 0 for h > 0.<br />

First, we need <strong>to</strong> prove two lemmas. Let us abbreviate µ +<br />

n,h = µ+<br />

Vn,β,h,J<br />

<strong>an</strong>d let (µ +<br />

n,h )⊗4 be the product of 4 copies of µ +<br />

n,h . We will denote the<br />

elements of Ω4 Vn by (ωi, σi, ω ′ i , σ′ i )i∈Vn.<br />

Lemma C.4. Same setting as for the Griffiths-Hurst-Sherm<strong>an</strong> inequality.<br />

For i ∈ Vn, let<br />

⎡ ⎤ ⎡<br />

αi<br />

⎢βi⎥<br />

⎢ ⎥<br />

1 ⎢<br />

⎣γi<br />

⎦ = ⎢<br />

2 ⎣<br />

1<br />

1<br />

1<br />

1<br />

1<br />

−1<br />

1<br />

−1<br />

1<br />

⎤ ⎡ ⎤<br />

1 ωi<br />

−1⎥<br />

⎢σi⎥<br />

⎥ ⎢ ⎥<br />

−1⎦<br />

⎣ ⎦<br />

−1 1 1 −1<br />

.<br />

δi<br />

i∈Vn<br />

Let A, B, C, <strong>an</strong>d D be arbitrary subsets of Vn. If h ≥ 0, then<br />

n,h )⊗4 <br />

(C.2)<br />

≥ 0.<br />

E (µ+<br />

i∈A<br />

αi<br />

i∈B<br />

βi<br />

Proof of Lemma C.4. Since the above matrix is orth<strong>on</strong>ormal, it preserves<br />

Euclidi<strong>an</strong> scalar products <strong>an</strong>d thus<br />

i∈C<br />

γi<br />

ω ′ i<br />

σ ′ i<br />

i∈D<br />

ωiωj + σiσj + ω ′ iω ′ j + σ ′ iσ ′ j = αiαj + βiβj + γiγj + δiδj,<br />

for <strong>an</strong>y i, j ∈ Vn. Therefore,<br />

H + n (ω) + H + n (σ) + H + n (ω ′ ) + H + n (σ ′ )<br />

= −J <br />

i,j∈Vn:|i−j|=1<br />

(αiαj + βiβj + γiγj + δiδj) − 2J <br />

δi<br />

i∈Vn,j∈Vn<br />

|i−j|=1<br />

αi − 2h <br />

αi.<br />

Now we proceed similarly <strong>to</strong> the proof of Lemma C.3. Exp<strong>an</strong>d the term<br />

e −β(H+ n (ω)+H + n (σ)+H + n (ω ′ )+H + n (σ ′ )) in<strong>to</strong> its Taylor series in the numera<strong>to</strong>r of the<br />

expectati<strong>on</strong> in (C.2). This leads us <strong>to</strong> sums of products of multiple integrals<br />

of the form <br />

α k i β ℓ i γ m i δ n i λ(dωi)λ(dσi)λ(dω ′ i)λ(dσ ′ i),<br />

multiplied by n<strong>on</strong>negative coefficients.<br />

i∈Vn


C.3. Griffiths-Hurst-Sherm<strong>an</strong> inequality 187<br />

∗Exercise C.5. Check that αiβi = γiδi = (ωiσi−ω ′ iσ′ i )/2 <strong>an</strong>d then c<strong>on</strong>clude<br />

that αiβiγiδi = (ωiσi − ω ′ iσ′ i )2 /4.<br />

∗ Exercise C.6. Prove that the above integral is 0 if k, ℓ, m, <strong>an</strong>d n are not<br />

all of the same parity (i.e. all even or all odd).<br />

By the above exercise we c<strong>an</strong> assume that the powers are all of the same<br />

parity. If the powers are all even the integral is obviously n<strong>on</strong>negative. If the<br />

powers are all odd use Exercise C.5 <strong>to</strong> show that the integral is n<strong>on</strong>negative.<br />

<br />

Lemma C.7. Same setting as for the Griffiths-Hurst-Sherm<strong>an</strong> inequality.<br />

Let i, j, k ∈ Vn be arbitrary. Abbreviate ¯σℓ = E µ+<br />

n,h[σℓ]. If h ≥ 0, then<br />

(C.3)<br />

Proof. Let<br />

E µ+ n,h[(σi − ¯σi)(σj − ¯σj)(σk − ¯σk)] ≤ 0,<br />

⎡<br />

⎢<br />

⎣<br />

ti<br />

qi<br />

t ′ i<br />

q ′ i<br />

⎤<br />

⎥<br />

⎦<br />

= 1<br />

√ 2<br />

⎡<br />

1 1 0<br />

⎤ ⎡<br />

0<br />

⎢<br />

⎢1<br />

⎣0<br />

−1<br />

0<br />

0<br />

1<br />

0⎥<br />

⎢<br />

⎥ ⎢<br />

1⎦<br />

⎣<br />

0 0 1 −1<br />

ωi<br />

σi<br />

ω ′ i<br />

σ ′ i<br />

⎤<br />

⎥<br />

⎦ .<br />

Exercise C.8. Check that the left-h<strong>an</strong>d-side of (C.3) is equal <strong>to</strong><br />

− √ 2E (µ+ n,h )⊗4<br />

[tkq ′ iq ′ j − tkqiqj] = −E (µ+ n,h )⊗4<br />

[(αk + βk)(γiδj + γjδi)],<br />

where the functi<strong>on</strong>s αi, βi, γi, <strong>an</strong>d δi are as in the previous lemma.<br />

Hint: The first equality is a direct computati<strong>on</strong>. For the sec<strong>on</strong>d equality,<br />

write tk = (αk + βk)/ √ 2, q ′ i = (γi + δi)/ √ 2, <strong>an</strong>d qi = (γi − δi)/ √ 2.<br />

The claim then follows from the previous lemma. <br />

Proof of the Griffiths-Hurst-Sherm<strong>an</strong> inequality. Set Sn = <br />

i∈Vn σi<br />

<strong>an</strong>d ¯ Sn = <br />

i∈Vn ¯σi. The above lemma shows that if h ≥ 0, then<br />

E µ+<br />

n,h[S 3 n] − 3E µ+<br />

n,h[Sn]E µ+<br />

n,h[S 2 n] + 2E µ+<br />

n,h[Sn] 3 = E µ+ n,h[(Sn − ¯ Sn) 3 ] ≤ 0.<br />

The inequality follows now from the following computati<strong>on</strong>.<br />

∗ Exercise C.9. Show that<br />

β −2 |Vn| ∂2Mn = Eµ+ n,h[S<br />

∂h2 3 n] − 3E µ+<br />

n,h[Sn]E µ+<br />

n,h[S 2 n] + 2E µ+<br />

n,h[Sn] 3 .


Bibliography<br />

1. Michael Aizenm<strong>an</strong>, Instability of phase coexistence <strong>an</strong>d tr<strong>an</strong>slati<strong>on</strong> invari<strong>an</strong>ce<br />

in two dimensi<strong>on</strong>s, Mathematical problems in theoretical<br />

physics (Proc. Internat. C<strong>on</strong>f. Math. Phys., Laus<strong>an</strong>ne, 1979), Lecture<br />

Notes in Phys., vol. 116, Springer, Berlin, 1980, pp. 143–147. MR582616<br />

2. , Tr<strong>an</strong>slati<strong>on</strong> invari<strong>an</strong>ce <strong>an</strong>d instability of phase coexistence in<br />

the two-dimensi<strong>on</strong>al Ising system, Comm. Math. Phys. 73 (1980), no. 1,<br />

83–94. MR573615<br />

3. Robert G. Bartle, The elements of real <strong>an</strong>alysis, first corrected printing<br />

ed., John Wiley & S<strong>on</strong>s, New York-L<strong>on</strong>d<strong>on</strong>-Sydney, 1967. MR0393369<br />

4. J. R. Baxter, N. C. Jain, <strong>an</strong>d T. O. Seppäläinen, <strong>Large</strong> deviati<strong>on</strong>s for<br />

n<strong>on</strong>stati<strong>on</strong>ary arrays <strong>an</strong>d sequences, Illinois J. Math. 37 (1993), no. 2,<br />

302–328. MR1208824<br />

5. John Baxter <strong>an</strong>d Naresh Jain, C<strong>on</strong>vexity <strong>an</strong>d compactness in large deviati<strong>on</strong><br />

theory, Unpublished m<strong>an</strong>uscript (1991).<br />

6. Patrick Billingsley, C<strong>on</strong>vergence of probability measures, John Wiley &<br />

S<strong>on</strong>s Inc., New York, 1968. MR0233396<br />

7. J. Bricm<strong>on</strong>t, J. R. F<strong>on</strong>taine, <strong>an</strong>d L. J. L<strong>an</strong>dau, On the uniqueness of<br />

the equilibrium state for pl<strong>an</strong>e rota<strong>to</strong>rs, Comm. Math. Phys. 56 (1977),<br />

no. 3, 281–296. MR0489629<br />

8. Amir Dembo <strong>an</strong>d Ofer Zei<strong>to</strong>uni, <strong>Large</strong> deviati<strong>on</strong>s techniques <strong>an</strong>d applicati<strong>on</strong>s,<br />

sec<strong>on</strong>d ed., Applicati<strong>on</strong>s of Mathematics, vol. 38, Springer-Verlag,<br />

New York, 1998. MR1619036<br />

9. Je<strong>an</strong>-Dominique Deuschel <strong>an</strong>d D<strong>an</strong>iel W. Stroock, <strong>Large</strong> deviati<strong>on</strong>s,<br />

Pure <strong>an</strong>d Applied Mathematics, vol. 137, Academic Press Inc., Bost<strong>on</strong>,<br />

MA, 1989. MR997938<br />

189


190 Bibliography<br />

10. I. H. Dinwoodie, A note <strong>on</strong> the upper bound for i.i.d. large deviati<strong>on</strong>s,<br />

Ann. Probab. 19 (1991), no. 4, 1732–1736. MR1127723<br />

11. R. L. Dobrushin, Existence of a phase tr<strong>an</strong>siti<strong>on</strong> in the two-dimensi<strong>on</strong>al<br />

<strong>an</strong>d three-dimensi<strong>on</strong>al Ising models, Soviet Physics Dokl. 10 (1965),<br />

111–113. MR0182405<br />

12. , <strong>Gibbs</strong>i<strong>an</strong> r<strong>an</strong>dom fields for lattice systems <strong>with</strong> pairwise interacti<strong>on</strong>s.,<br />

Funkci<strong>on</strong>al. Anal. i Priloˇzen. 2 (1968), no. 4, 31–43. MR0250630<br />

13. , The <strong>Gibbs</strong> state that describes the coexistence of phases for a<br />

three-dimensi<strong>on</strong>al Ising model, Teor. Verojatnost. i Primenen. 17 (1972),<br />

619–639. MR0421546<br />

14. Richard M. Dudley, Real <strong>an</strong>alysis <strong>an</strong>d probability, The Wadsworth &<br />

Brooks/Cole Mathematics Series, Wadsworth & Brooks/Cole Adv<strong>an</strong>ced<br />

Books & Software, Pacific Grove, CA, 1989. MR982264<br />

15. Richard Durrett, Probability: theory <strong>an</strong>d examples, sec<strong>on</strong>d ed., Duxbury<br />

Press, Belm<strong>on</strong>t, CA, 1996. MR1609153<br />

16. E. B. Dynkin, Initial <strong>an</strong>d final behavior of the trajec<strong>to</strong>ries of Markov processes,<br />

Uspehi Mat. Nauk 26 (1971), no. 4(160), 153–172. MR0298758<br />

17. William Feller, An introducti<strong>on</strong> <strong>to</strong> probability theory <strong>an</strong>d its applicati<strong>on</strong>s.<br />

Vol. I, Third editi<strong>on</strong>, John Wiley & S<strong>on</strong>s Inc., New York, 1968.<br />

MR0228020<br />

18. Gerald B. Foll<strong>an</strong>d, Real <strong>an</strong>alysis: Modern techniques <strong>an</strong>d their applicati<strong>on</strong>s,<br />

sec<strong>on</strong>d ed., Pure <strong>an</strong>d Applied Mathematics (New York), John<br />

Wiley & S<strong>on</strong>s Inc., New York, 1999. MR1681462<br />

19. Klaus Fritzsche <strong>an</strong>d H<strong>an</strong>s Grauert, From holomorphic functi<strong>on</strong>s <strong>to</strong> complex<br />

m<strong>an</strong>ifolds, Graduate Texts in Mathematics, vol. 213, Springer-<br />

Verlag, New York, 2002. MR1893803<br />

20. Jürg Fröhlich <strong>an</strong>d Charles-Edouard Pfister, Spin waves, vortices, <strong>an</strong>d<br />

the structure of equilibrium states in the classical XY model, Comm.<br />

Math. Phys. 89 (1983), no. 3, 303–327. MR709469<br />

21. Jürg Fröhlich <strong>an</strong>d Thomas Spencer, Kosterlitz-Thouless tr<strong>an</strong>siti<strong>on</strong> in<br />

the two-dimensi<strong>on</strong>al pl<strong>an</strong>e rota<strong>to</strong>r <strong>an</strong>d Coulomb gas, Phys. Rev. Lett.<br />

46 (1981), no. 15, 1006–1009. MR607429<br />

22. H<strong>an</strong>s-Ot<strong>to</strong> Georgii, <strong>Gibbs</strong> measures <strong>an</strong>d phase tr<strong>an</strong>siti<strong>on</strong>s, de Gruyter<br />

Studies in Mathematics, vol. 9, Walter de Gruyter & Co., Berlin, 1988.<br />

MR956646<br />

23. Robert B. Griffiths, Peierls proof of sp<strong>on</strong>t<strong>an</strong>eous magnetizati<strong>on</strong> in a twodimensi<strong>on</strong>al<br />

Ising ferromagnet, Phys. Rev. (2) 136 (1964), A437–A439.<br />

MR0189681


Bibliography 191<br />

24. Y. Higuchi, On the absence of n<strong>on</strong>-tr<strong>an</strong>slati<strong>on</strong> invari<strong>an</strong>t <strong>Gibbs</strong> states<br />

for the two-dimensi<strong>on</strong>al Ising model, R<strong>an</strong>dom fields, Vol. I, II (Esztergom,<br />

1979), Colloq. Math. Soc. János Bolyai, vol. 27, North-Holl<strong>an</strong>d,<br />

Amsterdam, 1981, pp. 517–534. MR712693<br />

25. G. Kassay, A simple proof for König’s minimax theorem, Acta Math.<br />

Hungar. 63 (1994), no. 4, 371–374. MR1261480<br />

26. Davar Khoshnevis<strong>an</strong>, Probability, Graduate Studies in Mathematics,<br />

vol. 80, Americ<strong>an</strong> Mathematical Society, Providence, RI, 2007.<br />

MR2296582<br />

27. Heinz König, Über das v<strong>on</strong> Neum<strong>an</strong>nsche Minimax-Theorem, Arch.<br />

Math. (Basel) 19 (1968), 482–487. MR0240600<br />

28. O. E. L<strong>an</strong>ford, III <strong>an</strong>d D. Ruelle, Observables at infinity <strong>an</strong>d states <strong>with</strong><br />

short r<strong>an</strong>ge correlati<strong>on</strong>s in statistical mech<strong>an</strong>ics, Comm. Math. Phys.<br />

13 (1969), 194–215. MR0256687<br />

29. A. Messager, S. Miracle-Sole, <strong>an</strong>d C. Pfister, Correlati<strong>on</strong> inequalities <strong>an</strong>d<br />

uniqueness of the equilibrium state for the pl<strong>an</strong>e rota<strong>to</strong>r ferromagnetic<br />

model, Comm. Math. Phys. 58 (1978), no. 1, 19–29. MR0475552<br />

30. K. R. Parthasarathy, Probability measures <strong>on</strong> metric spaces, AMS<br />

Chelsea Publishing, Providence, RI, 2005, Reprint of the 1967 original.<br />

MR2169627<br />

31. Robert R. Phelps, Lectures <strong>on</strong> Choquet’s theorem, sec<strong>on</strong>d ed., Lecture<br />

Notes in Mathematics, vol. 1757, Springer-Verlag, Berlin, 2001.<br />

MR1835574<br />

32. R. Tyrrell Rockafellar, C<strong>on</strong>vex <strong>an</strong>alysis, Princet<strong>on</strong> Mathematical Series,<br />

No. 28, Princet<strong>on</strong> University Press, Princet<strong>on</strong>, N.J., 1970. MR0274683<br />

33. Walter Rudin, Functi<strong>on</strong>al <strong>an</strong>alysis, sec<strong>on</strong>d ed., Internati<strong>on</strong>al Series in<br />

Pure <strong>an</strong>d Applied Mathematics, McGraw-Hill Inc., New York, 1991.<br />

MR1157815<br />

34. Erwin Schrödinger, Statistical thermodynamics, A course of seminar<br />

lectures delivered in J<strong>an</strong>uary-March 1944, at the School of Theoretical<br />

Physics, Dublin Institute for Adv<strong>an</strong>ced Studies. Sec<strong>on</strong>d editi<strong>on</strong>,<br />

reprinted, Cambridge University Press, New York, 1962. MR0149891<br />

35. Timo Seppäläinen, <strong>Large</strong> deviati<strong>on</strong>s for lattice systems. I. Parametrized<br />

independent fields, Probab. Theory Related Fields 96 (1993), no. 2, 241–<br />

260. MR1227034<br />

36. , <strong>Large</strong> deviati<strong>on</strong>s for lattice systems. II. N<strong>on</strong>stati<strong>on</strong>ary independent<br />

fields, Probab. Theory Related Fields 97 (1993), no. 1-2, 103–112.<br />

MR1240718


192 Bibliography<br />

37. , Entropy, limit theorems, <strong>an</strong>d variati<strong>on</strong>al principles for disordered<br />

lattice systems, Comm. Math. Phys. 171 (1995), no. 2, 233–277.<br />

MR1344727<br />

38. , Maximum entropy principles for disordered spins, Probab. Theory<br />

Related Fields 101 (1995), no. 4, 547–576. MR1327225<br />

39. Timo Seppäläinen <strong>an</strong>d J. E. Yukich, <strong>Large</strong> deviati<strong>on</strong> principles for Euclide<strong>an</strong><br />

functi<strong>on</strong>als <strong>an</strong>d other nearly additive processes, Probab. Theory<br />

Related Fields 120 (2001), no. 3, 309–345. MR1843178<br />

40. H. v<strong>an</strong> Beijeren, Interface sharpness in the Ising system, Comm. Math.<br />

Phys. 40 (1975), no. 1, 1–6. MR1552609


Notati<strong>on</strong> index<br />

empirical me<strong>an</strong> (X1 + · · · + Xn)/n<br />

Sn<br />

[x] integral part of x; i.e. largest integer smaller or equal <strong>to</strong> x<br />

<strong>an</strong> ∼ bn <strong>an</strong> is equivalent <strong>to</strong> bn; i.e. <strong>an</strong>/bn → 1<br />

BER(p) Bernoulli distributi<strong>on</strong> <strong>with</strong> parameter p<br />

N set of positive integers<br />

Z set of whole numbers<br />

Q set of rati<strong>on</strong>al numbers<br />

R set of real numbers<br />

a.s. almost surely<br />

a.e. almost every<br />

i.i.d. independent, identically distributed<br />

E[f] expectati<strong>on</strong> of f, relative <strong>to</strong> P<br />

E µ [f] expectati<strong>on</strong> of f, relative <strong>to</strong> some measure µ<br />

X , Y <strong>to</strong>pological spaces (Hausdorff, metric, vec<strong>to</strong>r, Polish, etc)<br />

B Borel σ-algebra of the <strong>to</strong>pological space X<br />

Ω a general probability space <strong>an</strong>d often Ω = X Zd<br />

F σ-algebra <strong>on</strong> Ω <strong>an</strong>d the product Borel σ-algebra if Ω = X Zd<br />

S Polish space<br />

M1(X ) probability measures <strong>on</strong> X<br />

M(X ) finite signed measures <strong>on</strong> X<br />

A◦ interior of A<br />

A closure of A<br />

Ac compliment of A<br />

A B set difference<br />

A∆B symmetric set difference (A B) ∪ (B A)<br />

|A| cardinality of A<br />

193


194 Notati<strong>on</strong> index<br />

lim limsup<br />

lim liminf<br />

flsc<br />

lower semic<strong>on</strong>tinuous regularizati<strong>on</strong> of f<br />

LDP(µn, rn, I) {µn} satisfy a large deviati<strong>on</strong> principle <strong>with</strong><br />

normalizati<strong>on</strong> rn <strong>an</strong>d rate functi<strong>on</strong> I<br />

ν ≪ λ ν is absolutely c<strong>on</strong>tinuous relative <strong>to</strong> λ<br />

∇f<br />

Ln<br />

gradient of f<br />

empirical measures: 1 n n k=1 δXk<br />

Cb(X )<br />

C<br />

bounded c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> X<br />

+<br />

b (X ) functi<strong>on</strong>s in Cb(X ) that are strictly positive <strong>an</strong>d<br />

bounded away from 0<br />

bB bounded B-measurable functi<strong>on</strong>s<br />

〈·, ·〉 bilinear duality between two vec<strong>to</strong>r spaces<br />

f ∗ c<strong>on</strong>vex c<strong>on</strong>jugate of f<br />

f ∗∗ c<strong>on</strong>vex bic<strong>on</strong>jugate of f<br />

∂f subdifferential of f<br />

p(·) pressure functi<strong>on</strong><br />

Ub,d(X ) bounded uniformly c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> (X , d)<br />

Ub(X ) bounded uniformly c<strong>on</strong>tinuous functi<strong>on</strong>s <strong>on</strong> X<br />

H(ν | λ) entropy of ν relative <strong>to</strong> λ<br />

β inverse temperature<br />

λ⊗N the law of <strong>an</strong> i.i.d. sequence <strong>with</strong> marginals λ<br />

λ⊗Zd the law of <strong>an</strong> i.i.d. field <strong>with</strong> marginals λ<br />

θi<br />

group of shifts <strong>on</strong> a probability space Ω<br />

Mθ(Ω) shift-invari<strong>an</strong>t probability measures <strong>on</strong> Ω<br />

Me(Ω) ergodic probability measures <strong>on</strong> Ω<br />

T tail σ-algebra<br />

I shift-invari<strong>an</strong>t σ-algebra<br />

ωΛ<br />

ΩΛ<br />

spins in Λ: (ωi)i∈Λ<br />

space X Λ of spins in Λ<br />

FΛ<br />

Rn<br />

σ-algebra generated by spins in Λ<br />

level-3 empirical fields: Rn(ω) = 1 <br />

|Vn| i∈Vn δθiω<br />

Cb,loc(Ω) bounded c<strong>on</strong>tinuous local functi<strong>on</strong>s<br />

Ub,loc(Ω) bounded uniformly c<strong>on</strong>tinuous local functi<strong>on</strong>s<br />

f∞ ω<br />

supremum norm: sup |f|<br />

(n) Rn<br />

periodized c<strong>on</strong>figurati<strong>on</strong><br />

1 <br />

periodic empirical fields: |Vn| i∈Vn δθiω (n)<br />

Λ a (sometimes finite) subset of Zd Vn<br />

the box {i ∈ Zd : −n < i1, . . . , id < n}<br />

νΛ<br />

the restricti<strong>on</strong> of ν ∈ M1(X Zd)<br />

<strong>to</strong> X Λ<br />

νn<br />

the restricti<strong>on</strong> of ν ∈ M1(X Zd)<br />

<strong>to</strong> X Vn<br />

HΛ(ν | λ) entropy of νΛ relative <strong>to</strong> λΛ<br />

Hn(ν | λ) entropy of νn relative <strong>to</strong> λn<br />

h(ν | λ) specific entropy of ν relative <strong>to</strong> λ


Notati<strong>on</strong> index 195<br />

B space of absolutely summable shift-invari<strong>an</strong>t<br />

interacti<strong>on</strong> potentials<br />

Hfree Λ<br />

H<br />

Hamilt<strong>on</strong>i<strong>an</strong> in volume Λ <strong>with</strong> free boundary c<strong>on</strong>diti<strong>on</strong>s<br />

τΛc Λ Hamilt<strong>on</strong>i<strong>an</strong> in volume Λ <strong>with</strong> boundary c<strong>on</strong>diti<strong>on</strong> τΛc πΛ(τ, ·) specificati<strong>on</strong> <strong>with</strong> boundary c<strong>on</strong>diti<strong>on</strong> τΛc πτ Λ (·)<br />

Π<br />

specificati<strong>on</strong> <strong>with</strong> boundary c<strong>on</strong>diti<strong>on</strong> τΛc specificati<strong>on</strong><br />

G Π set of <strong>Gibbs</strong> measures c<strong>on</strong>sistent <strong>with</strong> specificati<strong>on</strong> Π<br />

H0 Λ<br />

µ<br />

free Ising Hamilt<strong>on</strong>i<strong>an</strong><br />

ω Λ,β,h,J<br />

µ<br />

Ising specificati<strong>on</strong> <strong>with</strong> boundary ω, parameters (β, h, J)<br />

0 Λ,β,h,J<br />

µ<br />

Ising specificati<strong>on</strong> <strong>with</strong> free boundary c<strong>on</strong>diti<strong>on</strong><br />

+<br />

Λ,β,h,J<br />

µ<br />

Ising specificati<strong>on</strong> <strong>with</strong> boundary ω ≡ 1<br />

−<br />

Λ,β,h,J Ising specificati<strong>on</strong> <strong>with</strong> boundary ω ≡ −1<br />

x ∧ y min(x, y)<br />

x ∨ y max(x, y)<br />

ω ≤ σ partial order <strong>on</strong> c<strong>on</strong>figurati<strong>on</strong>s: ωi ≤ σi ∀i ∈ Zd µ ≤ ν ν s<strong>to</strong>chastically dominates µ<br />

P (Φ) infinite volume pressure corresp<strong>on</strong>ding <strong>to</strong> potential Φ<br />

h(ν | Φ) specific entropy relative <strong>to</strong> potential Φ<br />

κ upper rate functi<strong>on</strong><br />

κ lower rate functi<strong>on</strong><br />

¯p upper pressure<br />

J c<strong>on</strong>vex c<strong>on</strong>jugate of ¯p<br />

ex(K) extreme points of K<br />

αq push forward of measure α by kernel q


Formula<br />

Stirling’s,<br />

Theorems, principles, <strong>an</strong>d models index<br />

Inequality<br />

arithmetic-geometric,<br />

Chebyshev’s,<br />

Doob’s,<br />

Fenchel-Young,<br />

GHS,<br />

Griffiths’,<br />

Grifiths-Hurst-Sherm<strong>an</strong>,<br />

Hölder’s,<br />

Holley’s,<br />

Jensen’s,<br />

Lemma<br />

Borel-C<strong>an</strong>telli,<br />

Fa<strong>to</strong>u’s,<br />

Feteke’s,<br />

Strassen’s,<br />

Varadh<strong>an</strong>’s,<br />

Model<br />

Curie-Weiss,<br />

Fortuin-Kasteleyn,<br />

Ising,<br />

pl<strong>an</strong>e rota<strong>to</strong>r,<br />

pl<strong>an</strong>e ro<strong>to</strong>r,<br />

XY,<br />

Principle<br />

c<strong>on</strong>tracti<strong>on</strong>,<br />

DLR variati<strong>on</strong>al,<br />

Dobrushin-L<strong>an</strong>ford-Ruelle variati<strong>on</strong>al,<br />

<strong>Gibbs</strong> c<strong>on</strong>diti<strong>on</strong>ing,<br />

large deviati<strong>on</strong>,<br />

maximum entropy,<br />

Maxwell’s,<br />

push-forward,<br />

Theorem<br />

<strong>an</strong>alytic implicit functi<strong>on</strong>,<br />

backwards-martingale c<strong>on</strong>vergence,<br />

Baxter-Jain,<br />

Bryc’s,<br />

Choquet’s,<br />

Cramér’s,<br />

de Finetti’s,<br />

dominated c<strong>on</strong>vergence,<br />

ergodic,<br />

ergodic decompositi<strong>on</strong>,<br />

Fenchel-Morreau,<br />

Hahn-B<strong>an</strong>ach separati<strong>on</strong>,<br />

Kolmogorov’s extensi<strong>on</strong>,<br />

Markov chain c<strong>on</strong>vergence,<br />

minimax,<br />

m<strong>on</strong>ot<strong>on</strong>e class,<br />

m<strong>on</strong>ot<strong>on</strong>e c<strong>on</strong>vergence,<br />

portm<strong>an</strong>teau,<br />

Prohorov’s,<br />

Riesz representati<strong>on</strong>,<br />

S<strong>an</strong>ov’s,<br />

Tych<strong>on</strong>ov’s,<br />

Ulam’s,<br />

Varadh<strong>an</strong>’s,


Aizenm<strong>an</strong>, 113<br />

Bartle, 6<br />

Baxter <strong>an</strong>d Jain, 146<br />

Beijeren, see v<strong>an</strong> Beijeren<br />

Billingsley, 169<br />

Boltzm<strong>an</strong>n, 69<br />

Bricm<strong>on</strong>t, F<strong>on</strong>taine, <strong>an</strong>d L<strong>an</strong>dau, 96<br />

Dembo <strong>an</strong>d Zei<strong>to</strong>uni, 23, 24, 35, 51, 52, 65<br />

Deuschel <strong>an</strong>d Stroock, 173<br />

Dinwoodie, 24<br />

Dobrushin, 93, 94, 113, 116<br />

Dudley, 143, 168<br />

Durrett, 4, 11, 12, 21–23, 26, 49, 51, 58, 59,<br />

64, 67, 76, 91, 97, 113, 123, 135, 137,<br />

175, 184<br />

Dynkin, 173<br />

Feller, 4<br />

Foll<strong>an</strong>d, 109, 143, 172<br />

F<strong>on</strong>taine, see Bricm<strong>on</strong>t, F<strong>on</strong>taine, <strong>an</strong>d<br />

L<strong>an</strong>dau<br />

Fröhlich <strong>an</strong>d Pfister, 96<br />

Fröhlich <strong>an</strong>d Spencer, 96<br />

Fritzsche <strong>an</strong>d Grauert, 93<br />

Georgii, 98, 113, 171<br />

<strong>Gibbs</strong>, 69<br />

Grauert, see Fritzsche<br />

Griffiths, 116<br />

Higuchi, 113<br />

Jain, see Baxter <strong>an</strong>d Jain<br />

König, 178<br />

Kassay, 178<br />

Khoshnevis<strong>an</strong>, 4, 12, 22, 23, 26, 49, 51, 58,<br />

59, 64, 67, 76, 113, 123, 137, 175, 178<br />

L<strong>an</strong>dau, see Bricm<strong>on</strong>t, F<strong>on</strong>taine, <strong>an</strong>d<br />

L<strong>an</strong>dau<br />

L<strong>an</strong>ford <strong>an</strong>d Ruelle, 93<br />

Author index<br />

Maxwell, 69<br />

Messager, Miracle-Sole, <strong>an</strong>d Pfister, 96<br />

Miracle-Sole, see Messager, Miracle-Sole,<br />

<strong>an</strong>d Pfister<br />

Parthasarathy, 143, 168, 170<br />

Pfister, see Fröhlich <strong>an</strong>d Pfister, see<br />

Messager, Miracle-Sole, <strong>an</strong>d Pfister<br />

Phelps, 97<br />

Rockafellar, 44<br />

Rudin, 42, 43, 81, 142, 149, 150, 168<br />

Ruelle, see L<strong>an</strong>ford <strong>an</strong>d Ruelle<br />

Schrödinger, 6<br />

Seppäläinen <strong>an</strong>d Yukich, 36<br />

Spencer, see Fröhlich <strong>an</strong>d Spencer<br />

Stroock, see Deuschel <strong>an</strong>d Stroock<br />

v<strong>an</strong> Beijeren, 113<br />

Yukich, see Seppäläinen <strong>an</strong>d Yukich<br />

Zei<strong>to</strong>uni, see Dembo <strong>an</strong>d Zei<strong>to</strong>uni


additive, 7<br />

affine, 44, 75, 103<br />

minor<strong>an</strong>t, 41, 43, 47<br />

<strong>an</strong>alytic functi<strong>on</strong>, 93, 161<br />

<strong>an</strong>alytic implicit functi<strong>on</strong> theorem, 93<br />

arithmetic-geometric inequality, 159<br />

a<strong>to</strong>m, 37<br />

backwards-martingale c<strong>on</strong>vergence<br />

theorem, 97, 113, 175<br />

Baxter-Jain theorem, 146<br />

Berry-Esseen theorem, 135, 136<br />

bic<strong>on</strong>jugate, see c<strong>on</strong>vex<br />

Borel-C<strong>an</strong>telli lemma, 12, 27<br />

boundary c<strong>on</strong>diti<strong>on</strong>, 89, 99, 101, 112, 112,<br />

116, 119, 125<br />

free, 89, 102<br />

Bryc’s theorem, 31, 35, 35, 56<br />

c<strong>an</strong><strong>on</strong>ical ensemble, 69<br />

Chabyshev’s inequality, 52<br />

Chebyshev’s inequality, 11, 21, 137<br />

chemical potential, 94<br />

Choquet’s theorem, 97<br />

circuit, 116–118<br />

compact<br />

set, 16, 17, 19, 20, 25, 27, 30, 36, 43, 49,<br />

60, 61, 63, 66, 68, 75, 76, 81, 82,<br />

103, 109, 113, 145, 146, 149, 150,<br />

155, 158, 159, 161, 169, 170<br />

<strong>to</strong>pological space, see <strong>to</strong>pological space<br />

compactificati<strong>on</strong>, 145, 159, 160<br />

c<strong>on</strong>cave<br />

functi<strong>on</strong>, 25, 26, 35, 75, 126, 127, 155<br />

uniformly Jensen-c<strong>on</strong>cave-covex-like, 178<br />

c<strong>on</strong>diti<strong>on</strong>al probability, 62, 69, 78, 90, 91,<br />

142, 144, 171–173<br />

c<strong>on</strong>jugate, see c<strong>on</strong>vex<br />

c<strong>on</strong><strong>to</strong>ur, see circuit, 117<br />

c<strong>on</strong>tracti<strong>on</strong> principle, 29, 29, 30, 31, 50, 57,<br />

65, 161<br />

Bernoulli, 30, 39<br />

c<strong>on</strong>vex<br />

bic<strong>on</strong>jugate, 43, 46, 142<br />

General index<br />

c<strong>on</strong>jugate, 43, 49, 51, 58, 60, 65, 80, 143,<br />

145, 156<br />

duality, 35, 41, 42, 49, 58, 60, 145, 155,<br />

167, 168<br />

functi<strong>on</strong>, 22, 25, 26, 36, 41, 43, 43, 44,<br />

46–51, 54, 56, 64, 75, 77, 125, 126,<br />

142, 143, 145–149, 154, 156, 157<br />

hull, 23, 41, 46, 46, 149, 150<br />

minor<strong>an</strong>t, 46, 47<br />

rate functi<strong>on</strong>, see rate functi<strong>on</strong><br />

set, 25, 41, 43, 44, 49, 52–54, 68, 72, 76,<br />

109, 142, 145, 146, 149, 150, 155, 171<br />

strictly, 58, 59, 66, 68<br />

uniformly Jensen-c<strong>on</strong>cave-covex-like, 178<br />

correlati<strong>on</strong> functi<strong>on</strong>, 119–121<br />

coupling, 183<br />

covex<br />

functi<strong>on</strong>, 155<br />

Cramér’s theorem, 20, 21, 23, 24, 27, 37,<br />

39, 56, 65, 66, 71, 78, 136<br />

R, 23, 133<br />

R d , 24, 51, 51, 54, 56<br />

Polish, 65<br />

Polish space, 65<br />

refinement, 28, 133, 133, 134<br />

Curie point, 38<br />

Curie-Weiss model, 9, 37, 37, 38, 88, 93,<br />

94, 102, 111, 112<br />

de Finetti’s theorem, 56, 174, 175<br />

diag<strong>on</strong>al trick, 76, 126<br />

DLR<br />

equati<strong>on</strong>s, 93<br />

variati<strong>on</strong>al principle, 9, 105<br />

Dobrushin-L<strong>an</strong>ford-Ruelle<br />

equati<strong>on</strong>s, 93<br />

variati<strong>on</strong>al principle, 105<br />

dominated c<strong>on</strong>vergence theorem, 26, 59,<br />

113, 123, 127, 137<br />

Doob’s inequality, 12<br />

droplet, 115, 116<br />

duality, see c<strong>on</strong>vex duality<br />

Edgeworth exp<strong>an</strong>si<strong>on</strong>, 136<br />

empirical


202 General index<br />

average, 99<br />

field, 73, 78, 101, 102, 104<br />

periodic, 73<br />

periodized, 104<br />

fields<br />

periodized, 78<br />

me<strong>an</strong>, see sample me<strong>an</strong>, see sample me<strong>an</strong><br />

measure, 30, 31, 57, 62, 65, 71, 153, 156,<br />

175<br />

Bernoulli, 57<br />

energy, 6–8, 37, 38, 65–67, 69, 87, 89, 99,<br />

102, 115, 118<br />

free, 36, 67, 67, 88, 93, 95, 101, 102<br />

enthalpy, 67<br />

entropy, 5, 8, 66–69, 73, 88, 102, 141<br />

c<strong>on</strong>diti<strong>on</strong>al, 62, 78, 108, 142<br />

informati<strong>on</strong>-theoretic, 5, 57<br />

maximum, 66, 67, see maximum entropy<br />

principle, 68<br />

minimum, 66, 68<br />

relative, 4, 5, 57, 58, 59, 60, 62–65, 67,<br />

68, 73, 74, 77, 78, 88, 108, 153, 154,<br />

163<br />

Bernoulli, 31, 58<br />

specific, 73, 74, 75, 77, 101, 102, 173<br />

thermodynamic, 5, 6, 7, 8, 57, 67, 68<br />

epigraph, 15, 16, 41, 44, 45–47<br />

equilibrium, 87, 90, 91, 94, 99<br />

equivalence of ensembles, 69, 69, 70, 88<br />

ergodic<br />

measure, 72, 77, 96, 97, 170, 170,<br />

171–173<br />

ergodic decompositi<strong>on</strong> theorem, 171, 173,<br />

174<br />

ergodic theorem, 72, 73, 76, 79, 83, 97, 170,<br />

170, 171<br />

exch<strong>an</strong>geable, 56, 173, 174, 174, 175<br />

exp<strong>on</strong>entially tight, 17, 19, 19, 20, 25, 35,<br />

36, 54, 62, 63, 79, 81, 82, 145, 158, 160<br />

external magnetic field, see magnetic field<br />

extreme point, 72, 72, 76, 97, 98, 113, 149,<br />

150, 171, 171<br />

Fa<strong>to</strong>u’s lemma, 51<br />

Fekete’s lemma, 53<br />

multiindex, 74<br />

Feller-c<strong>on</strong>tinuous, 92, 92<br />

Fenchel-Morreau theorem, 44<br />

Fenchel-Young inequality, 44, 45, 48<br />

ferromagnet, 37, 39, 101<br />

finite r<strong>an</strong>ge, see r<strong>an</strong>ge<br />

Fortuin-Kasteleyn model, 129<br />

free energy, see energy, 94<br />

GHS inequality, see<br />

Griffiths-Hurst-Sherm<strong>an</strong> inequality<br />

<strong>Gibbs</strong><br />

c<strong>on</strong>diti<strong>on</strong>ing principle, 67, 136<br />

measure, 35, 37, 66, 67, 69, 83, 87, 88,<br />

89, 90, 92–98, 100, 102–105, 109,<br />

112, 113, 115, 116, 119–121, 123,<br />

126, 173<br />

specificati<strong>on</strong>, see specificati<strong>on</strong><br />

Griffiths’ inequality, 124, 184, 185<br />

mybf, 185<br />

Griffiths-Hurst-Sherm<strong>an</strong> inequality, 186,<br />

187<br />

Grifiths-Hurst-Sherm<strong>an</strong> inequality, 126,<br />

186<br />

Hölder’s inequality, 26, 49, 80, 126, 142<br />

Hahn-B<strong>an</strong>ach separati<strong>on</strong> theorem, 42, 45,<br />

47, 48, 142, 146, 150, 151, 180<br />

Hamilt<strong>on</strong>i<strong>an</strong>, 37, 38, 65, 87–89, 99, 100,<br />

106, 112, 118, 122<br />

free, 112<br />

Hausdorff, see <strong>to</strong>pological space<br />

Helmholtz free energy, 88<br />

Hewitt-Savage 0-1 law, 175<br />

Holley’s inequality, 122, 183, 183<br />

inclusi<strong>on</strong>-exclusi<strong>on</strong>, 119<br />

Inequlaity<br />

Jensen, 67<br />

informati<strong>on</strong>, 6, 8<br />

interacti<strong>on</strong>, see ptential88<br />

interacti<strong>on</strong> potential, see potential<br />

intermittency, 34<br />

inverse temperature, 66, see temperature,<br />

93, 94, 99, 111<br />

critical, 113<br />

Ising model, 37, 38, 94, 95, 106, 110, 111,<br />

115, 125, 126, 129, 186<br />

<strong>on</strong>e-dimensi<strong>on</strong>al, 113, 115<br />

two-dimensi<strong>on</strong>al, 116<br />

Jensen’s inequality, 22, 58, 64, 83, 142, 145,<br />

154, 155, 159<br />

Kirkwood-Salzburg<br />

equati<strong>on</strong>, 120<br />

kernel, 119<br />

Kolmogorov’s 0-1 law, 77, 172<br />

Kolmogorov’s extensi<strong>on</strong> theorem, 76, 123<br />

Krein-Milm<strong>an</strong> theorem, 149<br />

Lagr<strong>an</strong>ge multiplier, 6, 7<br />

large deviati<strong>on</strong>, 3–5, 8, 11, 12, 16, 17, 20,<br />

21, 27, 41, 57, 71, 73, 83, 84, 99, 100,<br />

103, 105, 134, 136, 145, 151, 153<br />

Bernoulli, 4<br />

informal, 11<br />

level 1, 71<br />

level 2, 71


General index 203<br />

level 3, 71<br />

lower bound, 17, 19, 20, 26, 29, 32<br />

Markov chain, 140, 144<br />

positi<strong>on</strong> level, 71<br />

process level, 71, 82<br />

upper bound, 17, 19, 20, 22, 24, 25, 27,<br />

29, 32, 33, 40, 49, 82, 162<br />

large deviati<strong>on</strong> principle (LDP), 12, 14, 16,<br />

16, 19–24, 26, 27, 29–32, 34–37, 39, 40,<br />

50–52, 54, 57, 62, 65, 73, 76, 78, 82,<br />

89, 101, 102, 104, 105, 146, 148–151,<br />

155, 156, 158–163, 194<br />

Bernoulli, 13<br />

exch<strong>an</strong>geable process, 56<br />

Normal, 20<br />

uniform, 104<br />

weak, 19, 19, 20, 24, 25, 36, 54<br />

lattice gas, 101<br />

LDP, see large deviati<strong>on</strong> principle<br />

level 1, see large deviati<strong>on</strong><br />

level 2, see large deviati<strong>on</strong><br />

level 3, see large deviati<strong>on</strong>, see process level<br />

Lipschitz functi<strong>on</strong>, 35<br />

liquid gas phase tr<strong>an</strong>siti<strong>on</strong>, 113, see phase<br />

tr<strong>an</strong>siti<strong>on</strong><br />

local functi<strong>on</strong>, 72<br />

locally c<strong>on</strong>vex space, see <strong>to</strong>pological space<br />

lower semic<strong>on</strong>tinuous, 15, 15, 16, 22, 27,<br />

30, 32, 34, 36, 41, 43, 44, 46–51, 56,<br />

60, 61, 66, 68, 75, 77, 103, 109, 142,<br />

143, 146–150, 173<br />

minor<strong>an</strong>t, 15, 41, 46, 47<br />

not, 30<br />

regularizati<strong>on</strong>, 15, 16, 29, 41, 44, 47<br />

macroscopic, 87<br />

macrostate, 96–98<br />

magnetic field, 112<br />

external, 37, 39, 88, 89, 93, 111, 112<br />

magnetizati<strong>on</strong>, 39<br />

sp<strong>on</strong>t<strong>an</strong>eous, 37, 39, 111<br />

Markov chain, 62, 90, 115, 141, 183, 184<br />

kernel, 144<br />

Markov chain c<strong>on</strong>vergence theorem, 62, 184<br />

Markov field, 115<br />

martingale, 12<br />

maximum entropy principle, 67, 67, 68, 69,<br />

84<br />

Maxwell’s principle, 70<br />

me<strong>an</strong>-field approximati<strong>on</strong>, 35, 37, 38<br />

measure<br />

a priori, 88, 89<br />

reference, 88<br />

metric, see <strong>to</strong>pological space<br />

metrize, 17, 168, 169<br />

metrized, 168<br />

microc<strong>an</strong><strong>on</strong>ical ensemble, 69<br />

microscopic, 87, 99<br />

Milm<strong>an</strong>’s theorem, 150<br />

minimal sp<strong>an</strong>ning tree, 36<br />

minimax theorem<br />

R d , 25<br />

minimizer, 34, 61, 66–69, 84, 105<br />

moderate deviati<strong>on</strong>, 27, 28, 136<br />

moment generating functi<strong>on</strong>, 21, 24, 24, 51<br />

m<strong>on</strong>ot<strong>on</strong>e class, 172, 178<br />

m<strong>on</strong>ot<strong>on</strong>e class theorem, 178<br />

m<strong>on</strong>ot<strong>on</strong>e c<strong>on</strong>vergence theorem, 59<br />

multidimensi<strong>on</strong>al ergodic theorem, see<br />

ergodic theorem<br />

normalizati<strong>on</strong>, 13, 16, 19, 20, 22, 31–33, 35,<br />

145, 146, 155, 156, 161<br />

c<strong>on</strong>st<strong>an</strong>t, 37, 88, 90<br />

observable, 96<br />

macroscopic, 96, 96<br />

microscopic, 96, 96, 99<br />

pair potential, 89<br />

partiti<strong>on</strong> functi<strong>on</strong>, 37, 66, 88, 90, 101, 112<br />

Peierls argument, 116<br />

Peierls arugment, 115<br />

percolati<strong>on</strong>, 128<br />

periodized c<strong>on</strong>figurati<strong>on</strong>, 73<br />

permutati<strong>on</strong>, 174<br />

phase diagram, 39, 94, 95<br />

phase tr<strong>an</strong>siti<strong>on</strong>, 88, 92–94, 94, 95, 96, 105,<br />

110–113, 115, 116, 119, 121, 123–125,<br />

128<br />

liquid gas, 113<br />

liquid-gas, 94, 95<br />

pl<strong>an</strong>e rota<strong>to</strong>r model, 96<br />

pl<strong>an</strong>e ro<strong>to</strong>r model, see pl<strong>an</strong>e rota<strong>to</strong>r<br />

Polish, see <strong>to</strong>pological space<br />

portm<strong>an</strong>teau theorem, 63, 72, 81, 159, 167,<br />

168, 169<br />

positi<strong>on</strong> level, see large deviati<strong>on</strong><br />

potential, 88, 89, 93, 111<br />

interacti<strong>on</strong>, 88, 88, 89, 89, 93, 99<br />

nearest-neighbor, 89<br />

<strong>on</strong>e body, 89<br />

pair, 89<br />

self, 89<br />

two-body, 89<br />

pressure, 50, 50, 78–80, 93–95, 101, 105,<br />

124, 125, 146, 148, 156, 158<br />

upper, 145<br />

process level, 71, see large deviati<strong>on</strong><br />

Prohorov<br />

metric, 169, 170<br />

Prohorov’s theorem, 61<br />

Prohorov’s theorem, 63, 81, 169, 170<br />

proper, 43, 44–46, 48, 51, 146


204 General index<br />

proper functi<strong>on</strong>, 44<br />

push forward, 144<br />

push-forward principle, see c<strong>on</strong>tracti<strong>on</strong><br />

principle<br />

Rad<strong>on</strong>-Nikodym derivative, 64, 83, 135<br />

r<strong>an</strong>dom cluster, see Fortuin-Kasteleyn<br />

r<strong>an</strong>dom field, 71, 72<br />

r<strong>an</strong>ge, 89<br />

finite, 89<br />

rate functi<strong>on</strong>, 4, 5, 8, 13–16, 16, 17, 20, 22,<br />

23, 29–35, 37, 39, 40, 49–51, 54, 56, 60,<br />

65, 66, 73, 104, 105, 109, 146, 148, 150,<br />

151, 155, 156, 158, 159, 161–163<br />

Bernoulli, 31<br />

c<strong>on</strong>vex, 35, 40, 52, 62, 65<br />

empirical measure<br />

Bernoulli, 57<br />

good, 17<br />

lower, 20, 146, 155<br />

lower semic<strong>on</strong>tinuous, 16, 19, 27, 30, 146<br />

not c<strong>on</strong>vex, 35, 40, 50, 50<br />

not lower semic<strong>on</strong>tinuous, 30<br />

not tight, 30<br />

tight, 16, 17, 20, 24, 27, 29, 30, 35, 52,<br />

54, 62, 65, 73, 78, 104, 143, 154<br />

unique, 17<br />

upper, 20, 146, 155<br />

zero, 22, 27, 35, 50, 76<br />

regular<br />

point, 148, 148, 149, 150<br />

probability measure, 61<br />

sequence, 155<br />

<strong>to</strong>pological space, see <strong>to</strong>pological space<br />

regularizati<strong>on</strong>, see lower semic<strong>on</strong>tinuous<br />

relative entropy, see entropy<br />

Riesz representati<strong>on</strong>, 143<br />

sample me<strong>an</strong>, 14, 20–24, 31, 39, 51, 56, 57,<br />

65, 71, 94<br />

S<strong>an</strong>ov’s theorem, 62, 62, 65, 67, 71, 73, 76,<br />

78, 81, 151, 153<br />

self-potential, see potential<br />

separable, see <strong>to</strong>pological space<br />

shift group, 72<br />

shift-invari<strong>an</strong>t, 75<br />

event, 72, 77, 170, 170<br />

functi<strong>on</strong>, 96, 101<br />

<strong>Gibbs</strong> measuer, 123<br />

<strong>Gibbs</strong> measure, 102, 105, 113, 123<br />

measure, 72, 76, 102, 170–173<br />

observable, see functi<strong>on</strong><br />

potential, 88, 89, 93, 97, 99, 105, 109<br />

set, see event<br />

specific entropy, see entropy<br />

specificati<strong>on</strong>, 90, 91, 91, 92, 93, 96, 97, 112,<br />

115, 123<br />

<strong>Gibbs</strong>, 92–94, 103<br />

spin, 37, 38, 72, 87–89, 93, 96, 109, 111,<br />

112, 115–118, 122, 176, 183–185<br />

c<strong>on</strong>figurati<strong>on</strong>, 37<br />

flip, 112, 116–118, 122<br />

spin glass, 36<br />

sp<strong>on</strong>t<strong>an</strong>eous magnetizati<strong>on</strong>, 37, see<br />

magnetizati<strong>on</strong>, 39<br />

statistical mech<strong>an</strong>ics, 5, 8, 35, 37<br />

Stirling’s formula, 31<br />

Stirling’s formula, 4, 6, 14<br />

s<strong>to</strong>chastic dominati<strong>on</strong>, 121<br />

s<strong>to</strong>chastic kernel, 90, 90, 91, 97, 98, 141,<br />

144, 163, 173–175<br />

Strassen’s lemma, 121, 175<br />

strictly c<strong>on</strong>vex, see c<strong>on</strong>vex<br />

subadditivity, 52, 55<br />

Fekete’s lemma, 53, 74<br />

subdifferential, 48, 50<br />

sublevel set, 60, 61, 66, 68, 75, 103<br />

tail σ-algebra, 172<br />

t<strong>an</strong>gent hyperpl<strong>an</strong>e, 48<br />

temperature, 6, 39, 88<br />

critical, 38<br />

high, 38<br />

inverse, 37, 66, 88<br />

low, 38<br />

tight<br />

family of measures, 19, 30, 61, 63, 81,<br />

159, 169, 169<br />

rate functi<strong>on</strong>, see rate functi<strong>on</strong><br />

<strong>to</strong>lopogy<br />

weak, 159<br />

<strong>to</strong>pological space, 169<br />

B<strong>an</strong>ach, 42, 89<br />

compact, 92, 109, 142, 143, 145–147, 149,<br />

150, 156, 158, 160, 161, 168, 179–181<br />

Hausdorff, 12, 17, 19, 29, 30, 42, 145,<br />

155, 167, 179<br />

locally c<strong>on</strong>vex, 42, 42, 45<br />

metric, 15–17, 26, 35, 58, 60, 75, 143,<br />

145, 160, 161, 167–170, 172, 177<br />

Polish, 60, 60, 61, 62, 65, 71, 72, 77, 87,<br />

91, 141, 143, 144, 153, 158, 160,<br />

170–172, 174<br />

regular, 17<br />

separable, 168–170, 172<br />

<strong>to</strong>tally bounded, 168<br />

vec<strong>to</strong>r space, 42, 168<br />

<strong>to</strong>pology<br />

Euclide<strong>an</strong>, 42<br />

weak, 41, 42, 58, 60–63, 68, 72, 75, 82,<br />

89, 109, 142, 145, 153, 155, 159,<br />

167–169<br />

weak ∗ , 42, 143


General index 205<br />

<strong>to</strong>tally bounded, 159, 168, see <strong>to</strong>pological<br />

space<br />

metric, 143<br />

tr<strong>an</strong>sfer matrix, 114<br />

tr<strong>an</strong>siti<strong>on</strong> matrix, 115<br />

traveling salesm<strong>an</strong>, 36<br />

Tych<strong>on</strong>ov’s theorem, 81<br />

Ulam’s theorem, 61, 63, 170<br />

uniform integrability, 61, 61<br />

upper semic<strong>on</strong>tinuous, 32, 34, 147, 148,<br />

179, 180<br />

Varadh<strong>an</strong>’s lemma, see Varadh<strong>an</strong>’s theorem<br />

Varadh<strong>an</strong>’s theorem, 31, 31, 34, 35, 37, 50,<br />

51, 93, 101, 104, 162<br />

weak c<strong>on</strong>vergence, 13, 30, 35, 37, 38, 40,<br />

68–70, 72, 73, 75–77, 81, 84, 92, 113,<br />

116, 123, 159, 161, 162, 167, 167, 168,<br />

169<br />

weak large deviati<strong>on</strong>, 19<br />

weak <strong>to</strong>pology, see <strong>to</strong>pology<br />

XY model, see pl<strong>an</strong>e rota<strong>to</strong>r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!