ACOURSE INPROBABILITYTHEORYTHIRD EDITION

ACOURSE INPROBABILITYTHEORYTHIRD EDITIONKai Lai ChungStanford UniversityACADEMIC PRESSA l-iorcc_lurt Science and Technology CompanySan Diego San Francisco New YorkBoston London Sydney Tokyo

This book is pr**in**ted on acid-free paper .COPYRIGHT © 2001, 1974, 1968 BY ACADEMIC PRESSALL RIGHTS RESERVED.NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANYMEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATIONSTORAGE AND RETRIEVAL SYSTEM . WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER .Requests for permission to make copies of any part of the work should be mailed to thefollow**in**g address : Permissions Department, Harcourt, Inc ., 6277 Sea Harbor Drive, Orlando,Florida 32887-6777 .ACADEMIC PRESSA Harcourt Science and Technology Company525 B Street, Suite 1900, San Diego, CA 92101-4495, USAh ttp ://www .academicpress .comACADEMIC PRESSHarcourt Place, 32 Jamestown Road, London, NW 17BY, UKhttp ://www .academicpress .comLibrary of Congress Catalog**in**g **in** Publication Data : 00-106712International Standard Book Number : 0-12-174151-6PRINTED IN THE UNITED STATES OF AMERICA00 01 02 03 IP 9 8 7 6 5 4 3 2 1

ContentsPreface to the third editionPreface to the second editionPreface to the first editionixxixiii1 J Distribution function1 .1 Monotone functions 11 .2 Distribution functions 71 .3 Absolutely cont**in**uous and s**in**gular distributions 112 I Measure theory2 .1 Classes of sets 162.2 **Probability** measures and their distributionfunctions 213 1 Random variable . Expectation . Independence3.1 General def**in**itions 343.2 Properties of mathematical expectation 413.3 Independence 534 Convergence concepts4.1 Various modes of convergence 684.2 Almost sure convergence ; Borel-Cantelli lemma 75

vi I CONTENTS4 .3 Vague convergence 844 .4 Cont**in**uation 914 .5 Uniform **in**tegrability ; convergence of moments 995 1 Law of large numbers . Random series5.1 Simple limit theorems 1065.2 Weak law of large numbers 1125.3 Convergence of series 1215.4 Strong law of large numbers 1295.5 Applications 138Bibliographical Note 1486 1 Characteristic function6.1 General properties ; convolutions 1506.2 Uniqueness and **in**version 1606.3 Convergence theorems 1696.4 Simple applications 1756.5 Representation theorems 1876.6 Multidimensional case ; Laplace transforms 196Bibliographical Note 2047 1 Central limit theorem and its ramifications7.1 Liapounov's theorem 2057.2 L**in**deberg-Feller theorem 2147 .3 Ramifications of the central limit theorem 2247 .4 Error estimation 2357 .5 Law of the iterated logarithm 2427 .6 Inf**in**ite divisibility 250Bibliographical Note 2618 1 Random walk8 .1 Zero-or-one laws 2638 .2 Basic notions 2708 .3 Recurrence 2788 .4 F**in**e structure 2888 .5 Cont**in**uation 298Bibliographical Note 308

CONTENTS I vii9 1 Condition**in**g . Markov property. Mart**in**gale9 .1 Basic properties of conditional expectation 3109.2 Conditional **in**dependence ; Markov property 3229 .3 Basic properties of smart**in**gales 3349.4 Inequalities and convergence 3469.5 Applications 360Bibliographical Note 373Supplement : Measure and Integral1 Construction of measure 3752 Characterization of extensions 3803 Measures **in** R 3874 Integral 3955 Applications 407General Bibliography 413Index 415

Preface to the third editionIn this new edition, I have added a Supplement on Measure and Integral .The subject matter is first treated **in** a general sett**in**g pert**in**ent to an abstractmeasure space, and then specified **in** the classic Borel-Lebesgue case for thereal l**in**e . The latter material, an essential part of real analysis, is presupposed**in** the orig**in**al edition published **in** 1968 and revised **in** the second editionof 1974 . When I taught the course under the title "Advanced **Probability**"at Stanford University beg**in**n**in**g **in** 1962, students from the departments ofstatistics, operations research (formerly **in**dustrial eng**in**eer**in**g), electrical eng**in**eer**in**g,etc . often had to take a prerequisite course given by other **in**structorsbefore they enlisted **in** my course . In later years I prepared a set of notes,lithographed and distributed **in** the class, to meet the need . This forms thebasis of the present Supplement . It is hoped that the result may as well serve**in** an **in**troductory mode, perhaps also **in**dependently for a short course **in** thestated topics .The presentation is largely self-conta**in**ed with only a few particular referencesto the ma**in** text . For **in**stance, after (the old) §2 .1 where the basic notionsof set theory are expla**in**ed, the reader can proceed to the first two sections ofthe Supplement for a full treatment of the construction and completion of ageneral measure ; the next two sections conta**in** a full treatment of the mathematicalexpectation as an **in**tegral, of which the properties are recapitulated **in**§3 .2 . In the f**in**al section, application of the new **in**tegral to the older Riemann**in**tegral **in** calculus is described and illustrated with some famous examples .Throughout the exposition, a few side remarks, pedagogic, historical, even

x IPREFACE TO THE THIRD EDITIONjudgmental, of the k**in**d I used to drop **in** the classroom, are approximatelyreproduced .In draft**in**g the Supplement, I consulted Patrick Fitzsimmons on severaloccasions for support . Giorgio Letta and Bernard Bru gave me encouragementfor the uncommon approach to Borel's lemma **in** §3, for which the usual proofalways left me disconsolate as be**in**g too devious for the novice's appreciation .A small number of additional remarks and exercises have been added tothe ma**in** text .Warm thanks are due : to Vanessa Gerhard of Academic Press who decipheredmy handwritten manuscript with great ease and care ; to Isolde Fieldof the Mathematics Department for unfail**in**g assistence ; to Jim Luce for amission accomplished . Last and evidently not least, my wife and my daughterCor**in**na performed numerous tasks **in**dispensable to the undertak**in**g of thispublication .

Preface to the second editionThis edition conta**in**s a good number of additions scattered throughout thebook as well as numerous voluntary and **in**voluntary changes . The reader whois familiar with the first edition will have the joy (or chagr**in**) of spott**in**g newentries . Several sections **in** Chapters 4 and 9 have been rewritten to make thematerial more adaptable to application **in** stochastic processes . Let me reiteratethat this book was designed as a basic study course prior to various possiblespecializations . There is enough material **in** it to cover an academic year **in**class **in**struction, if the contents are taken seriously, **in**clud**in**g the exercises .On the other hand, the order**in**g of the topics may be varied considerably tosuit **in**dividual tastes. For **in**stance, Chapters 6 and 7 deal**in**g with limit**in**gdistributions can be easily made to precede Chapter 5 which treats almostsure convergence . A specific recommendation is to take up Chapter 9, wherecondition**in**g makes a belated appearance, before much of Chapter 5 or evenChapter 4 . This would be more **in** the modern spirit of an early wean**in**g fromthe **in**dependence concept, and could be followed by an excursion **in**to theMarkovian territory .Thanks are due to many readers who have told me about errors, obscurities,and **in**anities **in** the first edition . An **in**complete record **in**cludes the namesbelow (with apology for forgotten ones) : Geoff Eagleson, Z . Gov**in**darajulu,David Heath, Bruce Henry, Donald Iglehart, Anatole Joffe, Joseph Marker,P. Masani, Warwick Millar, Richard Olshen, S . M. Samuels, David Siegmund .T. Thedeen, A . Gonzalez Villa lobos, Michel Weil, and Ward Whitt . Therevised manuscript was checked **in** large measure by Ditlev Monrad . The

xii IPREFACE TO THE SECOND EDITIONgalley proofs were read by David Kreps and myself **in**dependently, and it wasfun to compare scores and see who missed what . But s**in**ce not all parts ofthe old text have undergone the same scrut**in**y, readers of the new editionare cordially **in**vited to cont**in**ue the fault-f**in**d**in**g . Martha Kirtley and JoanShepard typed portions of the new material . Gail Lemmond took charge ofthe f**in**al page-by-page revamp**in**g and it was through her lov**in**g care that therevision was completed on schedule .In the third pr**in**t**in**g a number of mispr**in**ts and mistakes, mostly m**in**or, arecorrected. I am **in**debted to the follow**in**g persons for some of these corrections :Roger Alexander, Steven Carchedi, Timothy Green, Joseph Horowitz, EdwardKorn, Pierre van Moerbeke, David Siegmund .In the fourth pr**in**t**in**g, an oversight **in** the proof of Theorem 6 .3 .1 iscorrected, a h**in**t is added to Exercise 2 **in** Section 6 .4, and a simplificationmade **in** (VII) of Section 9 .5. A number of m**in**or mispr**in**ts are also corrected . Iam **in**debted to several readers, **in**clud**in**g Asmussen, Robert, Schatte, Whitleyand Yannaros, who wrote me about the text .

Preface to the first editionA mathematics course is not a stockpile of raw material nor a random selectionof vignettes . It should offer a susta**in**ed tour of the field be**in**g surveyed anda preferred approach to it . Such a course is bound to be somewhat subjectiveand tentative, neither stationary **in** time nor homogeneous **in** space . But itshould represent a considered effort on the part of the author to comb**in**e hisphilosophy, conviction, and experience as to how the subject may be learnedand taught . The field of probability is already so large and diversified thateven at the level of this **in**troductory book there can be many different viewson orientation and development that affect the choice and arrangement of itscontent . The necessary decisions be**in**g hard and uncerta**in**, one too often takesrefuge by plead**in**g a matter of "taste ." But there is good taste and bad taste**in** mathematics just as **in** music, literature, or cuis**in**e, and one who dabbles **in**it must stand judged thereby .It might seem superfluous to emphasize the word "probability" **in** a bookdeal**in**g with the subject . Yet on the one hand, one used to hear such speciousutterance as "probability is just a chapter of measure theory" ; on the otherhand, many still use probability as a front for certa**in** types of analysis such ascomb**in**atorial, Fourier, functional, and whatnot . Now a properly constructedcourse **in** probability should **in**deed make substantial use of these and otherallied discipl**in**es, and a strict l**in**e of demarcation need never be drawn . ButPROBABILITY is still dist**in**ct from its tools and its applications not only **in**the f**in**al results achieved but also **in** the manner of proceed**in**g . This is perhaps

xiv I PREFACE TO THE FIRST EDITIONbest seen **in** the advanced study of stochastic processes, but will already beabundantly clear from the contents of a general **in**troduction such as this book .Although many notions of probability theory arise from concrete models**in** applied sciences, recall**in**g such familiar objects as co**in**s and dice, genesand particles, a basic mathematical text (as this pretends to be) can no longer**in**dulge **in** diverse applications, just as nowadays a course **in** real variablescannot delve **in**to the vibrations of str**in**gs or the conduction of heat . Incidentally,merely borrow**in**g the jargon from another branch of science withouttreat**in**g its genu**in**e problems does not aid **in** the understand**in**g of concepts orthe mastery of techniques .A f**in**al disclaimer : this book is not the prelude to someth**in**g else and doesnot lead down a strait and righteous path to any unique fundamental goal .Fortunately noth**in**g **in** the theory deserves such s**in**gle-m**in**ded devotion, asapparently happens **in** certa**in** other fields of mathematics . Quite the contrary,a basic course **in** probability should offer a broad perspective of the open fieldand prepare the student for various further possibilities of study and research .To this aim he must acquire knowledge of ideas and practice **in** methods, anddwell with them long and deeply enough to reap the benefits .A brief description will now be given of the n**in**e chapters, with somesuggestions for read**in**g and **in**struction . Chapters 1 and 2 are preparatory . Asynopsis of the requisite "measure and **in**tegration" is given **in** Chapter 2,together with certa**in** supplements essential to probability theory . Chapter 1 isreally a review of elementary real variables ; although it is somewhat expendable,a reader with adequate background should be able to cover it swiftly andconfidently-with someth**in**g ga**in**ed from the effort . For class **in**struction itmay be advisable to beg**in** the course with Chapter 2 and fill **in** from Chapter 1as the occasions arise . Chapter 3 is the true **in**troduction to the language andframework of probability theory, but I have restricted its content to what iscrucial and feasible at this stage, relegat**in**g certa**in** important extensions, suchas shift**in**g and condition**in**g, to Chapters 8 and 9 . This is done to avoid overload**in**gthe chapter with def**in**itions and generalities that would be mean**in**glesswithout frequent application . Chapter 4 may be regarded as an assembly ofnotions and techniques of real function theory adapted to the usage of probability. Thus, Chapter 5 is the first place where the reader encounters bona fidetheorems **in** the field . The famous landmarks shown there serve also to **in**troducethe ways and means peculiar to the subject . Chapter 6 develops some ofthe chief analytical weapons, namely Fourier and Laplace transforms, neededfor challenges old and new . Quick test**in**g grounds are provided, but for majorbattlefields one must await Chapters 7 and 8 . Chapter 7 **in**itiates what has beencalled the "central problem" of classical probability theory . Time has marchedon and the center of the stage has shifted, but this topic rema**in**s withoutdoubt a crown**in**g achievement . In Chapters 8 and 9 two different aspects of

PREFACE TO THE FIRST EDITION I xv(discrete parameter) stochastic processes are presented **in** some depth . Therandom walks **in** Chapter 8 illustrate the way probability theory transformsother parts of mathematics . It does so by **in**troduc**in**g the trajectories of aprocess, thereby turn**in**g what was static **in**to a dynamic structure . The samerevolution is now go**in**g on **in** potential theory by the **in**jection of the theory ofMarkov processes . In Chapter 9 we return to fundamentals and strike out **in**major new directions . While Markov processes can be barely **in**troduced **in** thelimited space, mart**in**gales have become an **in**dispensable tool for any seriousstudy of contemporary work and are discussed here at length . The fact thatthese topics are placed at the end rather than the beg**in**n**in**g of the book, wherethey might very well be, testifies to my belief that the student of mathematicsis better advised to learn someth**in**g old before plung**in**g **in**to the new .A short course may be built around Chapters 2, 3, 4, selections fromChapters 5, 6, and the first one or two sections of Chapter 9 . For a richer fare,substantial portions of the last three chapters should be given without skipp**in**gany one of them . In a class with solid background, Chapters 1, 2, and 4 neednot be covered **in** detail . At the opposite end, Chapter 2 may be filled **in** withproofs that are readily available **in** standard texts . It is my hope that this bookmay also be useful to mature mathematicians as a gentle but not so meager**in**troduction to genu**in**e probability theory . (Often they stop just before th**in**gsbecome **in**terest**in**g!) Such a reader may beg**in** with Chapter 3, go at once toChapter 5 with a few glances at Chapter 4, skim through Chapter 6, and takeup the rema**in****in**g chapters seriously to get a real feel**in**g for the subject .Several cases of exclusion and **in**clusion merit special comment . I choseto construct only a sequence of **in**dependent random variables (**in** Section 3 .3),rather than a more general one, **in** the belief that the latter is better absorbed **in** acourse on stochastic processes . I chose to postpone a discussion of condition**in**guntil quite late, **in** order to follow it up at once with varied and worthwhileapplications . With a little reshuffl**in**g Section 9 .1 may be placed right afterChapter 3 if so desired . I chose not to **in**clude a fuller treatment of **in**f**in**itelydivisible laws, for two reasons : the material is well covered **in** two or threetreatises, and the best way to develop it would be **in** the context of the underly**in**gadditive process, as orig**in**ally conceived by its creator Paul Levy . Itook pa**in**s to spell out a peripheral discussion of the logarithm of characteristicfunction to combat the errors committed on this score by numerousexist**in**g books . F**in**ally, and this is mentioned here only **in** response to a queryby Doob, I chose to present the brutal Theorem 5 .3 .2 **in** the orig**in**al formgiven by Kolmogorov because I want to expose the student to hardships **in**mathematics .There are perhaps some new th**in**gs **in** this book, but **in** general I havenot striven to appear orig**in**al or merely different, hav**in**g at heart the **in**terestsof the novice rather than the connoisseur . In the same ve**in**, I favor as a

xvi I PREFACE TO THE FIRST EDITIONrule of writ**in**g (euphemistically called "style") clarity over elegance . In myop**in**ion the slightly decadent fashion of conciseness has been overwrought,particularly **in** the writ**in**g of textbooks . The only valid argument I have heardfor an excessively terse style is that it may encourage the reader to th**in**k forhimself. Such an effect can be achieved equally well, for anyone who wishesit, by simply omitt**in**g every other sentence **in** the unabridged version .This book conta**in**s about 500 exercises consist**in**g mostly of special casesand examples, second thoughts and alternative arguments, natural extensions,and some novel departures . With a few obvious exceptions they are neitherprofound nor trivial, and h**in**ts and comments are appended to many of them .If they tend to be somewhat **in**bred, at least they are relevant to the text andshould help **in** its digestion . As a bold venture I have marked a few of themwith * to **in**dicate a "must," although no rigid standard of selection has beenused . Some of these are needed **in** the book, but **in** any case the reader's studyof the text will be more complete after he has tried at least those problems .Over a span of nearly twenty years I have taught a course at approximatelythe level of this book a number of times . The penultimate draft ofthe manuscript was tried out **in** a class given **in** 1966 at Stanford University .Because of an anachronism that allowed only two quarters to the course (asif probability could also blossom faster **in** the California climate!), I had toomit the second halves of Chapters 8 and 9 but otherwise kept fairly closelyto the text as presented here . (The second half of Chapter 9 was covered **in** asubsequent course called "stochastic processes .") A good fraction of the exerciseswere assigned as homework, and **in** addition a great majority of themwere worked out by volunteers . Among those **in** the class who cooperated**in** this manner and who corrected mistakes and suggested improvements are :Jack E . Clark, B . Curtis Eaves, Susan D . Horn, Alan T . Huckleberry, ThomasM . Liggett, and Roy E . Welsch, to whom I owe s**in**cere thanks . The manuscriptwas also read by J . L. Doob and Benton Jamison, both of whom contributeda great deal to the f**in**al revision . They have also used part of the manuscript**in** their classes . Aside from these personal acknowledgments, the book owesof course to a large number of authors of orig**in**al papers, treatises, and textbooks. I have restricted bibliographical references to the major sources whileadd**in**g many more names among the exercises . Some oversight is perhaps**in**evitable ; however, **in**consequential or irrelevant "name-dropp**in**g" is deliberatelyavoided, with two or three exceptions which should prove the rule .It is a pleasure to thank Rosemarie Stampfel and Gail Lemmond for theirsuperb job **in** typ**in**g the manuscript .

Distribution function1 .1 Monotone functionsWe beg**in** with a discussion of distribution functions as a traditional wayof **in**troduc**in**g probability measures . It serves as a convenient bridge fromelementary analysis to probability theory, upon which the beg**in**ner may pauseto review his mathematical background and test his mental agility . Some ofthe methods as well as results **in** this chapter are also useful **in** the theory ofstochastic processes .In this book we shall follow the fashionable usage of the words "positive","negative", "**in**creas**in**g", "decreas**in**g" **in** their loose **in**terpretation .For example, "x is positive" means "x > 0" ; the qualifier "strictly" will beadded when "x > 0" is meant . By a "function" we mean **in** this chapter a realf**in**ite-valued one unless otherwise specified .Let then f be an **in**creas**in**g function def**in**ed on the real l**in**e (-oc, +oo) .Thus for any two real numbers x1 and x2,(1) x1 < x2 = f (XI) < f (x2) •We beg**in** by review**in**g some properties of such a function . The notation"t 'r x" means "t < x, t - x" ; "t ,, x" means "t > x, t -a x" .

2 1 DISTRIBUTION FUNCTION(i) For each x, bothunilateral limits(2) h m f (t) = f (x-) and lm f (t) = f (x+)exist and are f**in**ite . Furthermore the limits at **in**f**in**ityt o m f (t) = f (-oo) and t am f (t) = f (+oo)exist ; the former may be -oo, the latter may be +oo .This follows from monotonicity ; **in**deedf (x-) = sup f (t), f (x+) = **in**f f (t) .-oo

1 .1 MONOTONE FUNCTIONS 1 3Example 1 . Let x0 be an arbitrary real number, and def**in**e a function f as follows :f(x)=0 for x

4 1 DISTRIBUTION FUNCTIONbe replaced by an arbitrary countable set without any change of the argument . Wenow show that the condition of countability is **in**dispensable . By "countable" we meanalways "f**in**ite (possibly empty) or countably **in**f**in**ite" .(iv) The set of discont**in**uities of f is countable .We shall prove this by a topological argument of some general applicability. In Exercise 3 after this section another proof based on an equally usefulcount**in**g argument will be **in**dicated . For each po**in**t of jump x consider theopen **in**terval IX = (f (x-), f (x+)) . If x' is another po**in**t of jump and x < x',say, then there is a po**in**t x such that x < x < x' . Hence by monotonicity wehavef (x+) < f (x) < f (x' - ) .It follows that the two **in**tervals Ix and I x , are disjo**in**t, though they mayabut on each other if f (x+) = f (x'-) . Thus we may associate with the set ofpo**in**ts of jump **in** the doma**in** of f a certa**in** collection of pairwise disjo**in**t open**in**tervals **in** the range of f . Now any such collection is necessarily a countableone, s**in**ce each **in**terval conta**in**s a rational number, so that the collection of**in**tervals is **in** one-to-one correspondence with a certa**in** subset of the rationalnumbers and the latter is countable . Therefore the set of discont**in**uities is alsocountable, s**in**ce it is **in** one-to-one correspondence with the set of **in**tervalsassociated with it .(v) Let f 1 and f 2 be two **in**creas**in**g functions and D a set that is (everywhere)dense **in** (-oc, +oc) . Suppose thatVx E D :f 1(x) = f 2(X)-Then f 1 and f 2 have the same po**in**ts of jump of the same size, and theyco**in**cide except possibly at some of these po**in**ts of jump .To see this, let x be an arbitrary po**in**t and let t o E D, t ;, E D, t, 'r x,t,' x . Such sequences exist s**in**ce D is dense . It follows from (i) that(6)f 1(x-) = lim f 1 (t„) = lira f 2 (t„) = f 2 (x-),nf 1 (x+)=lnmf l (t;,)=limf2(t;,)= f2(x+) .nIn particularVx : f I (X+) - f 1(x- ) = f 2(X+) - f2(x - ) .The first assertion **in** (v) follows from this equation and (ii) . Furthermore iff 1 is cont**in**uous at x, then so is f 2 by what has just been proved, and we

1 .1 MONOTONE FUNCTIONS 1 5havef I (x) = f 1 (x- ) = f 2(x- ) = f 2(x),prov**in**g the second assertion .How can f 1 and f 2 differ at all? This can happen only when f 1 (x)and f2(x) assume different values **in** the **in**terval (f1(x-), f 1 (x+)) =(f 2(x-), f 2(x+)) . It will turn out **in** Chapter 2 (see **in** particular Exercise 21of Sec . 2 .2) that the precise value of f at a po**in**t of jump is quite unessentialfor our purposes and may be modified, subject to (3), to suit our convenience .More precisely, given the function f, we can def**in**e a new function f **in**several different ways, such asf (x) = f (x - ), f (x) = f (x+), f (x) = f (x - )+ f (x+) ,2and use one of these **in**stead of the orig**in**al one . The third modification isfound to be convenient **in** Fourier analysis, but either one of the first two ismore suitable for probability theory . We have a free choice between them andwe shall choose the second, namely, right cont**in**uity .(vi) If we putVx :f (x) = f (x+),then f is **in**creas**in**g and right cont**in**uous everywhere .Let us recall that an arbitrary function g is said to be right cont**in**uous atx iff lim t 4x g(t) exists and the limit, to be denoted by g(x+), is equal to g(x) .To prove the assertion (vi) we must show thatVx : lm f (t+) = f (x+) .This is **in**deed true for any f such that f (t+) exists for every t . For then :given any E > 0, there exists 8(E) > 0 such thatVs E (x, x + 8) : I f (s) - f (x+)I

6 1 DISTRIBUTION FUNCTIONand so on of f on its doma**in** of def**in**ition if **in** the usual def**in**itions we restrictourselves to the po**in**ts of D . Even if f is def**in**ed **in** a larger doma**in**, we maystill speak of these properties "on D" by consider**in**g the "restriction of fto D" .(vii) Let f be **in**creas**in**g on D, and def**in**e f on (-oo, +oo) as follows :Vx: f (x) = **in**f f (t) .z 0 be given . There exists to E D, to > xo,such thatHence if t E D, xo < t < to, we havef(to) - E < f (xo) < f (to) .0 < .f (t) - .f (xo) < .f (to) - .f (xo)

8 1 DISTRIBUTION FUNCTIONb ithe size at jump at ai, thenF(ai) - F(ai-) = bis**in**ce F(ai +) = F(ai) .Consider the functionFd(x) =J(x)which represents the sum of all the jumps of F **in** the half-l**in**e (-00, x) . It is .clearly **in**creas**in**g, right cont**in**uous, with(3)Fd (-oc) = 0, Fd (+oo) =Hence Fd is a bounded **in**creas**in**g function . It should constitute the "jump**in**gpart" of F, and if it is subtracted out from F, the rema**in**der should be positive,conta**in** no more jumps, and so be cont**in**uous . These plausible statements willnow be proved -they are easy enough but not really trivial .iTheorem 1 .2 .1 .LetF, (x) = F(x) - FAX) ;then F, is positive, **in**creas**in**g, and cont**in**uous .PROOF .Let x < x', then we have(4) Fd(x') - FAX) =x

1 .2 DISTRIBUTION FUNCTIONS 1 9Theorem 1 .2 .2 . Let F be a d .f. Suppose that there exist a cont**in**uous functionG, and a function Gd of the formGd(x) - b ja~ (x).l[where {a~ } is a countable set of real numbers and> ; Ib' I < oo], such thatthenF=G,+Gd,G C = F, Gd = Fd,where F c and Fd are def**in**ed as before .PROOF. If Fd 0 Gd, then either the sets {a j } and {a'} are not identical,or we may relabel the a i so that a i. = aJ for all j but b i:A bj for some j . Ineither case we have for at least one j, and a = aj or a' :[Fd(a) - Fd(a - )] - [Gd(a) - Gd (a-)] 0 0 .S**in**ce F c- G, = Gd - Fd, this implies thatF, (a) - Gja) - [Fc (a-) - G, (a- )] 0 0,contradict**in**g the fact that F, - G c is a cont**in**uous function . Hence Fd = Gdand consequently Fc = G,DEFINITION . A d .f . F that can be represented **in** the formF= ) .b j 8 ajwhere {aj } is a countable set of real numbers, b j > 0 for every j and > ; b j = 1,is called a discrete d .f. A d .f . that is cont**in**uous everywhere is called a cont**in**uousd .f.Suppose F c =A 0, F d # 0 **in** Theorem 1 .2.1, then we may set a = F d (oo)sothat 0

10 1 DISTRIBUTION FUNCTIONF2 - F ; **in** either extreme case (5) rema**in**s valid . We may now summarizethe two theorems above as follows .Theorem 1 .2 .3 . Every d .f. can be written as the convex comb**in**ation of adiscrete and a cont**in**uous one . Such a decomposition is unique .EXERCISES1 . Let F be a d .f. Then for each x,lim[F(x+E)-F(x-E)]=0C40unless x is a po**in**t of jump of F, **in** which case the limit is equal to the sizeof the jump .*2. Let F be a d .f. with po**in**ts of jump {aj } . Prove that the sumX-E 0 . The set of all such x is called thesupport of F . Show that each po**in**t of jump belongs to the support, and thateach isolated po**in**t of the support is a po**in**t of jump . Give an example of adiscrete d .f. whose support is the whole l**in**e .

1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 117. Prove that the support of any d .f. is a closed set, and the support ofany cont**in**uous d .f. is a perfect set .1 .3 Absolutely cont**in**uous and s**in**gular distributionsFurther analysis of d .f.'s requires the theory of Lebesgue measure . Throughoutthe book this measure will be denoted by m ; "almost everywhere" on the reall**in**e without qualification will refer to it and be abbreviated to "a .e ." ; an**in**tegral written **in** the form f . . . dt is a Lebesgue **in**tegral ; a function f issaid to be "**in**tegrable" **in** (a, b) iffGbf (t) dtis def**in**ed and f**in**ite [this entails, of course, that f be Lebesgue measurable] .The class of such functions will be denoted by L 1 (a, b), and L l (-oo, oo) isabbreviated to L 1 . The complement of a subset S of an understood "space"such as (-oo, +oo) will be denoted by S` .DEFINITION . A function F is called absolutely cont**in**uous [**in** (-oc, oo)and with respect to the Lebesgue measure] iff there exists a function f **in** L 1such that we have for every x < x' :(1) F(x) - F(x) _ ff (t) dt .It follows from a well-known proposition (see, e .g ., Natanson [3]*) that sucha function F has a derivative equal to f a .e . In particular, if F is a d .f., thenJXX/(2) f > 0 a .e . andf~f (t) dt = 1 .Conversely, given any f **in** L 1def**in**ed by(3) dx : F(x) _ Jf (t) dtsatisfy**in**g the conditions **in** (2), the function Fis easily seen to be a d .f. that is absolutely cont**in**uous .DEFINITION . A function F is called s**in**gular iff it is not identically zeroand F' (exists and) equals zero a .e .a* Numbers **in** brackets refer to the General Bibliography .

12 ( DISTRIBUTION FUNCTIONThe next theorem summarizes some basic facts of real function theory ;see, e .g ., Natanson [3] .Theorem 1 .3 .1 . Let F be bounded **in**creas**in**g with F(-oo) = 0, and let F'denote its derivative wherever exist**in**g . Then the follow**in**g assertions are true .(a) If S denotes the set of all x for which F'(x) exists with 0 < F(x) 0 .IHence F S is also **in**creas**in**g and F s < F . We are now **in** a position to announcethe follow**in**g result, which is a ref**in**ement of Theorem 1 .2 .3 .Theorem 1 .3 .2 . Every d .f. F can be written as the convex comb**in**ation ofa discrete, a s**in**gular cont**in**uous, and an absolutely cont**in**uous d .f. Such adecomposition is unique .EXERCISES1 . A d .f . F is s**in**gular if and only if F = F s ; it is absolutely cont**in**uousif and only if F - FQc .2. Prove Theorem 1 .3 .2 .*3 . If the support of a d .f . (see Exercise 6 of Sec . 1 .2) is of measure zero,then F is s**in**gular . The converse is false .

1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 13*4 . Suppose that F is a d .f. and (3) holds with a cont**in**uous f . ThenF' = f > 0 everywhere .5 . Under the conditions **in** the preced**in**g exercise, the support of F is theclosure of the set {t I f (t) > 0} ; the complement of the support is the **in**teriorof the set {t I f (t) = 0) .6. Prove that a discrete distribution is s**in**gular . [Cf. Exercise 13 ofSec . 2 .2 .]7. Prove that a s**in**gular function as def**in**ed here is (Lebesgue) measurablebut need not be of bounded variation even locally . [FuNT : Such a function iscont**in**uous except on a set of Lebesgue measure zero ; use the completenessof the Lebesgue measure .]The rema**in**der of this section is devoted to the construction of a s**in**gularcont**in**uous distribution . For this purpose let us recall the construction of theCantor (ternary) set (see, e .g ., Natanson [3]) . From the closed **in**terval [0,1],the "middle third" open **in**terval (3 , 3 ) is removed ; from each of the tworema**in****in**g disjo**in**t closed **in**tervals the middle third, ( 9 , 9 ) and (9 , 9 ), respectively,are removed and so on . After n steps, we have removed1+2+ . . .+2n-1=2'i-1disjo**in**t open **in**tervals and are left with 2' disjo**in**t closed **in**tervals each oflength 1/3 n . Let these removed ones, **in** order of position from left to right,be denoted by J n ,k, 1 < k < 2" - 1, and their union by U,, . We have1 2 4 2n -1 2 nm(U„)=3+32+33+ . . . } 3n =1_ ( 2As n T oo, U n **in**creases to an open set U ; the complement C of U withrespect to [0,I] is a perfect set, called the Cantor set . It is of measure zeros**in**cem(C)=1-m(U)=1-1=0 .Now for each n and k, n > 1, 1 < k < 2n - 1, we putkCn .k = Zn;and def**in**e a function F on U as follows :(7) F(x) = C,,,k for X E J,l,k .This def**in**ition is consistent s**in**ce two **in**tervals, J,,,k and J„ ',k, are eitherdisjo**in**t or identical, and **in** the latter case so are c n ,k = c n ',k- . The last assertion

14 1 DISTRIBUTION FUNCTIONbecomes obvious if we proceed step by step and observe thatJn+1,2k = Jn,k, Cn+1,2k = Cn,k for 1 < k < 2n - 1 .The value of F is constant on each J n , k and is strictly greater on any otherJ n ', k ' situated to the right of J n , k . Thus F is **in**creas**in**g and clearly we havelim F(x) = 0, lim F(x) = 1 .xyoxt1Let us complete the def**in**ition of F by sett**in**gF(x) = 0 for x < 0, F(x) = 1 for x > 1 .F is now def**in**ed on the doma**in** D = (- oo, 0) U U U (1, oo) and **in**creas**in**gthere . S**in**ce each J n , k is at a distance > 1/3n from any dist**in**ct Jn,k' and thetotal variation of F over each of the 2' disjo**in**t **in**tervals that rema**in** afterremov**in**g J n , k , 1 < k < 2n - 1, is 1/2n, it follows that0

1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 1510 . For each x c [0, 1 ], we have2F ( 3 ) = F(x), 2F(3 + 3 - 1 = F(x) .11 . Calculate1I11x2 dF(x),0 0ffe" dF(x) .[H NT : This can be done directly or by us**in**g Exercise 10 ; for a third methodsee Exercise 9 of Sec . 5 .3 .]12 . Extend the function F on [0,1 ] trivially to (-oc, oo) . Let {rn } be anenumeration of the rationals andG(x) =O0 1n=12nF(rn + x) .Show that G is a d .f. that is strictly **in**creas**in**g for all x and s**in**gular . Thus wehave a s**in**gular d .f. with support (-oo, oo) .*13 . Consider F on [0,I] . Modify its **in**verse F-1 suitably to make its**in**gle-valued **in** [0,1] . Show that F-1 so modified is a discrete d .f . and f**in**dits po**in**ts of jump and their sizes .14. Given any closed set C **in** (-oo, +oo), there exists a d .f. whosesupport is exactly C . [xmJT : Such a problem becomes easier when the correspond**in**gmeasure is considered ; see Sec . 2 .2 below .]*15. The Cantor d .f. F is a good build**in**g block of "pathological"examples . For example, let H be the **in**verse of the homeomorphic map of [0,1 ]onto itself : x ---> 2[F +x] ; and E a subset of [0,1] which is not Lebesguemeasurable . Show that1H(E)°H = lEwhere H(E) is the image of E, 1B is the **in**dicator function of B, and o denotesthe composition of functions . Hence deduce : (1) a Lebesgue measurable functionof a strictly **in**creas**in**g and cont**in**uous function need not be Lebesguemeasurable ; (2) there exists a Lebesgue measurable function that is not Borelmeasurable .

Measure theory2 .1 Classes of setsLet Q be an "abstract space", namely a nonempty set of elements to becalled "po**in**ts" and denoted generically by co . Some of the usual operationsand relations between sets, together with the usual notation, are givenbelow .Union : E U F, U E/111Intersection : E f1 F, n E,ComplementDifferenceSymmetric differenceS**in**gleton'IE` = Q\EE\F=Ef F`E A F = (E\F) U (F\E){cv}Conta**in****in**g (for subsets of Q as well as for collections thereof) :E C F, F D E (not exclud**in**g E = F)c/ C A, -~3 D ,cY (not exclud**in**g

18 1 MEASURE THEORYby (iv), which holds **in** a field, F n C F,,+1 andhence U~_ 1 E j E cJ by (vi) .00 00U Ej = U Fj ;j=1 j=1The collection J of all subsets of S2 is a B .F . called the total B.F. ; thecollection of the two sets {o, Q} is a B .F. called the trivial B.F. If A is any**in**dex set and if for every a E A, ~Ta is a B .F . (or M.C .) then the **in**tersectionn ,,,,-A `~a of all these B .F.'s (or M .C .'s), namely the collection of sets eachof which belongs to all Via , is also a B .F . (or M.C .) . Given any nonemptycollection 6 of sets, there is a m**in**imal B.F . (or field, or M .C .) conta**in****in**g it ;this is just the **in**tersection of all B .F .'s (or fields, or M.C .'s) conta**in****in**g -E,of which there is at least one, namely the J mentioned above . This m**in**imalB .F . (or field, or M .C .) is also said to be generated by 6 . In particular ifis a field there is a m**in**imal B .F . (or M .C .) conta**in****in**g A .Theorem 2 .1 .2 . Let 3~ be a field, 6 the m**in**imal M .C . conta**in****in**g ~)/b, ~ them**in**imal B .F . conta**in****in**g A, then 3 = 6 .PROOF . S**in**ce a B .F. is an M .C ., we have ~T D 6. To prove Z c 6 it issufficient to show that { is a B .F. Hence by Theorem 2 .1 .1 it is sufficient toshow that 6 is a field . We shall show that it is closed under **in**tersection andcomplementation . Def**in**e two classes of subsets of 6 as follows :The identitiesnF,1 =UE j E .C/j=1C,={EE S' :EnFE6~'forallFE },(2 ={EE~' :EnFE6 for allFE6} .Fn UEj = U(FnEj)(j=1 j= 100 00Fn nE j = n(FnE j )j=1 j=1show that both (i'1 and (2 are M.C .'s . S**in**ce T is closed under **in**tersection andconta**in**ed **in** 6, it is clear that - C 1 . Hence f,C (i by the m**in**imality of .6,

2 .1 CLASSES OF SETS 1 19and so = 6„71 . This means for any F E J~ and E E ~' we have F fl E E ',which **in** turn means A C (2 . Hence {' = f2 and this means f, is closed under**in**tersection .Next, def**in**e another class of subsets of §' as follows :The (DeMorgan) identities63={EEC : E` E9}0000U E j ) c = I I Ej(j=1 j=1c0000nEj( j=1UE jj=1show that e3 is a M .C . S**in**ce lb c e3, it follows as before that ' = - E3, whichmeans ' is closed under complementation . The proof is complete .Corollary . Let 3~ be a field, 07 the m**in**imal B .F. conta**in****in**g A ; -F a classof sets conta**in****in**g A and hav**in**g the closure properties (vi) and (vii), then -Cconta**in**s ~T .The theorem above is one of a type called monotone class theorems . Theyare among the most useful tools of measure theory, and serve to extend certa**in**relations which are easily verified for a special class of sets or functions to alarger class . Many versions of such theorems are known ; see Exercise 10, 11,and 12 below .EXERCISES*1 . (UjAj)\(UjB1) C Uj(Aj\Bj) . (f jAj)\(fljBj) C U j (Aj\Bj) . Whenis there equality?*2. The best way to def**in**e the symmetric difference is through **in**dicatorsof sets as follows :lAOB = l A + 1 B (mod 2)where we have arithmetical addition modulo 2 on the right side . All propertiesof o follow easily from this def**in**ition, some of which are rather tedious toverify otherwise . As examples :(A AB) AC=AA(BAC),(A AB)o(BAC)=AAC,

20 1 MEASURE THEORY(A AB) A(C AD) = (A A C) A(B AD),AAB=C' A =BAC,AAB=CAD4 AAC=BAD .3 . If S2 has exactly n po**in**ts, then J has 2" members . The B .F . generatedby n given sets "without relations among them" has 2 2 " members .4. If Q is countable, then J is generated by the s**in**gletons, andconversely . [HNT : All countable subsets of S2 and their complements forma B .F.]5 . The **in**tersection of any collection of B .F .'s a E A) is the maximalB .F . conta**in**ed **in** all of them ; it is **in**differently denoted by I IaEA ~a or A aE a .*6 . The union of a countable collection of B .F .'s {5~ } such that :~Tj C z~+rneed not be a B .F ., but there is a m**in**imal B .F . conta**in****in**g all of them, denotedby v j ; 7~ . In general V A denotes the m**in**imal B .F. conta**in****in**g all Oa , a E A .[HINT : Q = the set of positive **in**tegers ; 9Xj = the B .F. generated by those upto j .]7. A B .F . is said to be countably generated iff it is generated by a countablecollection of sets . Prove that if each j is countably generated, then sois v~__ 1 .*8 . Let `T be a B .F. generated by an arbitrary collection of sets {E a , a EA) . Prove that for each E E ~T, there exists a countable subcollection {Eaj , j >1 } (depend**in**g on E) such that E belongs already to the B .F . generated by thissubcollection . [Hrwr : Consider the class of all sets with the asserted propertyand show that it is a B .F. conta**in****in**g each Ea .]9 . If :T is a B .F . generated by a countable collection of disjo**in**t sets{A„ }, such that U„ A„ = Q, then each member of JT is just the union of acountable subcollection of these A,, 's .10 . Let 2 be a class of subsets of S2 hav**in**g the closure property (iii) ;let .,/ be a class of sets conta**in****in**g Q as well as , and hav**in**g the closureproperties (vi) and (x) . Then ~/ conta**in**s the B .F . generated by !7 . (This isDynk**in**'s form of a monotone class theorem which is expedient for certa**in**applications . The proof proceeds as **in** Theorem 2 .1 .2 by replac**in**g ~' 3 and -6-with 9 and .~/ respectively .)11 . Take Q = .:;4" or a separable metric space **in** Exercise 10 and let 7be the class of all open sets . Let ;XF be a class of real-valued functions on Qsatisfy**in**g the follow**in**g conditions .(a) 1 E '' and l D E ~Y' for each D E 1 ;(b) dY' is 'a vector space, namely : if f i E f 2 E XK and c 1 , c 2 are anytwo real constants, then c1 f 1 + c2f2 E X' ;

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 21(c))K is closed with respect to **in**creas**in**g limits of positive functions,namely : if f, E', 0 < f, < f i+1 for all n, and f =limn T f n

22 1 MEASURE THEORYAxiom (ii) is called "countable additivity" ; the correspond**in**g axiomrestricted to a f**in**ite collection {E j ) is called "f**in**ite additivity" .The follow**in**g proposition(1) En ., 0 =#- ) - ~- 0is called the "axiom of cont**in**uity" . It is a particular case of the monotoneproperty (x) above, which may be deduced from it or proved **in** the same wayas **in**dicated below .Theorem 2 .2 .1 . The axioms of f**in**ite additivity and of cont**in**uity togetherare equivalent to the axiom of countable additivity .If E, 4,PROOF . Let E„ ~ . We have the obvious identity :00 00E n = U (Ek\Ek+1) U n Ek .k=n k=10, the last term is the empty set . Hence if (ii) is assumed, we haveVn > 1 : J (E n ) _00k=n-,~)"(Ek\Ek+1)~the series be**in**g convergent, we have limn,,,,-,"P(En) = 0 . Hence (1) is true .Conversely, let {Ek , k > 1) be pairwise disjo**in**t, then00U E k ~ok=n+1(why?) and consequently, if (1) is true, then00lim :~, U Ek = 0 .)t -4 00(k=n+1Now if f**in**ite additivity is assumed, we haveU Ek = ?P U Ek + JP U EkC, ~'\k=1 k=1 k=n+1 =Ono+11 00?11(Ek) + ~) U Ek -k=1 k=n+1This shows that the **in**f**in**ite series Ek°1 J'(Ek) converges as it is bounded bythe first member above . Lett**in**g n -+ oo, we obta**in**

2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 23xU Ekk=1= limn-+oc00nk=1?/'(Ek) -f- 11m11-+00U Ek=n+1k=1J'(Ek ) •Hence (ii) is true .Remark . For a later application (Theorem 3 .3 .4) we note the follow**in**gextension . Let J be def**in**ed on a field :l which is f**in**itely additive and satisfiesaxioms (i), (iii), and (1) . Then (ii) holds whenever U k E k E T, . For thenUk_r,+1 Ek also belongs to , and the second part of the proof above rema**in**svalid .The triple (S2, 97 , °J') is called a probability space (triple) ; S2 alone iscalled the sample space, and co is then a sample po**in**t .Let A c S2, then the trace of the B .F . ~7 on A is the collection of all setsof the form A n F, where F E 3~7 . It is easy to see that this is a B .F. of subsetsof A, and we shall denote it by A n ~F . Suppose A E 9 and 9"(A) > 0 ; thenwe may def**in**e the set function J'o on A n J~ as follows :?P(E)dE E A n J : ?PA (E) = JP(Q) .It is easy to see that GPA is a p.m. on A n J~ . The triple (A, A n ~T , ~To) willbe called the trace of (S2, J , 7') on A .Example 1 . Let Q be a countable set : Q = (U)j, j E J}, where J is a countable **in**dexset, and let :' be the total B .F . of S2 . Choose any sequence of numbers { p j , j E J)satisfy**in**g(2) YjEJ :pj>_0 ;jEJp= l ;and def**in**e a set function :~ on as follows :(3) `dE E ' : '~'(E) =pWjEEIn words, we assign p j as the value of the "probability" of the s**in**gleton {w j }, andfor an arbitrary set of wj's we assign as its probability the sum of all the probabilitiesassigned to its elements . Clearly axioms (i), (ii), and (iii) are satisfied . Hence ~/, sodef**in**ed is a p .m .Conversely, let any such ?T be given on ",T . S**in**ce (a)j} E :-'7 for every j, :~,,)({wj})is def**in**ed, let its value be pj . Then (2) is satisfied . We have thus exhibited all the

24 1 MEASURE THEORYpossible p .m .'s on Q, or rather on the pair (S2, J) ; this will be called a discrete samplespace . The entire first volume of Feller's well-known book [13] with all its rich contentis based on just such spaces .Example 2 . Let (0, 1 ], C the collection of **in**tervals :C'={(a,b] :0

2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 253. In the preced**in**g example show that for each real number a **in** [0, 1 ]there is an E **in** -C such that ^E) = a . Is the set of all primes **in** f ? Givean example of E that is not **in** -r .4. Prove the nonexistence of a p .m . on (Q, ./'), where (S2, J) is as **in**Example 1, such that the probability of each s**in**gleton has the same value .Hence criticize a sentence such as : "Choose an **in**teger at random" .5. Prove that the trace of a B .F. 3A on any subset A of Q is a B .F .Prove that the trace of (S2, ~T, ,~P) on any A **in** ~ is a probability space, ifP7(A) > 0 .*6. Now let A 0 3 be such thatACFE~'F= P(F)=1 .Such a set is called thick **in** (S2, ~ , 91) . If E = A f1 F, F E ~~ 7 , def**in**e ~P* (E) =~P(F) . Then GJ'* is a well-def**in**ed (what does it mean?) p .m . on (A, A l 3 ) .This procedure is called the adjunction of A to (Q, ZT,7. The B .F . A1 on ~1 is also generated by the class of all open **in**tervalsor all closed **in**tervals, or all half-l**in**es of the form (-oo, a] or (a, oo), or these**in**tervals with rational endpo**in**ts . But it is not generated by all the s**in**gletonsof J l nor by any f**in**ite collection of subsets of R 1 .8. A1 conta**in**s every s**in**gleton, countable set, open set, closed set, Gsset, Fa set . (For the last two k**in**ds of sets see, e .g ., Natanson [3] .)* 9 . Let 6 be a countable collection of pairwise disjo**in**t subsets {E j , j > 1 }of R 1 , and let ~ be the B .F. generated by , . Determ**in**e the most generalp.m . on zT- and show that the result**in**g probability space is "isomorphic" tothat discussed **in** Example 1 .10 . Instead of requir**in**g that the E d 's be pairwise disjo**in**t, we may makethe broader assumption that each of them **in**tersects only a f**in**ite number **in**the collection . Carry through the rest of the problem .The question of probability measures on J3 1 is closely related to thetheory of distribution functions studied **in** Chapter 1 . There is **in** fact a one-toonecorrespondence between the set functions on the one hand, and the po**in**tfunctions on the other . Both po**in**ts of view are useful **in** probability theory .We establish first the easier half of this correspondence .Lemma . Each p .m . ,a on J3 1 determ**in**es a d .f. F through the correspondence(4) Vx E ,W 1 : x]) = F(x) .

26 1 MEASURE THEORYAs a consequence, we have for -oo < a < b < +oo :µ((a, b]) = F(b) - F(a),(5)µ((a, b)) = F(b-) - F(a),IL([a, b)) = F(b-) - F(a-),µ([a, b]) = F(b) - F(a-) .Furthermore, let D be any dense subset of R. 1 , then the correspondence isalready determ**in**ed by that **in** (4) restricted to x E D, or by any of the fourrelations **in** (5) when a and b are both restricted to D .PROOF .Let us writedx E ( 1 : IX = (- 00, x] .Then IX A1 E so that µ(Ix ) is def**in**ed ; call it F(x) and so def**in**e the functionF on R 1 . We shall show that F is a d .f. as def**in**ed **in** Chapter 1 . First of all, Fis **in**creas**in**g by property (viii) of the measure . Next, if x, 4, x, then IX, 4 Ix,hence we have by (ix)(6) F(x,,) = µ(Ix„) 4 h(Ix) = F (x) .Hence F is right cont**in**uous . [The reader should ascerta**in** what changes shouldbe made if we had def**in**ed F to be left cont**in**uous .] Similarly as x ,[ -oo, Ix ,-0 ; as x T +oc, Ix T J? 1 . Hence it follows from (ix) aga**in** thatlim F(x) = lim µ(Ix ) = µ(Q1) = 0 ;x~-oox,j-00x 11n F(x) = xlim /-t (Ix ) = µ(S2) = 1 .This ends the verification that F is a d .f. The relations **in** (5) follow easilyfrom the follow**in**g complement to (4) :g (( - oc, x)) = F(x-) .To see this let x„ < x and x„ T x . S**in**ce IX „ T (-oc, x), we have by (ix) :F(x-) = lim F(x„) = x,,)) T µ((-oo, x)) .11 -). 00To prove the last sentence **in** the theorem we show first that (4) restrictedto x E D implies (4) unrestricted . For this purpose we note that µ((-oc, x]),as well as F(x), is right cont**in**uous as a function of x, as shown **in** (6) .Hence the two members of the equation **in** (4), be**in**g both right cont**in**uousfunctions of x and co**in**cid**in**g on a dense set, must co**in**cide everywhere . Now

2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 27suppose, for example, the second relation **in** (5) holds for rational a and b . Foreach real x let a n , b„ be rational such that a„ ,(, -oc and b„ > x, b,, ,4, x . Thenµ((a,,, b„ )) -- µ((-oo, x]) and F(b"-) - F(a") -* F(x) . Hence (4) follows .Incidentally, the correspondence (4) "justifies" our previous assumptionthat F be right cont**in**uous, but what if we have assumed it to be left cont**in**uous?Now we proceed to the second-half of the correspondence .Theorem 2.2 .2. Each d .f. F determ**in**es a p .m . µ on 0 through any one ofthe relations given **in** (5), or alternatively through (4) .This is the classical theory of Lebesgue-Stieltjes measure ; see, e .g .,Halmos [4] or Royden [5] . However, we shall sketch the basic ideas as animportant review . The d .f. F be**in**g given, we may def**in**e a set function for**in**tervals of the form (a, b] by means of the first relation **in** (5) . Such a functionis seen to be countably additive on its doma**in** of def**in**ition . (What does thismean?) Now we proceed to extend its doma**in** of def**in**ition while preserv**in**gthis additivity . If S is a countable union of such **in**tervals which are disjo**in**t :S = U(ai, b l ]we are forced to def**in**e µ(S), if at all, byµ(S) = µ((ay, bi]) _ {F(bt) - F(ai)} .iiBut a set S may be representable **in** the form above **in** different ways, sowe must check that this def**in**ition leads to no contradiction : namely that itdepends really only on the set S and not on the representation . Next, we noticethat any open **in**terval (a, b) is **in** the extended doma**in** (why?) and **in**deed theextended def**in**ition agrees with the second relation **in** (5) . Now it is wellknown that any open set U **in** JR1 is the union of a countable collection ofdisjo**in**t open **in**tervals [there is no exact analogue of this **in** R " for n > 1], sayU = U ; (c ; ,; d,) and this representation is unique . Hence aga**in** we are forcedto def**in**e µ(U), if at all, byLt(U) = µ((ci, d,)) ={F(d, - ) - F(c,)} .Hav**in**g thus def**in**ed the measure for all open sets, we f**in**d that its values forall closed sets are thereby also determ**in**ed by property (vi) of a probabilitymeasure . In particular, its value for each s**in**gleton {a} is determ**in**ed to beF(a) - F(a-), which is noth**in**g but the jump of F at a . Now we also know itsvalue on all countable sets, and so on -all this provided that no contradiction

28 1 MEASURE THEORYis ever forced on us so far as we have gone . But even with the class of openand closed sets we are still far from the B .F . :- 9 . The next step will be the G8sets and the FQ sets, and there already the picture is not so clear . Althoughit has been shown to be possible to proceed this way by transf**in**ite **in**duction,this is a rather difficult task . There is a more efficient way to reach the goalvia the notions of outer and **in**ner measures as follows . For any subset S ofj)l consider the two numbers :µ*(S) = **in**f µ(U),U open, UDSA * (S) = sup WC) .C closed, CcSA* is the outer measure, µ* the **in**ner measure (both with respect to the givenF) . It is clear that g* (S) > µ* (S) . Equality does not **in** general hold, but whenit does, we call S "measurable" (with respect to F) . In this case the commonvalue will be denoted by µ(S) . This new def**in**ition requires us at once tocheck that it agrees with the old one for all the sets for which p. has alreadybeen def**in**ed. The next task is to prove that : (a) the class of all measurablesets forms a B .F ., say i ; (b) on this I, the function ,a is a p .m . Details ofthese proofs are to be found **in** the references given above . To f**in**ish : s**in**ce' is a B .F ., and it conta**in**s all **in**tervals of the form (a, b], it conta**in**s them**in**imal B .F. A i with this property . It may be larger than A i , **in**deed it is(see below), but this causes no harm, for the restriction of µ to 3i is a p.m .whose existence is asserted **in** Theorem 2 .2 .2 .Let us mention that the **in**troduction of both the outer and **in**ner measuresis useful for approximations . It follows, for example, that for each measurableset S and E > 0, there exists an open set U and a closed set C such thatUDS3C and(7) AM - E < /.c(S) < µ(C) + E .There is an alternative way of def**in****in**g measurability through the use of theouter measure alone and based on Caratheodory's criterion .It should also be remarked that the construction described above for(J,i J3 1 µ) is that of a "topological measure space", where the B .F . is generatedby the open sets of a given topology on 9 1 , here the usual Euclideanone . In the general case of an "algebraic measure space", **in** which there is notopological structure, the role of the open sets is taken by an arbitrary field A,and a measure given on Z)b be extended to the m**in**imal B .F . ?~T conta**in****in**g**in** a similar way . In the case of M. l , such an 33 is given by the field Z4o ofsets, each of which is the union of a f**in**ite number of **in**tervals of the form (a,b], (-oo, b], or (a, oo), where a E oRl, b E M i . Indeed the def**in**ition of the

2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 29outer measure given above may be replaced by the equivalent one :(8) A* (E) = **in**f a(U n ) .where the **in**fimum is taken over all countable unions Un U n such that eachU n E Mo and Un U n D E . For another case where such a construction isrequired see Sec . 3 .3 below .There is one more question : besides the µ discussed above is there anyother p.m. v that corresponds to the given F **in** the same way? It is importantto realize that this question is not answered by the preced**in**g theorem . It isalso worthwhile to remark that any p .m . v that is def**in**ed on a doma**in** strictlyconta**in****in**g A1 and that co**in**cides with µ on ?Z 1 (such as the µ on f asmentioned above) will certa**in**ly correspond to F **in** the same way, and strictlyspeak**in**g such a v is to be considered as dist**in**ct from u . Hence we shouldphrase the question more precisely by consider**in**g only p .m.'s on A 1 . Thiswill be answered **in** full generality by the next theorem .Theorem 2 .2 .3 . Let µ and v be two measures def**in**ed on the same B .F . 97,which is generated by the field A . If either µ or v is a-f**in**ite on A, andµ(E) = v(E) for every E E A, then the same is true for every E E 97, andthus µ = v .PROOF . We give the proof only **in** the case where µ and v are both f**in**ite,leav**in**g the rest as an exercise . LetC = {E E 37 : µ(E) = v(E)},then C 3 by hypothesis . But e is also a monotone class, for if En E forevery n and En T E or E n 4, E, then by the monotone property of µ and v,respectively,µ(E) = lim µ(E n ) = lim v(E„) = v(E) .nnIt follows from Theorem 2 .1 .2 that -C D 97, which proves the theorem .Remark . In order that µ and v co**in**cide on A, it is sufficient that theyco**in**cide on a collection ze~ such that f**in**ite disjo**in**t unions of members ofconstitute A .Corollary . Let µ and v be a-f**in**ite measures on A 1 that agree on all **in**tervalsof one of the eight k**in**ds : (a, b], (a, b), [a, b), [a, b], (-oc, b], (-oo, b), [a, oc),(a, oc) or merely on those with the endpo**in**ts **in** a given dense set D, thenthey agree on A' .n

30 1 MEASURE THEORYPROOF . In order to apply the theorem, we must verify that any of thehypotheses implies that µ and v agree on a field that generates J3 . Let us take**in**tervals of the first k**in**d and consider the field z~ 713o def**in**ed above . If µ and vagree on such **in**tervals, they must agree on Ao by countable additivity . Thisf**in**ishes the proof .Return**in**g to Theorems 2 .2.1 and 2 .2.2, we can now add the follow**in**gcomplement, of which the first part is trivial .Theorem 2 .2 .4 . Given the p .m . µ on A 1 , there is a unique d .f . F satisfy**in**g(4) . Conversely, given the d .f. F, there is a unique p .m . satisfy**in**g (4) or anyof the relations **in** (5) .We shall simply call µ the p.m. of F, and F the d.f. of ic .Instead of (R 1 , ^ we may consider its restriction to a fixed **in**terval [a,b] . Without loss of generality we may suppose this to be 9l = [0, 1] so that weare **in** the situation of Example 2 . We can either proceed analogously or reduceit to the case just discussed, as follows . Let F be a d .f. such that F = 0 for x 1 . The probability measure u of F will then have support**in** [0, 1], s**in**ce u((-oo, 0)) = 0 =,u((1, oo)) as a consequence of (4) . Thusthe trace of (9. 1 , ?A3 1 , µ) on J« may be denoted simply by (9I, A, µ), whereA is the trace of -7 , 3 1 on 9/ . Conversely, any p .m. on J3 may be regarded assuch a trace . The most **in**terest**in**g case is when F is the "uniform distribution"on 9,/ :0 for x < 0,F(x)= x for 0 1 .The correspond**in**g measure m on ",A is the usual Borel measure on [0, 1 ],while its extension on J as described **in** Theorem 2 .2 .2 is the usual Lebesguemeasure there . It is well known that / is actually larger than °7 ; **in**deed (J, m)is the completion of ( .7, m) to be discussed below .DEFINITION . The probability space (Q, is said to be complete iffany subset of a set **in** -T with l(F) = 0 also belongs to ~T .Any probability space (Q, T , J/') can be completed accord**in**g to the nexttheorem . Let us call a set **in** J 7T with probability zero a null set . A propertythat holds except on a null set is said to hold almost everywhere (a .e .), almostsurely (a .s.), or for almost every w .Theorem 2 .2 .5 . Given the probability space (Q, 3 , U~'), there exists acomplete space (S2, , %/') such that :~i C ?T and /) = J~ on r~ .

2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 31PROOF . Let "1 " be the collection of sets that are subsets of null sets, andlet ~~7 be the collection of subsets of Q each of which differs from a set **in** z~;7by a subset of a null set . Precisely :(9)~T={ECQ :EAFEA" for some FE~7} .It is easy to verify, us**in**g Exercise 1 of Sec . 2 .1, that ~~ is a B .F . Clearly itconta**in**s 27 . For each E E 7, we puto'(E) = -,:; P(F'),where F is any set that satisfies the condition **in**dicated **in** (7) . To show thatthis def**in**ition does not depend on the choice of such an F, suppose thatThen by Exercise 2 of Sec . 2 .1,E A F 1 E .1l', E A F2 E A' .(E A Fl) o(E o F2) = (F1 A F2) o(E o E) = F1 A F2 .Hence F 1 A F 2 E J" and so 2P(F 1 A F 2 ) = 0 . This implies 7)(F 1 ) = GJ'(F 2 ),as was to be shown . We leave it as an exercise to show that 0 1 is a measureon o . If E E 97, then E A E = 0 E .A', hence GJ'(E) = GJ'(E) .F**in**ally, it is easy to verify that if E E 3 and 9171(E) = 0, then E E -A'' .Hence any subset of E also belongs to . A l^ and so to JT . This proves that(Q, ~ , J') is complete .What is the advantage of completion? Suppose that a certa**in** property,such as the existence of a certa**in** limit, is known to hold outside a certa**in** setN with J'(N) = 0 . Then the exact set on which it fails to hold is a subsetof N, not necessarily **in** 97 , but will be **in** k with 7(N) = 0 . We need themeasurability of the exact exceptional set to facilitate certa**in** dispositions, suchas def**in****in**g or redef**in****in**g a function on it ; see Exercise 25 below .EXERCISESIn the follow**in**g, g is a p.m . on %3 1 and F is its d .f.*11. An atom of any measure g on A1 is a s**in**gleton {x} such thatg({x}) > 0 . The number of atoms of any a-f**in**ite measure is countable . Foreach x we have g({x}) = F(x) - F(x-) .12 . g is called atomic iff its value is zero on any set not conta**in****in**g anyatom. This is the case if and only if F is discrete . g is without any atom oratomless if and only if F is cont**in**uous .13 . g is called s**in**gular iff there exists a set Z with m(Z) = 0 suchthat g (Z`') = 0 . This is the case if and only if F is s**in**gular . [Hurt : One halfis proved by us**in**g Theorems 1 .3 .1 and 2 .1 .2 to get fB F(x) dx < g(B) forB E A 1 ; the other requires Vitali's cover**in**g theorem .]

32 1 MEASURE THEORY14 . Translate Theorem 1 .3 .2 **in** terms of measures .* 15 . Translate the construction of a s**in**gular cont**in**uous d .f. **in** Sec . 1 .3 **in**terms of measures . [It becomes clearer and easier to describe!] Generalize theconstruction by replac**in**g the Cantor set with any perfect set of Borel measurezero . What if the latter has positive measure? Describe a probability schemeto realize this .16. Show by a trivial example that Theorem 2 .2.3 becomes false if thefield ~~ is replaced by an arbitrary collection that generates 27 .17. Show that Theorem 2 .2 .3 may be false for measures that are a-f**in**iteon ~T . [HINT : Take Q to be 11, 2, . . . , oo} and ~~ to be the f**in**ite sets exclud**in**goc and their complements, µ(E) = number of po**in**ts **in** E, ,u,(oo) :A v(oo) .]18 . Show that the Z **in** (9) is also the collection of sets of the formF U N [or F\N] where F E and N E . 1' .19 . Let -1 - be as **in** the proof of Theorem 2 .2 .5 and A o be the set of allnull sets **in** (Q, ZT, ~?) . Then both these collections are monotone classes, andclosed with respect to the operation "\" .*20. Let (Q, 37, 9') be a probability space and 91 a Borel subfield oftJ . Prove that there exists a m**in**imal B .F. ~ satisfy**in**g 91 C S c ~ and1o c 3~, where J6 is as **in** Exercise 19 . A set E belongs to ~ if and onlyif there exists a set F **in** 1 such that EAF E A0 . This O~ is called theaugmentation of j with respect to (Q,.97,9)21 . Suppose that F has all the def**in****in**g properties of a d .f. except that itis not assumed to be right cont**in**uous . Show that Theorem 2 .2 .2 and Lemmarema**in** valid with F replaced by F, provided that we replace F(x), F(b), F(a)**in** (4) and (5) by F(x+), F(b+), F(a+), respectively . What modification isnecessary **in** Theorem 2 .2 .4?22 . For an arbitrary measure ?/-' on a B .F . T, a set E **in** 3 is called anatom of J7' iff .(E) > 0 and F C E, F E T imply q'(F) = 9(E) or 9'(F) =0 . JT is called atomic iff its value is zero over any set **in** 3 that is disjo**in**tfrom all the atoms . Prove that for a measure µ on ?/3 1 this new def**in**ition isequivalent to that given **in** Exercise 11 above provided we identify two setswhich differ by a JI-null set .23 . Prove that if the p.m . JT is atomless, then given any a **in** [0, 1]there exists a set E E zT with JT(E) = a . [HINT : Prove first that there exists Ewith "arbitrarily small" probability . A quick proof then follows from Zorn'slemma by consider**in**g a maximal collection of disjo**in**t sets, the sum of whoseprobabilities does not exceed a . But an elementary proof without us**in**g anymaximality pr**in**ciple is also possible .]*24. A po**in**t x is said to be **in** the support of a measure µ on A" iffevery open neighborhood of x has strictly positive measure . The set of allsuch po**in**ts is called the support of µ . Prove that the support is a closed setwhose complement is the maximal open set on which µ vanishes . Show that

2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 33the support of a p .m . on A' is the same as that of its d .f., def**in**ed **in** Exercise 6of Sec . 1 .2 .*25 . Let f be measurable with respect to ZT, and Z be conta**in**ed **in** a nullset . Def**in**e_ft f on Zc,1 K on Z,where K is a constant . Then f is measurable with respect to 7 provided that(Q, , Y) is complete . Show that the conclusion may be false otherwise .

3 Random variable .Expectation . Independence3 .1 General def**in**itionsLet the probability space (Q, ~ , °J') be given . Ri = (-oo, -boo) the (f**in**ite)real l**in**e, A* = [-oc, +oo] the extended real l**in**e, A 1 = the Euclidean Borelfield on A", A* = the extended Borel field . A set **in** U, * is just a set **in** Apossibly enlarged by one or both po**in**ts +oo .DEFINITION OF A RANDOM VARIABLE . A real, extended-valued random variableis a function X whose doma**in** is a set A **in** ~ and whose range isconta**in**ed **in** M* = [-oo, +oc] such that for each B **in** A*, we have(1) {N : X(60) E B} E A n :-Twhere A n ~T is the trace of ~T- on A . A complex-valued random variable isa function on a set A **in** ~T to the complex plane whose real and imag**in**aryparts are both real, f**in**ite-valued random variables .This def**in**ition **in** its generality is necessary for logical reasons **in** manyapplications, but for a discussion of basic properties we may suppose A = Qand that X is real and f**in**ite-valued with probability one . This restricted mean**in**g

3 .1 GENERAL DEFINITIONS 1 3 5of a "random variable", abbreviated as "r.v .", will be understood **in** the bookunless otherwise specified . The general case may be reduced to this one byconsider**in**g the trace of (Q, ~T, ~') on A, or on the "doma**in** of f**in**iteness"A0 = {co : IX(w)J < oc), and tak**in**g real and imag**in**ary parts .Consider the "**in**verse mapp**in**g" X-1 from M, ' to Q, def**in**ed (as usual)as follows :VA C R' : X -1 (A) = {co : X (w) E A} .Condition (1) then states that X -1 carries members of A1 onto members of .~ :(2) dB E 9,3 1 : X -1 (B) E `T ;or **in** the briefest notation :X-11 )C 3;7 .Such a function is said to be measurable (with respect to ) . Thus, an r .v . isjust a measurable function from S2 to R' (or R*) .The next proposition, a standard exercise on **in**verse mapp**in**g, is essential .Theorem 3 .1 .1 . For any function X from Q to g1 (or R*), not necessarilyan r .v ., the **in**verse mapp**in**g X -1 has the follow**in**g properties :X -1 (A ` ) _ (X-' (A))` .X -1(UAa) = UX -1 (A«),X -1n A a = nX - '(A c,) .a «where a ranges over an arbitrary **in**dex set, not necessarily countable .Theorem 3 .1 .2 . X is an r .v. if and only if for each real number x, or eachreal number x **in** a dense subset of cal, we havePROOF .{(0 :X(0)) < x} E ;7- T-The preced**in**g condition may be written as(3) Vx : X-1((-00,x]) E .Consider the collection .~/ of all subsets S of R1 for which X -1 (S) E i . FromTheorem 3 .1 .1 and the def**in****in**g properties of the Borel field f , it follows thatif S E c/, thenX-' (S`)= (X -1 (S))` E :'- ;

36 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEif V j : Sj E /, thenX -1 (us)j = UX -1 (Sj) E -~ .lThus S` E c/ and U, Sj E c'/ and consequently c/ is a B .F . This B .F . conta**in**sall **in**tervals of the form (-oc, x], which generate 73 1 even if x is restricted toa dense set, hence ,

3 .1 GENERAL DEFINITIONS 1 37Specifically, F is given byF(x) = A(( - oo, x]) = 91{X < x) .While the r .v. X determ**in**es lc and therefore F, the converse is obviouslyfalse . A family of r .v .'s hav**in**g the same distribution is said to be "identicallydistributed" .Example 1 . Let (S2, J) be a discrete sample space (see Example 1 of Sec . 2 .2) .Every numerically valued function is an r .v .Example 2 . (91, A, m) .In this case an r .v . is by def**in**ition just a Borel measurable function . Accord**in**g tothe usual def**in**ition, f on 91 is Borel measurable iff f -' (A3') C ?Z . In particular, thefunction f given by f (co) - w is an r .v . The two r .v .'s co and 1 - co are not identicalbut are identically distributed ; **in** fact their common distribution is the underly**in**gmeasure m .Example 3 . (9' A' µ)The def**in**ition of a Borel measurable function is not affected, s**in**ce no measureis **in**volved ; so any such function is an r .v ., whatever the given p .m . µ may be . As**in** Example 2, there exists an r .v . with the underly**in**g µ as its p.m. ; see Exercise 3below .We proceed to produce new r .v .'s from given ones .Theorem 3 .1 .4 . If X is an r .v ., f a Borel measurable function [on (R1 A 1 )],then f (X) is an r .v .PROOF . The quickest proof is as follows . Regard**in**g the function f (X) ofco as the "composite mapp**in**g" :f ° X : co -* f (X (w)),we have (f ° X) -1 = X -1 ° f -1 and consequently(f ° X)-1 0) = X-1(f-1 W)) C X -1 (A1) CThe reader who is not familiar with operations of this k**in**d is advised to spellout the proof above **in** the old-fashioned manner, which takes only a littlelonger .We must now discuss the notion of a random vector . This is just a vectoreach of whose components is an r .v . It is sufficient to consider the case of twodimensions, s**in**ce there is no essential difference **in** higher dimensions apartfrom complication **in** notation .

1 RANDOM VARIABLE38. EXPECTATION . INDEPENDENCEWe recall first that **in** the 2-dimensional Euclidean space : 2, or the plane,the Euclidean Borel field _32 is generated by rectangles of the form{(x,y):a

3 .1 GENERAL DEFINITIONS 1 39by (2) . Now the collection of sets A **in** ?i2 for which (X, Y) - '(A) E T formsa B .F. by the analogue of Theorem 3 .1 .1 . It follows from what has just beenshown that this B .F . conta**in**s %~o hence it must also conta**in** A2 . Hence eachset **in** /3 2 belongs to the collection, as was to be proved .Here are some important special cases of Theorems 3 .1 .4 and 3 .1 .5 .Throughout the book we shall use the notation for numbers as well as functions:(6) x v y= max(x, y), x A y= m**in**(x, y) .Corollary . If X is an r .v . and f is a cont**in**uous function on R 1 , then f (X)is an r .v . ; **in** particular X r for positive **in**teger r, 1XIr for positive real r, e-U,e"x for real A and t, are all r .v .'s (the last be**in**g complex-valued) . If X and Yare r, v .' s, thenXvY, XAY, X+Y, X-Y, X •Y ,X/Yare r .v .'s, the last provided Y does not vanish .Generalization to a f**in**ite number of r .v.'s is immediate . Pass**in**g to an**in**f**in**ite sequence, let us state the follow**in**g theorem, although its analogue **in**real function theory should be well known to the reader .Theorem 3 .1 .6 . If {X j , j > 1} is a sequence of r .v .'s, then**in**f X j , sup X j, lim **in**f X j , lim sup XjJ J J jare r .v .'s, not necessarily f**in**ite-valued with probability one though everywheredef**in**ed, andlim X jj -+00is an r .v . on the set A on which there is either convergence or divergence to+0O .PROOF . To see, for example, that sup j X j is an r .v ., we need only observethe relationvdx E Jai' {supX j < x} = n{X j < x}JJand use Theorem 3 .1 .2 . S**in**celim sup X j = **in**f (sup X j ),I n j>n

40 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEand limj_,,,, X j exists [and is f**in**ite] on the set where lim sup s X j = lim **in**f iX j[and is f**in**ite], which belongs to ZT, the rest follows .Here already we see the necessity of the general def**in**ition of an r .v .given at the beg**in**n**in**g of this section .DEFINITION . An r .v . X is called discrete (or countably valued) iff thereis a countable set B C R.1 such that 9'(X E B) = 1 .It is easy to see that X is discrete if and only if its d .f. is . Perhaps it isworthwhile to po**in**t out that a discrete r .v . need not have a range that is discrete**in** the sense of Euclidean topology, even apart from a set of probability zero .Consider, for example, an r.v . with the d .f. **in** Example 2 of Sec . 1 .1 .The follow**in**g term**in**ology and notation will be used throughout the bookfor an arbitrary set 0, not necessarily the sample space .DEFINTTION . For each A C Q, the function 1 ° ( •) def**in**ed as follows :b'w E S2 : IA(W) _ 1,is called the **in**dicator (function) of 0 .if w E A,- 0, if w E Q\0,Clearly 1 ° is an r.v . if and only if A E 97 .A countable partition of 0 is a countable family of disjo**in**t sets {Aj},with Aj E zT for each j and such that Q = U ; Aj . We have then1=1More generally, let b j be arbitrary real numbers, then the function co def**in**edbelow :Vw E Q : cp(w) = bj 1 A, (w),iis a discrete r .v . We shall call cp the r .v . belong**in**g to the weighted partition{A j ; b j } . Each discrete r .v . X belongs to a certa**in** partition . For let {bj } bethe countable set **in** the def**in**ition of X and let A j = {w : X (o)) = b 1 }, then Xbelongs to the weighted partition {A j ; b j } . If j ranges over a f**in**ite **in**dex set,the partition is called f**in**ite and the r .v . belong**in**g to it simple .EXERCISES1. Prove Theorem 3 .1 .1 . For the "direct mapp**in**g" X, which of theseproperties of X -1 holds?

3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION ( 4 12 . If two r .v .'s are equal a .e ., then they have the same p .m .*3. Given any p .m . µ on def**in**e an r .v. whose p.m . is µ . Canthis be done **in** an arbitrary probability space?* 4. Let 9 be uniformly distributed on [0,1] . For each d.f. F, def**in**e G(y) _sup{x : F(x) < y} . Then G(9) has the d .f. F .*5. Suppose X has the cont**in**uous d .f . F, then F(X) has the uniformdistribution on [0,I] . What if F is not cont**in**uous?6. Is the range of an r .v. necessarily Borel or Lebesgue measurable?7. The sum, difference, product, or quotient (denom**in**ator nonvanish**in**g)of the two discrete r .v .'s is discrete .8. If S2 is discrete (countable), then every r .v . is discrete . Conversely,every r .v . **in** a probability space is discrete if and only if the p .m . is atomic .[Hi' r: Use Exercise 23 of Sec . 2 .2 .]9 . If f is Borel measurable, and X and Y are identically distributed, thenso are f (X) and f (Y) .10. Express the **in**dicators of A, U A2, A 1 fl A2, A 1 \A 2 , A 1 A A2,lim sup A„ , lim **in**f A, **in** terms of those of A 1 , A 2 , or A, [For the def**in**itionsof the limits see Sec . 4 .2 .]*11 . Let T {X} be the m**in**imal B .F . with respect to which X is measurable .Show that A E f {X} if and only if A = X -1 (B) for some B E A' . Is this Bunique? Can there be a set A ~ A1 such that A = X -1 (A)?12 . Generalize the assertion **in** Exercise 11 to a f**in**ite set of r .v .'s . [It ispossible to generalize even to an arbitrary set of r .v .'s .]3 .2 Properties of mathematical expectationThe concept of "(mathematical) expectation" is the same as that of **in**tegration**in** the probability space with respect to the measure ' . The reader is supposedto have some acqua**in**tance with this, at least **in** the particular case (V, A, m)or %3 1 , **in**) . [In the latter case, the measure not be**in**g f**in**ite, the theoryof **in**tegration is slightly more complicated .] The general theory is not muchdifferent and will be briefly reviewed . The r .v.'s below will be tacitly assumedto be f**in**ite everywhere to avoid trivial complications .For each positive discrete r .v. X belong**in**g to the weighted partition{A j ; b j }, we def**in**e its expectation to be(1) J`(X) = >b j {A ;} .JThis is either a positive f**in**ite number or +oo . It is trivial that if X belongsto different partitions, the correspond**in**g values given by (1) agree . Now let

42 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEX be an arbitrary positive r.v . For any two positive **in**tegers m and n,the setn n + 1A mn = w :2m < X (O))

3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 43and call it "the **in**tegral of X (with respect to 91) over the set A" . We shallsay that X is **in**tegrable with respect to JP over A iff the **in**tegral above existsand is f**in**ite .In the case of (:T 1 , ?P3 1 , A), if we write X = f , (o = x, the **in**tegralI = X(w)9'(dw) f f (x)li(dx)nis just the ord**in**ary Lebesgue-Stieltjes **in**tegral of f with respect to p . If Fis the d .f. of µ and A = ( a, b], this is also written as(a b]f (x) dF(x) .This classical notation is really an anachronism, orig**in**ated **in** the days when apo**in**t function was more popular than a set function . The notation above mustthen amend to1b+0 b+0 b-0 b-0a+0 1-0 1 +0 a-0to dist**in**guish clearly between the four k**in**ds of **in**tervals (a, b], [a, b], (a, b),[a, b) .In the case of (2l, A, m), the **in**tegral reduces to the ord**in**ary Lebesgue**in**tegralabf (x)m(dx) =ff (x) dx .Here m is atomless, so the notation is adequate and there is no need to dist**in**guishbetween the different k**in**ds of **in**tervals .The general **in**tegral has the familiar properties of the Lebesgue **in**tegralon [0,1] . We list a few below for ready reference, some be**in**g easy consequencesof others . As a general notation, the left member of (4) will beabbreviated to fn X dJ/' . In the follow**in**g, X, Y are r .v .'s ; a, b are constants ;A is a set **in** ,~ .(i) Absolute **in**tegrability . fn X d J~' is f**in**ite if and only ifab(ii) L**in**earity .JAIX I dJ' < oc .f(aX+bY)cl=afXd+bf.°~'YdJAprovided that the right side is mean**in**gful, namely not +oo - oc or -oo + oc .

44 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE(iii) Additivity over sets .If the A n 's are disjo**in**t, then(iv) Positivity .Ju n AnXd9'=~ f Xd~'XnAnIf X > 0 a .e . on A, then(v) Monotonicity .fMean value theorem .(vii) Modulus **in**equality .fnX d °J' > 0 .If X1 < X < X2 a .e . on A, thenX i dam'< f XdGJ'< f X2d?P .n n nIf a < X < b a .e . on A, thenaGJ'(A) < f X d°J' < b°J'(A) .nJAX d°J'fIXId?7J.(viii) Dom**in**ated convergence theorem . If lim n, X n = X a .e . or merely**in** measure on A and `dn : IXn I < Y a.e. on A, with fA Y dY' < oc, then(5) lim f Xn dGJ' =fX dJ' = f lim X n d-OP .n moo A JA A n-'°°(ix) Bounded convergence theorem . If lim n, X n = X a .e . or merely**in** measure on A and there exists a constant M such that `'n : IX„ I < M a .e .on A, then (5) is true .(x) Monotone convergence theorem . If X, > 0 and X n T X a .e . on A,then (5) is aga**in** true provided that +oo is allowed as a value for eithermember. The condition "X,, > 0" may be weakened to : "6'(X„) > -oo forsome n" .(xi) Integration term by term . IfEf IXn I dJ' < oo,nnthen En IXn I < oc a .e . on A so that En X n converges a.e. on A and'Xn dJ' = f Xn d?P .n

3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 45(xii)Fatou's lemma . If X, > 0 a .e . on A, thenA ( f lim X,) &l' < lim X, 1 d:-Z'.n-+oc n-+00 ALet us prove the follow**in**g useful theorem as an **in**structive example .Theorem 3 .2.1 .We have00 00(6) E (IXI >- n) < "( IXI) < 1 + '( IXI >_ n)I1=1 n=1so that (`(IXI) < oo if and only if the series above converges .PROOF . By the additivity property (iii), if A 11 = { n < IXI < n + 1),dGJ' .n=0 nnHence by the mean value theorem (vi) applied to each set A n0000(`°(IXI) = Y,'(7 ) ng'(An) < ( IXI) (n + 1)~P(An) = 1 + n~P(An)jn=0n=0n=0It rema**in**s to show00(8) n_,~P(An) _ ( IXI > n),n=0n=0f**in**ite or **in**f**in**ite . Now the partial sums of the series on the left may be rearranged(Abel's method of partial summation!) to yield, for N > 1,N(9) En{J~'(IXI > n) -'(IXI > n + 1)}n=0f IXI0000n=1- - 1)}"P(IXI >- n) -N~(IXI > N+ 1)Thus we haveNn=1NN'(IXI > n) - NJ'(IXI > N + 1) .(10) Ln'~,)(An) < ° ( IXI > n) .~/(An)+N,I'(IXI > N+ 1) .n=1n=1n=1

46 I RANDOM VARIABLE. EXPECTATION. INDEPENDENCEAnother application of the mean value theorem givesN:~(IX I> N + 1) < f IXI dT .(IXI?N+1)Hence if c`(IXI) < oo, then the last term **in** (10) converges to zero as N --* 00and (8) follows with both sides f**in**ite . On the other hand, if c1(IXI) = o0, thenthe second term **in** (10) diverges with the first as N -* o0, so that (8) is alsotrue with both sides **in**f**in**ite .Corollary .If X takes only positive **in**teger values, then~(X) =J'(X > n) .nEXERCISES1. If X > 0 a .e. on A and fA X dY = 0, then X = 0 a .e. on A .*2 . If c (IXI) < oo and limn,x "J(A f ) = 0, then limn,oo fA X d.°J' = 0 .In particularlimXdA=0Xn->oo f (IXI>n)3. Let X > 0 and f 2 X d~P = A, 0 < A < oo . Then the set function vdef**in**ed on J;7 as follows :v(A)= A fn X dJT,is a probability measure on ~ .4 . Let c be a fixed constant, c > 0 . Then ct(IXI) < o0 if and only ifxn=1~ft XI > cn) < oo .In particular, if the last series converges for one value of c, it converges forall values of c .r)5. For any r > 0, (, (IX I < oc if and only ifxn=1nr -1 "T(IXI > n) < oc .

3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION ( 47*6. Suppose that sup, IX„ I < Y on A with fA Y d?/' oc. Deduce from 0, there exists a simple r .v .(see the end of Sec . 3 .1) such thatd` ( IX - X E I) < E .Hence there exists a sequence of simple r.v .'sWe can choose IX,,) so that I Xm Ilim ((IX - X,n 1)-0 .M_+ 00< IXI for all m* 8. For any two sets A 1 and A 2 **in** ~T, def**in**eXm such that.p(A1, A2) _ 9'(A1 A A2) ;then p is a pseudo-metric **in** the space of sets **in** 3~ 7; call the result**in**g metricspace M(27, J') . Prove that for each **in**tegrable r .v. X the mapp**in**g of M(J~ , ~')to t given by A -f fA X d2' is cont**in**uous . Similarly, the mapp**in**gs onM(T , ?P) x M(3 , 7') to M(7 , P) given by(A1, A2) ---are all cont**in**uous . If (see Sec . 4.2 below)A1 U A2, A1 fl A2, A1\A2, A1 A A2lim sup A„ = lim **in**f A„itnmodulo a null set, we denote the common equivalence class of these two setsby lim„ A,, . Prove that **in** this case [A,,) converges to lim„ A,, **in** the metricp . Deduce Exercise 2 above as a special case .There is a basic relation between the abstract **in**tegral with respect to/' over sets **in** : T on the one hand, and the Lebesgue-Stieltjes **in**tegral withrespect to µ over sets **in** 1 on the other, **in**duced by each r .v. We give theversion **in** one dimension first .Theorem 3 .2 .2. Let X on (Q, ;~i , .T) **in**duce the probability space%3 1 , µ) accord**in**g to Theorem 3 .1 .3 and let f be Borel measurable . Thenwe have(11) f f (X (cw)) ;'/'(dw) = f (x)A(dx)provided that either side exists .

48 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEPROOF. Let B E . %1 1 , and f = 1B, then the left side **in** (11) is ' X E B)and the right side is µ(B) . They are equal by the def**in**ition of p. **in** (4) ofSec . 3 .1 . Now by the l**in**earity of both **in**tegrals **in** (11), it will also hold iff= bj1 Bj ,namely the r .v . on (MI, A) belong**in**g to an arbitrary weighted partition{B j ; b j } . For an arbitrary positive Borel measurable function f we can def**in**e,as **in** the discussion of the abstract **in**tegral given above, a sequence f f, m >1 } of the form (12) such that f . .. T f everywhere . For each of them we have(13) [fm(X)d~'= f .fordµ ;hence, lett**in**g **in** -+ oo and us**in**g the monotone convergence theorem, weobta**in** (11) whether the limits are f**in**ite or not . This proves the theorem forf > 0, and the general case follows **in** the usual way .We shall need the generalization of the preced**in**g theorem **in** severaldimensions . No change is necessary except for notation, which we will give**in** two dimensions . Instead of the v **in** (5) of Sec . 3 .1, let us write the "masselement" as it 2 (dx, d y) so thatv(A) = ff µ( 2 dx , d y) .ATheorem 3 .2 .3 . Let (X, Y) on (Q, f, J') **in**duce the probability space( , ,2 , ;,~32 , µ2)and let f be a Borel measurable function of two variables .Then we have( 14) f f (X(w), Y(w)) .o(dw) = ff f (x, y)l.2 (dx, dy) .s~Note that f (X, Y) is an r .v . by Theorem 3 .1 .5 .As a consequence of Theorem 3 .2 .2, we have : if µx and Fx denote,respectively, the p .m . and d .f. **in**duced by X, then we haveand more generally~( (X) = f xµx (dx)~~ oc= f x d Fx (x) ;(15) (`( .f (X)) = f , . f (x)l.x(dx) = f .f (x) dFx(x)00with the usual proviso regard**in**g existence and f**in**iteness .

3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 49Another important application is as follows : let µ2 be as **in** Theorem 3 .2 .3and take f (x, y) to be x + y there . We obta**in**(16) i(X + Y) = f f (x + Y)Ii2(dx, dy)= ff xµ2(dx, dy) + ff yµ2(dx, dy)o~? zJR~On the other hand, if we take f (x, y) to be x or y, respectively, we obta**in**and consequently(` (X) = JfxIL2(dx, dy), (`` (Y) = ff yh .2(dx, dy)R2 9?2(17) (-r,(X + Y) _ (-"(X) + e(Y) .This result is a case of the l**in**earity of oc' given but not proved here ; the proofabove reduces this property **in** the general case of (0, 97, ST) to the correspond**in**gone **in** the special case (R2, ?Z2, µ2) . Such a reduction is frequentlyuseful when there are technical difficulties **in** the abstract treatment .We end this section with a discussion of "moments" .Let a be real, r positive, then cf(IX - air) is called the absolute momentof X of order r, about a . It may be +oo ; otherwise, and if r is an **in**teger,~`((X - a)r) is the correspond**in**g moment . If µ and F are, respectively, thep .m . and d .f. of X, then we have by Theorem 3 .2 .2 :('(IX - air) = f Ix - al rµ(dx) = f Ix - air dF(x),00cc `((X . - a)r) = (x - a)rµ(dx) = f (x - a)r dF(x)fJi .00For r = 1, a = 0, this reduces to (`(X), which is also called the mean of X .The moments about the mean are called central moments . That of order 2 isparticularly important and is called the variance, var (X) ; its positive squareroot the standard deviation . 6(X) :var (X) = Q2 (X ) = c`'{ (X - `(X))2) = (,`(X2) { (X ))2 .We note the **in**equality a2(X) < c`(X2), which will be used a good deal**in** Chapter 5 . For any positive number p, X is said to belong to LP =LP(Q, :~ , :J/') iff c (jXjP) < oo .

5 0 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEThe well-known **in**equalities of Holder and M**in**kowski (see, e .g .,Natanson [3]) may be written as follows . Let X and Y be r.v .'s, 1 < p < 00and l / p + l /q = 1, then(18) I f (XY)I <

3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 5 1each u > 0 :7{IXI >- u} < {~P(X)}cp(u)PROOF . We have by the mean value theorem :flco(X )} =~o(X )dam' >_**in**from which the **in**equality follows .I~o(X)d-,~P >_ ~ o (u)'' f IXI >- u}jX1>U)The most familiar application is when co(u) = I uI P for 0 < p < oo, sothat the **in**equality yields an upper bound for the "tail" probability **in** terms ofan absolute moment .EXERCISES9. Another proof of (14) : verify it first for simple r .v .'s and then useExercise 7 of Sec . 3 .2 .10 . Prove that if 0 < r < r' and (,,1 (IX I r') < oo, then (, (IX I r) < oo. Alsothat (r ( IX I r) r< oo if and only if AI X - a I ) < oo for every a .*11 . If (r(X2 ) = 1 and cP(IXI) > a > 0, then UJ'{IXI > Xa} > (1 - X ) 2a2for 0 0, p_> 0, then e { (X + Y )P } < 2P { AXP) +AYP)} . If p > 1, the factor 2P may be replaced by 2P -1 . If 0 < p < 1, itmay be replaced by 1 .*13 . If X j > 0, thennPnc~ < or c~ (X P )j=1 j=1accord**in**g as p < 1 or p > 1 .*14 . If p > 1, we haveP1

5 2 1 RANDOM VARIABLE . EXPECTATION. INDEPENDENCEwe have alsoCompare the **in**equalities .15 . If p > 0, (`(iXi ) < oc, then xp?7-1{iXI > x} = o(l) as x -± oo .Conversely, if xPJ' J I X I > x) = o(1), then ((IX I P - E) < oc for 0 < E < p .* 16 . For any d .f. and any a > 0, we have0 0f[F(x + a) - F(x)] dx = a .0017 . If F is a d .f. such that F(0-) = 0, thenThus if X is a positive r .v .,{1 - F(x)} dx =J ~x dF(x) < +oo .f/ 0 0then we havec` (X)_ I~ 3A{X > x} dx =fo000GJ'{X > x} dx .18 . Prove that f "0,,, l x i d F(x) < oc if and only if0F(x) dx < ocand000[1 - F(x)] dx < oo .*19 . If {X,} is a sequence of identically distributed r .v .'s with f**in**ite mean,then1lim - c`'{ max iX j I } = 0 .n fl I x) dx = °l'(X llr > v)rvr -1 dv,0 0fsubstitute and **in**vert the order of the repeated **in**tegrations .]

3 .3 INDEPENDENCE 1 533.3 IndependenceWe shall now **in**troduce a fundamental new concept peculiar to the theory ofprobability, that of "(stochastic) **in**dependence" .DEFINITION OF INDEPENDENCE . The r .v .'s {X j, 1 < j < n} are said to be(totally) **in**dependent iff for any l**in**ear Borel sets {Bj, 1 < j < n } we haven(1) °' n(Xj E Bj) = fl ?)(Xj E Bj) .j=1 j=1The r .v.'s of an **in**f**in**ite family are said to be **in**dependent iff those **in** everyf**in**ite subfamily are. They are said to be pairwise **in**dependent iff every twoof them are **in**dependent .Note that (1) implies that the r .v .'s **in** every subset of {X j , 1 < j < n)are also **in**dependent, s**in**ce we may take some of the B j 's as 9. 1 . On the otherhand, (1) is implied by the apparently weaker hypothesis : for every set of realnumbers {xj , 1 < j < n }n(2) {~(Xi

5 4 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEFrom now on when the probability space is fixed, a set **in** `T will also becalled an event . The events {E j , 1 < j < n } are said to be **in**dependent iff their**in**dicators are **in**dependent; this is equivalent to : for any subset { j 1, . . . j f } of11, . . . , n }, we have(4)een E j k. = H?1') (Ej k ) .k=1 k=1Theorem 3 .3 .1 . If {X j , 1 < j < n} are **in**dependent r.v .'s and { f j , 1 < j

3 .3 INDEPENDENCE 1 55PROOF . We give two proofs **in** detail of this important result to illustratethe methods . Cf. the two proofs of (14) **in** Sec . 3 .2, one **in**dicated there, andone **in** Exercise 9 of Sec . 3 .2 .First proof. Suppose first that the two r .v.'s X and Y are both discretebelong**in**g respectively to the weighted partitions {A 1 ;c j } and {Mk .;d k } suchthat A j = {X = c j }, M k = { Y = dd . ThusNow we havec`(X) = j ,JT(A 1 ), c`'(Y) = d :%I'(M k ) .jn UMk = U(AjMk)kj,kandX (w)Y(w) = c j dk if Co E A jMk .Hence the r.v . XY is discrete and belongs to the superposed partitions{A jMk ; cjdk} with both the j and the k vary**in**g **in**dependently of each other .S**in**ce X and Y are **in**dependent, we have for every j and k :~'(A jMk) = P(X = c1 ; Y = d k ) = ?P(X = c j )~'(Y = dk) _ 'T(Aj)^Mk) ;and consequently by def**in**ition (1) of Sec . 3 .2 :c`(XY) = jdk'(AjMk) = c1 :Y(A1) dk'J'(Mk)j .kj= c (X)c`(Y) .Thus (5) is true **in** this case .Now let X and Y be arbitrary positive r .v .'s with f**in**ite expectations . Then,accord**in**g to the discussion at the beg**in**n**in**g of Sec . 3 .2, there are discreter .v .'s X,,, and Y,,, such that ((X,,,) T c`(X) and (`(Y, n ) T C "(Y) . Furthermore,for each m, X,n and Y,,, are **in**dependent . Note that for the **in**dependence ofdiscrete r .v .'s it is sufficient to verify the relation (1) when the B j 's are theirpossible values and "E" is replaced by "=" (why ?). Here we haven n' n n+1 n' n'+l~~p Xm - 2rn' Yn, = 2177 { 2rn -X < 2m , 2rn < Y

5 6 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE~Ym ' ~= J7~ {xm = = .n2 m ~2nThe **in**dependence of Xm and Ym is also a consequence of Theorem 3 .3 .1,s**in**ce X," = [2'"X]/2"' where [X] denotes the greatest **in**teger **in** X . F**in**ally, itis clear that X,,,Y,,, is **in**creas**in**g with m and0

3 .3 INDEPENDENCE 1 57Corollary . If {X j , 1 < j < n} are **in**dependent r .v .'s with f**in**ite expectations,then(6) c fJX j = 11 c'(X j) .nnj=1 j=1This follows at once by **in**duction fromtwo r .v .'sftx1and11 Xjj=1 j=k+1n(5), provided we observe that theare **in**dependent for each k, I < k < n - 1 . A rigorous proof of this fact maybe supplied by Theorem 3 .3 .2 .Do **in**dependent random variables exist? Here we can take the cue fromthe **in**tuitive background of probability theory which not only has given risehistorically to this branch of mathematical discipl**in**e, but rema**in**s a source of**in**spiration, **in**culcat**in**g a way of th**in**k**in**g peculiar to the discipl**in**e . It maybe said that no one could have learned the subject properly without acquir**in**gsome feel**in**g for the **in**tuitive content of the concept of stochastic **in**dependence,and through it, certa**in** degrees of dependence . Briefly then : events aredeterm**in**ed by the outcomes of random trials . If an unbiased co**in** is tossedand the two possible outcomes are recorded as 0 and 1, this is an r.v ., and ittakes these two values with roughly the probabilities 1 each . Repeated toss**in**gwill produce a sequence of outcomes . If now a die is cast, the outcome maybe similarly represented by an r.v . tak**in**g the six values 1 to 6 ; aga**in** thismay be repeated to produce a sequence . Next we may draw a card from apack or a ball from an urn, or take a measurement of a physical quantitysampled from a given population, or make an observation of some fortuitousnatural phenomenon, the outcomes **in** the last two cases be**in**g r .v .'s tak**in**gsome rational values **in** terms of certa**in** units ; and so on . Now it is veryeasy to conceive of undertak**in**g these various trials under conditions suchthat their respective outcomes do not appreciably affect each other ; **in**deed itwould take more imag**in**ation to conceive the opposite! In this circumstance,idealized, the trials are carried out "**in**dependently of one another" and thecorrespond**in**g r .v .'s are "**in**dependent" accord**in**g to def**in**ition . We have thus"constructed" sets of **in**dependent r .v.'s **in** varied contexts and with variousdistributions (although they are always discrete on realistic grounds), and thewhole process can be cont**in**ued **in**def**in**itely .Can such a construction be made rigorous? We beg**in** by an easy specialcase .

5 8 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEExample 1 . Let n > 2 and (S2 j , Jj', ~'j) be n discrete probability spaces . We def**in**ethe product space2' = S2 1 x . . . x S2 n (n factors)to be the space of all ordered n-tuples w" = (cot, . . . , 600 1 where each w j E 2, . Theproduct B.F. ,I" is simply the collection of all subsets of S2", just as lj is that for S2 j .Recall that (Example 1 of Sec . 2 .2) the p .m . ~j is determ**in**ed by its value for eachpo**in**t of S2, . S**in**ce S2" is also a countable set, we may def**in**e a p .m . on Jn by thefollow**in**g assignment :( 7 )1(10), 1) = IIPi ({wj}),j=1nnamely, to the n-tuple (w 1 , . . . , co n ) the probability to be assigned is the product of theprobabilities orig**in**ally assigned to each component wj by ~Pj . This p .m . will be calledthe product measure derived from the p .m .'s {-,~'Pj , 1 < j < n } and denoted by X j=1 iIt is trivial to verify that this is **in**deed a p .m . Furthermore, it has the follow**in**g productproperty, extend**in**g its def**in**ition (7) : if S j E 4, 1 < j < n, then(8)nn~" X Sj -II,j (S j ) .J=1j=1To see this, we observe that the left side is, by def**in**ition, equal toE . . . E ~~~({wl, . . ., (On)) _ E . . . E 11=~,j({0)j})W1ES1 (0,Es n wIES1 Wn Esn j= 1nn({ ) _ f -,~Pi j(Si)+nthe second equation be**in**g a matter of simple algebra .Now let X j be an r .v . (namely an arbitrary function) on Q j ; Bj be an arbitraryBorel set ; and S j = Xi 1 (Bj ), namely :so that Sj E Jj, We have then by (8) :Sj = {(0j E Qj : Xj(w j ) E Bj)11n n(9) LGl'" {*x1EB1I _ '" X Sj = 11 ~j(Sj) =11 ~j{X j E Bj} •J=1 j=1 -'=1 j=1To each function X j on Q 1 let correspond the function X j on S2" def**in**ed below, **in**which w = (w1, . . . , w n ) and each "coord**in**ate" w j is regarded as a function of thepo**in**t w :Vw E Q'1 : X j (w) = X j (w

3 .3 INDEPENDENCE 1 59Then we havennn{co :X j (w) E Bj} = X {wj:Xj(wj) E Bj)j=1 j=1s**in**ce{(O :Xj(w) E Bj) = Q1 x . . . X Qj_i X {(wj :X j(wj) E Bj} x Qj+l x . . . X Q n .It follows from (9) thatn2 n[X j E Bj] = f 'op"{X j E B j } .j=1 j=1nTherefore the r .v .'s {Xj , 1 < j < n} are **in**dependent .Example 2 . Let W11 be the n-dimensional cube (immaterial whether it is closed ornot) :2l' ={(xi, . . .,xn) : 0

60 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEExample 4 . Can we construct r .v .'s on the probability space (W, A, m) itself, withoutgo**in**g to a product space? Indeed we can, but only by imbedd**in**g a product structure**in** %/ . The simplest case will now be described and we shall return to it **in** the nextchapter .For each real number **in** (0,1 ], consider its b**in**ary digital expansionx = .E1 E2 . . .0CEn . . . 2" Enn=1each c, = 0 or 1 .This expansion is unique except when x is of the form m/2n ; the set of such x iscountable and so of probability zero, hence whatever we decide to do with them willbe immaterial for our purposes . For the sake of def**in**iteness, let us agree that onlyexpansions with **in**f**in**itely many digits "1" are used . Now each digit E j of x is afunction of x tak**in**g the values 0 and 1 on two Borel sets . Hence they are r .v .'s . Let{c j , j ? 1 } be a given sequence of 0's and l's . Then the set{x : Ej(x) = cj, I < j < n}=nix :EJ(x) = cj}is the set of numbers x whose first n digits are the given c /s, thusnj=1x = .C1C2 . . .CnEn+t€n+2 . . .with the digits from the (n + 1)st on completely arbitrary . It is clear that this set isjust an **in**terval of length 1/2n, hence of probability 1/2n . On the other hand for eachj, the set {x : E j (x) = cj } has probability ; for a similar reason . We have therefore91 {Ej1= cj, 1 < j < n} = 2n = 1112 = fl ~{Ej = cj} .j= 1 j=1This be**in**g true for every choice of the cj 's, the r.v .'s {E j , j > 1} are **in**dependent . Let{fj , j > 1 } be arbitrary functions with doma**in** the two po**in**ts {0, 11, then {fj (E j ), j >1 } are also **in**dependent r .v .'s .This example seems extremely special, but easy extensions are at hand (seeExercises 13, 14, and 15 below) .We are now ready to state and prove the fundamental existence theoremof product measures .Theorem 3 .3 .4 . Let a f**in**ite or **in**f**in**ite sequence of p .m .'s 11-t j ) on (AP, A'),or equivalently their d .f .'s, be given . There exists a probability space(Q, :'T ,J ) and a sequence of **in**dependent r .v .'s {X j } def**in**ed on it such thatfor each j, It j is the p .m . of X j .PROOF . Without loss of generality we may suppose that the givensequence is **in**f**in**ite . (Why?) For each n, let (Qn, J n , ., -A„) be a probability spacenn

3 .3 INDEPENDENCE 1 61**in** which there exists an r .v . X„ with µ n as its p .m . Indeed this is possible if wetake (S2„ , ?l„ , ;J/;,) to be (J/? 1 , J/,3 1 , µ„) and X„ to be the identical function ofthe sample po**in**t x **in** now to be written as w, (cf. Exercise 3 of Sec . 3 .1) .Now def**in**e the **in**f**in**ite product spaceS200= X Qnn=1on the collection of all "po**in**ts" co = {c ) l, cot, . . . , cv n , . . .}, where for each n,co,, is a po**in**t of S2„ . A subset E of Q will be called a "f**in**ite-product set" iffit is of the form00(11) E = X Fn,n=1where each F n E 9„ and all but a f**in**ite number of these F n 's are equal to thecorrespond**in**g St n 's. Thus cv E E if and only if con E F n , n > 1, but this isactually a restriction only for a f**in**ite number of values of n . Let the collectionof subsets of 0, each of which is the union of a f**in**ite number of disjo**in**t f**in**iteproductsets, be A . It is easy to see that the collection A is closed with respectto complementation and pairwise **in**tersection, hence it is a field . We shall takethe z5T **in** the theorem to be the B .F. generated by A . This ~ is called theproduct B . F. of the sequence {f**in** , n > 1) and denoted by X n __1 n .We def**in**e a set function ~~ on as follows . First, for each f**in**ite-productset such as the E given **in** (11) we set00(12) ~(E) _ fl -~ (F'n ),n=1where all but a f**in**ite number of the factors on the right side are equal to one .Next, if E E A andnE = U E (k) ,k=1where the E (k) 'sare disjo**in**t f**in**ite-product sets, we put11(13) °/'(E (k)) .k=1If a given set E **in** A~ has two representations of the form above, then it isnot difficult to see though it is tedious to expla**in** (try draw**in**g some pictures!)that the two def**in**itions of , ~"(E) agree . Hence the set function J/' is uniquely

~^62 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEdef**in**ed on ib ; it is clearly positive with :%'(52) = 1, and it is f**in**itely additiveon ib by def**in**ition . In order to verify countable additivity it is sufficient toverify the axiom of cont**in**uity, by the remark after Theorem 2 .2 .1 . Proceed**in**gby contraposition, suppose that there exist a 3 > 0 and a sequence of sets{C,,, n > l} **in** :rb such that for each n, we have C,, D C„+1 and JT(C,,) >S > 0 ; we shall prove that n,°__ 1 C„ :A 0 . Note that each set E **in** , as wellas each f**in**ite-product set, is "determ**in**ed" by a f**in**ite number of coord**in**ates**in** the sense that there exists a f**in**ite **in**teger k, depend**in**g only on E, such thatif w = (wl, w2, . . .) and w' = (wl , w 2 , . . .) are two po**in**ts of 52 with w j = w~for 1 < j < k, then either both w and w' belong to E or neither of them does .To simplify the notation and with no real loss of generality (why?) we maysuppose that for each n, the set C„ is determ**in**ed by the first n coord**in**ates .Given w?, for any subset E of Q let us write (E I w?) for 21 x E1, where E1 isthe set of po**in**ts (cot, w3, . . .) **in** Xn_ 2 Q,, such that (w?, w2, w3, . . .) E E . If w 0does not appear as first coord**in**ate of any po**in**t **in** E, then (E I cv?) = 0 . Notethat if E E , then (E I cv?) E A for each wo . We claim that there exists ana), such that for every n, we have '((C n 1 (v?)) > S/2 . To see this we beg**in**with the equation(14) ~(C,) = jo=JA((Cn I co1))~'1(dwl ) .This is trivial if C„ is a f**in**ite-product set by def**in**ition (12), and follows byaddition for any C„ **in** To . Now put B n = {w1 : ^(C„ I w1)) > S/2}, a subsetof Q 1 ; then it follows thatS S/2 . Choose any cvo **in** nc,,o1 B,, . Then 9'((C,, I w?)) > S/2 .Repeat**in**g the argument for the set (C„ I cv?), we see that there exists an wosuch that for every n, JP((C„ I coo cv o )) > S/4, where (C,, I wo, (oz) = ((C,,(v?) 1 wo) is of the form Q1 X Q2 x E3 and E3 is the set (W3, w4 . . . . ) **in**X,°°_3 52„ such that (cv?, w°, w3, w4, . . .) E C,, ; and so forth by **in**duction . Thusfor each k > 1, there exists wo such thatV11 :- ~/) (( Cn I w?, . . . , w°))S> Zk.Consider the po**in**t w o = ()? wo w o .) . S**in**ce (C k I w ° wo ) :Athere is a po**in**t **in** Ck whose first k coord**in**ates are the same as those of w o ;s**in**ce C k is determ**in**ed by the first k coord**in**ates, it follows that w o E C k . Thisis true for all k, hence w o E nk 1 Ck,as was to be shown .

3 .3 INDEPENDENCE 1 63We have thus proved that ?-P as def**in**ed on ~~ is a p .m . The next theorem,which is a generalization of the extension theorem discussed **in** Theorem 2 .2 .2and whose proof can be found **in** Halmos [4], ensures that it can be extendedto ~A as desired. This extension is called the product measure of the sequence{ ~6n , n > 1 } and denoted by X n°_ 1 ?;71, with doma**in** the product field X n°_ 1 n .Theorem 3 .3 .5 . Let 3 3 be a field of subsets of an abstract space Q, and °J'a p.m . on ~j . There exists a unique p .m . on the B .F . generated by A thatagrees with GJ' on 3 .The uniqueness is proved **in** Theorem 2 .2 .3 .Return**in**g to Theorem 3 .3 .4, it rema**in**s to show that the r .v .'s {w3 } are**in**dependent . For each k > 2, the **in**dependence of {w3, 1 < j < k) is a consequenceof the def**in**ition **in** (12) ; this be**in**g so for every k, the **in**dependenceof the **in**f**in**ite sequence follows by def**in**ition .We now give a statement of Fub**in**i's theorem for product measures,which has already been used above **in** special cases . It is sufficient to considerthe case n = 2. Let 0 = Q 1 x 522, 3;7= A x .lz and _ 37'1 x A be theproduct space, B .F ., and measure respectively .Let (z~Kj, 9'1 ), (3,, A) and (j x , °J'1 x A) be the completionsof ( 1 , and (~1 x ~,JPj x A), respectively, accord**in**g toTheorem 2 .2 .5 .Fub**in**i's theorem . Suppose that f is measurable with respect to 1 xand **in**tegrable with respect to ?/-~1 x A . Then(i) for each w1 E Q1\N1 where N 1 E ~1 and 7 (N1) = 0, f (c0 1 , .) ismeasurable with respect to ,J~ and **in**tegrable with respect to A, ;(ii) the **in**tegralf f ( •, w2)A(dw2)is measurable with respect to `Tj and **in**tegrable with respect to ?P1 ;(iii) The follow**in**g equation between a double and a repeated **in**tegralholds :( 15 ) f f f (a) 1, (02)?/'l x `~'2(dw) = f f f (wl, w2)'~z(dw2) ~i (dw1)Q, X Q'Furthermore, suppose f is positive and measurable with respect toJ% x 1 ; then if either member of (15) exists f**in**ite or **in**f**in**ite, so does theother also and the equation holds .sl,c2 z

64 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEF**in**ally if we remove all the completion signs "-" **in** the hypothesesas well as the conclusions above, the result**in**g statement is true with theexceptional set N 1 **in** (i) empty, provided the word "**in**tegrable" **in** (i) bereplaced by "has an **in**tegral ."The theorem rema**in**s true if the p .m .'s are replaced by Q-f**in**ite measures ;see, e .g ., Royden [5] .The reader should readily recognize the particular cases of the theoremwith one or both of the factor measures discrete, so that the correspond**in**g**in**tegral reduces to a sum . Thus it **in**cludes the familiar theorem on evaluat**in**ga double series by repeated summation .We close this section by stat**in**g a generalization of Theorem 3 .3 .4 to thecase where the f**in**ite-dimensional jo**in**t distributions of the sequence {Xj } arearbitrarily given, subject only to mutual consistency . This result is a particularcase of Kolmogorov's extension theorem, which is valid for an arbitrary familyof r .v .'s .Let m and n be **in**tegers : 1 < m < n, and def**in**e nmn to be the "projectionmap" of Am onto 7A" given by`dB E '" : 7r mn (B) _ {(x1, xn ) : (X1, . . , Xm ) E B) .Theorem 3 .3.6 . For each n > 1, let µ" be a p.m . on (mil", A") such that(16) dm < n : µ n ° 7rmn = µm .Then there exists a probability space (0, ~ , 0 ') and a sequence of r .v .'s {X j }on it such that for each n, µ" is the n-dimensional p .m . of the vector(X1, • • , Xn) .Indeed, the Q and x may be taken exactly as **in** Theorem 3 .3 .4 to be theproduct spaceand X ~TilJwhere (Q 1 , JTj ) = (A l , A 1 ) for each j ; only .JT is now more general . In termsof d.f.'s, the consistency condition (16) may be stated as follows . For each**in** > 1 and (x1, . . . , x, n ) E "', we have if n > m :.cmlim~l-.xFn (X1, . . . , Xm, Xm+l, . . . , xn) = Fm(X1, . . . , X m ) .An-CJFor a proof of this first fundamental theorem **in** the theory of stochasticprocesses, see Kolmogorov [8] .

3 .3 INDEPENDENCE I 65EXERCISES1 . Show that two r .v .'s on (S2, ~T ) may be **in**dependent accord**in**g to onep.m . ?/' but not accord**in**g to another!*2. If X I and X 2 are **in**dependent r .v .'s each assum**in**g the values +1and -1 with probability 2, then the three r .v .'s {X 1 , X2 , X I X2 } are pairwise**in**dependent but not totally **in**dependent. F**in**d an example of n r.v .'s such thatevery n - 1 of them are **in**dependent but not all of them . F**in**d an examplewhere the events A 1 and A 2 are **in**dependent, A 1 and A3 are **in**dependent, butA I and A2 U A 3 are not **in**dependent .*3. If the events {Ea , a E A} are **in**dependent, then so are the eventsIF,, a E A), where each Fa may be Ea or Ea ; also if {Ap, ,B E B), whereB is an arbitrary **in**dex set, is a collection of disjo**in**t countable subsets of A,then the eventsU Ea, 8 E B,aEApare **in**dependent .4. Fields or B .F .'s 3 a (C 3) of any family are said to be **in**dependent iffany collection of events, one from each ./a., forms a set of **in**dependent events .Let a° be a field generat**in**g Via . Prove that if the fields a are **in**dependent,then so are the B .F .'s 3 . Is the same conclusion true if the fields are replacedby arbitrary generat**in**g sets? Prove, however, that the conditions (1) and (2)are equivalent . [HINT : Use Theorem 2 .1 .2 .]5 . If {X a } is a family of **in**dependent r .v .'s, then the B .F .'s generatedby disjo**in**t subfamilies are **in**dependent. [Theorem 3 .3 .2 is a corollary to thisproposition .]6. The r .v . X is **in**dependent of itself if and only if it is constant withprobability one. Can X and f (X) be **in**dependent where f E A I ?7 . If {E j , 1 < j < oo} are **in**dependent events theng)00 ccn E j = 11UT (Ej),(j=1 j=1where the **in**f**in**ite product is def**in**ed to be the obvious limit ; similarly%/' U E j _ - 11(1 - 91'(E,))~ ( 0000J=1 j=18. Let {X j , 1 < j < n} be **in**dependent with d .f.'s {F j , I < J < n} . F**in**dthe d .f. of maxi Xj and m**in**i X j .

6 6 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE*9. If X and Y are **in**dependent and (_,`(X) exists, then for any Bore] setB, we have{X d .JT = c`(X)~'(Y E B) .YEB}* 10 . If X and Y are **in**dependent and for some p > 0 : ()(IX + Y I P) < 00,then (IXIP) < oc and e (IYIP) < oo .11 . If X and Y are **in**dependent, (t( IX I P) < oo for some p > 1, andiP(Y) = 0, then c` (IX + Y I P) > (_,~(IX I P) . [This is a case of Theorem 9 .3 .2 ;but try a direct proof!]12. The r .v .'s 16j ) **in** Example 4 are related to the "Rademacher functions":rj (x) = sgn(s**in** 2 1 rx) .What is the precise relation?13 . Generalize Example 4 by consider**in**g any s-ary expansion where s >2 is an **in**teger :00Enx = where E n = 0, l , . . ., s - 1 .n=1S 1* 14 . Modify Example 4 so that to each x **in** [0, 1 ] there corresponds asequence of **in**dependent and identically distributed r .v .'s {E n , n > 1}, eachtak**in**g the values 1 and 0 with probabilities p and 1 - p, respectively, where0 < p < 1 . Such a sequence will be referred to as a "co**in**-toss**in**g game (withprobability p for heads)" ; when p = 1 it is said to be "fair" .15 . Modify Example 4 further so as to allow c,, to take the values 1 and0 with probabilities p„ and 1 - p,1 , where 0 < p n < 1, but p n depends on n .16. Generalize (14) toJ' (C) = ~"((C I wl, . . . , (0k))r'T1(dwl). . . I (d(Ok)C(1 k)= f ^(C I w1, . . ., wk))~'(dw),where C(l, . . . , k) is the set of (w1, . . . , wk) which appears as the first kcoord**in**ates of at least one po**in**t **in** C and **in** the second equation w 1 , . . . , wkare regarded as functions of w .* 17 . For arbitrary events {E j , 1 < j < n), we havenj=1:JII(Ej) - T(EjEk) .1

3 .3 INDEPENDENCE I 67If do : {E~n 1 , 1 < j < n } are **in**dependent events, andnJ' U E(' ) -~ 0 as n -* oo,(j=lthenGl'nnU E~n ) ~(E ( n ) ) .j=I j=118. Prove that A x A =,4 A x where J3 is the completion of A withrespect to the Lebesgue measure ; similarly A x 3 .19 . If f E 3l x ~ andfffd I I(~j x A) < oo,0 1 X Q2thenfn ' f f dGJ-'z ] dJ' 1 = fn2 [L1 f d?P i d?Pz .20 . A typical application of Fub**in**i's theorem is as follows . If f is aLebesgue measurable function of (x, y) such that f (x, y) = 0 for each x E 9 1and y N, where m(NX ) = 0 for each x, then we have also f (x, y) = 0 foreach y N and X E N Y', where m(N) = 0 and m(Ny) = 0 for each y N .

4Convergence concepts4 .1 Various modes of convergenceAs numerical-valued functions, the convergence of a sequence of r .v .'s{X„ , n ? I), to be denoted simply by {X, } below, is a well-def**in**ed concept .Here and hereafter the term "convergence" will be used to mean convergenceto a f**in**ite limit. Thus it makes sense to say : for every co E A, whereA E ~T, , the sequence {X„ (w)} converges . The limit is then a f**in**ite-valuedr.v. (see Theorem 3 .1 .6), say X(w), def**in**ed on A . If Q = A, then we have"convergence every-where", but a more useful concept is the follow**in**g one .DEFINITION OF CONVERGENCE "ALMOST EVERYWHERE" (a .e .) . The sequence ofr.v . {X, } is said to converge almost everywhere [to the r .v . X] iff there existsa null set N such that(1) Vw E Q\N : Jim X„ (w) = X (o)) f**in**ite .li -~ OCRecall that our convention stated at the beg**in**n**in**g of Sec . 3 .1 allowseach r .v. a null set on which it may be ±oo . The union of all these sets be**in**gstill a null set, it may be **in**cluded **in** the set N **in** (1) without modify**in**g the

4 .1 VARIOUS MODES OF CONVERGENCE 1 69conclusion . This type of trivial consideration makes it possible, when deal**in**gwith a countable set of r .v .'s that are f**in**ite a .e ., to regard them as f**in**iteeverywhere .The reader should learn at an early stage to reason with a s**in**gle samplepo**in**t coo and the correspond**in**g sample sequence {X n (coo), n > 1 } as a numericalsequence, and translate the **in**ferences on the latter **in**to probability statementsabout sets of co . The follow**in**g characterization of convergence a .e . isa good illustration of this method .Theorem 4 .1 .1 . The sequence {X n } converges a.e. to X if and only if forevery c > 0 we have(2) lim J'{IX n -XI < E for all n > m} = 1 ;M--+00or equivalently(2') lim J'{ IX n - X 1 > E for some n > m} = 0 .M_+00PROOF . Suppose there is convergence a .e. and let 00 = Q\N where N isas **in** (1) . For m > 1 let us denote by Am (E) the event exhibited **in** (2), namely :(3) Am(E) = n {IXn - XI < E} .n=m00Then Am (E) is **in**creas**in**g with m . For each coo, the convergence of {X n (c )o)}to X(coo) implies that given any c > 0, there exists m(coo, c) such that(4) n > m(wo, E) = Wn(wo) -X(wo)I E .Hence each such coo belongs to some Am (E) and so Q o C U,° 1 A,n (E) .It follows from the monotone convergence property of a measure thatlimm, %'(Am (E)) = 1, which is equivalent to (2) .Conversely, suppose (2) holds, then we see above that the set A(E) _U00=1 Am (E) has probability equal to one . For any 0.)o E A(E), (4) is true forthe given E . Let E run through a sequence of values decreas**in**g to zero, for**in**stance [11n) . Then the setO0 1A= nAnn=1still has probability one s**in**ce95(A) = lira ?T1(A C n.

70 ( CONVERGENCE CONCEPTSIf coo belongs to A, then (4) is true for all E = 1 /n, hence for all E > 0 (why?) .This means {X„ (wo)} converges to X(a)o) for all wo **in** a set of probability one .A weaker concept of convergence is of basic importance **in** probabilitytheory .DEFINITION OF CONVERGENCE "IN PROBABILITY" (**in** pr .) . The sequence {X„ }is said to converge **in** probability to X iff for every E > 0 we have(5) lim :%T{IXn -XI > E} = 0 .n-+o0Strictly speak**in**g, the def**in**ition applies when all X n and X are f**in**itevalued. But we may extend it to r .v .'s that are f**in**ite a .e . either by agree**in**gto ignore a null set or by the logical convention that a formula must first bedef**in**ed **in** order to be valid or **in**valid . Thus, for example, if X, (w) = +ooand X (w) = +oo for some w, then Xn (w) - X (CO) is not def**in**ed and thereforesuch an w cannot belong to the set { IX„ - X I > E} figur**in**g **in** (5) .S**in**ce (2') clearly implies (5), we have the immediate consequence below .Theorem 4 .1 .2. Convergence a .e. [to X] implies convergence **in** pr . [to X] .Sometimes we have to deal with questions of convergence when no limitis **in** evidence . For convergence a.e . this is immediately reducible to the numericalcase where the Cauchy criterion is applicable . Specifically, {Xn } convergesa .e . iff there exists a null set N such that for every w E Q\N and every c > 0,there exists m(a), E) such thatn ' > n > m(w, c) = IX,, (w) -X n '(w)I < E .The follow**in**g analogue of Theorem 4 .1 .1 is left to the reader .Theorem 4 .1 .3 .we haveThe sequence {X, } converges a.e . if and only if for every e(6) lim %/'{ IX, - X n ' I > E for some n' > n > ,n} = 0 .m-00For convergence **in** pr., the obvious analogue of (6) is(7) lim J~{IX,, -Xn 'I > E} = 0 .n--xIt can be shown (Exercise 6 of Sec . 4 .2) that this implies the existence of af**in**ite r.v . X such that X„ --+ X **in** pr .

4 .1 VARIOUS MODES OF CONVERGENCE I 71DEFINITION OF CONVERGENCE IN LP, 0 < p < oc . The sequence {X n } is saidto converge **in** LP to X iff X, E LP, X E LP and(8) lim ("(IX, - XIP) = 0 .n-+ooIn all these def**in**itions above, Xn converges to X if and only if X„ - Xconverges to 0 . Hence there is no loss of generality if we put X - 0 **in** thediscussion, provided that any hypothesis **in**volved can be similarly reducedto this case . We say that X is dom**in**ated by Y if I X I < Y a .e ., and that thesequence {X n } is dom**in**ated by Y iff this is true for each X, with the sameY . We say that X or {X„ } is uniformly bounded iff the Y above may be takento be a constant.Theorem 4 .1 .4 . If X n converges to 0 **in** LP, then it converges to 0 **in** pr .The converse is true provided that {X n } is dom**in**ated by some Y that belongsto LP .Remark . If X n ->. X **in** LP, and {X n } is dom**in**ated by Y, then {X n -X}is dom**in**ated by Y + I X I, which is **in** LP . Hence there is no loss of generalityto assume X - 0 .PROOF . By Chebyshev **in**equality with ~p(x) - IxI P, we have( (IXnIP)(9) 9/1 1 IX, I > E) 0 by hypothesis, hence so does the left,which is equivalent to (5) with X = 0 . This proves the first assertion . If nowIX„ I < Y a .e . with E(YP) < oo, then we havef %(IX,I P ) = IX n I P d~PIX,I P d :J~ 0, the last-written **in**tegral -+ 0 by Exercise 2 **in**Sec . 3 .2 . Lett**in**g first n -+ oo and then E --~ 0, we obta**in** ("(IX, I P) - 0 ; henceX„ converges to 0 **in** LP .As a corollary, for a uniformly bounded sequence {X„ } convergence **in**pr . and **in** LP are equivalent . The general result is as follows .Theorem 4.1 .5 . X„ --+ 0 **in** pr . if and only ifIXn I(10)(Furthermore, the functional p( ., .) given by1+IX„I 0-IX - YIp(X,Y)= (`C 1+Ix-YI

is metric it is sufficient to show that72 l CONVERGENCE CONCEPTSis a metric **in** the space of r .v.'s, provided that we identify r .v .'s that areequal a .e .PROOF . If p(X, Y) = 0, then (``(IX - Y j) = 0, hence X = Y a .e. byExercise 1 of Sec . 3 .2 . To show that p( •, •)I ( IX - YIIXI lYlC', +1+IX-YI 1+IXI 1+IYIFor this we need only verify that for every real x and y :(11) Ix + yl _1,1

4 .1 VARIOUS MODES OF CONVERGENCE 1 73Now if we replace cok, by k"Pcokf , where p > 0, then -7?{X,, > 01 = 1/k" -+0 so that X,, -+ 0 **in** pr ., but for each n, we have i (X„ P) = 1 . Consequentlylim,, -,, (` (IX„ - 01P) = 1 and X,, does not -* 0 **in** LP .Example 2 . Convergence a .e . does not imply convergence **in** LP .In (Jl/, .l, m) def**in**eX" (w) =12", if WE 0,- ;n0, otherwise .Then ("- (IX,, JP) = 2"P/n -+ +oo for each p > 0, but X, --* 0 everywhere .The reader should not be misled by these somewhat "artificial" examplesto th**in**k that such counterexamples are rare or abnormal . Natural examplesabound **in** more advanced theory, such as that of stochastic processes, butthey are not as simple as those discussed above . Here are some culled fromlater chapters, tersely stated . In the symmetrical Bernoullian random walk{S,,, n > 1), let 1 {sn = 0} . Then limn 0 for every p > 0, butexists) = 0 because of **in**termittent return of S, to the orig**in** (seeSec . 8 .3) . This k**in**d of example can be formulated for any recurrent processsuch as a Brownian motion . On the other hand, if {~„ , n > 1) denotes therandom walk above stopped at 1, then 0 for all n but g{lim„, co ~„ _11 = 1 . The same holds for the mart**in**gale that consists **in** toss**in**g a fairco**in** and doubl**in**g the stakes until the first head (see Sec . 9 .4) . Anotherstrik**in**g example is furnished by the simple Poisson process{N(t), t > 01 (seeSect . 5 .5) . If ~(t) = N(t)/t, then c (~(t)) = ? for all t > 0 ; but °J'{limao 0) =01 = 1 because for almost every co, N(t, w) = 0 for all sufficiently small valuesof t . The cont**in**uous parameter may be replaced by a sequence t, , . 0 .F**in**ally, we mention another k**in**d of convergence which is basic **in** functionalanalysis, but conf**in**e ourselves to L 1 . The sequence of r .v .'s {X„ 1 **in** L 1is said to converge weakly **in** L 1 to X iff for each bounded r .v . Y we havelim (,(X,,Y) = (XY), f**in**ite .n--00It is easy to see that X E L 1 and is unique up to equivalence by tak**in**gY = llxox-} if X' is another candidate . Clearly convergence **in** L 1 def**in**edabove implies weak convergence ; hence the former is sometimes referred toas "strong" . On the other hand, Example 2 above shows that convergencea .e. does not imply weak convergence ; whereas Exercises 19 and 20 belowshow that weak convergence does not imply convergence **in** pr . (nor even **in**distribution ; see Sec . 4 .4) .

74 1 CONVERGENCE CONCEPTSEXERCISES1 . X„ -~ +oc a .e . if and only if VM > 0 : 91{X, < M i .o .) = 0 .2 . If 0 < X,, X, < X E L1 and X, -* X **in** pr ., then X, -~ X **in** L 1 .3. If X n -+ X, Y„ -* Y both **in** pr ., then X, ± Y„ -+ X + Y, X„ Y nX Y, all **in** pr.*4 . Let f be a bounded uniformly cont**in**uous function **in** Then X,0 **in** pr . implies (`if (X,,)) - f (0) . [Example :-af (x)_ 1XI1+1XIas **in** Theorem 4 .1 .5 .]5. Convergence **in** LP implies that **in** L' for r < p .6. If X, ->. X, Y, Y, both **in** LP, then X, ± Y n --* X ± Y **in** LP . IfX, --* X **in** LP and Y, -~ Y **in** Lq, where p > 1 and 1 / p + l 1q = 1, thenX n Y n -+ XY **in** L 1 .7. If X n X **in** pr. and X n -->. Y **in** pr ., then X = Y a .e .8 . If X, -+ X a .e . and µn and A are the p .m .'s of X, and X, it does notfollow that /.c„ (P) - µ(P) even for all **in**tervals P .9. Give an example **in** which e(X n ) -)~ 0 but there does not exist anysubsequence Ink) - oc such that Xnk -* 0 **in** pr.* 10 . Let f be a cont**in**uous function on M. 1 . If X, -+ X **in** pr., thenf (X n ) - f (X) **in** pr . The result is false if f is merely Borel measurable .[HINT : Truncate f at +A for large A .]11 . The extended-valued r .v . X is said to be bounded **in** pr. iff for eachE > 0, there exists a f**in**ite M(E) such that T{IXI < M(E)} > 1 - E . Prove thatX is bounded **in** pr . if and only if it is f**in**ite a .e .12 . The sequence of extended-valued r.v . {X n } is said to be bounded **in**pr . iff supra W n I is bounded **in** pr . ; {Xn ) is said to diverge to +oo **in** pr . iff foreach M > 0 and E > 0 there exists a f**in**ite n o (M, c) such that if n > n o , thenJT { jX„ j > M } > 1 - E . Prove that if {X„ } diverges to +00 **in** pr . and {Y„ } isbounded **in** pr ., then {X n + Y„ } diverges to +00 **in** pr .13 . If sup„ X n = +oc a .e ., there need exist no subsequence {X nk } thatdiverges to +oo **in** pr .14. It is possible that for each co, lim nXn (co) = +oo, but there doesnot exist a subsequence {n k } and a set A of positive probability such thatlimk X nk (c)) = +00 on A . [HINT : On (//, A) def**in**e X n (co) accord**in**g to thenth digit of co .]

4 .2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 75*15. Instead of the p **in** Theorem 4 .1 .5 one may def**in**e other metrics asfollows . Let Pi (X, Y) be the **in**fimum of all e > 0 such that' (IX - YI > E) < E .Let p2 (X , Y) be the **in**fimum of UT{ IX - Y I > E} + E over all c > 0. Provethat these are metrics and that convergence **in** pr . is equivalent to convergenceaccord**in**g to either metric .* 16 . Convergence **in** pr . for arbitrary r .v.'s may be reduced to that ofbounded r .v.'s by the transformationX' = arctan X .In a space of uniformly bounded r .v .'s, convergence **in** pr . is equivalent tothat **in** the metric po(X, Y) = (-P(IX - YI) ; this reduces to the def**in**ition given**in** Exercise 8 of Sec . 3.2 when X and Y are **in**dicators .17. Unlike convergence **in** pr ., convergence a .e. is not expressibleby means of metric . [r-nwr : Metric convergence has the property that ifp(x,,, x) -+* 0, then there exist c > 0 and **in**k) such that p(x,1 , x) > E forevery k .]18 . If X„ ~ X a .s ., each X, is **in**tegrable and **in**f„ (r(X„) > -oo, thenX„- X**in**L 1 .19. Let f„ (x) = 1 + cos 27rnx, f (x) = 1 **in** [0, 1] . Then for each g E L 1[0, 1 ] we have1I f rig dx0 10fg dx,but f „ does not converge to f **in** measure . [HINT : This is just theRiemann-Lebesgue lemma **in** Fourier series, but we have made f „ > 0 tostress a po**in**t .]20 . Let {X, } be a sequence of **in**dependent r .v.'s with zero mean andunit variance . Prove that for any bounded r .v . Y we have lim, f`(X„Y) = 0 .[HINT : Consider (1'71[Y - ~ _ 1 ~ (XkY)Xk] 2 } to get Bessel's **in**equality c(Y 2 ) >E k=l c`(Xk 7 ) 2 . The stated result extends to case where the X„ 's are assumedonly to be uniformly **in**tegrable (see Sec . 4 .5) but now we must approximateY by a function of (X 1 , . . . , X,,,), cf. Theorem 8 .1 .1 .]4.2 Almost sure convergence; Borel-Cantelli lemmaAn important concept **in** set theory is that of the "lim sup" and "lim **in**f" ofa sequence of sets . These notions can be def**in**ed for subsets of an arbitraryspace Q .

76 1 CONVERGENCE CONCEPTSDEFINITION . Let E n be any sequence of subsets of S2 ; we def**in**e0C 00 00 00lim sup En = n U En 5 lim **in**fE n = U nEn .n m=1 n=m n m=1 n=mLet us observe at once that(1) lim **in**f E n = ( lim sup En )`nnso that **in** a sense one of the two notions suffices, but it is convenient toemploy both . The ma**in** properties of these sets will be given **in** the follow**in**gtwo propositions .(i) A po**in**t belongs to I'M sup, E n if and only if it belongs to **in**f**in**itelymany terms of the sequence {E n , n > 1} . A po**in**t belongs to lim**in**fn En ifand only if it belongs to all terms of the sequence from a certa**in** term on .It follows **in** particular that both sets are **in**dependent of the enumeration ofthe E n 's .PROOF . We shall prove the first assertion . S**in**ce a po**in**t belongs to**in**f**in**itely many of the E n 's if and only if it does not belong to all En froma certa**in** value of n on, the second assertion will follow from (1) . Now if wbelongs to **in**f**in**itely many of the En 's, then it belongs tohence it belongs to00F - U E n for every m ;n=mn F M = lim sup E n .m=1 nConversely, if w belongs ton cc 1 Fm , then w E Fm for every m . Were co tobelong to only a f**in**ite number of the E n 's there would be an m such thatw V E,7 for n > m, so that00w¢ UE„=Fill .n=**in**This contradiction proves that w must belong to an **in**f**in**ite number of the E n 's .In more **in**tuitive language : the event lim sup, E n occurs if and only if theevents E,; occur **in**f**in**itely often . Thus we may write/'(I'm sup E n ) = ^ En i .o . )11

4 .2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 77where the abbreviation "i .o ." stands for "**in**f**in**itely often" . The advantage ofsuch a notation is better shown if we consider, for example, the events "IX,E" and the probability JT{IX n I > E i .o .} ; see Theorem 4 .2 .3 below .(ii) If each E n E :T, then we have(2) T(lim sup E n ) = lim ?-il U Ennm-+0000n=m(3) ?)'(1im **in**f E n ) = lim 91nm-+0000nn=mE nPROOF . Us**in**g the notation **in** the preced**in**g proof, it is clear that F mdecreases as m **in**creases . Hence by the monotone property of p .m .'s :00~' n F M = lim -6P' (Fm ),M-+00m=1which is (2) ; (3) is proved similarly or via (1) .Theorem 4 .2 .1 . We have for arbitrary events {E n } :(4) E 9'(E n ) < oo = J(E, i .o .) = 0 .nPROOF. By Boole's **in**equality for p .m .'s, we have00(Fm) < ), ~(Ejn=mHence the hypothesis **in** (4) implies that 7(Fm ) -+ 0, and the conclusion **in**(4) now follows by (2) .As an illustration of the convenience of the new notions, we may restateTheorem 4 .1 .1 as follows . The **in**tuitive content of condition (5) below is thepo**in**t be**in**g stressed here .Theorem 4 .2 .2. X„ -a 0 a .e . if and only if(5) YE > 0 : ?/1 1 IX, I > E i .o . } = 0 .PROOF . Us**in**g the notation A n , {IX, I E i .o .} = n U {IXnI > E) = n AM .,n=1 n=m m=1

78 1 CONVERGENCE CONCEPTSAccord**in**g to Theorem 4 .1 .1, X, - 0 a .e. if and only if for each E > 0,J'(A' ) -+ 0 as m -+ oo ; s**in**ce Am decreases as m **in**creases, this is equivalentto (5) as asserted .Theorem 4 .2 .3 . If X, -* X **in** pr ., then there exists a sequence **in**k) of **in**tegers**in**creas**in**g to **in**f**in**ity such that Xnk -± X a .e . Briefly stated : convergence**in** pr. implies convergence a .e . along a subsequence .PROOF . We may suppose X - 0 as expla**in**ed before . Then the hypothesismay be written astik > 0 : nlim~ ~ 1Xn I >ZkI = 0 .It follows that for each k we can f**in**d n ksuch thatand consequentlyGT C 1Xnk I >2k1- 2kk' (IxflkI> 2kkHav**in**g so chosen Ink), we let Ek be the event "iX nk I > 1/2 k". Then we haveby (4) :1(IxlZkI> 2ki .o . = 0 .[Note : Here the **in**dex **in**volved **in** "i .o ." is k; such ambiguity be**in**g harmlessand expedient .] This implies Xnk -* 0 a .e. (why?) f**in**ish**in**g the proof .EXERCISES1. Prove thatJ'(lim sup E n ) > lim ?7 (E„ ),nnJ'(lim **in**f E n ) < lim J'(E„ ) .n 112. Let {B,, } be a countable collection of Borel sets **in** `'Z/ . If there existsa 8 > 0 such that m(Ba ) > 8 for every n, then there is at least one po**in**t **in** -ithat belongs to **in**f**in**itely many Bn 's .3 . If {X n } converges to a f**in**ite limit a .e ., then for any c there existsM(E) < oc such that JT{sup IXn I < M(E)} > 1 - E .

4.2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 79*4 . For any sequence of r .v .'s {X„ } there exists a sequence of constants{A„ } such that X n /An -* 0 a.e .*5 . If {Xn } converges on the set C, then for any E > 0, there existsCo C C with 9'(C\Co) < E such that X„ converges uniformly **in** Co . [Thisis Egorov's theorem . We may suppose C = S2 and the limit to be zero . LetFmk= n,0 ° =k{( o : I X, (co) < 1/m} ; then Ym, 3k(m) such thatTake Co = fl 1 Fm, k(m) .]l(Fm,k(m)) > 1 - E/2m .6. Cauchy convergence of {X,} **in** pr . (or **in** LP) implies the existence ofan X (f**in**ite a.e .), such that X„ converges to X **in** pr . (or **in** LP) . [HINT: Choosen k so thatE ~P {Ixflk+lk1- Xnk, I > 2k < oo ;cf. Theorem 4 .2 .3 .]*7 . {X,) converges **in** pr . to X if and only if every subsequence{Xnk} conta**in**s a further subsequence that converges a .e. to X . [ENT: UseTheorem 4 .1 .5 .]8. Let {Xn, n > 1 } be any sequence of functions on Q to .* and letC denote the set of co for which the numerical sequence {Xn(co), n > 1}converges . Show thatwherec= 00 00 00 {ixn - Xn' I< m = n 00 00 00U n l \(m, n, n')m=1 n=1 n'=n+1 m=1 n=1 n'=n+1A(m,n,n')=w: max 1X j (w) - X k (w) In

80 1 CONVERGENCE CONCEPTSthen lim n ._, X„ exists a .e . but it may be **in**f**in**ite . [HINT : Consider all pairs ofrational numbers (a, b) and take a union over them .]Under the assumption of **in**dependence, Theorem 4 .2 .1complement .has a strik**in**gTheorem 4 .2 .4 .If the events {En) are **in**dependent, then(6)PROOF.1:nBy (3) we haveJ7(En) = oc =:~ 1~6(E n i .o .) = l .(7)00?{lim **in**f En } = lim I n Enn my 00n=mThe events {E i } are **in**dependent as well as {E n }, by Exercise 3 ofSec . 3 .3 ; hence we have if m' > m :m' m' m'n En = 11 ~(Ec) = - 11(1~(En ))n=m n=m n=mNow for any x > 0, we have 1 - x < e -x;does not exceedit follows that the last term abovem'11 e -JP(E ^ ) = exp - )7 (E n )n=mn=mLett**in**g **in**' --* oc, the right member above -* 0, s**in**ce the series **in** theexponent ---> +oo by hypothesis . It follows by monotone property thatncc**in**'J' E; = lim %' n En = 0 .m -+ 00nn=mThus the right member **in** (7) is equal to 0, and consequently so is the leftmember **in** (7) . This is equivalent to :JT(En i .o .) = 1 by (1) .Theorems 4 .2 .1 and 4 .2.4 together will be referred to as theBorel-Cantelli lemma, the former the "convergence part" and the latter "thedivergence part" . The first is more useful s**in**ce the events there may becompletely arbitrary . The second has an extension to pairwise **in**dependentr .v .'s ; although the result is of some **in**terest, it is the method of proof to begiven below that is more important . It is a useful technique **in** probabilitytheory .m'

4 .2 ALMOST SURE CONVERGENCE ; BOREL-CANTELLI LEMMA 1 81Theorem 4 .2 .5. The implication (6) rema**in**s true if the events {E„ } are pairwise**in**dependent .PROOF . Let I„ denote the **in**dicator of E,,, so that our present hypothesisbecomes(8)dm :A n : L (Im 1 n ) = 1(I .)(%'(I n ) .Consider the series of r .v .'s : >n °_ 1 I n (w) . It diverges to +oo if and only ifan **in**f**in**ite number of its terms are equal to one, namely if w belongs to an**in**f**in**ite number of the En 's . Hence the conclusion **in** (6) is equivalent to(9)00What has been said so far is true for arbitrary En 's . Now the hypothesis **in**(6) may be written as00E AI 1 ) = +00 -n=1Consider the partial sum Jk = En=1 I n .have for every A > 0 :Us**in**g Chebyshev's **in**equality, we(10) {I Jk - 0 (Jk)I < A6(Jk)} > 1 -U 2 (Jk)A2U2(Jk)= 1 - 1A 2 'where or 2 (J) denotes the variance of J . Writ**in**gPit = ((I,) = ~~( En ),we may calculate a 2 (Jk) by us**in**g (8), as follows :kn=1In + 21

82 1 CONVERGENCE CONCEPTSIlcnceQ 2 (Jk) = c` (J2) - C`(Jk) 2 = (pn - p2) Q2 (In)n=1 n =1kThis calculation will turn out to be a particular case of a simple basic formula ;see (6) of Sec . 5 .1 . S**in**ce En =1 p n = (Jk) -+ 00, it follows thatk9(h) (Jk) 112 = o(~r(Jk))**in** the classical "o, 0" notation of analysis . Hence if k > k0 (A),1 1~' A > 2 (Jk) > 1 - A2(10) implies(where i may be replaced by any constant < 1) . S**in**ce A **in**creases with kthe **in**equality above holds a fortiori when the first A there is replaced bylimk,~ A ; after that we can let k --+ oo **in** (Jk ) to obta**in**1{k A = +oo} > 1 - A2.S**in**ce the left member does not depend on A, and A is arbitrary, this impliesthat lim k, 0 Jk = + 00 a .e ., namely (9) .Corollary. If the events {En } are pairwise **in**dependent, then?-7-1 (lim sup En ) = 0 or 1naccord**in**g as E n A(En) < 00 or = cc .This is an example of a "zero-or-one" law to be discussed **in** Chapter 8,though it is not **in**cluded **in** any of the general results there .EXERCISESBelow X, , Y„ are r .v .'s, E n events .11 . Give a trivial example of dependent {E n } satisfy**in**g the hypothesisbut not the conclusion of (6) ; give a less trivial example satisfy**in**g the hypothesisbut with ~6 (lim sup n E n ) = 0 . [HINT : Let E n be the event that a real number**in** [0, 1] has its n-ary expansion beg**in** with 0 .]*12 . Prove that the probability of convergence of a sequence of **in**dependentr .v .'s is equal to zero or one .13 . If {X,) is a sequence of **in**dependent and identically distributed r .v .'snot constant a .e ., then /'{X„ converges} = 0 .

4 .2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 83*14 . If IX, j is a sequence of **in**dependent r .v.'s with d .f.'s {F n }, then%T{limn X„ = 0) = 1 if and only if vE > 0 : ~n {1 - Fn (E) + F„ (-E)} < oc .15 . If >n ~(IXnI > n) < oc, thenlim sup IX" I < 1 a .e .n n*16. Strengthen Theorem 4 .2.4 by prov**in**g thatlim Jn = 1 a .e .n-*oc (f(Jn )[HINT : Take a subsequence {k n } such that e(J nk ) k 2 ; prove the result firstfor this subsequence by estimat**in**g P I Jk - f(Jk) I > S (-r(Jk)) ; the generalcase follows because if n k < n < nk+l,Jnk/~P(Jnk+,) < Jnl L(Jn) 1 } > 0 -n-+00[This is due to Kochen and Stone . Truncate Xn at A to Y n with A so largethat C "(Y,) > 1 -,e for all n ; then apply Exercise 6 of Sec . 3 .2 .]*18 . Let {En} be events and {In} their **in**dicators . Prove the **in**equalityn n 2 n 2(11) J' U Ek > E~ kk=1 k=1 k=1Deduce from this that if (i) En 31)(E,1) oo and (ii) there exists c > 0 suchthat we haveb'm < n : ~l'(E,nE„) < cJ''(Em)~'(En-m) ;then{lira sup En) > 0 .n19 . If En;'(En)= oo and11mnn nJ'(E jEk).j=1 k=1nk=1~'(Ek) = 1,}°then I'{limsup, E n } = 1 . [HINT : Use (11) above .]

84 1 CONVERGENCE CONCEPTS20. Let {En } be arbitrary events satisfy**in**g(i) lim /'(En) = 0, (ii) 'T(EnEi +1 ) < 00 ;nnthen ~fllim sup„ E„ } = 0 . [This is due to Barndorff-Nielsen .]4.3 Vague convergenceIf a sequence of r .v .'s {X n } tends to a limit, the correspond**in**g sequence ofp.m .'s {,u } ought to tend to a limit **in** some sense . Is it true that lim n µ,, (A)exists for all A E A 1 or at least for all **in**tervals A? The answer is no fromtrivial examples .Example 1 . Let X„ = c„ where the c„ 's are constants tend**in**g to zero . Then X„ - -* 0determ**in**istically . For any **in**terval I such that 0 0 I, where I is the closure of I, wehave limo µ o (I) = 0 = µ(I) ; for any **in**terval such that 0 E I°, where I° is the **in**teriorof I, we have limo µ„ (I) = 1 = µ(I) . But if [Co) oscillates between strictly positiveand strictly negative values, and I = (a, 0) or (0, b), where a < 0 < b, then µ o (I )oscillates between 0 and 1, while µ(I) = 0 . On the other hand, if I = (a, 0] or [0, b),then µ o (I) oscillates as before but µ (I) = 1 . Observe that {0} is the sole atom of µand it is the root of the trouble .Instead of the po**in**t masses concentrated at c o , we may consider, e .g ., r .v .'s {X,}hav**in**g uniform distributions over **in**tervals (c,, c ;,) where c, < 0 < c', and c, -+ 0,c„ -* 0 . Then aga**in** X„ -+ 0 a.e . but ,a, ((a, 0)) may not converge at all, or convergeto any number between 0 and 1 .Next, even if {µ„ } does converge **in** some weaker sense, is the limit necessarilya p.m .? The answer is aga**in** no .Example 2 . Let X„ = c„ where c„ --+ +oo . Then X o -+ +oo determ**in**istically .Accord**in**g to our def**in**ition of a r.v ., the constant +oo **in**deed qualifies . But forany f**in**ite **in**terval (a, b) we have limo µ„ ((a, b)) = 0, so any limit**in**g measure mustalso be identically zero . This example can be easily ramified ; e .g . let a„ --* -oc,b„ -+ +oo anda„ with probability a,X„ = 0 with probability 1 - a - ~,b„ with probability fi .Then X„ -). X whereFor any f**in**ite **in**tervalX =+oo with probability a,0with probability 1 - a - ~,-oo with probability fi .(a, b) conta**in****in**g 0 we havelim µ o ((a, b)) = lim µ„ ({0}) = 1 - a - fl .

4 .3 VAGUE CONVERGENCE 1 85In this situation it is said that masses of amount a and ,B "have wandered off to +ooand -oc respectively ." The remedy here is obvious: we should consider measures onthe extended l**in**e R* = [-oo, +00], with possible atoms at {+oo} and {-00} . Weleave this to the reader but proceed to give the appropriate def**in**itions which take **in**toaccount the two k**in**ds of troubles discussed above .DEFINITION . A measure g on (LI, A I ) with g( I ) < 1 will be called asubprobability measure (s .p .m .) .DEFINITION OF VAGUE CONVERGENCE . A sequence {g n , n > 1} of s .p .m .'s issaid to converge vaguely to an s .p.m . g iff there exists a dense subset D ofR I such that(1) Va E D, b E D, a < b : An ((a, b]) -± /,t ((a, b]) .This will be denoted by(2) An v gand g is called the vague limit of [An)-We shall see that it is unique below .For brevity's sake, we will write g((a, b]) as it (a, b] below, and similarlyfor other k**in**ds of **in**tervals . An **in**terval (a, b) is called a cont**in**uity **in**tervalof g iff neither a nor b is an atom of g ; **in** other words iff p. (a, b) = It [a, b] .As a notational convention, g(a, b) = 0 when a > b .Theorem 4 .3 .1 . Let 111n) and g be s .p.m .'s . The follow**in**g propositions areequivalent .(i) For every f**in**ite **in**terval (a, b) and E > 0, there exists an n o (a, b, c)such that if n > no, then(3) g(a+E,b-E)-E < gn(a,b) < g(a-E,b+E)+E .Here and hereafter the first term is **in**terpreted as 0 if a + E > b - E .(ii) For every cont**in**uity **in**terval (a, b] of g, we havev(iii) An -* gAn (a, b] g(a, b] .PROOF. To prove that (i) = (ii), let (a, b) be a cont**in**uity **in**terval of g .It follows from the monotone property of a measure thatE o g(a + E, b - E) = /j, (a, b) = [a, b] = limo /-t (a - E, b + E) .Lett**in**g n -+ oc, then c 4, 0 **in** (3), we haveA(a, b) < lim An (a, b)n< l nm gn [a, b] < g[a, b] = A(a, b),

86 1 CONVERGENCE CONCEPTSwhich proves (ii) ; **in**deed (a, b] may be replaced by (a, b) or [a, b) or [a, b]there . Next, s**in**ce the set of atoms of µ is countable, the complementary setD is certa**in**ly dense **in** X4 1 . If a E D, b E D, then (a, b) is a cont**in**uity **in**tervalof µ . This proves (ii) = (iii) . F**in**ally, suppose (iii) is true so that (1) holds .Given any (a, b) and E > 0, there exist a l , a2, b l , b2 all **in** D satisfy**in**ga-E

4 .3 VAGUE CONVERGENCE 1 87PROOF . The employment of two numbers 6 and c **in**stead of one c as **in** (3)is an apparent extension only (why?) . S**in**ce (i') : (i), the preced**in**g theoremyields at once (i') = (ii) q (iii) . It rema**in**s to prove (ii) = (i') when the µ„'sand µ are p.m .'s . Let A denote the set of atoms of µ . Then there exist an**in**teger £ and a j E A`, 1 < j < £, satisfy**in**g :andaj

88 1 CONVERGENCE CONCEPTSwhich states : The set of all s .p.m .'s is sequentially compact with respect tovague convergence . It is often referred to as "Helly's extraction (or selection)pr**in**ciple" .Theorem 4.3 .3 . Given any sequence of s .p.m .'s, there is a subsequence thatconverges vaguely to an s .p.m .PROOF . Here it is convenient to consider the subdistribution function(s .d .f.) F„ def**in**ed as follows :Vx : F n (X) = µn (- 00, X] .If , , is a p.m ., then F n is just its d .f. (see Sec . 2 .2) ; **in** general F n is **in**creas**in**g,right cont**in**uous with F n (-oo) = 0 and Fn (+oo) = An (9 1 ) < 1 .Let D be a countable dense set of R 1 , and let irk, k > 1) be anenumeration of it . The sequence of numbers {F n (rl ), n > 1} is bounded, henceby the Bolzano-Weierstrass theorem there is a subsequence {F lk , k > 1} ofthe given sequence such that the limitlim Flk(rl) = fik-+ o0exists ; clearly 0 < L 1 < 1 . Next, the sequence of numbers {F lk (r2 ), k > 1 } isbounded, hence there is a subsequence {F 2k , k > 1} of {F l k, k > 1} such thatlim F2k(r2) = £2k--*00where 0 < £ 2 < 1 . S**in**ce {F 2k } is a subsequence of {FU}, it converges also atrl to f l . Cont**in**u**in**g, we obta**in**F11, F12, . . . , Flk, . . . converg**in**g at rl ;F21, F22, . . . , F2k, . . . converg**in**g at r l , r2 ;. . . . . . . . . . . . . . . . . . . .Fj 1, Fi2, . . . , Fkk, . . . converg**in**g at rl, r2, . . . , rj ;. . . . . . . . . . . . . . . . . . . .Now consider the diagonal sequence {Fkk, k > 11 . We assert that it convergesat every rj , j > 1 . To see this let r j be given . Apart from the first j - 1 terms,the sequence {Fkk, k > 1) is a subsequence of {Fjk, k > 11, which convergesat rj and hence limky 00 Fkk (rj ) = f j, as desired .We have thus proved the existence of an **in**f**in**ite subsequence Ink) and afunction G def**in**ed and **in**creas**in**g on D such thatYr E D : lim F nk (r) = G(r) .k a o0From G we def**in**e a function F on %il as follows :Yx E ?Rl : F (x) = **in**f G (r) .x

4 .3 VAGUE CONVERGENCE ( 89By Sec . 1 .1, (vii), F is **in**creas**in**g and right cont**in**uous . Let C denote the setof its po**in**ts of cont**in**uity ; C is dense **in** ?4 1 and we show that(8)Vx E C : lim F„ k (x) = F(x) .k+ ocFor, let x E C and c > 0 be given, there exist r, r', and r" **in** D such thatr < r' < x < r" and F(r") - F(r) < E . Then we haveF(r)

90 1 CONVERGENCE CONCEPTS**in**f**in**ity such that the numbers An, (a, b) converge to a limit, say L :A g(a, b) .By Theorem 4 .3 .3, the sequence {g,,, k > 1} conta**in**s a subsequence, sayLu n t I k > 11, which converges vaguely, hence to g by hypothesis of thetheorem. Hence aga**in** by Theorem 4 .3 .1, (ii), we haveA n , (a, b) -+ A(a, b) .But the left side also -f L, which is a contradiction .EXERCISES1. Perhaps the most logical approach to vague convergence is as follows .The sequence {An, n > 1 } of s .p.m .'s is said to converge vaguely iff thereexists a dense subset D of £ such that for every a E D, b E D, a < b, thesequence (An (a, b), n > 1 } converges . The def**in**ition given before implies this,of course, but prove the converse .2 . Prove that if (1) is true, then there exists a dense set D', such thatAn (I) -+ g(I) where I may be any of the four **in**tervals (a, b), (a, b], [a, b),[a, b] with a E D', b c D' .3. Can a sequence of absolutely cont**in**uous p .m .'s converge vaguely toa discrete p.m .? Can a sequence of discrete p .m .'s converge vaguely to anabsolutely cont**in**uous p.m.?4. If a sequence of p .m .'s converges vaguely to an atomless p.m ., thenthe convergence is uniform for all **in**tervals, f**in**ite or **in**f**in**ite . (This is due toPolya .)5. Let {fn } be a sequence of functions **in**creas**in**g **in** R1 and uniformlybounded there : Supra,,, I fn WI < M < oo . Prove that there exists an **in**creas**in**gfunction f on 91 and a subsequence {n k } such that f nk (x) -* f (x) for everyx. (This is a form of Theorem 4 .3 .3 frequently given ; the **in**sistence on "everyx" requires an additional argument .)6. Let {g n } be a sequence of f**in**ite measures on /31 . It is said to convergevaguely to a measure g iff (1) holds . The limit g is not necessarily a f**in**itemeasure . But if It, (M ) is bounded **in** n, then g is f**in**ite .7 . If 9'n is a sequence of p .m.'s on (Q, 3T) such that ,/'n (E) convergesfor every E E ~7, then the limit is a p .m . ~' . Furthermore, if f is boundedand 9,T-measurable, thenI f d .~'n -- ' f dJ' .J s~ ~(The first assertion is the Vitali-Hahn-Saks theorem and rather deep, but itcan be proved by reduc**in**g it to a problem of summability ; see A. Renyi, [24] .

4.4 CONTINUATION 1 918. If g„ and g are p .m .'s and g„ (E) -- u(E) for every open set E, thenthis is also true for every Borel set . [HINT: Use (7) of Sec . 2 .2 .]9. Prove a convergence theorem **in** metric space that will **in**clude bothTheorem 4 .3 .3 for p .m .'s and the analogue for real numbers given before thetheorem . [IIIIVT : Use Exercise 9 of Sec . 4 .4 .]4 .4 Cont**in**uationWe proceed to discuss another k**in**d of criterion, which is becom**in**g ever morepopular **in** measure theory as well as functional analysis . This has to do withclasses of cont**in**uous functions on R1 .CK = the class of cont**in**uous functions f each vanish**in**g outside acompact set K(f ) ;Co = the class of cont**in**uous functions f such thatlima, f (x) = 0 ;CB = the class of bounded cont**in**uous functions ;C = the class of cont**in**uous functions .We have CK C Co C CB C C . It is well known that Co is the closure of CKwith respect to uniform convergence .An arbitrary function f def**in**ed on an arbitrary space is said to havesupport **in** a subset S of the space iff it vanishes outside S . Thus if f E CK,then it has support **in** a certa**in** compact set, hence also **in** a certa**in** compact**in**terval . A step function on a f**in**ite or **in**f**in**ite **in**terval (a, b) is one with support**in** it such that f (x) = cj for x E (aj, aj+i) for 1 < j < £, where £ is f**in**ite,a = al < . . . < at = b, and the cg's are arbitrary real numbers . It will becalled D-valued iff all the ad's and cg's belong to a given set D . When the**in**terval (a, b) is M1, f is called just a step function . Note that the values off at the po**in**ts aj are left unspecified to allow for flexibility ; frequently theyare def**in**ed by right or left cont**in**uity . The follow**in**g lemma is basic .Approximation Lemma. Suppose that f E CK has support **in** the compact**in**terval [a, b] . Given any dense subset A of °41 and e > 0, there exists anA-valued step function f E on (a, b) such that(1) sup I f (x) - f E (x) I < E .x E.RIf f E C0, the same is true if (a, b) is replaced by M1This lemma becomes obvious as soon as its geometric mean**in**g is grasped .In fact, for any f **in** CK, one may even require that either f E < f or f E > f .

92 1 CONVERGENCE CONCEPTSThe problem is then that of the approximation of the graph of a plane curveby **in**scribed or circumscribed polygons, as treated **in** elementary calculus . Butlet us remark that the lemma is also a particular case of the Stone-Weierstrasstheorem (see, e .g ., Rud**in** [2]) and should be so verified by the reader . Sucha sledgehammer approach has its merit, as other k**in**ds of approximationsoon to be needed can also be subsumed under the same theorem . Indeed,the discussion **in** this section is meant **in** part to **in**troduce some modernterm**in**ology to the relevant applications **in** probability theory . We can nowstate the follow**in**g alternative criterion for vague convergence .Theorem 4 .4 .1 . Let [/-t,) and µ be s .p .m .'s . Then µ n - µ if and only if(2) Vf E CK[or Co] : f~ f (x)µn (dx) f (x)µ (dx) .PROOF . Suppose µ n - lc ; (2) is true by def**in**ition when f is the **in**dicatorof (a, b] for a E D, b E D, where D is the set **in** (1) of Sec . 4 .3 . Hence by thel**in**earity of **in**tegrals it is also true when f is any D-valued step function . Nowlet f E C o and c > 0 ; by the approximation lemma there exists a D-valuedstep function f E satisfy**in**g (1) . We have(3) ffdll- n ffdµ

4 .4 CONTINUATION 1 93that is l**in**ear **in** (a - 6, a) and (b, b + 6) . It is clear that g, < g < 92 < 91 + 1and consequently(4) fgidn ld._ f g d A n _< f 92 dl-t,(5) fgid g< f g dg < f gd . 2 gS**in**ce gi E CK , 92 E CK, it follows from (2) that the extremeconverge to the correspond**in**g terms **in** (5) . S**in**ceterms **in** (4)f 92 dg - f gl dg < fU1dµ - µ(U) < E,and E is arbitrary, it follows that the middle term **in** (4) also converges to that**in** (5), prov**in**g the assertion .Corollary . If {g„ } is a sequence of s .p.m .'s such that for every f E CK,exists, then 1g, j converges vaguely .limff (x)g, (dx)n ~,For by Theorem 4 .3 .3 a subsequence converges vaguely, say to g . ByTheorem 4 .4 .1, the limit above is equal to f,,,,, f (x)g (dx) . This must thenbe the same for every vaguely convergent subsequence, accord**in**g to thehypothesis of the corollary. The vague limit of every such sequence istherefore uniquely determ**in**ed (why?) to be g, and the corollary follows fromTheorem 4 .3 .4 .Theorem 4 .4 .2 . Let {g n } and g be p.m .'s . Then gn -v,. A if and only if(6) Vf E CB : f f (x) A, (dx) -a f f (x) A (A) .fIPROOF . Suppose g„ -v, g . Given E > 0, there exist a and b **in** D such that(7) lt((a, b] c ) = 1 - g((a, b]) < E .It follows from vague convergence that there exists n0(E) such that ifn > n0(E), then(8) pt, ((a, b]`) = 1 - An ((a, b]) < e .Let f E C B be given and suppose that I f I < M < oc . Consider the functionf E , which is equal to f on [a, b], to zero on (-oo, a - 1) U (b + 1, oc),

94 1 CONVERGENCE CONCEPTSand which is l**in**ear **in** [a - 1, a) and **in** (b, b + I] . Then f E E Ch andf - f E < 2M . We have by Theorem 4 .4 .1(9) / fEdlit„ -~ fEdA .On the other hand, we have( 10 ) f If - f E I d A, 2M d it, < 2ME~~1 < I (a bhby (8) . A similar estimate holds with µ replac**in**g µ„ above, by (7) . Now theargument lead**in**g from (3) to (2) f**in**ishes the proof of (6) **in** the same way .This proves that lt„ -v+ g implies (6) ; the converse has already been proved**in** Theorem 4 .4 .1 .Theorems 4 .3.3 and 4 .3 .4 deal with s .p.m .'s .Even if the given sequence{µ„ } consists only of strict p.m .'s, the sequential vague limit may not be so .This is the sense of Example 2 **in** Sec . 4 .3 . It is sometimes demanded thatsuch a limit be a p.m . The follow**in**g criterion is not deep, but applicable .Theorem 4 .4 .3 . Let a family of p .m .'s {µ a , a E A) be given on an arbitrary**in**dex set A . In order that every sequence of them conta**in**s a subsequencewhich converges vaguely to a p.m ., it is necessary and sufficient that thefollow**in**g condition be satisfied : for any E > 0, there exists a f**in**ite **in**terval Isuch that(11) **in**f ha (I) > 1 - E .aEAPROOF . Suppose (11) holds . For any sequence {,a } from the family, thereexists a subsequence {,u ;t } such that µn ~ it . We show that µ is a p.m . Let Jbe a cont**in**uity **in**terval of µ which conta**in**s the I **in** (11) . Thenµ(L' )> lt(J) = lim lt„ (J) > lim m1(I)11S**in**ce E is arbitrary, lc (%T ~) = 1 . Conversely, suppose the condition **in**volv**in**g(11) is not satisfied, then there exists c > 0, a sequence of f**in**ite **in**tervals I„**in**creas**in**g to ~W l , and a sequence {,u,,, } from the family such thatVn : p,, (I„) < 1 - E .Let {µn } and It be as before and J any cont**in**uity **in**terval of A . Then J C I„for all sufficiently large n, so thatµ(J) s=limµ„(J) < lim -i (I„)

4.4 CONTINUATION 1 95A family of p .m .'s satisfy**in**g the condition above **in**volv**in**g (11) is said tobe tight . The preced**in**g theorem can be stated as follows : a family of p .m .'s isrelatively compact if and only if it is tight . The word "relatively" purports thatthe limit need not belong to the family ; the word "compact" is an abbreviationof "sequentially vaguely convergent to a strict p.m ." Extension of the resultto p .m .'s **in** more general topological spaces is straight-forward but plays animportant role **in** the convergence of stochastic processes .The new def**in**ition of vague convergence **in** Theorem 4 .4 .1 has theadvantage over the older ones **in** that it can be at once carried over to measures**in** more general topological spaces . There is no substitute for "**in**tervals" **in**such a space but the classes C K, C o and C B are readily available . We willillustrate the general approach by **in**dicat**in**g one more result **in** this direction .Recall the notion of a lower semicont**in**uous function on R1 def**in**ed by :(12) /x E M1 : f (x) < lim f (y) .Y_ XThere are several equivalent def**in**itions (see, e .g ., Royden [5]) butthe follow**in**g characterization is most useful : f is bounded and lowersemicont**in**uous if and only if there exists a sequence of functions f k E C Bwhich **in**creases to f everywhere, and we call f upper semicont**in**uous iff -fis lower semicont**in**uous . Usually f is allowed to be extended-valued ; but toavoid complications we will deal with bounded functions only and denoteby L and U respectively the classes of bounded lower semicont**in**uous andbounded upper semicont**in**uous functions .Theorem 4 .4 .4 . If {µ n } and µ are p.m .'s, then µn 4 µ if and only if oneof the two conditions below is satisfied :(13) V f E L : l ttm f ,f (x)µ 11 (dx) >ff (x)µ (dx)y54xVg E U : li1m Jg(x)µ n (dx) < Jg(x)µ (A) .PROOF . We beg**in** by observ**in**g that the two conditions above are equivalentby putt**in**g f = -g. Now suppose µ„ 4 µ and let f k E CB, f k T f .Then we have(14) l „m [f(x)j**in** (dx) > lim f f k (x)l**in** (dx) = f f k (x)µ (dx)by Theorem 4 .4 .2. Lett**in**g k -* oc the last **in**tegral above converges tof f (x)µ(dx) by monotone convergence . This gives the first **in**equality **in** (13) .Conversely, suppose the latter is satisfied and let co c CB, then ~p belongs to

merely X a convenient turn of speech96 1 CONVERGENCE CONCEPTSboth L and U, so that~o(dx)f (P(x)µ (dx) < lim n I cp(x)µn (dx) < n lien J (p[(x)iLn (dx) < [(x)ṗwhich provesl**in**en f cp(x)µn (dx) _ f (p(x) . (dx) .Hence µ„ - p. by Theorem 4 .4 .2 .Remark . (13) rema**in**s true if, e .g ., f is lower semicont**in**uous, with +ooas a possible value but bounded below .Corollary. The conditions **in** (13) may be replaced by the follow**in**g :We leave this as an exercise .for every open 0 : limit, (0) > µ(O) ;nfor every closed C : lien µ, (C) < µ(C) .nF**in**ally, we return to the connection between the convergence of r .v .'sand that of their distributions .DEFINITION OF CONVERGENCE "IN DISTRIBUTION" (**in** dist .) . A sequence ofr .v.'s {Xn } is said to converge **in** distribution to F iff the sequence [Fn)of correspond**in**g d .f.'s converges vaguely to the d .f. F .If X is an r .v . that has the d .f. F, then by an abuse of language we shallalso say that {X, } converges **in** dist . to.Theorem 4 .4 .5 . Let IF, J, F be the d .f .'s of the r .v .'s {Xn }, X . If Xn -+ X **in**pr., then Fn - F . More briefly stated, convergence **in** pr . implies convergence**in** dist .PROOF . If Xn -+ X **in** pr ., then for each f E CK, we have f (X,) -* f (X)**in** pr . as easily seen from the uniform cont**in**uity of f (actually this is truefor any cont**in**uous f, see Exercise 10 of Sec . 4 .1). S**in**ce f is bounded theconvergence holds also **in** LI by Theorem 4 .1 .4 . It follows thatf {f (X,,)) -+ (t{f M),which is just the relation (2) **in** another guise, hence µ, 4 µ .Convergence of r .v .'s **in** d ist . i s; itdoes not have the usual properties associated with convergence . For **in**stance,

4 .4 CONTINUATION 1 97if X„ -+ X **in** dist. and Y, -* Y **in** dist ., it does not follow by any meansthat X„ + Y, will converge **in** dist . to X + Y . This is **in** contrast to the trueconvergence concepts discussed before ; cf. Exercises 3 and 6 of Sec . 4 .1 . Butif X, and Y, are **in**dependent, then the preced**in**g assertion is **in**deed true as aproperty of the convergence of convolutions of distributions (see Chapter 6) .However, **in** the simple situation of the next theorem no **in**dependence assumptionis needed . The result is useful **in** deal**in**g with limit distributions **in** thepresence of nuisance terms .Theorem 4 .4 .6 . If X, --* X **in** dist, and Y„ --* 0 **in** dist ., then(a) X, + Y, -* X **in** dist .(b) X„ Y n --* 0 **in** dist .PROOF. We beg**in** with the remark that for any constant c, Y, --* c **in** dist .i s equivalent to Y„ - c **in** pr. (Exercise 4 below) . To prove (a), let f E CK ,I f I < M . S**in**ce f is uniformly cont**in**uous, given e > 0 there exists 8 suchthat Ix - yI < 8 implies I f (x) - f (y) I < E . Hence we have{If(Xn+Yn) - f(XI)I}< Ef{I f (X,, + Y,1) - f(X,,)I < E} I +2WP7 {I f(X,, + Y,) - f(Xn)I > E}< E+2MJ {IYnI > 8) .The last-written probability tends to zero asn -- oo ; it follows thatlira ("If (X n + Y n )} = lim c {f (X")'= c {f (X )l1100 11-*00by Theorem 4 .4 .1, and (a) follows by the same theorem .To prove (b), for given e > 0 we choose Ao so that both ±A0 are po**in**tsof cont**in**uity of the d .f. of X, and so large thatlim 9~'{IX I > A 0 } = ~fl IXI > A o } < E .n-*ooThis means %~'{IX„ I > Ao} < E for n > n o (-e) . Furthermore we choose A > A0so that the same **in**equality holds also for n < no(E) . Now it is clear that;-~'{lXnYnI > E} < ~ ,1 {IX,I >A}+ ?/'{IYnI > A1

98 I CONVERGENCE CONCEPTSEXERCISES*1. Let g, and A be p.m .'s such that An A . Show that the conclusion**in** (2) need not hold if (a) f is bounded and Borel measurable and all Anand A are absolutely cont**in**uous, or (b) f is cont**in**uous except at one po**in**tand every A„ is absolutely cont**in**uous . (To f**in**d even sharper counterexampleswould not be too easy, **in** view of Exercise 10 of Sec . 4 .5 .)2 . Let A, ~ A when the ,a 's are s .p.m .'s . Then for each f E C andeach f**in**ite cont**in**uity **in**terval I we have fl f dA„ -). rt f dµ .*3 . Let it, and It be as **in** Exercise 1 . If the f„ 's are bounded cont**in**uousfunctions converg**in**g uniformly to f, then f f „ dl-t,, -* f f d A .*4 . Give an example to show that convergence **in** dist . does not imply that**in** pr. However, show that convergence to the unit mass 8Q does imply that **in**pr. to the constant a .5. A set {µa } of p.m .'s is tight if and only if the correspond**in**g d .f.'s{F a } converge uniformly **in** a as x --> -oc and as x -+ +oo .6. Let the r .v .'s {X a } have the p .m .'s {A a } . If for some real r > 0,{ I Xa I'} is bounded **in** a, then {A a } is tight .7. Prove the Corollary to Theorem 4 .4 .4 .8 . If the r .v.'s X and Y satisfyJ'{ IX - Y I> E} < Efor some E, then their d .f.'s F and G satisfy**in**g the **in**equalities :(15) Vx E Z I : F(x - E) - E < G(x) < F(x + E) + E .Derive another proof of Theorem 4 .4 .5 from this .*9. The Levy distance of two s .d .f.'s F and G is def**in**ed to be the **in**fimumof all E > 0 satisfy**in**g the **in**equalities **in** (15) . Prove that this is **in**deed a metric**in** the space of s .d .f.'s, and that F„ converges to F **in** this metric if and onlyif F n --'*' Fand fxdF„~ f~dF .10 . F**in**d two sequences of p .m .'s (,a,,) and {v„ } such thatbut for no f**in**ite (a, b) is it true thatVfECK :J fdA„- / fdv„>0 ;An (a, b) - v„ (a, b) - 0 .[HINT : Let A„ = 8,,, v„ = &,, and choose {r„}, {s„} suitably .]11 . Let {An } be a sequence of p.m .'s such that for each f E CB, thesequence f ,, f dA„ converges ; then An 4 A, where A is a p.m . [HINT: If the

4 .5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 99hypothesis is strengthened to **in**clude every f **in** C, and convergence of realnumbers is **in**terpreted as usual as convergence to a f**in**ite limit, the result iseasy by tak**in**g an f go**in**g to oo . In general one may proceed by contradictionus**in**g an f that oscillates at **in**f**in**ity .]*12 . Let F n and F be d .f.'s such that F n 4 F . Def**in**e G n (9) and G(0)as **in** Exercise 4 of Sec . 3 .1 . Then G,(0) - G(9) **in** a .e . [HINT : Do this firstwhen F n and F are cont**in**uous and strictly **in**creas**in**g . The general case isobta**in**ed by smooth**in**g Fn and F by convolut**in**g with a uniform distribution**in** [-6, +8] and lett**in**g 8 ~ 0 ; see the end of Sec . 6 .1 below .]4.5 Uniform **in**tegrability; convergence of momentsThe function IxI', r > 0, is **in** C but not **in** CB, hence Theorem 4 .4 .2 doesnot apply to it. Indeed, we have seen **in** Example 2 of Sec . 4 .1 that evenconvergence a .e. does not imply convergence of any moment of order r > 0 .For, given r, a slight modification of that example will yield X„ - X a .e .,(IXn ~') = 1 but

100 1 CONVERGENCE CONCEPTSTheorem 4 .5 .2 . If {X n } converges **in** d ist . to X, and for some p > 0,sup,, cc'{ IX„ I P } = M < oo, then for each r < p :Jimr) 00(2)c (IX, ( r ) = U~(IX A ;(-A)r, if x < -A .Then fA E CB , hence by Theorem 4 .4.4 the "truncated moments" converge :Next we have00 00f fA(x)dFn(x) f fA (x)dF(x) .00f IfA(x) - x r IdFn(x) < fIxl r dF,(x)=JIX n I r dJ~00 ~xI>A jX„ I>A1< Ap_r IXnI P d~' < MS2AP-rThe last term does not depend on n, and converges to zero as A -* 00 . Itfollows that as A -+ oo, f 00 fA dF n converges uniformly **in** n to f 0 xr dF .Hence by a standard theorem on the **in**version of repeated limits, we havexf 00 fAdF= f 00 fAdF fl_ 00 A-*0O 00 A-*o0 n-*00 I .,(4) x r dF = lim lim limf00 00= lim lim fA dF n = lim / xr dF n .n oo A-* 00/l -_, 0000We now **in**troduce the concept of uniform **in**tegrability, which is of basicimportance **in** this connection . It is also an essential hypothesis **in** certa**in**convergence questions aris**in**g **in** the theory of mart**in**gales (to be treated **in**Chapter 9) .DEFINITION OF UNIFORM INTEGRABILITY . A family of r .v .'s {X,}, t E T,where T is an arbitrary **in**dex set, is said to be uniformly **in**tegrable iff(5) limIXtId :~6 = 0A->o0 1X,1>Auniformly **in** t E T .00

4 .5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 101Theorem 4 .5 .3 . The family {X t } is uniformly **in**tegrable if and only if thefollow**in**g two conditions are satisfied :(a) ~(IX, I) is bounded **in** t E T ;(b) For every c > 0, there exists8(e) > 0 such that for any E E~P(E) < 8(e) ==;~ f IXt I dJ' < F for every t E T .EPROOF . Clearly (5) implies (a) . Next, let E E ~T and write E t for the set{w : IXt (w) I > A) . We have by the mean value theoremfIXtI dGP' = f + f IXtI dGJ'E EriE, E\E, < fEl E,dg'+AJ'(E) .Given c > 0, there exists A = A(e) such that the last-written **in**tegral is lessthan c/2 for every t, by (5) . Hence (b) will follow if we set 8 = E/2A . Thus(5) implies (b) .Conversely, suppose that (a) and (b) are true . Then by the Chebyshev**in**equality we have for every t,lfl IXtI>A) < AIXt1) < MA - A'where M is the bound **in**dicated **in** (a) . Hence if A > M/8, then GJ'(E t ) < 8and we have by (b) :Thus (5) is true .fE,IXt I dGJ' < E .Theorem 4 .5 .4 . Let 0 < r < oo, X, E Lr, and X„ -+ X **in** pr . Then thefollow**in**g three propositions are equivalent :(i) {IX n I r} is uniformly **in**tegrable ;(ii) X n -* X **in** Lr ;(iii) (IXnI r ) (IXIr) < 00 .PROOF . Suppose (i) is true ; s**in**ce X, -~ X **in** pr . by Theorem 4 .2 .3, thereexists a subsequence **in**k) such that Xfk -+ X a .e. By Theorem 4 .5 .1 and (a)above, X E Lr . The easy **in**equalityIXn -XIr < 2r {IXnI r + (XI r ),valid for all r > 0, together with Theorem 4 .5 .3 then implies that the sequencefixn - XIr } is also uniformly **in**tegrable . For each c > 0, we have

102 1 CONVERGENCE CONCEPTS(6) f IXn _XIrd=f!I' IXn -xlrd~T+ IX n -XI'dJ'2 X„-XI>E /I X, -XIES**in**ce J/'{IX n - X I > E} - 0 as n -+ oo by hypothesis, it follows from (b)above that the last written **in**tegral **in** (6) also tends to zero . This be**in**g truefor every E > 0, (ii) follows .Suppose (ii) is true, then we have (iii) by the second assertion ofTheorem 4 .5 .1 .F**in**ally suppose (iii) is true . To prove (i), let A > 0 and consider a functionfA **in** CK satisfy**in**g the follow**in**g conditions :fA(x)= IxlrjX jr A + 1 ;cf. the proof of Theorem 4 .4 .1 for the construction of such a function . Hencewe havelimIX, I r d-~711 > lim (-r{fA(Xn)} _ ~~{fA(X)} > IXl r dg,n-*oo IX,lrA+1fIx" Ir d 7 AThe last **in**tegral does not depend on n and converges to zero as A -+ oo . Thismeans : for any c > 0, there exists Ao = Ao(E) and no = no(Ao(E)) such thatwe havesupfixn>no , V r>A+1IX n I" d °T < Eprovided that A > Ao . S**in**ce each IX n Ir is **in**tegrable, there exists A 1 = A 1 (E)such that the supremum above may be taken over all n > 1 provided thatA > Ao v A, . This establishes (i), and completes the proof of the theorem .In the rema**in**der of this section the term "moment" will be restricted to amoment of positive **in**tegral order . It is well known (see Exercise 5 of Sec . 6 .6)that on (?/, A) any p.m. or equivalently its d .f. is uniquely determ**in**ed byits moments of all orders . Precisely, if F 1 and F 2 are two d .f.'s such that

4.5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 103F i (0) = 0, F ; (1) = 1for i = l, 2 ; and1 1do > l :f x" dF 1 (x) = 0 I0x" dF2(x),then F1 - F2 . The correspond**in**g result is false **in** (0, J3 1 ) and a furthercondition on the moments is required to ensure uniqueness . The sufficientcondition due to Carleman is as follows :O0 11/2rr=1 m2rwhere Mr denotes the moment of order r . When a given sequence of numbers{m r , r > 1) uniquely determ**in**es a d .f. F such that(7) Mr =Jxr dF(x),we say that "the moment problem is determ**in**ate" for the sequence . Of coursean arbitrary sequence of numbers need not be the sequence of moments forany d .f. ; a necessary but far from sufficient condition, for example, is thatthe Liapounov **in**equality (Sec . 3 .2) be satisfied . We shall not go **in**to thesequestions here but shall content ourselves with the useful result below, whichis often referred to as the "method of moments" ; see also Theorem 6 .4 .5 .Theorem 4 .5 .5 . Suppose there is a unique d .f. F with the moments {m ( r ) , r >11, all f**in**ite . Suppose that {F„ } is a sequence of d .f.'s, each of which has allits moments f**in**ite :00In nr) = x r dF„ .F**in**ally, suppose that for every r > 1 :(8) lim m nr) = mf r > .ri-00Then F ,1 --v+ F .PROOF .Let A,, be the p .m . correspond**in**g to F,, . By Theorem 4 .3 .3 thereexists a subsequence of {µ„ } that converges vaguely . Let {µ„ A } be any subsequenceconverg**in**g vaguely to some µ . We shall show that µ is **in**deed ap.m . with the d .f. F . By the Chebyshev **in**equality, we have for each A > 0 :µ 11A (-A, +A) > 1 - A-2 mn~~ .S**in**ce m ;,2) - m (2) < 00, it follows that as A -+ oo, the left side convergesuniformly **in** k to one . Lett**in**g A - oo along a sequence of po**in**ts such that

104 I CONVERGENCE CONCEPTSboth ::LA belong to the dense set D **in**volved **in** the def**in**ition of vague convergence,we obta**in** as **in** (4) above :4 (m 1 ) = lim g (-A, +A) = lim lim /.t,, (-A, +A)A--ocA-+oo k-+oo= lim lim g nk (-A, +A) = lim g fk (A 1 ) = 1 .k-+oo A-), oo k-+o0Now for each r, let p be the next larger even **in**teger . We haveXP dg, = m(np) --~ m (p) ,hence mnp ) is bounded **in** k . It follows from Theorem 4 .5 .2 thataog~ x r dg .f :xrdnk fJ 00But the left side also converges to m (r) by (8). Hence by the uniquenesshypothesis g is the p .m. determ**in**ed by F . We have therefore proved that everyvaguely convergent subsequence of [An}, or equivalently {F n }, has the samelimit g, or equivalently F . Hence the theorem follows from Theorem 4 .3 .4 .EXERCISESI 1 . If SUP, IXn E LP and X n --* X a .e ., then X E LP and X n -a X **in** LP .2 . If {X,) is dom**in**ated by some Y **in** L", and converges **in** d ist . to X,then (-`(IXnV) C(IXIp) .3 . If X„ -* X **in** dist ., and f E C, then f (X„) - f (X) **in** dist .*4. Exercise 3 may be reduced to Exercise 10 of Sec . 4 .1 as follows . LetF n , 1 < n < cc, be d .f .'s such that Fn $ F . Let 0 be uniformly distributed on[0, 1 ] and put X„ = F- 1 (0), 1 < n < oo, where Fn 1 (y) = sup{x : F n (x) < y}(cf. Exercise 4 of Sec . 3 .1) . Then X n has d .f. F n and X, -+ X,,, **in** pr .5. F**in**d the moments of the normal d .f. 4 and the positive normal d .f .(D + below :x() 1 e _,,'/2 dy, + (x) V n2 foe -yz/2d y, if x> 0 ;4) x = ~ =.,/27r -x0, if x < 0 .Show that **in** either case the moments satisfy Carleman's condition .6 . If {X, } and {Y, } are uniformly **in**tegrable, then so is {X, + Y 1 } and{Xt + Y t } .7 . If {X„ } is dom**in**ated by some Y **in** L 1 or if it is identically distributedwith f**in**ite mean, then it is uniformly **in**tegrable .

4 .5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 105*8 . If sup„ (r(IX„ I p) < oc for some p > 1, then {X, } is uniformly **in**tegrable.9 . If {X,, } is uniformly **in**tegrable, then the sequenceis uniformly **in**tegrable .*10 . Suppose the distributions of {X„ , 1 < n < oc) are absolutely cont**in**uouswith densities {g n } such that g n -+ g,,,, **in** Lebesgue measure . Thengn -a g . **in** L' (-oo, oo), and consequently for every bounded Borel measurablefunction f we have fl f (X„ ) } - S { f (Xo,)} . [HINT : f (go, - g, ) + dx =f (g oo - g n )- dx and (g am - g,)+ < g am ; use dom**in**ated convergence .]

5 Law of large numbers .Random series5 .1 Simple limit theoremsThe various concepts of Chapter 4 will be applied to the so-called "law oflarge numbers" - a famous name **in** the theory of probability . This has to dowith partial sumsS,lof a sequence of r .v .'s . In the most classical formulation, the "weak" or the"strong" law of large numbers is said to hold for the sequence accord**in**g as/7j=1(1) Sri - (1 ~ (Sll ) 0n**in** pr . or a .e. This, of course, presupposes the f**in**iteness of (,`(S,,) .generalization is as follows :A naturalSilb„allwhere {a„ } is a sequence of real numbers and {b„ } a sequence of positivenumbers tend**in**g to **in**f**in**ity . We shall present several stages of the development,- -*

5 .1 SIMPLE LIMIT THEOREMS 1 107even though they overlap each other to some extent, for it is just as importantto learn the basic techniques as the results themselves .The simplest cases follow from Theorems 4 .1 .4 and 4 .2 .3, accord**in**g towhich if Z n is any sequence of r .v .'s, then c`'(Z2) __+ 0 implies that Z, -+ 0**in** pr . and Znk -+ 0 a .e. for a subsequence Ink) . Applied to Z„ = S,, In, thefirst assertion becomes(2) ((S n) = o(n 2 ) Sn 0 **in** pr .nNow we can calculate f (Sn) more explicitly as follows :(3)X~ + 21_

108 1 LAW OF LARGE NUMBERS . RANDOM SERIESwithout it the def**in**itions are hardly useful . F**in**ally, it is obvious that pairwise**in**dependence implies uncorrelatedness, provided second moments are f**in**ite .If {X,, } is a sequence of uncorrelated r .v .'s, then the sequence {X„ -(, (X,,)) is orthogonal, and for the latter (3) reduces to the fundamental relationbelow :(6)U2 (Sn)nj= 12 (X j),which may be called the "additivity of the variance" . Conversely, the validityof (6) for n = 2 implies that X 1 and X2 are uncorrelated . There are only nterms on the right side of (6), hence if these are bounded by a fixed constantwe have now 62 (S n ) = 0(n) = o(n 2 ) . Thus (2) becomes applicable, and wehave proved the follow**in**g result .Theorem 5 .1 .1 . If the X j 's are uncorrelated and their second moments havea common bound, then (1) is true **in** L 2 and hence also **in** pr .This simple theorem is actually due to Chebyshev, who **in**vented hisfamous **in**equalities for its proof . The next result, due to Rajchman (1932),strengthens the conclusion by prov**in**g convergence a .e . This result is **in**terest**in**gby virtue of its simplicity, and serves well to **in**troduce an importantmethod, that of tak**in**g subsequences .Theorem 5 .1 .2 .also a .e .Under the same hypotheses as **in** Theorem 5 .1 .1, (1) holdsPROOF . Without loss of generality we may suppose that (F (X j ) = 0 foreach j, so that the X j 's are orthogonal . We have by (6) :(t (S2) < Mn,where M is a bound for the second moments . It follows by Chebyshev's**in**equality that for each c > 0 we haveMnM~{IS,I > nE} < - 2E2 = nE2 .If we sum this over n, the result**in**g series on the right diverges . However, ifwe conf**in**e ourselves to the subsequence {n 2 }, thenn~S n 21 > n 2 E) _Hence by Theorem 4 .2 .1 (Borel-Cantelli) we havennM2 < 00 .(7)°/'{I sit21 > n 2 E i .o .} = 0 ;

5 .1 SIMPLE LIMIT THEOREMS 1 109and consequently by Theorem 4 .2 .2(8)n2--* 0 a .e .We have thus proved the desired result for a subsequence ; and the "methodof subsequences" aims **in** general at extend**in**g to the whole sequence a resultproved (relatively easily) for a subsequence . In the present case we must showthat Sk does not differ enough from the nearest S n 2 to make any real difference .Put for each n > 1 :Then we haveD n = max ISk -S n z 1 .n2

110 1 LAW OF LARGE NUMBERS . RANDOM SERIESvk" ) (w) denote the number of digits among the first n digits of w that areequal to k . Then vk"~(w)/n is the relative frequency of the digit k **in** the firstn places, and the limit, if exist**in**g :lim v k" (60) = (Pk (w),n-*oo nmay be called the frequency of k **in** w . The number w is called simply normal(to the scale 10) iff this limit exists for each k and is equal to 1/10 . Intuitivelyall ten possibilities should be equally likely for each digit of a number picked"at random". On the other hand, one can write down "at random" any numberof numbers that are "abnormal" accord**in**g to the def**in**ition given, such as• 1111 . . ., while it is a relatively difficult matter to name even one normalnumber **in** the sense of Exercise 5 below . It turns out that the number1 2345678910111213 . . .,which is obta**in**ed by writ**in**g down **in** succession all the natural numbers **in**the decimal system, is a normal number to the scale 10 even **in** the str**in**gentdef**in**ition of Exercise 5 below, but the proof is not so easy . As fordeterm**in****in**g whether certa**in** well-known numbers such as e - 2 or r - 3 arenormal, the problem seems beyond the reach of our present capability formathematics . In spite of these difficulties, Borel's theorem below asserts that**in** a perfectly precise sense almost every number is normal . Furthermore, thisstrik**in**g proposition is merely a very particular case of Theorem 5 .1 .2 above .Theorem 5 .1 .3 . Except for a Borel set of measure zero, every number **in**[0, 1] is simply normal .PROOF . Consider the probability space (/, J3, m) **in** Example 2 ofSec . 2 .2 . Let Z be the subset of the form m/l0" for **in**tegers n > 1, m > 1,then m(Z) = 0 . If w E 'Z/\Z, then it has a unique decimal expansion ; if w E Z,it has two such expansions, but we agree to use the "term**in**at**in**g" one for thesake of def**in**iteness . Thus we havew= •~ 1~2 . . .~n . . . .where for each n > 1, "() is a Borel measurable function of w . Just as **in**Example 4 of Sec . 3 .3, the sequence [~", n > 1) is a sequence of **in**dependentr.v .'s withfi n = k) _ ,lo , k=0,1, . . .,9 .Indeed accord**in**g to Theorem 5 .1 .2 we need only verify that the ~"'s areuncorrelated, which is a very simple matter . For a fixed k we def**in**e the

5 .1 SIMPLE LIMIT THEOREMS 1 1 1 1r .v. X„ to be the **in**dicator of the set {c ) : 4, (co) = k}, then

112 1 LAW OF LARGE NUMBERS . RANDOM SERIESlarge numbers . Without us**in**g Theorem 5 .1 .2 we may operate with (F(S4**in** 4 )as we did with C`(S /n 2 ) . Note that the full strength of **in**dependence is notneeded .]5. We may strengthen the def**in**ition of a normal number by consider**in**gblocks of digits . Let r > 1, and consider the successive overlapp**in**g blocks ofr consecutive digits **in** a decimal ; there are n - r + 1 such blocks **in** the firstn places . Let 0")(w) denote the number of such blocks that are identical witha given one ; for example, if r = 5, the given block may be "21212" . Provethat for a .e . w, we have for every r :lim v (")(w) = 1-n-±oo n l or[HINT : Reduce the problem to disjo**in**t blocks, which are **in**dependent .]*6. The above def**in**ition may be further strengthened if we consider differentscales of expansion . A real number **in** [0, 1 ] is said to be completelynormal iff the relative frequency of each block of length r **in** the scale s tendsto the limit 1 /sr for every s and r . Prove that almost every number **in** [0, 1 ]is completely normal .7. Let a be completely normal . Show that by look**in**g at the expansionof a **in** some scale we can rediscover the complete works of Shakespearefrom end to end without a s**in**gle mispr**in**t or **in**terruption . [This is Borel'sparadox .]*8 . Let X be an arbitrary r .v . with an absolutely cont**in**uous distribution .Prove that with probability one the fractional part of X is a normal number .[xmT : Let N be the set of normal numbers and consider ~P{X - [X] E N} .]9 . Prove that the set of real numbers **in** [0, 1 ] whose decimal expansionsdo not conta**in** the digit 2 is of measure zero . Deduce from this the existenceof two sets A and B both of measure zero such that every real number isrepresentable as a sum a + b with a E A, b E B .* 10 . Is the sum of two normal numbers, modulo 1, normal? Is the product?[HINT : Consider the differences between a fixed abnormal number and allnormal numbers : this is a set of probability one .]5 .2 Weak law of large numbersThe law of large numbers **in** the form (1) of Sec . 5 .1 **in**volves only the firstmoment, but so far we have operated with the second . In order to drop anyassumption on the second moment, we need a new device, that of "equivalentsequences", due to Kh**in**tch**in**e (1894-1959) .

5.2 WEAK LAW OF LARGENUMBERS 1 1 1 3DEFWUION . Two sequences of r .v .'s {X n } and {Y n } are said to be equivalentiff( 1 ) 1: °J'{Xn 0 Y n } < 00 .nIn practice, an equivalent sequence is obta**in**ed by "truncat**in**g" **in** variousways, as we shall see presently .Theorem 5 .2 .1 . If {Xn } and [Y,) are equivalent, thenE(X n - Yn ) converges a .e .nFurthermore if a nf oo, then(2)PROOF.j=1Y j ) -)~ 0 a .e .By the Borel-Cantelli lemma, (1) implies that"P7 {Xn 0 Y n i.0 .} = 0 .This means that there exists a null set N with the follow**in**g property :co E c2\N, then there exists no(co) such thatifn > no(w) , Xn (w) = Yn (CA) .Thus for such an co, the two numerical sequences {X, (c))} and {Y n (c ))) differonly **in** a f**in**ite number of terms (how many depend**in**g on co) . In other words,the seriesE(Xn (a)) - Yn (a)))nconsists of zeros from a certa**in** po**in**t on . Both assertions of the theorem aretrivial consequences of this fact .Corollary .With probability one, the expression~` 1 nL~Xn or - ann j=1converges, diverges to +oo or -oc, or fluctuates **in** the same way asE Yn17or

114 1 LAW OF LARGE NUMBERS . RANDOM SERIESrespectively . In particular, ifconverges to X **in** pr ., then so doesTo prove the last assertion of the corollary, observe that by Theorem 4 .1 .2the relation (2) holds also **in** pr . Hence ifthen we have(3)n(see Exercise 3 of Sec . 4 .1) .The next law of large numbers is due to Kh**in**tch**in**e . Under the strongerhypothesis of total **in**dependence, it will be proved aga**in** by an entirelydifferent method **in** Chapter 6 .Theorem 5 .2 .2 . Let {X„ } be pairwise **in**dependent and identically distributedr .v.'s with f**in**ite mean m . Then we havePROOF .n 1 nJ X i ~- -aJ=1 j=1 n .1=1Sn- -~ m **in** pr .nLet the common d .f. be F so that( -Xi)-+X+O=X **in**pr.00 00M = r(X n ) _JxdF(x), t (jX,,1) =Jlxi dF(x) < oc .00By Theorem 3 .2 .1 the f**in**iteness of (,,'(IX, 1) is equivalent tonMIX, I>n)

5.2 WEAK LAW OF LARGENUMBERS 1 1 1 5We **in**troduce a sequence of r .v .'s {Y n } by "truncat**in**g at n" :Yn (w)= Xn (w), if IXn (co) I < n ;0, if IXn (w) I > n .This is equivalent to {X n } by (4), s**in**ce GJ'(IXn I > n) = MXn :A Y n ) . LetT nnj=1By the corollary above, (3) will follow if (and only if) we can prove Tn /n --+m **in** pr. Now the Y n 's are also pairwise **in**dependent by Theorem 3 .3 .1(applied to each pair), hence they are uncorrelated, s**in**ce each, be**in**g bounded,has a f**in**ite second moment . Let us calculate U 2 (Tn ) ; we have by (6) ofSec . 5 .1,na 2 (T,) =x 2 dF(x)j=1IxI_

116 1 LAW OF LARGE NUMBERS . RANDOM SERIESconsequently, by (2) of Sec . 5 .1,Now it is clear that as n -+ oo, ""(Y n ) - (K (X) = m ; hence also112nj=1c (Y j) --)~ m .It follows thatT nnas was to be proved .For totally **in**dependent r.v .'s, necessary and sufficient conditions forthe weak law of large numbers **in** the most general formulation, due toKolmogorov and Feller, are known . The sufficiency of the follow**in**g criterionis easily proved, but we omit the proof of its necessity (cf. Gnedenko andKolmogorov [121) .Theorem 5 .2 .3 . Let {X n } be a sequence of **in**dependent r .v .'s with d .f.'sIF, J ; and S„ _ I ;=1 X j . Let {b n } be a given sequence of real numbers **in**creas**in**gto +oo .Suppose that we have0) I~=1 fxj>bn dF j (x) = 0(1),1(ii) b2 Ej=1 f XI

5.2 WEAK LAW OF LARGENUMBERS I 1 1 7Then if (6) holds for the given {b n } and any sequence of real numbers {a n },the conditions (i) and (ii) must hold .Remark .Condition (7) may be written as9{X n < 0} > A, GJ'{Xn > 0) >,k ;when ~. = 2 this means that 0 is a median for each X,, ; see Exercise 9 below .In general it ensures that none of the distribution is too far off center, and itis certa**in**ly satisfied if all F n are the same ; see also Exercise 11 below .It is possible to replace the a n **in** (5) byand ma**in**ta**in** (6) ; see Exercise 8 below .nxdFj(x)j=1 Ix I 1 and 1 bn ;J< n :and writeT nnj=1Then condition (i) may be written asn{Yn,jj=1XI} = O(1)iand it follows from Boole's **in**equality that{C(Y1(8) {Tn Sn } n, X 1) {Yn,j X1) = O(1) .Next, condition (ii) may be written asnj=1 j=1Y n,j(b nj=1from which it follows, s**in**ce {Y n , j , 1 < j < n} are **in**dependent r.v .'s :nnY i t,jYn,j1 2b nj=1 b " j=1( ( b,, - 0(l)/2n

118 I LAW OF LARGE NUMBERS . RANDOM SERIESHence as **in** (2) of Sec . 5 .1,(9)T" - (t (T, ) -~ 0bn**in** pr .It is clear (why?) that (8) and (9) together implyS**in**ceSn - (Tn) --* 0 **in** pr .bnf (T n) _d (Y ,,jj= j 1xdFj(x) = an ,(6) is proved .As an application of Theorem 5 .2 .3 we give an example where the weakbut not the strong law of large numbers holds .Example. Let {X n } be **in**dependent r .v .'s with a common d .f. F such thatn} = 9'{X, _ -n} = n2 log n'Cn=3,4, . . .,where c is the constantWe have then, for large values of n,1 03 ~ 12 n 3 n 2 log n) -IdF(x) = nk>nCk2 log ktiClog n '1 2 1 ck 2. n • x aT (x) _ -tiiXj n)n log n 'so that, s**in**ce X, and X,, have the same d .f .,IX,, I > n) _ "fl IX ,I > n} = oo .Hence by Theorem 4 .2 .4 (Borel-Cantelli),(' 1 11X, I > n i .o .) = 1 .

5 .2 WEAK LAW OF LARGENUMBERS I 1 19But IS,, -S,,- 1 I = IX, I > n implies IS, I > n12 or ISn _ 1 I > n/2 ; it follows thatIsn l > 2and so it is certa**in**ly false that S n /n --* 0 a .e . However, we can prove more . For anyA > 0, the same argument as before yieldsand consequently"Pfl IX n I > An i .o .) = 1I S n I > An2 1 .0 . = 1 .This means that for each A there is a null set Z(A) such that if wE S2\Z(A), then(10) lIM Snn->oo n(w) > 200Let Z = U;_ 1 Z(m) ; then Z is still a null set, and if w E S2\Z, (10) is true for everyA, and therefore the upper limit is +oo . S**in**ce X is "symmetric" **in** the obvious sense,it follows thatEXERCISESlim Sn = -oo, urn Sn = +oo a .e .n->oo n n -oonSnnj=11. For any sequence of r .v .'s {Xn }, and any p > 1 :X n -+ 0 a .e . = S" -+ 0 a .e .,nXn -+0**in**L P ==~ SnThe second result is false for p < 1 .- 0**in**LP0n2. Even for a sequence of **in**dependent r .v .'s {X J, },Xn - -)~ 0 **in** pr . St - 0 **in** pr .n[HINT : Let X, take the values 2" and 0 with probabilities n -1 and 1 - n -1 .]3. For any sequence {X,) :Sn -+ 0 **in** pr . X„ -* 0 **in** pr .nnMore generally, this is true if n is replaced by b n , where b,,+i/b„ -* 1 .

1 2 0 1 LAW OF LARGE NUMBERS . RANDOM SERIES*4. For any 8 > 0, we haveurn - P)n-k = 0lk-n p >nSuniformly **in** p : 0 < p < 1 .*5. Let .7T(X1 = 2n) = 1/2n, n > 1 ; and let {X n , n > l} be **in**dependentand identically distributed . Show that the weak law of large numbers does nothold for b„ = n ; namely, with this choice of b n no sequence {a n } exists forwhich (6) is true . [This is the St . Petersburg paradox, **in** which you w**in** 2n ifit takes n tosses of a co**in** to obta**in** a head . What would you consider as afair entry fee? and what is your mathematical expectation?]*6. Show on the contrary that a weak law of large numbers does hold forb n = n log n and f**in**d the correspond**in**g a n . [xirrr: Apply Theorem 5 .2 .3 .]7. Conditions (i) and (ii) **in** Theorem 5 .2.3 imply that for any 8 > 0,n~x 1>6b,dF j (x) = o(1)and that a„ = of nb„ ) .8. They also imply that1 nn j=1b j 2, J'{X>a}> 2 .Show that such a number always exists but need not be unique .* 10 . Let {X,, 1 < n < oo} be arbitrary r .v.'s and for each n let mn be amedian of X n . Prove that if Xn -a X,, **in** pr. and m,, is unique, then m n -±m om . Furthermore, if there exists any sequence of real numbers {c n } such thatX n - C n -> 0 **in** pr ., then X n - Mn ---> 0 **in** pr .11 . Derive the follow**in**g form of the weak law of large numbers fromTheorem 5 .2 .3 . Let {b„ } be as **in** Theorem 5 .2 .3 and put X, = 2b n for n > 1 .Then there exists {an } for which (6) holds but condition (i) does not .

5 .3 CONVERGENCE OF SERIES 1 12112. Theorem 5 .2 .2 may be slightly generalized as follows . Let IX,) bepairwise **in**dependent with a common d .f. F such that(i) x dF(x) = o(1), (ii) n fdF(x) = 0(1) ;Jx1 (p(n) i .o .} = 1, and so S, -- oo a .e . [HINT : Let N n denote the numberof k < n such that Xk < cp(n)/n . Use Chebyshev's **in**equality to estimatePIN, > n /2} and so conclude J'{S n > rp(n )/2} > 1 - 2F((p(n )/n ) . This problemwas proposed as a teaser and the rather unexpected solution was givenby Kesten .J14 . Let {b n } be as **in** Theorem 5 .2 .3 . and put X n = 2b„ for n > 1 . Thenthere exists {a n } for which (6) holds, but condition (i) does not hold. Thuscondition (7) cannot be omitted .IxI>n5 .3 Convergence of seriesIf the terms of an **in**f**in**ite series are **in**dependent r .v .'s, then it will be shown**in** Sec . 8 .1 that the probability of its convergence is either zero or one . Herewe shall establish a concrete criterion for the latter alternative . Not only isthe result a complete answer to the question of convergence of **in**dependentr .v .'s, but it yields also a satisfactory form of the strong law of large numbers .This theorem is due to Kolmogorov (1929) . We beg**in** with his two remarkable**in**equalities . The first is also very useful elsewhere ; the second may be circumvented(see Exercises 3 to 5 below), but it is given here **in** Kolmogorov'sorig**in**al form as an example of true virtuosity .Theorem 5.3 .1 . Let {X,, } be **in**dependent r .v .'s such thatVn :(-`(X,,) = 0, c`(XZ) = U 2 (Xn ) < oo .Then we have for every c > 0 :(1) max jS j I > F} < a 2 (Sn)1< j

1 22 1 LAW OF LARGE NUMBERS . RANDOM SERIESPROOF .let us def**in**eFix c > 0 . For any co **in** the setA = {co :max ISj (w)I > E},1< j -< nv(w) = m**in**{j : 1 < j < n, ISj(w)i > E) .Clearly v is an r.v . with doma**in** A .PutA k = {co : v(co) = k} _ {w : max ISj(w)I < E, ISk(w)I > E1,1

5 .3 CONVERGENCE OF SERIES 1 123where the last **in**equality is by the mean value theorem, s**in**ce I Sk I > E on A k bydef**in**ition . The theorem now follows upon divid**in**g the **in**equality above by E 2 .Theorem 5 .3 .2 . Let {Xn } be **in**dependent r .v.'s with f**in**ite means and supposethat there exists an A such that(3) Vn : IX,, - e(X n )I < A < oo .Then for every E > 0 we have(4) ~'{ max 13 j I< E} < ( + 4E)21 < j

(4E124 1LAW OF LARGE NUMBERS . RANDOM SERIES1 1la k - ak+1 1 = Sk dJ' - Sk d?,/19'(Mk) Mk '(Mk+l) fM117~(Mk+l )Mk+1Xk+1 dJ~ < 2e +A .It follows that, s**in**ce I Sk I < E on Qk+l,1 2 < fo(I SkI +,e + 2E +A +A)2 dJ' < (4+ 2A)2~(Qk+1 ) •k+1On the other hand, we have1l =fk{(Sk - ak) 2 + (ak - ak+1 ) 2 + `I'k+1 + 2(S'k - ak)(ak - ak+l)M+ 2(Sk - ak)Xk+1 + 2(ak - ak+1)Xk+1 } d i .The **in**tegrals of the last three terms all vanish by (5) and **in**dependence, hence11 ?Mk(Sk - ak) 2 d°J' .+ Xk+l dGJ'fM k= f (Sk - ak)2 d°J' + - (Mk)a 2 (Xk+l ) .MkSubstitut**in**g **in**to (6), and us**in**g M k M, we obta**in** for 0 < k < n - 1 :Mk+,(Sk+l - ak+1 ) 2 d~ - (Sk - ak ) 2Mkdam'> ~'(Mn)62(Xk+l) - (4E + 2A) 2J'(Q k+l ) •Summ**in**g over k and us**in**g (7) aga**in** for k = n :hencewhich is (4) .4E2 (Mn) > f (ISn I + E) 2 d~ > ' 2 JM„ , (Sn -a n ) dM„n> ~'(M")-j=1(2A + 46) 2 > 91(ml)nj=1+ 2A) 2 GJ'(Q\M n ),

5 .3 CONVERGENCE OF SERIES 1 1 2 5We can now prove the "three series theorem" of Kolmogorov (1929) .Theorem 5 .3 .3 . Let IX,,) be **in**dependent r .v .'s and def**in**e for a fixed constantA > 0 :n (w), ifY n IX,, (w) I < A(w) - X0, if IXn (w)I > A .Then the series >n X n converges a .e . if and only if the follow**in**g three seriesall converge :(i) E„ -OP{IXnI> A} _ En 9~1{Xn :A Yn},(11) E17 cr(Yll ),(iii) En o 2 (Yn ) .PROOF . Suppose that the three series converge . Apply**in**g Theorem 5 .3 .1to the sequence {Y n - d (Y,, )}, we have for every m > 1 :If we denote the probability on the left by /3 (m, n, n'), it follows from theconvergence of (iii) that for each m :lim lim _?(m, n, n') = 1 .n-aoo n'-ooThis means that the tail of >n {Y n - 't(Y ,1 )} converges to zero a .e ., so that theseries converges a .e . S**in**ce (ii) converges, so does En Y n . S**in**ce (i) converges,{X„ } and {Y n } are equivalent sequences ; hence En X„ also converges a .e . byTheorem 5 .2 .1 . We have therefore proved the "if' part of the theorem .Conversely, suppose that En X n converges a .e. Then for each A > 0 :° l'(IXn I> A i .o .) = 0 .It follows from the Borel-Cantelli lemma that the series (i) must converge .Hence as before E n Y,, also converges a .e . But s**in**ce I Y n - (r (Y n ) I< 2A, wehave by Theorem 5 .3 .2j=nmaxn

126 1 LAW OF LARGE NUMBERS . RANDOM SERIESbounded by 1, so the series could not converge . This contradiction proves that(iii) must converge . F**in**ally, consider the series En {Y n - (1"'(Y,)} and applythe proven part of the theorem to it . We have ?PJIY n - (5'(Y n )I > 2A) = 0 ande` (Yn - (`(Y,)) = 0 so that the two series correspond**in**g to (i) and (ii) with2A for A vanish identically, while (iii) has just been shown to converge . Itfollows from the sufficiency of the criterion that En {Y n - ([(Y„ )} convergesa .e ., and so by equivalence the same is true of En {X n - (r(Y n )} . S**in**ce E„ X nalso converges a .e. by hypothesis, we conclude by subtraction that the series(ii) converges . This completes the proof of .the theorem .The convergence of a series of r.v .'s is, of course, by def**in**ition the sameas its partial sums, **in** any sense of convergence discussed **in** Chapter 4 . Forseries of **in**dependent terms we have, however, the follow**in**g theorem due toPaul Levy .Theorem 5 .3 .4 . If {X,} is a sequence of **in**dependent r .v .'s, then the convergenceof the series E n X n **in** pr . is equivalent to its convergence a .e .PROOF . By Theorem 4 .1 .2, it is sufficient to prove that convergence ofEll X„ **in** pr . implies its convergence a.e. Suppose the former ; then, givenE : 0 < E < 1, there exists m 0 such that if n > m > m0 , we have(8) {Isrn,nI > E} < E,whereS**in**,_nj=m+1It is obvious that for **in** < k < n we haven(9) U { **in** axI S**in**, j I < 2E ; ISm, I > 2E ;I Sk,n I :!S E} C { IS**in**,,, I > E}M< J_k=m+ 1where the sets **in** the union are disjo**in**t . Go**in**g to probabilities and us**in**g**in**dependence, we obta**in**nk=m+ 1Jl max IS**in**, j 126 ;I S**in**, k l > 2E} J~'{ ISk,n I < E} E} .m< j 2E},n< j 2E} m**in** J/'{ I Sk,n I _< E} { I Sm,n I > 6}m< j

5 .3 CONVERGENCE OF SERIES 1 127This **in**equality is due to Ottaviani . By (8), the second factor on the left exceeds1 -c, hence if m > m 0 ,(10) GJ'{ max ISm,jl > 2E}< 1 1 E-Opt ISm,nI > E} < 1 E E .m< j oc, then m -± oo, and f**in**ally c --* 0 through a sequence ofvalues, we see that the triple limit of the first probability **in** (10) is equalto zero . This proves the convergence a .e . of >n X, by Exercise 8 of Sec . 4 .2 .It is worthwhile to observe the underly**in**g similarity between the **in**equalities(10) and (1) . Both give a "probability upper bound" for the maximum ofa number of summands **in** terms of the last summand as **in** (10), or its varianceas **in** (1) . The same pr**in**ciple will be used aga**in** later .Let us remark also that even the convergence of S, **in** d ist . i s equivalentto that **in** pr. or a.e . as just proved ; see Theorem 9 .5 .5 .We give some examples of convergence of series .Example . > n ±1/n .This is meant to be the "harmonic series" with a random choice of signs **in** eachterm, the choices be**in**g totally **in**dependent and equally likely to be + or - **in** eachcase . More precisely, it is the seriesnwhere {Xn , n > 1) is a sequence of **in**dependent, identically distributed r .v .'s tak**in**gthe values ±1 with probability z each .We may take A = 1 **in** Theorem 5 .3 .3 so that the two series (i) and (ii) vanish identically. S**in**ce u'(Xn ) = u 2 (Y n ) = 1/n'- , the series (iii) converges . Hence, 1:„ ±1/nconverges a .e . by the criterion above . The same conclusion applies to En ±1/n 8 if2 < 0 < 1 . Clearly there is no absolute convergence . For 0 < 0 < 1, the probabilityof convergence is zero by the same criterion and by Theorem 8 .1 .2 below .EXERCISES1 . Theorem 5 .3 .1 has the follow**in**g "one-sided" analogue . Under thesame hypotheses, we have[This is due to A . W . Marshall .]0 1 max S j > E} < (S" )1

1 2 8 1 LAW OF LARGE NUMBERS . RANDOM SERIES*2 . Let {X„ } be **in**dependent and identically distributed with mean 0 andvariance 1 . Then we have for every x :PJ max S j > x} < 2,91'{S n > x - 2n } .1

5 .4 STRONG LAW OF LARGENUMBERS 1 1297. For arbitrary {X,, }, ifE (-f' '(IX,, I) < 00,nthen E„ X n converges absolutely a .e .8. Let {Xn }, where n = 0, ±1, ±2, . . ., be **in**dependent and identicallydistributed accord**in**g to the normal distribution c (see Exercise 5 of Sec . 4 .5) .Then the series of complex-valued r .v .'s00 e**in**xX nn=1where i = and x is real, converges a .e. and uniformly **in** x . (This isWiener's representation of the Brownian motion process .)*9 . Let {X n } be **in**dependent and identically distributed, tak**in**g the values0 and 2 with probability 2 each ; then00 X nn=1converges a .e. Prove that the limit has the Cantor d .f. discussed **in** Sec . 1 .3 .Do Exercise 11 **in** that section aga**in** ; it is easier now .*10 . If E n ±X n converges a .e. for all choices of ±1, where the Xn 's arearbitrary r .v .'s, then >n Xn 2 converges a.e . [HINT : Consider >n rn (t)X, (w)where the rn 's are co**in**-toss**in**g r .v .'s and apply Fub**in**i's theorem to the spaceof (t, cw) .]3n5 .4 Strong law of large numbersTo return to the strong law of large numbers, the l**in**k is furnished by thefollow**in**g lemma on "summability" .Kronecker's lemma . Let {xk} be a sequence of real numbers, {ak} asequence of numbers > 0 and T oo . ThenPROOF .xn- < converges -n anJ=For 1 < n < oc letnnxj --* 0 .

1 3 0 1 LAW OF LARGE NUMBERS . RANDOM SERIESIf we also write a 0- 0, bo = 0, we havexn = an (bn - bn-1)and(Abel's method of partial summation) . S**in**ce aj+ l - a j > 0,n -. - bj_1) = b n - b j (aj+l - aj)j=0(aj+l - aj) = 1,and b„ --+ b,,,, we havexj-*b,,, -b,,~ b~=The lemma is proved .as Ixl(1)Now let cp be a positive, even, and cont**in**uous function on -R,'**in**creases,co(x)cp(x)T ,IxI x 2such thatTheorem 5 .4 .1 . Let {Xn } be a sequence of **in**dependent r .v.'s with ((Xn ) _0 for every n ; and 0 < a,, T oo . If ~o satisfies the conditions above and(2)thenEn(t(~O(Xn)) < 00,W(an)(3)nXn converges a .e .a,PROOF . Denote the d .f. Of Xn by Fn . Def**in**e for each n(4) Yn (w)_ f Xn(w), if IXn (w) I < an,0, if IXn (w) I >an .ThenQ)Y„ 2n a n nx 22 dF„(x) .fxj

5 .4 STRONG LAW OF LARGENUMBERS 1 131By the second hypothesis **in** (1), we haveIt follows that2X< 'p(x) for < Ix Ia n .a~(an n)n2Y n ~ Yn(P (X)Q - < < dF2n (x)ann a n n Ixl

1 3 2 1 LAW OF LARGE NUMBERS . RANDOM SERIESApply**in**g Kronecker's lemma to (3) for each c) **in** a set of probabilityone, we obta**in** the next result .Corollary .Under the hypotheses of the theorem, we have(6)1 na n j=1- * 0 a .e .Particular cases . (i) Let ~p(x) = Ix I P , I < p < 2 ; a n - n . Then we haven(7) nP ~(IXn I,) < 00 ~ 1 -+ 0 a .e .n j=1For p - 2, this is due to Kolmogorov ;Marc**in**kiewicz and Zygmund .(ii) Suppose for some 8, 0 < 8 < 1Vn : cO(IXn I 1+s ) < M .for 1 < p < 2, it is due toand M < oc we haveThen the hypothesis **in** (7) is clearly satisfied with p = 1 + 8 . This case isdue to Markov . Cantelli's theorem under total **in**dependence (Exercise 4 ofSec . 5 .1) is a special case .(iii) By proper choice of {a n }, we can considerably sharpen the conclusion(6) . Supposendn : Q 2 (Xn) = 6n < 00, a2 (Sn) = sn - Q~ -~ oo .Choose co(x) = x 2 and a n = Sn (log sJ 1 / 2) +E,Theorem 5 .4 .1 . Then((Xn )na 2n nby D**in**i's theorem, and consequently2Orn < 00sn (log Sn )1+2cE > 0, **in** the corollary toS nS n (log Sn ) (1/2)+E-+ 0 a .e .In case all on = 1 so that sn = n, the above ratio has a denom**in**ator that isclose to n 1 / 2 . Later we shall see that n 1 / 2 is a critical order of magnitudefor S n .

5.4 STRONG LAW OF LARGENUMBERS 1 133We now come to the strong version of Theorem 5 .2 .2 **in** the totally **in**dependentcase . This result is also due to Kolmogorov .Theorem 5 .4 .2 . Let {X n } be a sequence of **in**dependent and identically distributedr .v .'s . Then we have(8) c`(IX i I) < 00 ==~ Sn ((X 1 ) a .e .,n(9) (-r(1X11) = 00 =:~ lim is" In-+oo n= +oo a .e .PROOF . To prove (8) def**in**e {Y n } as **in** (4) with an = n . S**in**ceE -'-'P{Xn 0 Y,)nn0 P{1XnI > n} =E°'{IX1I > n} < 00nby Theorem 3 .2 .1, {Xn } and {Y n } are equivalent sequences . Let us apply (7)to {Y„ with ~p(x) = x2 . We have(10)n02 (Yn) "(I2n) 2n 2n n nnIxI

1 34 1 LAW OF LARGE NUMBERS . RANDOM SERIESClearly cr(Y,,) -+ (`(X 1 ) as n -+ oc ; hence alsof (Y 1 ) -+ (%'(X 1),and consequentlynj=1-- . e(X I ) a .e .By Theorem 5 .2.1, the left side above may be replaced by (1/n) I:;=1 X j ,prov**in**g (8) .To prove (9), we note that ~(IX 1 1) = oc implies AIX1 I/A) = oo for eachA > 0 and hence, by Theorem 3 .2 .1,G.1'(IX 1 I > An) = +oo .nS**in**ce the r .v .'s are identically distributed, it follows thatE ~(lXn I > An) = + oo .nNow the argument **in** the example at the end of Sec . 5.2 may be repeatedwithout any change to establish the conclusion of (9) .Let us remark that the first part of the preced**in**g theorem is a special caseof G . D . Birkhoff's ergodic theorem, but it was discovered a little earlier andthe proof is substantially simpler .N. Etemadi proved an unexpected generalization of Theorem 5 .4 .2 : (8)is true when the (total) **in**dependence of {Xn } is weakened to pairwise**in**dependence (An elementary proof of the strong law of large numbers,Z. Wahrsche**in**lichkeitstheorie 55 (1981), 119-122) .Here is an **in**terest**in**g extension of the law of large numbers when themean is **in**f**in**ite, due to Feller (1946) .Theorem 5 .4 .3 . Let {X n } be as **in** Theorem 5 .4 .2 with ct (I X 11) = oc . Let{a„ } be a sequence of positive numbers satisfy**in**g the condition an /n T . Thenwe have(11) lim is" I n an= 0 a .e ., or= oo a .e .accord**in**g as(12) t'{1X n I > a n } - dF(x) < oo, or = oc .nnPROOF . Writ**in**gccIXI >ank=nJak < Ixl

5 .4 STRONG LAW OF LARGENUMBERS 1 135substitut**in**g **in**to (12) and rearrang**in**g the double series, we see that the series**in** (12) converges if and only if(13) 1: dF(x) < oo .k k [ k_1

136 1 LAW OF LARGE NUMBERS . RANDOM SERIESWe now estimate the quantity(16)n11k = --xdF(x)a " k=1 lIk=1JxI

5.4 STRONG LAW OF LARGENUMBERS 1 1373. Let {X„ } be **in**dependent and identically distributed r .v .'s such thatc`(IX1IP) < od for some p : 0 < p < 2 ; **in** case p > 1, we assume also thatC`(X 1 ) = 0 . Then S„n-('In)_E --* 0 a .e . For p = 1 the result is weaker thanTheorem 5 .4 .2 .4. Both Theorem 5 .4 .1 and its complement **in** Exercise 2 above are "bestpossible" **in** the follow**in**g sense . Let {a, 1 ) and (p be as before and suppose thatb11 > 0,b n~T(all) -n._ ~Then there exists a sequence of **in**dependent and identically distributed r .v .'s{Xn } such that cr-(Xn ) = 0, (r(Cp(Xn )) = b, andXna nconverges = 0 .[HINT : Make En'{ JX, I > an } = oo by lett**in**g each X, take two or threevalues only accord**in**g as bn /Wp(an ) _< I or >1 .15. Let Xn take the values ±n o with probability 1 each . If 0 < 0 2 ? [HINT : To answer the question, useTheorem 5 .2 .3 or Exercise 12 of Sec . 5 .2 ; an alternative method is to considerthe characteristic function of S 71 /n (see Chapter 6) .]6. Let t(X 1 ) = 0 and {c, 1 } be a bounded sequence of real numbers . Thenj --}0 a . e .[HINT : Truncate X„ at n and proceed as **in** Theorem 5 .4 .2 .]7. We have S„ /n --). 0 a .e. if and only if the follow**in**g two conditionsare satisfied :(i) S 11 /n -+ 0 **in** pr .,(ii) S2"/2" -± 0 a .e . ;an alternative set of conditions is (i) and(ii') VC > 0 : En JP(1S 2 ,'+1- S 2 ,' ( > 211 E < cc .*8 . If ("(`X11) < oo, then the sequence {S,1 /n } is uniformly **in**tegrable andS n /n -* ( (X 1 ) **in** L 1 as well as a .e .

1 38 I LAW OF LARGE NUMBERS . RANDOM SERIES9. Construct an example where i(X i) = + c and S„ /n -+00 a.e . [HINT : Let 0 < a < fi < 1 and take a d .f. F such that I - F(x) x-"as x --+ 00, and f °0, IxI ~ dF(x) < 00 . Show thatn/'( max X1 +00has been given recently by K . B . Erickson .10. Suppose there exist an a, 0 < a < 2, a ; 1, and two constants A 1and A2 such thatbn, b'x > 0 :X" < J'{IXnI > x} < A2If a > 1, suppose also that (1'(X n ) = 0 for each n . Then for any sequence {a n }**in**creas**in**g to **in**f**in**ity, we havez'{ Is, I > an i-0-1 _ 5 0 1if~ 1 ' aa„ -n[This result, due to P . Levy and Marc**in**kiewicz, was stated with a superfluouscondition on {a„} . Proceed as **in** Theorem 5 .3 .3 but truncate Xn at an ; directestimates are easy .]11. Prove the second alternative **in** Theorem 5 .4 .3 .12 . If c~(X 1 ) ; 0, then maxi n } < oc .]5.5 ApplicationsThe law of large numbers has numerous applications **in** all parts of probabilitytheory and **in** other related fields such as comb**in**atorial analysis andstatistics . We shall illustrate this by two examples **in**volv**in**g certa**in** importantnew concepts .The first deals with so-called "empiric distributions" **in** sampl**in**g theory .Let {X„, n > l} be a sequence of **in**dependent, identically distributed r .v .'swith the common d .f. F . This is sometimes referred to as the "underly**in**g"or "theoretical distribution" and is regarded as "unknown" **in** statistical l**in**go .For each w, the values X,(&)) are called "samples" or "observed values", andthe idea is to get some **in**formation on F by look**in**g at the samples . For each

is an r cv) as follows co) (o) is called and F( the empiric distributionwhich ranges over the whole real l**in**e, how much better it would be to makea global statement about the functions Fn ( •, •) . We shall do this **in**5.5 APPLICATIONS 1 1 39n, and each cv E S2, let the n real numbers {X j (co), 1 _< j < n } be arranged **in****in**creas**in**g order as(1)Yn1(w) < Yn2((0) < . . . < Ynn(w)-Now def**in**e a discrete d .f. Fn ( •,:Fn (x, cv) = 0, if x < Yn1(w),kF, (x, cv) =nif Ynk(co) < X < Yn,k+l(w), 1 < k Y„n (c)) .In other words, for each x, nFn (x, c)) is the number of values of j, 1 < j < n,for which X j (co) < x ; or aga**in** Fn (x, w) is the observed frequency of samplevalues not exceed**in**g x . The function Fn ( •,function based on n samples from F .For each x, Fn (x, •).v ., as we can easily check . Let us **in**troducealso the **in**dicator r .v .'s {~j(x), j > 1) as follows :if X j (c)) < x,~1(x, w) = 1 0 if X j (co) > x .We have thennFn (x, co) = - ~ j (x, cv) .j=1For each x, the sequence {~j(x)} is totally **in**dependent s**in**ce {X j} is, byTheorem 3 .3 .1 . Furthermore they have the common "Bernoullian distribution",tak**in**g the values 1 and 0 with probabilities p and q = 1 - p, wherep = F(x), q = 1 - F(x) ;thus c_'(~ j (x)) = F (x) . The strong law of large numbers **in** the formTheorem 5 .1 .2 or 5 .4.2 applies, and we conclude that(2) F,, (x, c)) --) . F(x) a .e .Matters end here if we are **in**terested only **in** a particular value of x, or a f**in**itenumber of values, but s**in**ce both members **in** (2) conta**in** the parameter x,the theorem below . Observe first the precise mean**in**g of (2) : for each x, thereexists a null set N(x) such that (2) holds for co E Q\N(x) . It follows that (2)also holds simultaneously for all x **in** any given countable set Q, such as the

1 4 0 1 LAW OF LARGE NUMBERS . RANDOM SERIESset of rational numbers, for co E Q\N, whereN= UN(x)XEQis aga**in** a null set . Hence by the def**in**ition of vague convergence **in** Sec . 4 .4,we can already assert thatF,(-,w)4 F( •) for a .e . co .This will be further strengthened **in** two ways : convergence for all x anduniformity . The result is due to Glivenko and Cantelli .Theorem 5 .5 .1 .We have as n -± ocsup IF, (x, co) - F(x)I -3 0 a .e .-00

5 .5 APPLICATIONS 1 1 4 1PROOF . Suppose the contrary, then there exist E > 0, a sequence {nk} of**in**tegers tend**in**g to **in**f**in**ity, and a sequence {xk} **in** ~~1 such that for all k :(5) IFnk(xk) - F(xk)I > E > 0 .This is clearly impossible if xk -+ +oc or Xk -+ -oo . Exclud**in**g these cases,we may suppose by tak**in**g a subsequence that xk -~ ~ E~:-4 1 . Now considerfour possible cases and the respective **in**equalities below, valid for all sufficientlylarge k, where r 1 E Q, r2 E Q, r l < 4 < r2 .Case 1 . xk t 4, xk < i :• < Fnk (xk) - F(xk) < Fnk (4- ) - F(rl )< Fnk (4-) - Fn k (44) + Fnk (r2) - F(r2) + F(r2) - F(rl ) .Case 2 . xk t 4, xk < 4 :• < F(Xk) - Fnk(xk) < F(4- ) - Fnk(rl)= F(4-) - F(rl) + F(r1) - F,,, (rl ) .Case 3 . xk ~ 4, xk > 4• < F(xk) - Fnk(xk) < F(r2) - Fnk(4)< F(r2) - F(r1) + F(r1) - F nk (rl ) + F nk (4-) - F nk (4) .Case 4 . xk ,j, 4, xk > 4• < F„k (xk) - F(xk) < Fn k(r2) - F(4)= Fn k (r2) - Fn k (ri ) + F„ k (r j ) - F(ri) + F(ri) - F(4) .In each case let first k -a oo, then r1 T 4, r2 ~ 4 ; then the last member ofeach cha**in** of **in**equalities does not exceed a quantity which tends to 0 and acontradiction is obta**in**ed .Remark . The reader will do well to observe the way the proof aboveis arranged . Hav**in**g chosen a set of c) with probability one, for each fixedc) **in** this set we reason with the correspond**in**g sample functions F„ ( ., (0)and F( ., co) without further **in**tervention of probability . Such a procedure isstandard **in** the theory of stochastic processes .Our next application is to renewal theory . Let IX,,, n > 1 } aga**in** be asequence of **in**dependent and identically distributed r .v .'s . We shall furtherassume that they are positive, although this hypothesis can be dropped, andthat they are not identically zero a .e. It follows that the common mean is

142 1 LAW OF LARGE NUMBERS . RANDOM SERIESstrictly positive but may be +oo . Now the successive r .v .'s are **in**terpreted as"lifespans" of certa**in** objects undergo**in**g a process of renewal, or the "returnperiods" of certa**in** recurrent phenomena . Typical examples are the ages of asuccession of liv**in**g be**in**gs and the durations of a sequence of services . Thisraises theoretical as well as practical questions such as : given an epoch **in**time, how many renewals have there been before it? how long ago was thelast renewal? how soon will the next be?Let us consider the first question . Given the epoch t > 0, let N(t, w)be the number of renewals up to and **in**clud**in**g the time t . It is clear thatwe have(6) {w : N(t, w) = n } = {w : S„ (w) < t < S„+1(w)}valid for n > 0, provided So =- 0 . Summ**in**g over n < m - 1, we obta**in**(7) {w : N(t, (o) < m} = {w : Sm (w) > t} .This shows **in** particular that for each t > 0, N(t) = N(t, .) is a discrete r .v .whose range is the set of all natural numbers . The family of r .v .'s {N(t)}**in**dexed by t E [0, oo) may be called a renewal process . If the common distributionF of the X„'s is the exponential F(x) = 1 - e- , x > 0 ; where a. > 0,then {N(t), t > 0} is just the simple Poisson process with parameter A .Let us prove first that(8) lim N(t) = +oo a .e .,t-> 00namely that the total number of renewals becomes **in**f**in**ite with time . This isalmost obvious, but the proof follows . S**in**ce N(t, w) **in**creases with t, the limit**in** (8) certa**in**ly exists for every w . Were it f**in**ite on a set of strictly positiveprobability, there would exist an **in**teger M such thatThis implies by (7) that.P{ sup N(t, w) < M} >0 .0

5 .5 APPLICATIONS 1 1 43null set Z 1such thatYw E 2\Z1 : lim X1((0) + . . . + X n (w) = m .n-aoonWe have just proved that there exists a null set Z2 such thatVw E S2\Z2 : lim N(t, w) = +oo .t-+00Now for each fixed coo , if the numerical sequence {an (coo ), n > 1 } convergesto a f**in**ite (or **in**f**in**ite) limit m and at the same time the numerical function{N(t, wo), 0 < t < oo} tends to +oo as t -~ +oo, then the very def**in**ition of alimit implies that the numerical function {aN(t,uo)(wo), 0 < t < oo} convergesto the limit m as t -+ +oo . Apply**in**g this trivial but fundamental observation tonj=1for each w **in** Q\(Z 1 U Z 2 ), we conclude thatSN(t,W)(w)(10) tli~N(t w)= m a.e .By the def**in**ition of NQ, w), the numerator on the left side should be close tot ; this will be confirmed and strengthened **in** the follow**in**g theorem .Theorem 5 .5 .2 .(11)andWe haveN(t) 1lim = - a .e .t-* oo t mIN (t)} _ 1lim -,t->oo t mboth be**in**g true even if m = +oo, provided we take 1 /m to be 0 **in** that case .PROOF . It follows from (6) that for every w :SN(t,w) (w) - 0,SN(t, w ) (w) < t

1 44 1 LAW OF LARGE NUMBERS . RANDOM SERIESThe deduction of (12) from (11) is more tricky than might have beenthought . S**in**ce X„ is not zero a .e ., there exists 8 > 0 such thatYn :A{X„>8)=p>0 .Def**in**eX, (0)) _ f8, if X„ (w) > 8 ;0, if X„ (w) < 8 ;and let S ;, and N' (t) be the correspond**in**g quantities for the sequence {X ;, , n >11 . It is obvious that Sn < S, and N'(t) > N(t) for each t . S**in**ce the r .v .'s{X;, /8} are **in**dependent with a Bernoullian distribution, elementary computations(see Exercise 7 below) show thatf {N'(t) 2 }t 2= 0 (g2as t -) . oo .Hence we have, 8 be**in**g fixed,~N(t)) 2 < f(N r ~t)) 2 = O(1) .S**in**ce (11) implies the convergence of N(t)/t **in** distribution to 8 1/,,,, an applicationof Theorem 4 .5 .2 with X, = N(n)/n and p = 2 yields (12) with treplaced by n **in** (12), from which (12) itself follows at once .Corollary . For each t, f{N(t)} < oo .An **in**terest**in**g relation suggested by (10) is that f{SN (, ) } should be closeto m f {N(t)} when t is large . The precise result is as follows :f {X 1 + . .. + XN(r)+1 } = f`{X 1) f{N(t) + 1 1This is a strik**in**g generalization of the additivity of expectations when thenumber of terms as well as the summands is "random ." This follows from thefollow**in**g more general result, known as "Wald's equation" .Theorem 5.5 .3 . Let {X„ , n > 1 } be a sequence of **in**dependent and identicallydistributed r .v .'s with f**in**ite mean . For k > 1 let A, 1 < k < oo, be theBorel field generated by {X j , 1 < j < k} . Suppose that N is an r .v . tak**in**gpositive **in**teger values such that(13) Vk>1 :{N

5 .5 APPLICATIONS 1 1 45PROOF.S**in**ce So = 0 as usual, we have00(14) (r(SN) - f SNd~'= Skd9'=fX j dTk=1 (N=k) k=1 j=1 {N=k}00 00= ~7 E JXjdOP= X j dJTj=1 k=j {N k} fj=1 {N'j }Now the set {N < j - 1) and the r .v . X j are **in**dependent, hence the last written**in**tegral is equal to (t(X j ) 7{N < j - 1} . Substitut**in**g this **in**to the above, weobta**in**cc00( (SN) = (-F(Xj)z~-1{N > j} _ Y(X1) 9{N > j} = ((X1)cP(N),j=1 j=1the last equation by the corollary to Theorem 3 .2 .1 .It rema**in**s to justify the **in**terchange of summations **in** (14), which isessential here . This is done by replac**in**g X j with jX j I and obta**in****in**g as beforethat the repeated sum is equal to c (IX11)(-P(N) < cc .We leave it to the reader to verify condition (13) for the r .v . N(t) + 1above . Such an r .v . will be called "optional" **in** Chapter 8 and will play animportant role there .Our last example is a noted triumph of the ideas of probability theoryapplied to classical analysis . It is S . Bernste**in**'s proof of Weierstrass' theoremon the approximation of cont**in**uous functions by polynomials .Theorem 5 .5 .4 . Let f be a cont**in**uous function on [0, 1], and def**in**e theBernste**in** polynomials {p„) as follows :pn (x) _00(--,- (Xj)-L Xj dA .j=1 {N

146 1 LAW OF LARGE NUMBERS . RANDOM SERIESand let S„thatso that_ Ek"-1 X, as usual . We know from elementary probability theory{Sn = k} =(1 _ x) n-k , 08 < 1 -a2 S n nx(1 - x) < 1n - 62 n 8 2n 2 - 48 2nThis is noth**in**g but Chebyshev's proof of Bernoulli's theorem . Hence if n >IIf II/8 2E, we get I p, (x) - f (x)I < E **in** (16) . This proves the uniformity .

5 .5 APPLICATIONS 1 1 4 7One should remark not only on the lucidity of the above derivation butalso the mean**in**gful construction of the approximat**in**g polynomials . Similarmethods can be used to establish a number of well-known analytical resultswith relative ease ; see Exercise 11 below for another example .EXERCISES{X„ } is a sequence of **in**dependent and identically distributed r .v .'s ;S,nj=11. Show that equality can hold somewhere **in** (1) with strictly positiveprobability if and only if the discrete part of F does not vanish .*2 . Let F n and F be as **in** Theorem 5 .5 .1 ; then the distribution ofsupIF n (x, co) - F(x)-00 0, then for every t > 0 andr > 0 we have v(t) > n } < An for some k < 1 and all large n ; consequently{v(t)r} < oo. This implies the corollary of Theorem 5 .5 .2 without recourseto the law of large numbers . [This is Charles Ste**in**'s theorem .]*7. Consider the special case of renewal where the r .v .'s are Bernoulliantak**in**g the values 1 and 0 with probabilities p and 1 - p, where 0 < p < 1 .

1 4 8 1 LAW OF LARGE NUMBERS . RANDOM SERIESF**in**d explicitly the d .f . of v(O) as def**in**ed **in** Exercise 6, and hence of v(t)for every t > 0 . F**in**d [[{v(t)} and (, {v(t) 2 ) . Relate v(t, w) to the N(t, w) **in**Theorem 5 .5 .2 and so calculate (1{N(t)} and "{N(t)2} .8. Theorem 5 .5 .3 rema**in**s true if (r'(X1) is def**in**ed, possibly +00 or -oo .*9 . In Exercise 7, f**in**d the d .f. of X„ ( , ) for a given t . (`'{X„(,)} is the meanlifespan of the object liv**in**g at the epoch t ; should it not be the same as C`{X 1 },the mean lifespan of the given species? [This is one of the best examples ofthe use or misuse of **in**tuition **in** probability theory .]10 . Let r be a positive **in**teger-valued r .v . that is **in**dependent of the X, 's .Suppose that both r and X 1 have f**in**ite second moments, thena2(St)= (q r)a2 (X1)+u2 (r)(U(X1))2 .Then* 11 . Let f be cont**in**uous and belong to L ' (0, oo) for some r > 1, and00(t) dt .f (x) =g(A) _ fe -Xt f(- 1)n-1nnooo - ( n - 1)! C x /0nl (n-1) (ngwhere g ( n -1) is the (n - 1)st derivative of g, uniformly **in** every f**in**ite **in**terval .[xarrr : Let ) > 0, T {X 1 (a.) < t } = 1 - e - ~. . Then`xL {f (Sn(1))) _ [(-1)n-1/(n _ l)!]A .ng(n-1) ( .)and S, (n /x) -a x **in** pr . This is a somewhat easier version of Widder's **in**versionformula for Laplace transforms .]12. Let .{X 1 = k} = p k , 1 < k < f, ik-1 pk = 1 . Let N(n, w) be thenumber of values of j, 1 < j < n, for which X1 = k andH( ) = n, w ljPPk (n w) .k=1'Prove thatlim I log jj(n, w) exists a .e .n + oc nand f**in**d the limit . [This is from **in**formation theory .]Bibliographical NoteBorel's theorem on normal numbers, as well as the Borel-Cantelli lemmaSecs . 4 .2-4 .3, is conta**in**ed **in****in**

5 .5 APPLICATIONS 1 149Smile Borel, Sur les probabilites denombrables et leurs applications arithmetiques,Rend . Circ . Mat . Palermo 26 (1909), 247-271 .This pioneer**in**g paper, despite some serious gaps (see Frechet [9] for comments),is well worth read**in**g for its historical **in**terest . In Borel's Jubile Selecta (Gauthier-Villars, 1940), it is followed by a commentary by Paul Levy with a bibliography onlater developments .Every serious student of probability theory should read :A . N . Kolmogoroff, Ober die Summen durch den Zufall bestimmten unabhangigerGrossen, Math . Annalen 99 (1928), 309-319 ; Bermerkungen, 102 (1929),484-488 .This conta**in**s Theorems 5 .3 .1 to 5 .3 .3 as well as the orig**in**al version of Theorem 5 .2 .3 .For all convergence questions regard**in**g sums of **in**dependent r.v .'s, the deepeststudy is given **in** Chapter 6 of Levy's book [11] . After three decades, this book rema**in**sa source of **in**spiration .Theorem 5 .5 .2 is taken fromJ . L. Doob, Renewal theory from the po**in**t of view ofprobability, Trans. Am. Math .Soc . 63 (1942), 422-438 .Feller's book [13], both volumes, conta**in**s an **in**troduction to renewal theory as wellas some of its latest developments .

6Characteristic function6 .1 General properties ; convolutionsAn important tool **in** the study of r.v .'s and their p .m .'s or d .f .'s is the characteristicfunction (ch .f .) . For any r .v . X with the p .m . 11. and d .f . F, this isdef**in**ed to be the function f on :T . 1 as follows, Vt E R1 :_ f= µ(dx) = e dF(x) .tc itX(1) f O = ( (e )eitX(m) .1)v(dw)eitxirx00The equality of the third and fourth terms above is a consequence ofTheorem 3 .32 .2, while the rest is by def**in**ition and notation . We rem**in**d thereader that the last term **in** (1) is def**in**ed to be the one preced**in**g it, where theone-to-one correspondence between µ and F is discussed **in** Sec . 2 .2 . We shalluse both of them below . Let us also po**in**t out, s**in**ce our general discussionof **in**tegrals has been conf**in**ed to the real doma**in**, that f is a complex-valuedfunction of the real variable t, whose real and imag**in**ary parts are givenrespectively byR f (t) = fcosxtic(dx), I f (t) = fs**in** xtµ(dx) .

6.1 GENERAL PROPERTIES; CONVOLUTIONS 1 1 5 1Here and hereafter, **in**tegrals without an **in**dicated doma**in** of **in**tegration aretaken over W. 1 .Clearly, the ch .f. is a creature associated with µ or F, not with X, butthe first equation **in** (1) will prove very convenient for us, see e .g . (iii) and (v)below . In analysis, the ch .f . is known as the Fourier-Stieltjes transform of µor F . It can also be def**in**ed over a wider class of µ or F and, furthermore,be considered as a function of a complex variable t, under certa**in** conditionsthat ensure the existence of the **in**tegrals **in** (1) . This extension is important **in**some applications, but we will not need it here except **in** Theorem 6 .6 .5 . Asspecified above, it is always well def**in**ed (for all real t) and has the follow**in**gsimple properties .(i) `dt E iw 1If WI < 1 = f (0), f (-t) = f (t),where z denotes the conjugate complex of z .(ii) f is uniformly cont**in**uous **in** R1 .To see this, we write for real t and h :f (t + h) - f (t) = f (e i(t+b)X - e itX )l-t(dx),I f (t + h) - f (t)I < f Ieitx IIe n,X - l IA(dx) =fIe''ix - 1I µ(dx) .The last **in**tegrand is bounded by 2 and tends to 0 as h --* 0, for each x . Hence,the **in**tegral converges to 0 by bounded convergence . S**in**ce it does not **in**volvet, the convergence is surely uniform with respect to t .(iii) If we write fx for the ch .f. of X, then for any real numbers a andb, we havefax+b(t)= fx (at)e't bf-x(t) = fx(t) .This is easily seen from the first equation **in**(1), fort ( e it(aX+b)) = t~( e i(ta)X e itb) = c ( e i(ta)X) e itb(iv) If If,, n > 1 } are ch .f.'s, X n > 0, T11=1 X,, = 1, then00~~,nfnn=1is a ch .f. Briefly : a convex comb**in**ation of ch .f .'s is a ch .f .

1 5 2 1 CHARACTERISTIC FUNCTIONFor if {µ„ , n > 1 } are the correspond**in**g p .m.'s, then E n=1 ;~ n µ n is a p .m .whose ch .f. is n=1 4 f n(v) If {fj , 1 < j < n) are ch .f.'s, thennis a ch .f.By Theorem 3 .3 .4, there exist **in**dependent r .v .'s {Xj , 1 < j < n} withprobability distributions {µ j, 1 < j < n}, where µ j is as **in** (iv) . Lett**in**gS nj=1we have by the corollary to Theorem 3 .3 .3 :or **in** the notation of (iii) :(2)(3)and written as(t(eitsn) _ s' (fjeitxiS(e itxJ) = f j (t) ;= 11 11n(j=1 j=1 j=1fsnn= J fx,fj=1(For an extension to an **in**f**in**ite number of f j 's see Exercise 4 below .)The ch .f. of S n be**in**g so neatly expressed **in** terms of the ch .f.'s of thesummands, we may wonder about the d .f. Of Sn . We need the follow**in**gdef**in**itions .DEFINTT1oN . The convolution of two d .f .'s F 1 and F2 is def**in**ed to be thed .f. F such thatVx E J. 1 : F(x) =J700F=F 1 *F 2 .F, (x - y) dF2(y),It is easy to verify that F is **in**deed a d .f . The other basic properties ofconvolution are consequences of the follow**in**g theorem .Theorem 6 .1 .1 . Let X 1 and X2 be **in**dependent r .v .'s with d .f .'s F 1 and F 2 ,respectively . Then X1 + X2 has the d .f . F1 * F2 .n

6.1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1 53PROOF .We wish to show that(4) Vx : ,ql{Xi + X2 < x) = (F 1 * F2)(x) .For this purpose we def**in**e a function f of (x1, x2 ) as follows, for fixed x :_ 1, if x1 + x2 < x ;f (x1, x2) 0, otherwise .f is a Borel measurable function of two variables . By Theorem 3 .3 .3 andus**in**g the notation of the second proof of Theorem 3 .3 .3, we havef f(X1, X2)d?T = fff(xi,x2)92µ 2 (dxi,dx2)= f 112 (dx2) f (XI, x2µl (dxl )= f 1L2(dx2) f µl (dxl )(-oo,x- X21= f00dF2(x2)F1(x - X2)-This reduces to (4) . The second equation above, evaluat**in**g the double **in**tegralby an iterated one, is an application of Fub**in**i's theorem (see Sec . 3 .3) .Corollary . The b**in**ary operation of convolution * is commutative and associative.For the correspond**in**g b**in**ary operation of addition of **in**dependent r .v .'shas these two properties .DEFINITION . The convolution of two probability density functions P1 andP2 is def**in**ed to be the probability density function p such that(5)and written asdx E R,' :P(x) ='COOP1(x - Y)P2(), ) dy,P=P1*P2 •We leave it to the reader to verify that p is **in**deed a density, but we willspell out the follow**in**g connection .Theorem 6 .1 .2. The convolution of two absolutely cont**in**uous d .f .'s withdensities p1 and P2 is absolutely cont**in**uous with density P1 * P2 .

1 54 1 CHARACTERISTIC FUNCTIONPROOF . We have by Fub**in**i's theorem :X X 00Jf p(u) du = duf f00 00 00P1 (U - v)p2(v) dv0000~ -oo(u v)dp2(v)dv[TO0 [f00 Pi_u]F1(x - v)p2(v) A= foo F, (x - v) dF2(v) = (F1 * F'2)(x) •00This shows that p is a density of F1 * F2 .What is the p.m ., to be denoted by Al * µ2, that corresponds to F 1 * F 2 ?For arbitrary subsets A and B of R 1 , we denote their vector sum and differenceby A + B and A - B, respectively :(6) A±B={x±y :xEA,yEB} ;and write x ± B for {x} ± B, -B for 0 - B .confus**in**g A - B with A\B .There should be no danger ofTheorem 6 .1 .3 .For each B E .A, we have(7) (A1 * /x2)(B) = f~~ AIM - Y)A2(dY) .For each Borel measurable function g that is **in**tegrable with respect to µ1 *,a2,we have(8) f g(u)(A] * 92)(du) = f fg(x + Y)µ] (dx)/i2(dy) .PROOF . It is easy to verify that the set function (Al * ADO def**in**ed by(7) is a p .m . To show that its d .f. is F1 * F2, we need only verify that its valuefor B = (- oo, x] is given by the F(x) def**in**ed **in** (3) . This is obvious, s**in**cethe right side of (7) then becomesf F, (x - Y)A2(dy) = fF, (x - Y) dF 2(Y)-Now let g be the **in**dicator of the set B, then for each y, the function g y def**in**edby g y (x) = g(x + y) is the **in**dicator of the set B - y . Hencefg(x + Y)It1(dx) = /t, (B - y)I

6 .1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1 5 5and, substitut**in**g **in**to the right side of (8), we see that it reduces to (7) **in** thiscase . The general case is proved **in** the usual way by first consider**in**g simplefunctions g and then pass**in**g to the limit for **in**tegrable functions .As an **in**structive example, let us calculate the ch .f. of the convolution/11 * g2 . We have by (8)f e itu (A] * g2)(du) = fe ity e uX gl(dx)g2(dy)= f e irz g1(dx) f e ityg2(dy) .This is as it should be by (v), s**in**ce the first term above is the ch .f. of X + Y,where X and Y are **in**dependent with it, and g2 as p.m .'s . Let us restate theresults, after an obvious **in**duction, as follows .Theorem 6 .1 .4 . Addition of (a f**in**ite number of) **in**dependent r .v .'s correspondsto convolution of their d .f .'s and multiplication of their ch .f.'s .Corollary . If f is a ch .f., then so is If 12 .To prove the corollary, let X have the ch .f . f . Then there exists on someQ (why?) an r .v. Y **in**dependent of X and hav**in**g the same d .f., and so alsothe same ch .f . f . The ch .f. of X - Y is( eit(X-Y) ) = (eitX)cr(e-itY) = (t ) cP12 .f (t)f (-t) = I fThe technique of consider**in**g X - Y and I f 12 **in**stead of X and f willbe used below and referred to as "symmetrization" (see the end of Sec . 6 .2) .This is often expedient, s**in**ce a real and particularly a positive-valued ch .f .such as I f (2 is easier to handle than a general one .Let us list a few well-known ch .f.'s together with their d .f.'s or p .d .'s(probability densities), the last be**in**g given **in** the **in**terval outside of whichthey vanish .(1) Po**in**t mass at a :d .f . 8 Q ; ch .f. e`a t .(2) Symmetric Bernoullian distribution with mass 1 each at +1 and -1 :d .f. (8 1 +6- 1 ) ; ch .f . cost .1(3) Bernoullian distribution with "success probability" p, and q = I - p :d .f . q8o + p8 1 ; ch .f. q + pe' t = 1 + p(e' t - 1) .

1 56 1 CHARACTERISTIC FUNCTION(4) B**in**omial distribution for n trials with success probability p :nd .f . Y:pkgn-k8k ;ch .f. (q + pe ) n itk=o (n(5) Geometric distribution with success probability p :00d .f. qn p6n ; ch .f. p(l - qe` t )-1n =O(6) Poisson distribution with (mean) parameter ). :00~. nd .f.Sn ; ch . f. e" e 1)n .n=0(7) Exponential distribution with mean ), -1 :p .d . ke-~ **in** [0, oc) ; ch .f. (1 - A -1 it) -1 .(8) Uniform distribution **in** [-a, +a] :p .d .I **in** [-a, a] ; ch .f . s**in**2a(9) Triangular distribution **in** [-a, a] :att (- 1 for t = 0) .p .d .(10) Reciprocal of (9) :a - Ixl . 2(1 - cosat)**in** [-a, a] ; ch .f.a2a2 t 2at2 /1 -cosax / Itlp .d . i n (-oo, oo) ; ch .f. f l - - v 0 .7rax 2a(11) Normal distribution N(m, a2 ) with mean m and variance a21 (x-m)2p .d . exp - **in** (-oc, oo) ;1/27ra2Q2ch .f . exp Cimt -Q 2 t 2Unit normal distribution N(0, 1) = c with mean 0 and variance 1 :p .d . 1 e -x2 / 2 **in** (-oc, oc) ; ch .f. e -t2 / 2 .1/2Tr(12) Cauchy distribution with parameter a > 0 :2p .d .r(a 2 a 7+ **in** (-oc, oc) ; ch .f. e-'I t l .x2)

CS6 .1 GENERAL PROPERTIES; CONVOLUTIONS 1 1 5 7Convolution is a smooth**in**g operation widely used **in** mathematical analysis,for **in**stance **in** the proof of Theorem 6 .5 .2 below . Convolution with thenormal kernel is particularly effective, as illustrated below .Let ns be the density of the normal distribution N(0, 8 2 ), namelyn s ( -12= 270 . exp x, - 00 < x < 00 .282)For any bounded measurable function f on 'V, put00 "0(9) fs(x) = (f * ns)(x) = f f (x - Y)ns(Y)dY = f ns(x - Y)f (Y)dy .00It will be seen below that the **in**tegrals above converge . Let CB denote theclass of functions on R1 which have bounded derivatives of all orders ; C Uthe class of bounded and uniformly cont**in**uous functions on R 1 .Theorem 6 .1 .5 . For each 8 > 0, we have f s E COO . Furthermore if f E CU ,then f s --> f uniformly **in** R 1 .PROOF. It is easily verified that n s E CB . Moreover its kth derivative n 3(k)is dom**in**ated by Ck .an2s where ck, s is a constant depend**in**g only on k and 8 sothat00fnsk) (x - y)f (Y)dY_ ck,sIIfIIn2S(x-Y)dy=Ck,sIIfII-Thus the first assertion follows by differentiation under the **in**tegral of the lastterm **in** (9), which is justified by elementary rules of calculus . The secondassertion is proved by standard estimation as follows, for any 77 > 0 :00If (x)-fs(x)I < fIf(x)-f(x-Y)Ins(Y)dy< sup If(x)-f(x-Y)I +2IIf1I f n6(Y)dy .1),I-'I>77Here is the probability idea **in**volved . If f is **in**tegrable over 0, we mayth**in**k of f as the density of a r .v . X, and n 8 as that of an **in**dependent normalr .v . Ys . Then f s is the density of X + Ys by Theorem 6 .1 .2 . As 8 4 0, Ysconverges to 0 **in** probability and so X + Y s converges to X likewise, hencealso **in** distribution by Theorem 4 .4 .5 . This makes it plausible that the densitieswill also converge under certa**in** analytical conditions .As a corollary, we have shown that the class CB is dense **in** the class Cuwith respect to the uniform topology on i~' 1 . This is a basic result **in** the theory

158 1 CHARACTERISTIC FUNCTIONof "generalized functions" or "Schwartz distributions" . By way of application,we state the follow**in**g strengthen**in**g of Theorem 4 .4 .1 .Theorem 6 .1 .6 . If {li ,1 } and A are s .p.m .'s such thatthen lµ„ -~ µ .b f E C,', : f f (x)l- ,, (dx) -~ f f (x)Iu (dx),This is an immediate consequence of Theorem 4 .4.1, and Theorem 6 .1 .5,if we observe that Co C C U . The reduction of the class of "test functions"from Co to CB is often expedient, as **in** L**in**deberg's method for prov**in**g centrallimit theorems ; see Sec . 7.1 below .EXERCISES1 . . If f is a ch .f., and G a d .f. with G(0-) = 0, then the follow**in**gfunctions are all ch .f.'s :00 00ff (ut)e -u du, fe_I t lu dG(u),f f (ut) du,0 0f o" e_t-u00dG(u) f f (ut) dG(u) .0 o*2 . Let f (u, t) be a function on (-oo, oo) x (-oo, oo) such that for eachu, f (u, •) is a ch .f. and for each t, f ( •, t) is a cont**in**uous function ; thenfooccf (u, t) dG(u)is a ch .f. for any d .f. G . In particular, if f is a ch .f. such that lim t, 00 f (t)exists and G a d .f. with G(0-) = 0, thenfo°O tf - dG(u) is a ch .f.u3 . F**in**d the d .f. with the follow**in**g ch .f .'s (a > 0, ,6 > 0) :a2 1 1a2 + t 2 ' (1 - ait)V (1 + of - ape" ) 1 /[HINT : The second and third steps correspond respectively to the gamma andPolya distributions .]

6 .1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1594. Let S1 , be as **in** (v) and suppose that S, --+ S,,, **in** pr. Prove that00flfj(t)j=1converges **in** the sense of **in**f**in**ite product for each t and is the ch .f. of S am .5 . If F 1 and F 2 are d .f.'s such thatF1 =jb- iand F 2 has density p, show that F 1 * F 2 has a density and f**in**d it .*6 . Prove that the convolution of two discrete d .f.'s is discrete ; that of acont**in**uous d .f. with any d .f. is cont**in**uous ; that of an absolutely cont**in**uousd .f. with any d .f. is absolutely cont**in**uous .7. The convolution of two discrete distributions with exactly m and natoms, respectively, has at least m + n - I and at most mn atoms .8. Show that the family of normal (Cauchy, Poisson) distributions isclosed with respect to convolution **in** the sense that the convolution of anytwo **in** the family with arbitrary parameters is another **in** the family with someparameter(s) .9 . F**in**d the nth iterated convolution of an exponential distribution .*10 . Let {Xj , j > 1} be a sequence of **in**dependent r .v.'s hav**in**g thecommon exponential distribution with mean 1/),, a, > 0 . For given x > 0 letv be the maximum of n such that Sn < x, where So = 0, S, = En=1 Xj asusual . Prove that the r .v . v has the Poisson distribution with mean ).x . SeeSec . 5 .5 for an **in**terpretation by renewal theory .11. Let X have the normal distribution (D . F**in**d the d .f., p .d ., and ch .f.of X 2 .12 . Let {X j, 1 < j < n} be **in**dependent r .v.'s each hav**in**g the d .f. (D .F**in**d the ch .f. ofand show that the correspond**in**g p .d . is 2- n 12 F(n/2) -l x (n l 2)-l e -x" 2 **in** (0, oc) .This is called **in** statistics the "X 2 distribution with n degrees of freedom" .13. For any ch .f. f we have for every t :R[1 - f (t)] > 4R[1 - f (2t)] .14. F**in**d an example of two r .v .'s X and Y with the same p .m . µ that arenot **in**dependent but such that X + Y has the p .m . µ * It . [HINT : Take X = Yand use ch .f.]

1 CHARACTERISTIC FUNCTIONDO h2 h +(h) -< QF (h) A QG (h)1 60*15 . For a d .f. F and h > 0, def**in**eQF(h) = sup[F(x + h) - F(x-)] ;xQF is called the Levy concentration function of F . Prove that the sup aboveis atta**in**ed, and if G is also a d .f., we haveVh > 0 : QF•G.16 . If 0 < hA < 2n, then there is an absolute constant A such thatQF(h) < - f if (t)I dt,where f is the ch .f. of F . [HINT : Use Exercise 2 of Sec . 6.2 below .]17. Let F be a symmetric d .f ., with ch .f. f > 0 then(Dr (h) _f2 dF(x) = h 00 e-ht f(t) dtoo x 0is a sort of average concentration function . Prove that if G is also a d .f. withch .f. g > 0, then we have Vh > 0 :(PF*G(h) < cOF(h) A ip0(h) ;1 - cPF*G(h) < [1 - coF(h)] + [1 - cPG(h)] .*18 . Let the support of the p .m. a on ~RI be denoted by supp p . Provethatsupp (j.c * v) = closure of supp . + supp v ;supp (µ i * µ2 * . . . ) = closure of (supp A l + supp / 2 + • )where "+" denotes vector sum .6.2 Uniqueness and **in**versionTo study the deeper properties of Fourier-Stieltjes transforms, we shall needcerta**in** "Dirichlet **in**tegrals" . We beg**in** with three basic formulas, where "sgna" denotes 1, 0 or -1, accord**in**g as a > 0, = 0, or < 0 .ry s**in** ax(1) dy>0:0

x[ J6 .2 UNIQUENESS AND INVERSION 1161(3)°° 1 - os axA0= 2 Icel .The substitution ax = u shows at once that it is sufficient to prove all threeformulas for a = 1 . The **in**equality (1) is proved by partition**in**g the **in**terval[0, oo) with positive multiples of Jr so as to convert the **in**tegral **in**to a seriesof alternat**in**g signs and decreas**in**g moduli . The **in**tegral **in** (2) is a standardexercise **in** contour **in**tegration, as is also that **in** (3) . However, we shall **in**dicatethe follow**in**g neat heuristic calculations, leav**in**g the justifications, which arenot difficult, as exercises .r °0 s**in** x f °O f °° f °0 r °°°01 2osxdx= 00dx s**in** x e-XU du dx =fJ o I = J J00 du n1 u 2 2 '00 u 00 dxX s**in** u du dx = s**in**10 0 x2 o o U~O° s**in** u ndu = - .o u 2Je-xu s**in** x dx duWe are ready to answer the question : given a ch .f. f, how can we f**in**dthe correspond**in**g d .f. F or p.m . µ? The formula for do**in**g this, called the**in**version formula, is of theoretical importance, s**in**ce it will establish a oneto-onecorrespondence between the class of d .f.'s or p .m .'s and the class ofch .f.'s (see, however, Exercise 12 below) . It is somewhat complicated **in** itsmost general form, but special cases or variants of it can actually be employedto derive certa**in** properties of a d .f. or p .m . from its ch .f. ; see, e .g ., (14) and(15) of Sec . 6 .4 .Theorem 6 .2 .1 . If x1 < x2, then we have(4 ) A((X1, X2)) + 2-~({X1 }) + 2/~({x2})1 T e-irx1 - e -i'x2= 2ir it f (t) dtToo I T(the **in**tegrand be**in**g def**in**ed by cont**in**uity at t = 0) .PROOF . Observe first that the **in**tegrand above is bounded by Ixl - x21everywhere and is O(ItI -1 ) as It! --+ oo ; yet we cannot assert that the "**in**f**in**ite**in**tegral" f0 exists (**in** the Lebesgue sense) . Indeed, it does not **in** general(see Exercise 9 below) . The fact that the **in**dicated limit, the so-called Cauchylimit, does exist is part of the assertion .du

t1 62 1 CHARACTERISTIC FUNCTIONWe shall prove (4) by actually substitut**in**g the def**in**ition of f **in**to (4)and carry**in**g out the **in**tegrations . We have(5)1 T e- l"' - e-itx2 0027r I T it f 00e' rx µ(dx)dt~cc T e it(x-x,) - e it(x-x2)00 [~ dt] µ(dx) .T27ri ttLeav**in**g aside for the moment the justification of the **in**terchange of the iterated**in**tegral, let us denote the quantity **in** square brackets above by I (T, x, x1, X2)-Trivial simplification yieldsI(T, x, x1, x2) =- 0It follows from (2) that1 1 T s**in** t(x - xl) 1 dt -t_ 21 _(_ 21)_1 10 - (- 2 ) = 21 1lim I (T, x, x1, x2) _ 2-(- 2 )=11 - 0 - 1T-+ oo2 21 12 -2=0 -01 T s**in** t(x - x2)dt .Furthermore, I is bounded **in** T by (1) . Hence we may let T ---> oo under the**in**tegral sign **in** the right member of (5) by bounded convergence, s**in**ce0-0for x < x1,for x = x1,for xl < x < x2,for x = X2,for x > x2 .by (1) .The result isJI(T, x, x1, x2 )1< 2 s**in**x dx7r 0 XO+ Jx 2 + i+ f 1 + 0}µ(dx)oo, xx ,z x 2 (x2, 00 ))))( i) { ~ l( i 2) { z?= 2A({x1}) + µ((x1, x2)) + 2µ({x2}) .This proves the theorem . For the justification mentioned above, we **in**vokeFub**in**i's theorem and observe thate it(x-x,)- e it(x-xz)it,Ixie-itu duwhere the **in**tegral is taken along the real axis, and< IX l -x21,T1? 1 ITIxl - x2 I dt,u,(dx) < 2T Ix, -x21 < oo,

6 .2 UNIQUENESS AND INVERSION 1 163so that the **in**tegrand on the right of (5) is dom**in**ated by a f**in**itely **in**tegrablefunction with respect to the f**in**ite product measure dt - u(dx) on [-T, +T] x?) 1 . This suffices .Remark . If x1 and x2 are po**in**ts of cont**in**uity of F, the left side of (4)is F(x2) - F(xl ) .The follow**in**g result is often referred to as the "uniqueness theorem" forthe "determ**in****in**g" p. or F (see also Exercise 12 below) .Theorem 6 .2 .2 .same .If two p .m .'s or d .f.'s have the same ch .f., then they are thePROOF . If neither x 1 nor x2 is an atom of µ, the **in**version formula (4)shows that the value of µ on the **in**terval (x1, x2 ) is determ**in**ed by its ch .f. Itfollows that two p .m .'s hav**in**g the same ch .f. agree on each **in**terval whoseendpo**in**ts are not atoms for either measure . S**in**ce each p .m . has only a countableset of atoms, po**in**ts of ?P,, 1 that are not atoms for either measure form adense set. Thus the two p .m .'s agree on a dense set of **in**tervals, and thereforethey are identical by the corollary to Theorem 2 .2 .3 .We give next an important particular case of Theorem 6 .2 .1 .Theorem 6 .2 .3 . If f E L 1 (-oo, +oo), then F is cont**in**uously differentiable,and we have00(6) F(x) = Zn J 00e- `X ' f (t) dt .PROOF . Apply**in**g (4) for x2 = x and x1 = x - h with h > 0 and us**in**g F**in**stead of µ, we haveF(x) + F(x-) F(x - h) + F(x - h-) I O° e"~'-12 2 2n ,~ ~ ite - "X f (t) dt .Here the **in**f**in**ite **in**tegral exists by the hypothesis on f, s**in**ce the **in**tegrandabove is dom**in**ated by I h f (t)! . Hence we may let h --)- 0 under the **in**tegralsign by dom**in**ated convergence and conclude that the left side is 0 . Thus, Fis left cont**in**uous and so cont**in**uous **in** Now we can writeF(x)-F(x-h) 1 °O e"h-1 _ e-itx f(t) dt .h 27r _~ ithThe same argument as before shows that the limit exists as h -* 0. HenceF has a left-hand derivative at x equal to the right member of (6), the latterbe**in**g clearly cont**in**uous [cf . Proposition (ii) of Sec . 6 .1] . Similarly, F has a

164 1 CHARACTERISTIC FUNCTIONright-hand derivative given by the same formula . Actually it is known for acont**in**uous function that, if one of the four "derivates" exists and is cont**in**uousat a po**in**t, then the function is cont**in**uously differentiable there (see, e .g .,Titchmarsh, The theory offunctions, 2nd ed ., Oxford Univ . Press, New York,1939, p . 355) .The derivative F' be**in**g cont**in**uous, we have (why?)Vx : F(x) =fF'(u)du .Thus F' is a probability density function . We may now state Theorem 6 .2 .3**in** a more symmetric form familiar **in** the theory of Fourier **in**tegrals .Corollary . If f E L I , then p E L 1 , whereandp(x) =1 °°2n Je-txt f(t) dt,= f f (t) 00 e1txpxdx ().The next two theorems yield **in**formation on the atoms of µ by means off and are given here as illustrations of the method of "harmonic analysis" .Theorem 6 .2.4 .For each x0, we have(7) lim 1 e-`txo f(t) dtTT---) .w 2T -T= µ({xo}) .PROOF . Proceed**in**g as **in** the proof of Theorem 6 .2 .1, we obta**in** for the**in**tegral average on the left side of (7) :(8)s**in** T(x - x0)µ(dx) + lµ(dx) .~~-{x } T(.,c- x0) {xo)The **in**tegrand of the first **in**tegral above is bounded by 1 and tends to 0 asT -). oc everywhere **in** the doma**in** of **in**tegration ; hence the **in**tegral convergesto 0 by bounded convergence. The second term is simply the right memberof (7) .Theorem 6 .2 .5 .(9)We have1 TT (t)I 2 dt2T J-T I fXE_/? I

6 .2 UNIQUENESS AND INVERSION 1 165PROOF . S**in**ce the set of atoms is countable, all but a countable numberof terms **in** the sum above vanish, mak**in**g the sum mean**in**gful with a valuebounded by 1 . Formula (9) can be established directly **in** the manner of (5)and (7), but the follow**in**g proof is more illum**in**at**in**g . As noted **in** the proofof the corollary to Theorem 6 .1 .4, 12 1 f is the ch .f. of the r.v . X - Y there,whose distribution is g * g', where g'(B) = It (-B) for each B E A Apply**in**gTheorem 6 .2.4 with x0 = 0, we see that the left member of (9) is equal to(A * g')({0}) .By (7) of Sec . 6 .1, the latter may be evaluated asI g'({-y})g(dy) = g({y})g({y}),~ts**in**ce the **in**tegrand above is zero unless -y is an atom of g', which is thecase if and only if y is an atom of g . This gives the right member of (9) . Thereader may prefer to carry out the argument above us**in**g the r .v .'s X and -Y**in** the proof of the Corollary to Theorem 6 .1 .4 .Corollary. g is atomless (F is cont**in**uous) if and only if the limit **in** the leftmember of (9) is zero .yE9?'This criterion is occasionally practicable .DEFINITION . The r .v . X is called symmetric iff X and -X have the samedistribution .For such an r .v ., the distribution g has the follow**in**g property :VB E .A : 11(B) = g(-B) .Such a p .m . may be called symmetric ; an equivalent condition on its d .f.as follows :Yx E R I : F(x) = 1 - F(-x-),F is(the awkwardness of us**in**g d .f. be**in**g obvious here) .Theorem 6 .2 .6. X or g is symmetric if and only if its ch .f. is real-valued(for all t) .PROOF. If X and -X have the same distribution, they must "determ**in**e"the same ch .f. Hence, by (iii) of Sec . 6 .1, we have.f (t) = .f (t)

1 6 6 1 CHARACTERISTIC FUNCTIONand so f is real-valued . Conversely, if f is real-valued, the same argumentshows that X and -X must have the same ch .f. Hence, they have the samedistribution by the uniqueness theorem (Theorem 6 .2 .2) .EXERCISESfis the ch .f. of F below .1. Show that*2. Show that for each T > 0 :J °O (s**in**x)2 7rx dx = 2 .1 / °O (1 -cos Tx) cos tx00 X 2dx = (T - ItI) v 0 .Deduce from this that for each T > 0, the function of tIti(1_)vOTgiven byis a ch .f. Next, show that as a particular case of Theorem 6 .2 .3,1 - cos Tx _ 1 fT2 T (T - It I)e` txdt .X2 JF**in**ally, derive the follow**in**g particularly useful relation (a case of Parseval'srelation **in** Fourier analysis), for arbitrary a and T > 0 :~'OO 1 - cos T(x - a) 1 1dF(x) = 2 Tz T (T - ItI)e-`t° f(t)dt .J [T(x - a)]2J*3. Prove that for each a > 0 :J a [F(x + u) - F(x - u)] du -cos ate_,tX f (t) dtot o.As a sort of reciprocal, we havefT12a u o0/ du / f (t) dt = 1 - cosx 2 ax dF(x) .0 u o°4 . If f (t)/t c L 1 (-oo, oo), then for each a > 0 such that ±a are po**in**tsof cont**in**uity of F, we have00s**in** at f.F(a) - F(-a) = 1(t)dt .rr 0 t

I6 .2 UNIQUENESS AND INVERSION 1 1 6 75. What is the special case of the **in**version formula when f - 1? Deducealso the follow**in**g special cases, where a > 0 :1 f °O s**in** at s**in** tdt=aAl,7r xt21 °O s**in** at (s**in** t)2 a 2--006. For each n > 0, we haveJ",t3 d t = a - 4for a < 2 ; 1 for a > 2 .1 °O s**in** ttn+2 j2 udt =where Cp l = 2and tOn = cpn-1 * (PI for n > 2 .f~pn (t)dtdu,*7. If F is absolutely cont**in**uous, then liml t l,, f (t) = 0 . Hence, if theabsolutely cont**in**uous part of F does not vanish, then lim t, I f (t)I < 1 . IfF is purely discont**in**uous, then lim, f (t) = 1 . [The first assertion is theRiemann-Lebesgue lemma ; prove it first when F has a density that is asimple function, then approximate . The third assertion is an easy part of theobservation that such an f is "almost periodic" .]8. Prove that for 0 < r < 2 we have00rIx dFxI O= C(r) 1Itff 00R+I (t) dtW 00whereC(r) -Cf 00°O 1 -cos a -1 F(r + 1) r7rIuIr+1du =7rs**in** 2 ,thus C (l) = 1/,7 . [HINT :IxI r= C(r)°° 1 - cos xt-00t Ir+1dt .]*9. Give a trivial example where the right member of (4)replaced by the Lebesgue **in**tegral1 00 e-ttx1 - e-Itx227r J-c,, R f (t) dt .itcannot beBut it can always be replaced by the improper Rieniann **in**tegral :1 T2 e-uxi - e -ux2R f (t) dt .T~--x 27r TjitT,- +x

1 6 8 1 CHARACTERISTIC FUNCTION10 . Prove the follow**in**g form of the **in**version formula (due to Gil-Palaez) :1 1 T eitx f(-t) - e-itx f (t)2 {F(x+) + F(x-)} = 2 + Jim dt .Trx a 27rit[Hmrr : Use the method of proof of Theorem 6 .2 .1 rather than the result .]11. Theorem 6 .2 .3 has an analogue **in** L 2 . If the ch .f. f of F belongsto L 2 , then F is absolutely cont**in**uous . [Hmrr : By Plancherel's theorem, thereexists ~0 E L 2 such thatx 1 0o e -itx - 1cp(u) du = ~ f (t) dt .0 27r Jo -itNow use the **in**version formula to show thatxF(x) - F(O) _ cp(u) du .]2n o* 12. Prove Theorem 6 .2 .2 by the Stone-Weierstrass theorem . [= : Cf.Theorem 6 .6 .2 below, but beware of the differences . Approximate uniformlyg l and 92 **in** the proof of Theorem 4 .4 .3 by a periodic function with "arbitrarilylarge" period .]13. The uniqueness theorem holds as well for signed measures [or functionsof bounded variations] . Precisely, if each i.ci, i = 1, 2, is the differenceof two f**in**ite measures such thatVt :I eitxA, (dx) = fe itx µ2(dx),then µl = /J, 2-14. There is a deeper supplement to the **in**version formula (4) orExercise 10 above, due to B . Rosen . Under the condition(1 + log Ix 1) dF(x) < oo,00the improper Riemann **in**tegral **in** Exercise 10 may be replaced by a Lebesgue**in**tegral . [HINT : It is a matter of prov**in**g the existence of the latter . S**in**ced00 fNJ~ dF(y) s**in**(x - y)t00t < dF(y){1 + log(1 +NIx - yl)} < oc,Jt fwe havef~Ns**in**(x t y)t dt =dF(y) N dt~00 Jo Jo j 000000s**in**(x - y)tdF(y) .

6 .3 CONVERGENCE THEOREMS I 169For fixed x, we havey¢xdF(y)f00 s**in**(x - y)t dtNt+ Cz f dF(y), oo .]6.3 Convergence theoremsFor purposes of probability theory the most fundamental property of the ch .f.is given **in** the two propositions below, which will be referred to jo**in**tly as theconvergence theorem, due to P . Levy and H . Cramer. Many applications willbe given **in** this and the follow**in**g chapter . We beg**in** with the easier half .Theorem 6 .3 .1 . Let {/.6 n , 1 < n < oc) be p.m .'s on R.1 with ch .f.'s { f n , 1

170 1 CHARACTERISTIC FUNCTIONThen we have(a) where µ,, is a p.m . ;(f) f,, is the ch .f. of µ ms .PROOF . Let us first relax the conditions (a) and (b) to require only convergenceof f„ **in** a neighborhood (-80, 8) of t = 0 and the cont**in**uity of thelimit function f (def**in**ed only **in** this neighborhood) at t = 0 . We shall provethat any vaguely convergent subsequence of {µ„ } converges to a p .m . µ . Forthis we use the follow**in**g lemma, which illustrates a useful technique forobta**in****in**g estimates on a p .m . from its ch .f.Lemma .For each A > 0, we haveA - '(2) µ([-2A, 2A]) > A-f (t) dt - 1 .A - ~PROOF OF THE LEMMA . By (8) of Sec . 6 .2, we have(3) ~T T f (t) dtTJ -oos**in** Tx µ(dx) .S**in**ce the **in**tegrand on the right side is bounded by 1 for all x (it is def**in**ed to be1 at x = 0), and by ITx1 -1 < (2TA) -1 for Ix I > 2A, the **in**tegral is bounded by1/1([ -2A, 2A]) + 2TA {1-,u([-2A, 2A]))_ (1 -2TA) /L([ -2A, 2A]) + 2TA'Putt**in**g T = A -1 **in** (3),we obta**in**AA 12f-Af (t) dt1 1< 2 /u ([-2A, 2A]) + 2which reduces to (2) . The lemma is proved .Now for each 8, 0 < 8 < 80, we have(4)1a28f s f„(t)dt1s28 f sf (t) dt1s28f aIf,, (t) - f (t)I dt .The first term on the right side tends to I as 8 .4, 0, s**in**ce f (0) = 1 and fis cont**in**uous at 0 ; for fixed 8 the second term tends to 0 as n -* oc, bybounded convergence s**in**ce If,, - f I < 2 . It follows that for any given c > 0,

6 .3 CONVERGENCE THEOREMS 1 171there exist 8 = 8(E) < 8o and n o = n o (E) such that if n > n o , then the leftmember of (4) has a value not less than 1 - E. Hence by (2)(5) µ n ([-28-1 , 28 -1 ]) > 2(1 - E) - 1 > 1 - 2E .Let {/i, ) be a vaguely convergent subsequence of {,u, ), which alwaysexists by Theorem 4 .3 .3 ; and let the vague limit be a, which is always ans .p.m. For each 8 satisfy**in**g the conditions above, and such that neither -28 -1nor 28-1 is an atom of µ, we have by the property of vague convergenceand (5) :/t( 1 ) ? /t([-23 -1 , 25 -1 1)lim µn ([-28 -1 , 28 -1 ]) > 1 - 2E .n-+ooS**in**ce c is arbitrary, we conclude that µ is a p .m ., as was to be shown .Let f be the ch .f. of a . Then, by the preced**in**g theorem, f,, --* f everywhere; hence under the orig**in**al hypothesis (a) we have f = f 00 . Thus everyvague limit ,u considered above has the same ch .f . and therefore by the uniquenesstheorem is the same p .m. Rename it µ 00 so that u 00 is the p .m. hav**in**gthe ch .f. f,,, . Then by Theorem 4 .3 .4 we have µn-'* ,u 00 . Both assertions (a)and (,8) are proved .As a particular case of the above theorem : if (An, 1 < n < oo} and{ f n, 1 < n < oc) are correspond**in**g p .m.'s and ch .f.'s, then the converse of(1) is also true, namely :(6) An --+ /too '4#' fn --> fcc •This is an elegant statement, but it lacks the full strength of Theorem 6 .3 .2,which lies **in** conclud**in**g that f ,,z) is a ch .f. from more easily verifiable conditions,rather than assum**in**g it . We shall see presently how important this is .Let us exam**in**e some cases of **in**applicability of Theorems 6 .3 .1 and 6 .3 .2 .Example 1 . Let µ„ have mass z'-at 0 and mass 2at n . Then µ„ --->. /-µx, where it,,,has mass ; at 0 and is not a p .m . We have1 1 **in**tf„ (t) = Z + 7i ewhich does not converge as n -* oo, except when t is equal to a multiple of 27r .Example 2 . Let µ„ be the uniform distribution [-n, n] . Then µ„ --* µ,x , where µ,,,is identically zero . We havefn(t) =1,'if t 0 ;if t=0 ;

172 1 CHARACTERISTIC FUNCTIONandif t = 0 ;f » (t) -~ f (t) = ~ 0,1, if t=0 .Thus, condition (a) is satisfied but (b) is not .Later we shall see that (a) cannot be relaxed to read : f , ( t) converges **in**I t I < T for some fixed T (Exercise 9 of Sec . 6 .5) .The convergence theorem above settles the question of vague convergenceof p .m .'s to a p .m . What about just vague convergence without restrictionon the limit? Recall**in**g Theorem 4 .4 .3, this suggests first that we replacethe **in**tegrand e uTX **in** the ch .f . f by a function **in** Co . Secondly, go**in**g over thelast part of the proof of Theorem 6 .3 .2, we see that the choice should be madeso as to determ**in**e uniquely an s .p .m . (see Sec . 4 .3) . Now the Fourier-Stieltjestransform of an s .p.m . is well def**in**ed and the **in**version formula rema**in**s valid,so that there is unique correspondence just as **in** the case of a p .m . Thus anatural choice of g is given by an "**in**def**in**ite **in**tegral" of a ch .f., as follows :(7) g(u) = f u[f e`rXµ(dx) dt= fe' - I µ(dx) .0 I ixLet us call g the **in**tegrated characteristic function of the s .p.m . µ . We arethus led to the follow**in**g companion of (6), the details of the proof be**in**g leftas an exercise .Theorem 6 .3 .3 . A sequence of s .p .m .'s {,cc,,, 1 < n < oo} converges (to p 0 )if and only if the correspond**in**g sequence of **in**tegrated ch .f.'s {g, 1 } converges(to the **in**tegrated ch .f. of µ,,) .Another question concern**in**g (6) arises naturally . We know that vagueconvergence for p .m .'s is metric (Exercise 9 of Sec . 4 .4) ; let the metric bedenoted by Uniform convergence on compacts (viz ., **in** f**in**ite **in**tervals)for uniformly bounded subsets of C B ( :% 1) is also metric, with the metricdenoted by ( •, •) 2 , def**in**ed as follows :(f, g)2If (t) -g(t)1= suptE'4l 1 + tIt is easy to verify that this is a metric on CB and that convergence **in** this metricis equivalent to uniform convergence on compacts ; clearly the denom**in**atorI + t 2 may be replaced by any function cont**in**uous on Mi l , bounded belowby a strictly positive constant, and tend**in**g to +oo as ItI -- oo . S**in**ce there isa one-to-one correspondence between ch .f.'s and p .rn .'s, we may transfer the

6.3 CONVERGENCE THEOREMS 1 173metric ( •, .) 2 to the latter by sett**in**g(µ, v)2 = (f u , fl,) 2**in** obvious notation . Now the relation (6) may be restated as follows .Theorem 6 .3 .4 . The topologies **in**duced by the two metrics ( ) 1 and ( ) 2 onthe space of p .m .'s on Z1 are equivalent .This means that for each u and given c > 0, there exists S(1.t, c) such that :(u, v)1 :!S S(µ, E) = (A, v)2 < E,(ii, v)2 < S(µ, E) (A, v)1 < E .Theorem 6 .3 .4 needs no new proof, s**in**ce it is merely a paraphras**in**g of (6) **in**new words . However, it is important to notice the dependence of S on µ (aswell as c) above . The sharper statement without this dependence, which wouldmean the equivalence of the uniform structures **in**duced by the two metrics,is false with a vengeance ; see Exercises 10 and 11 below (Exercises 3 and 4are also relevant) .EXERCISES1 . Prove the uniform convergence of f n **in** Theorem 6 .3 .1 by an **in**tegrationby parts of f e` tX d F, (x) .*2. Instead of us**in**g the Lemma **in** the second part of Theorem 6 .3 .2,prove that µ is a p.m . by **in**tegrat**in**g the **in**version formula, as **in** Exercise 3of Sec . 6 .2. (Integration is a smooth**in**g operation and a standard technique **in**tam**in**g improper **in**tegrals : cf. the proof of the second part of Theorem 6 .5 .2below .)3. Let F be a given absolutely cont**in**uous d .f. and let F„ be a sequenceof step functions with equally spaced steps that converge to F uniformly **in**JT 1 . Show that for the correspond**in**g ch .f.'s we haveVn : sup I f (t) - f,,(t)I = 1 .tER 14. Let F,,, G,, be d .f.'s with ch .f.'s f, and g,, . If f„ - g„ -+ 0 a .e .,then for each f E CK we have f f d F, - f f d G, --- 0 (see Exercise 10of Sec . 4 .4) . This does not imply the Levy distance (F, G,), -+ 0 ; f**in**da counterexample . [HINT : Use Exercise 3 of Sec . 6.2 and proceed as **in**Theorem 4 .3 .4 .]5. Let F be a discrete d .f . with po**in**ts of jump {a j , j > 1 } and sizes ofjump {b j , j > 11 . Consider the approximat**in**g s .d .f.'s F„ with the same jumpsbut restricted to j < n . Show that F,-'+'F .

174 1 CHARACTERISTIC FUNCTION* 6 . If the sequence of ch .f.'s { fn } converges uniformly **in** a neighborhoodof the orig**in**, then if, } is equicont**in**uous, and there exists a subsequence thatconverges to a ch .f. [HINT : Use Ascoli-Arzela's theorem .]7 . If F„ - F and G n -* G, then F,, * G n 4 F * G . [A proof of this simpleresult without the use of ch .f .'s would be tedious .]*8. Interpret the remarkable trigonometric identitys**in** t °O tt = H cos nn=1**in** terms of ch .f .'s, and hence by addition of **in**dependent r .v .'s . (This is anexample of Exercise 4 of Sec . 6 .1 .)9. Rewrite the preced**in**g formula ass**in** t 00 ttt = 11Cos 22k-1 H COs 22kk=1 k=1Prove that either factor on the right is the ch .f. of a s**in**gular distribution . Thusthe convolution of two such may be absolutely cont**in**uous . [xiwr : Use thesame r .v .'s as for the Cantor distribution **in** Exercise 9 of Sec . 5 .3 .]10 . Us**in**g the strong law of large numbers, prove that the convolution oftwo Cantor d .f .'s is still s**in**gular . [Hmrr : Inspect the frequency of the digits **in**the sum of the correspond**in**g random series ; see Exercise 9 of Sec . 5 .3 .]*11. Let Fn , G n be the d .f.'s of µ„ , v n , and f n , g n their ch .f.'s . Even ifsupxEg? , IF,, (x) - G n (x) I - * 0, it does not follow that (f It, gn) 2 -+ 0 ; **in**deedit may happen that (f,,, g„)2 = 1 for every n . [HINT : Take two step functions"out of phase" .]12 . In the notation of Exercise 11, even if sup t .R, I f n (t) - g,, (t) I -* 0,it does not follow that (F,,, G n ) --> 0 ; **in**deed it may - k 1 . [HINT : Let f be anych .f. vanish**in**g outside (-1, 1), f j (t) = e-**in**itf (m j t), g j (t) = e' n i t f (mj t), andF j , G j be the correspond**in**g d .f .'s . Note that if mjn,. 1 0, then F j (x) -k 1,G j (x) -a 0 for every x, and that f j - g j vanishes outside (-m i1 ,mi 1 ) andis 0(s**in** n j t) near t = 0 . If m j = 2j 2 and n j = jm j then E j (f j - g j) isuniformly bounded **in** t : for nk+1 < t < nk 1 consider j > k, j = k, j < k separately. LetItn* -1g n = n ~gj,j=1j=1then sup I f n - gn I = O(n -1 ) while F ; - Gn --* 0. This example is due toKatznelson, rival**in**g an older one due to Dyson, which is as follows . For00

6 .4 SIMPLE APPLICATIONS I 175b > a > 0, letx 2log + b 2F(x) _ - (x 2 +a 22 a logbafor x < 0 and = 1 for x > 0 ; G(x) = 1 - F(-x) .Then,f (t)t [e -° l t l - e-' ]- g(t) _ -7ri ~tl blog-/ aIf a is large, then (F, G) is near 1 . If b/a is large, then (f, g) is near 0 .]6 .4 Simple applicationsA common type of application of Theorem 6 .3 .2 depends on power-seriesexpansions of the ch .f., for which we need its derivatives .Theorem 6 .4 .1 . If the d .f . has a f**in**ite absolute moment of positive **in**tegralorder k, then its ch .f. has a bounded cont**in**uous derivative of order k given by(1) f (k) (t) = f(jxe1tx ) kdF(x) .Conversely, if f has a f**in**ite derivative of even order k at ta f**in**ite moment of order k .= 0, then F hasPROOF . For k = 1, the first assertion follows from the formula :f (t + h) - f (t)00 ei(t+h)x - e itxh _ 00 hdF(x) .An elementary **in**equality already used **in** the proof of Theorem 6 .2 .1 showsthat the **in**tegrand above is dom**in**ated by txj . Hence if f Jxl dF(x) < oo, wemay let h 0 under the **in**tegral sign and obta**in** (1) . Uniform cont**in**uity ofthe **in**tegral as a function of t is proved as **in** (ii) of Sec . 6 .1 . The case of ageneral k follows easily by **in**duction .To prove the second assertion, let k = 2 and suppose that f "(0) existsand is f**in**ite . We havef1/ (0)-hof(h)-2f(0)+f(-h)h2

176 1 CHARACTERISTIC FUNCTIONihx_= ;io e h2eihzdF(x)(2)_ 1 cos hx-21o h2dF(x) .As h -* 0, we have by Fatou's lemma,1 - cos hx 1 - cos hxfx 2 dF(x) = 2 f h dF(x) < lim 2 dF(x)o h 2h~O fh2_ - f"(0) .Thus F has a f**in**ite second moment, and the validity of (1) for k = 2 nowfollows from the first assertion of the theorem .The general case can aga**in** be reduced to this by **in**duction, as follows .Suppose the second assertion of the theorem is true for 2k - 2, and thatf(2k) (0) is f**in**ite . Then f (2k-2) (t) exists and is cont**in**uous **in** the neighborhoodof t = 0, and by the **in**duction hypothesis we have **in** particular(- I )k-1f x 2k-2 dF(x) = f(2k-2) (0)Put G(x) = fX" y2k-2 dF(y) for every x, then G( •)/G(oo) is a d .f. with thech .f.t1( ) = f e itxx2k-2 dF (x) = (-1)k-i f (2k-2) (t)G(oc)G(oo)Hence 1/r"exists, and by the case k = 2 proved above, we have-V(2)(0)Upon cancell**in**g G(oo), we obta**in**x2k dFx( )G(o) o fx2dG(x) G(«~) f(_ 1) k f(2k) (O)= fx 2k dF(x),which proves the f**in**iteness of the 2kth moment . The argument above failsif G(oc) = 0, but then we have (why?) F = 60 , f = 1, and the theorem istrivial .Although the next theorem is an immediate corollary to the preced**in**gone, it is so important as to deserve prom**in**ent mention .Theorem 6 .4 .2 . If F has a f**in**ite absolute moment of order k, k an **in**teger> 1, then f has the follow**in**g expansion **in** the neighborhood of t = 0 :kJl ,n (i) t .i +o(I tI k ),I I •

6 .4 SIMPLE APPLICATIONS I 177k-1 j(3') f (t) = -j_0m (j) t i+ k~ A (k) Itl k ;where m ( j ) is the moment of order j, / .t (k) is the absolute moment of order k,and l9k I

178 1 CHARACTERISTIC FUNCTIONNow let {X„ , n > 1 } be a sequence of **in**dependent r .v .'s with the commond .f. F, and S, = En=1 X j , as **in** Chapter 5 .Theorem 6 .4.3 .If F has a f**in**ite mean m, thenS nn-f m **in** pr .PROOF. S**in**ce convergence to the constant m is equivalent to that **in** d ist .to Sn, (Exercise 4 of Sec . 4 .4), it is sufficient by Theorem 6 .3 .2 to prove thatthe ch .f. of S„ /n converges to etmt (which is cont**in**uous) . Now, by (2) ofSec . 6 .1 we havetE(e it(Sn/n) ) = E(e i(t/n)Sn ) = f -nnBy Theorem 6 .4 .2, the last term above may be written ast1 ~-im-+ ontnnfor fixed t and n --* oo . It follows from (5) that this converges to ell' asdesired .Theorem 6 .4 .4. If F has mean m and f**in**ite variance a 2> 0, thenS n - mn -* (D **in** dist .a Inwherec is the normal distribution with mean 0 and variance 1 .PROOF . We may suppose m = 0 by consider**in**g the r .v .'s X j - m, whosesecond moment is Q 2 . As **in** the preced**in**g proof, we haveC,SnC exp it = f tn(ft,1+ i2a2t2+ o)(t 2 2 n -= j 1- + o t --+ e r~/2l 2n nThe limit be**in**g the ch .f . of c, the proof is ended .The convergence theorem for ch .f.'s may be used to complete the methodof moments **in** Theorem 4 .5 .5, yield**in**g a result not very far from what ensuesfrom that theorem coupled with Carleman's condition mentioned there .

6 .4 SIMPLE APPLICATIONS 1 179Theorem 6 .4 .5 . In the notation of Theorem 4 .5 .5, if (8) there holds togetherwith the follow**in**g condition :m (k) t k(6) Vt E'W 1 : lim = 0,k-+ oo k !then F n - 17). F .PROOF . Let f n be the ch .f. of Fn . For fixed t and an odd k we have bythe Taylor expansion for e`tt with a rema**in**der term :fdFn (x) _n (t) = f e`tXkemnk+l) tk+l(lt)j m (j) +nJij=o(k + 1)! '(ltx)j +l itxl k+1 dFn9(x)j=0 J . (k + 1)'.where 0 denotes a "generic" complex number of modulus 0, by condition (6) there exists an odd k = k(E) such that for thefixed t we have(8)(2m (k+l)+1 )t k+l E(k + 1) ! < 2S**in**ce we have fixed k, there exists n o = n 0 (E) such that if n > n 0 , thenm (k+l) < m (k+l) + 1n - 'and moreover,max I m (') - MW I < e -Itl E1

180 1 CHARACTERISTIC FUNCTIONHence f „ (t) -+ f (t) for each t, and s**in**ce f is a ch .f., the hypotheses ofTheorem 6 .3 .2 are satisfied and so F„ -4 F .As another k**in**d of application of the convergence theorem **in** which alimit**in**g process is implicit rather than explicit, let us prove the follow**in**gcharacterization of the normal distribution .Theorem 6 .4 .6 . Let X and Y be **in**dependent, identically distributed r .v .'swith mean 0 and variance 1 . If X + Y and X - Y are **in**dependent then thecommon distribution of X and Y is (D .PROOF . Let f be the ch .f ., then by (1), f'(0) = 0, f"(0) = -1 . The ch .f.of X + Y is f (t)2 and that of X - Y is f (t) f (-t) . S**in**ce these two r .v .'s are**in**dependent, the ch .f . f (2t) of their sum 2X must satisfy the follow**in**g relation :(9) f (2t) = f (t) 3 f ( - t) .It follows from (9) that the function f never vanishes . For if it did at t0 , thenit would also at t0/2, and so by **in**duction at t0 /2" for every n > 1 . This isimpossible, s**in**ce limn ,, f (t 0/2n) = f (0) = 1 . Sett**in**g for every t :we obta**in**Wp(t) = f ((t)'(10) p(2t) = p(t) 2 .Hence we have by iteration, for each t :C2°tt 2nP(t)=p = 1-{ 0121 211/ }by Theorem 6 .4 .2 and the previous lemma . Thus p(t) - 1, f (t) - f (-t), and(9) becomes(11) f (2t) = f (t)4 .Repeat**in**g the argument above with (11),t4f(t)=f 211=)2we have1 tt+O2C 21~2n24"_t 2 /2eThis proves the theorem without the use of "logarithms" (see Sec . 7 .6) .EXERCISES*1 . If f is the ch .f . of X, andf(t)- 1_Ezlim = > - 00,r~0 t 2 2

6 .4 SIMPLE APPLICATIONS 1 181then c` (X) = 0 and c`(X 2 ) = a 2 . In particular, if f (t) = 1 + 0(t 2 ) as t -+ 0,then f - 1 .*2 . Let {Xn } be **in**dependent, identically distributed with mean 0 and variancea2 , 0 < a2 < oo . Prove thatISn 1 S.+ 2lira ` /~ = 2 lim c~ = -an-oov"n--4oo -,fn- 7i[If we assume only A{X 1 0 0} > 0, c_~ (jX11) < oo and €(X 1 ) = 0, then wehave c`(IS n 1) > C .,fn- for some constant C and all n ; this is known asHornich's **in**equality .] [Hu rr : In case a 2 = oo, if lim n (-r(1 S,/J) < oo, thenthere exists Ink} such that S n / n k converges **in** distribution ; use an extensionof Exercise 1 to show I f (t//,i)I2n -* 0 . This is due to P. Matthews .]3. Let GJ'{X = k} = Pk, 1 < k < f < oc, Ek=1 pk = 1 . The sum S n ofn **in**dependent r .v .'s hav**in**g the same distribution as X is said to have amult**in**omial distribution . Def**in**e it explicitly . Prove that [Sn - c``(S n )]/a(S n )converges to 4) **in** distribution as n -± oc, provided that a(X) > 0 .*4. Let X n have the b**in**omial distribution with parameter (n, p n ), andsuppose that n p n --+ a, > 0 . Prove that X n converges **in** dist . to the Poisson d .f.with parameter A . (In the old days this was called the law of small numbers .)5. Let X A have the Poisson distribution with parameter A . Prove thatconverges **in** dist . to' as k --* oo .*6. Prove that **in** Theorem 6 .4 .4, S n /6 .,fn- does not converge **in** probability. [HINT : Consider S,/a/ and S tn /o 2n .]7. Let f be the ch .f. of the d .f. F . Suppose that as t ---> 0,f (t) - 1 = O(ItI « )where 0 < a < 2, then as A --> 00,IxI >AdF(x) = O(A -") .[HINT : Integrate f xI>A (1 - cos tx) dF(x) < Ct" over t **in** (0, A) .]8. If 0 < a < 1 and f IxI" dF(x) < oc, then f (t) - 1 = o(jtI') as t -+ 0 .For 1 < a < 2 the same result is true under the additional assumption thatf xdF(x) = 0 . [HIT : The case 1 < a < 2 is harder . Consider the real andimag**in**ary parts of f (t) - 1 separately and write the latter asIXI < E/ls**in** tx dF(x) +Js**in** tx dF(x) .IXI>E 1 ItI

182 1 CHARACTERISTIC FUNCTIONThe second is bounded by (Its/E)« f i, 1, 11 , 1 Jxj « dF(x) = o(ltl«) for fixed c . Inthe first **in**tegral use s**in** tx = tx + O(ItxI 3 ),IXIE/ItixdF(x),IXI 0, 0 < a < 2, is a ch .f. (Theorem 6 .5 .4below) . Let {Xj , j > 11 be **in**dependent and identically distributed r .v .'s witha common ch .f . of the form1 - $jti« + o0ti«)as t -a 0. Determ**in**e the constants b and 9 so that the ch .f. of S, /bne convergesto e -ltl°10. Suppose F satisfies the condition that for every q > 0 such that asA --± oo,fX>AdF(x) = O(e-"A ) .Then all moments of F are f**in**ite, and condition (6) **in** Theorem 6 .4 .5 is satisfied .11 . Let X and Y be **in**dependent with the common d .f . F of mean 0 andvariance 1 . Suppose that (X + Y)1,12- also has the d .f . F . Then F =- 4) . [HINT :Imitate Theorem 6 .4 .5 .]*12 . Let {X j , j > 11 be **in**dependent, identically distributed r .v .'s withmean 0 and variance 1 . Prove that both11 n,ln-j=1EXjj=1andnconverge **in** dist . t o (D . [H**in**t : Use the law of large numbers .]13 . The converse part of Theorem 6 .4 .1 is false for an odd k. Example .F is a discrete symmetric d .f. with mass C/n 2 log n for **in**tegers n > 3, whereO is the appropriate constant, and k = 1 . [HINT : It is well known that the seriesconverges uniformly **in** t .]ns**in** ntn log n

6 .4 SIMPLE APPLICATIONS 1 183We end this section with a discussion of a special class of distributionsand their ch .f .'s that is of both theoretical and practical importance .A distribution or p .m . on 31 is said to be of the lattice type iff itssupport is **in** an arithmetical progression -that is, it is completely atomic(i .e ., discrete) with all the atoms located at po**in**ts of the form {a + jd }, wherea is real, d > 0, and j ranges over a certa**in** nonempty set of **in**tegers . Thecorrespond**in**g lattice d.f. is of form :F(x) =00j= -00where p j > 0 and p j = 1 . Its ch .f. isa+jd (x),(12) f (t) = e00ait ejditj=-00which is an absolutely convergent Fourier series . Note that the degenerate d .f .Sa with ch .f. e' is a particular case . We have the follow**in**g characterization .Theorem 6 .4.7 . A ch .f. is that of a lattice distribution if and only if thereexists a t o :A 0 such that I f (t o ) I = 1 .I PROOF . The "only if" part is trivial, s**in**ce it is obvious from (12) that fis periodic of period 2rr/d . To prove the "if" part, let f (t o ) = e i9°, where 00is real; then we have1 = e-i00 f(to)= I e i(t°X-00) g (dx )and consequently, tak**in**g real parts and transpos**in**g :(13) 0 = f[1 - cos(tox - 00)]g(dx) .The **in**tegrand is positive everywhere and vanishes if and only if for some**in**teger j,BOx +>C to )It follows that the support of g must be conta**in**ed **in** the set of x of this form**in** order that equation (13) may hold, for the **in**tegral of a strictly positivefunction over a set of strictly positive measure is strictly positive . The theoremis therefore proved, with a = 00/t0 and d = 27r/to **in** the def**in**ition of a latticedistribution .

184 1 CHARACTERISTIC FUNCTIONIt should be observed that neither "a" nor "d" is uniquely determ**in**edabove ; for if we take, e .g ., a' = a + d' and d' a divisor of d, then the supportof z is also conta**in**ed **in** the arithmetical progression {a' + jd'} . However,unless ,u is degenerate, there is a unique maximum d, which is called the"span" of the lattice distribution . The follow**in**g corollary is easy .Corollary . Unless I f I - 1, there is a smallest to > 0 such that f (to) = 1 .The span is then 27r/to .Of particular **in**terest is the "**in**teger lattice" when a = 0, d = 1 ; that is,when the support of ,u is a set of **in**tegers at least two of which differ by 1 . Wehave seen many examples of these . The ch .f. f of such an r.v . X has period27r, and the follow**in**g simple **in**version formula holds, for each **in**teger j :I7r(14) 9'(X = j) = Pj = - ~ f (t)e-Pu dt,- 2 ,7where the range of **in**tegration may be replaced by any **in**terval of length2,7. This, of course, is noth**in**g but the well-known formula for the "Fouriercoefficients" of f . If {Xk } is a sequence of **in**dependent, identically distributedr .v .'s with the ch .f. f, then S„ Xk has the ch .f. (f)', and the **in**versionformula above yields :(15) ~'(S" = j) = 2n[f (t)]"edt .-nThis may be used to advantage to obta**in** estimates for S, (see Exercises 24to 26 below) .EXERCISESf or f, is a ch .f. below .14 . If I f (t) I = 1, 1 f (t')I = 1 and t/t' is an irrational number, then f isdegenerate . If for a sequence {t k } of nonvanish**in**g constants tend**in**g to 0 wehave I f (0 1 = 1, then f is degenerate .*15 . If l f „ (t) l -)~ 1 for every t as n --* oo, and F" is the d .f. correspond**in**gto f ", then there exist constants a,, such that F" (x + a" )48o . [HINT :Symmetrize and take a,, to be a median of F" .]* 16 . Suppose b,, > 0 and I f (b" t) I converges everywhere to a ch .f . thatis not identically 1, then b" converges to a f**in**ite and strictly positive limit .[HINT: Show that it is impossible that a subsequence of b" converges to 0 orto +oo, or that two subsequences converge to different f**in**ite limits .]* 17 . Suppose c" is real and that e`'^`` converges to a limit for every t**in** a set of strictly positive Lebesgue measure . Then c" converges to a f**in**ite

6 .4 SIMPLE APPLICATIONS 1 185limit . [HINT : Proceed as **in** Exercise 16, and **in**tegrate over t . Beware of anyargument us**in**g "logarithms", as given **in** some textbooks, but see Exercise 12of Sec . 7 .6 later .]* 18 . Let f and g be two nondegenerate ch .f .'s . Suppose that there existreal constants a„ and bn > 0 such that for every t :.f n (t) -* f (t) and eltan /b„ fn t -* g(t) .b nThen a n -- a, bn --* b, where a is f**in**ite, 0 < b < oo, and g(t) = eitalb f(t/b) .[HINT : Use Exercises 16 and 17 .]*19. Reformulate Exercise 18 **in** terms of d .f.'s and deduce the follow**in**gconsequence . Let F n be a sequence of d .f.'s a n , a'n real constants, b n > 0,b;, > 0 . IfFn (bn x + a n ) -v+ F(x)andF n (bnx + an ) -4 F(x),where F is a nondegenerate d .f., thenbn -~ 1 and an an ~ 0 .nb n[Two d .f.'s F and G such that G(x) = F(bx + a) for every x, where b > 0and a is real, are said to be of the same "type" . The two preced**in**g exercisesdeal with the convergence of types .]20 . Show by us**in**g (14) that I cos tj is not a ch .f. Thus the modulus of ach .f. need not be a ch .f ., although the squared modulus always is .21 . The span of an **in**teger lattice distribution is the greatest commondivisor of the set of all differences between po**in**ts of jump .22 . Let f (s, t) be the ch .f. of a 2-dimensional p .m . v . If I f (so , to) I = 1for some (so, t o ) :0 (0, 0), what can one say about the support of v?*23 . If {Xn } is a sequence of **in**dependent and identically distributed r .v .'s,then there does not exist a sequence of constants {c n } such that En (X„ - c„ )converges a .e ., unless the common d .f. is degenerate .In Exercises 24 to 26, let S n Xj , where the X's are **in**dependent r .v .'swith a common d .f. F of the **in**teger lattice type with span 1, and tak**in**g both>0 and

186 1 CHARACTERISTIC FUNCTION25 . If F 0 60, then there exists a constant A such that for every j :5{S,-1 /2= j} < An .[HINT : Use a special case of Exercise 27 below .]26 . If F is symmetric and f Ix I dF(x) < oo, thenn-13111S, = j } --+ oc .[HINT : 1 -f(t) =o(ItI) as t-+ 0 .127. If f is any nondegenerate ch . f, then there exist constants A > 0and 8 > 0 such thatf (t)I < 1 -At 2 for I t I < 8 .[HINT: Reduce to the case where the d .f. has zero mean and f**in**ite variance bytranslat**in**g and truncat**in**g .]28 . Let Q,, be the concentration function of S, = En j-1 X j , where theXD 's are **in**dependent r.v.'s hav**in**g a common nondegenerate d .f . F . Then forevery h > 0,Qn (h) < An -1 / 2[HINT: Use Exercise 27 above and Exercise 16 of Sec . 6 .1 . This result is dueto Levy and Doebl**in**, but the proof is due to Rosen .]In Exercises 29 to 35, it or Ak is a p.m . on u41 = (0, 1] .29 . Def**in**e for each n :fu(n) = fe2mnx A(dx) .9rProve by Weierstrass's approximation theorem (by trigonometrical polynomials)that if f A , (n) = f ,,, (n) for every n > 1, then µ 1 - µ2 . The conclusionbecomes false if =?,l is replaced by [0, 1] .30 . Establish the **in**version formula express**in**g µ **in** terms of the f µ (n )'s .Deduce aga**in** the uniqueness result **in** Exercise 29 . [HINT : F**in**d the Fourierseries of the **in**dicator function of an **in**terval conta**in**ed **in** W .]31 . Prove that I f µ (n) I = 1 if and only if . has its support **in** the set{eo + jn -1 , 0 < j < n - 1} for some 00 **in** (0, n -1 ] .*32. ,u, is equidistributed on the set {jn -1 , 0 < j < n - 1 } if and only iff A(i) = 0 or 1 accord**in**g to j t n or j I n .* 33 . itk- v µ if and only if f :-, k ( ) -,, f ( ) everywhere .34 . Suppose that the space 9/ is replaced by its closure [0, 1 ] and thetwo po**in**ts 0 and I are identified ; **in** other words, suppose 9/ is regarded as the

6 .5 REPRESENTATIVE THEOREMS 1 187circumference of a circle ; then we can improve the result **in** Exercise 33as follows . If there exists a function g on the **in**tegers such that f , , A ( .) -+ g( .)eve where, then there exists a p.m . µ on such that g = f µ and µk -µon 'l .35. Random variables def**in**ed on are also referred to as def**in**ed"modulo 1" . The theory of addition of **in**dependent r .v .'s on 0 is somewhatsimpler than on 0, as exemplified by the follow**in**g theorem . Let {X j , j > 1}be **in**dependent and identically distributed r .v .'s on 0 and let Sk = Ek j=1 Xj .Then there are only two possibilities for the asymptotic distributions of Sk .Either there exist a constant c and an **in**teger n > I such that Sk - kc converges**in** dist . t o the equidistribution on {jn -1 , 0 < j < n - 1} ; or Sk converges **in**dist . to the uniform distribution on [HINT : Consider the possible limits of(f N, (n) )k as k -+ oo, for each n . ]6 .5 Representation theoremsA ch .f . is def**in**ed to be the Fourier- Stieltjes transform of a p .m . Can it be characterizedby some other properties? This question arises because frequentlysuch a function presents itself **in** a natural fashion while the underly**in**g measureis hidden and has to be recognized . Several answers are known, but thefollow**in**g characterization, due to Bochner and Herglotz, is most useful . Itplays a basic role **in** harmonic analysis and **in** the theory of second-orderstationary processes .A complex-valued function f def**in**ed on R1 is called positive def**in**ite ifffor any f**in**ite set of real numbers t j and complex numbers z j (with conjugatecomplex zj), 1 < j < n, we have(1)E T~ f (tj - tk)ZjZk ? 0 -i=1 k=1nnLet us deduce at once some elementary properties of such a function .Theorem 6 .5 .1 . If f is positive def**in**ite, then for each t E0 :f(-t) = f (t), I f (t)I 0 :(2)f (s - t)~(s)~(t) ds dt > 0 .

188 1 CHARACTERISTIC FUNCTIONPROOF . Tak**in**g n = 1, t 1 = 0, zl = 1 **in** (1), we see thatf(0)>0 .Tak**in**g n = 2, t1 = 0, t2 = t, ZI = Z2 = 1, we havechang**in**g z 2to i, we have2 f (0) + f (t) + f (-t) - 0 ;f (0 ) + f (t)i - f ( -t)i + f (0) > 0 .Hence f (t) + f (-t) is real and f (t) - f (-t) is pure imag**in**ary, which implythat f (t) = f (-t) . Chang**in**g zl to f (t) and z2 to -If (t)I, we obta**in**2f (0) If (t) 12 - 21f (t)1 3 > 0 .Hence f (0) > I f (t) 1, whether I f (t) I = 0 or > 0. Now, if f (0) = 0, thenf ( .) = 0 ; otherwise we may suppose that f (0) = 1. If we then take n = 3,t l = 0, t2 = t, t3 = t + h, a well-known result **in** positive def**in**ite quadraticforms implies that the determ**in**ant below must have a positive value :f (0) f (-t) f (-t - h)f (t) f (0) f (-h)f(t+h) f(h) f(0)It follows that= 1 - If (t)1 2 - I f (t + h) 1 2 - I f (h) 12 + 2R{f (t)f (h)f (t + h)} ? 0 .if (t)-f(t+h)1 2=1f(t)1 2 +if(t+h)1 2 -2R{f(t)f(t+h)}< 1 - I f (h) 1 2 + 2R{f (t)f (t + h)[f (h) - 11)< 1 - If (h) 1 2 + 211 - f (h)I < 411 - f (h)I .Thus the "modulus of cont**in**uity" at each t is bounded by twice the squareroot of that at 0; and so cont**in**uity at 0 implies uniform cont**in**uity everywhere. F**in**ally, the **in**tegrand of the double **in**tegral **in** (2) be**in**g cont**in**uous,the **in**tegral is the limit of Riemann sums, hence it is positive because thesesums are by (1) .Theorem 6 .5 .2 . f is a ch .f . if and only if it is positive def**in**ite and cont**in**uousat 0 with f (0) = 1 .Remark .It follows trivially that f is the Fourier-Stieltjes transformeltX v(dx)

l6 .5 REPRESENTATIVE THEOREMS 1 189of a f**in**ite measure v if and only if it is positive def**in**ite and f**in**ite cont**in**uousat 0 : then v(A 1 ) = f (0) .PROOF . If f is the ch .f. of the p .m . g, then we need only verify that it ispositive def**in**ite . This is immediate, s**in**ce the left member of (1) is thenxe"'zje` rkzkµ(dx) _ne`t) zjj=12,u (dx) > 0 .Conversely, if f is positive def**in**ite and cont**in**uous at 0, then byTheorem 6 .5 .1, (2) holds for ~(t) = e-"x . Thus(3)2 1T Tf f f (s - t)e-`(s-r)X ds dt > 0 .0 oby a change of variable and partial **in**tegra-Denote the left member by PT (x) ;tion, we have(4)TPT(x) = 2n f T Cl - ~tJ f (t)e-''x dt .Now observe that for a > 0,1 fa 1 a 2 s**in** Pt 2(1 - cos at)o dj3 -itX dx=- f d fi =a fPe a t ate(where at t = 0 the limit value is meant, similarly later) ; it follows thatwhere1aa f d~ PT (x) dx =0 7r T(5) fT(t) =-T I t 1- cos at1 - 1 f (t) ate dt-cos at dt. ff T(t) at271f °° ~ ~ 1 -t cos tf T zd t,7r1 - ItI ) f(t), if ItI < T ;C0, if I t I > T .Note that this is the product of f by a ch .f. (see Exercise 2 of Sec . 6 .2)and corresponds to the smooth**in**g of the would-be density which does notnecessarily exist .a

190 1 CHARACTERISTIC FUNCTIONS**in**ce I fT (t)I < I f (t) I < 1 by Theorem 6 .5 .1, and (1 - cost)/t 2 belongs toL 1 (-oc, oc), we have by dom**in**ated convergence :(6) Jim af o1~d,Bf PT1(x) dx = 7r- 1Here the second equation follows from the cont**in**uity of f T7r( 00 1 -at 0 withf T(0) = 1, and the third equation from formula (3) of Sec . 6 .2 . S**in**ce PT ? 0,the **in**tegral f ~PT (x)dx is **in**creas**in**g **in** ,B, hence the existence of the limit ofits "**in**tegral average" **in** (6) implies that of the pla**in** limit as f -+ oo, namely :limfT ( ṯ ~ cost dt00 a-* °° a t2f0 1 - costt 2 dt = 1 .(7)ffi00 6-+ 00 f8PT (x) dx = lim PT (x) dx = 1 .Therefore PT is a probability density functio n . Return**in**g to (3) and observ**in**gthat for real z :dx = 2 s**in** $(z - t)e'iXe-`txz-twe obta**in**, similarly to (4) :1a o d~ e PT (x)dx = ~~ fT(t)7r - a(zfco- t)2a J_ 18IrX1- 1 f00- cos a(z - t)TTC oo a t 2dtt 1 -costz-- dt .Note that the last two **in**tegrals are **in** reality over f**in**itea -* oc, we obta**in** by bounded convergence as before :**in**tervals . Lett**in**g(8)e'COO -00= fT(z),the **in**tegral on the left exist**in**g by (7) . S**in**ce equation (8) is valid for each r,we have proved that fT is the ch .f. of the density function PT . F**in**ally, s**in**cef T (z) --+ f (z) as T -~ oc for every z, and f is by hypothesis cont**in**uous atz = 0, Theorem 6 .3 .2 yields the desired conclusion that f is a ch .f.As a typical application of the preced**in**g theorem, consider a family of(real-valued) r .v .'s {X 1 , t E +}, where °X+ _ [0, oc), satisfy**in**g the follow**in**gconditions, for every s and t **in** :W+ :

6 .5 REPRESENTATIVE THEOREMS 1 191(i) f' (X 2 ) = 1 ;(ii) there exists a function r( .) on R1 such that $ (X SX t ) = r(s - t) ;(iii) limo cF((X 0 -Xt ) 2 ) = 0 .A family satisfy**in**g (i) and (ii) is called a second-order stationary process orstationary process **in** the wide sense ; and condition (iii) means that the processis cont**in**uous **in** L2(Q, 3~7, ' q) .For every f**in**ite set of t j and z j as **in** the def**in**itions of positive def**in**iteness,we haveThus rn ntj Zjj=1 k=1_}is a positive def**in**ite function . Next, we havee(X tj X tk )ZjZk = tj - tk)ZjZkk=11r(0) - r(t) = 9(Xo (Xo - X t )),hence by the Cauchy-Schwarz **in**equality,~r(t) - r(0)1 2 < G (X 2 ) g ((Xo - Xt) 2 ) .It follows that r is cont**in**uous at 0, with r(O) = 1 . Hence, by Theorem 6 .5 .2,r is the ch .f. of a uniquely determ**in**ed p .m . R :r(t) = Ie` txR(dx) .This R is called the spectral distribution of the process and is essential **in** itsfurther analysis .Theorem 6 .5.2 is not practical **in** recogniz**in**g special ch .f.'s or **in**construct**in**g them . In contrast, there is a sufficient condition due to Polyathat is very easy to apply .Theorem 6 .5 .3 . Let f on %, . 1 satisfy the follow**in**g conditions, for each t :(9) f (0) = 1, f (t) > 0, f (t) = f (-t),f is decreas**in**g and cont**in**uous convex **in** R+ _ [0, oc) . Then f is a ch .f.PROOF .Without loss of generality we may suppose thatf (oc) = lim f (t) = 0 ;t +00otherwise we consider [f (t) - f (oc)]/[ f (0) - f (oc)] unless f (00) = 1, **in**which case f - 1 . It is well known that a convex function f has right-hand

192 1 CHARACTERISTIC FUNCTIONand left-hand derivatives everywhere that are equal except on a countable set,that f is the **in**tegral of either one of them, say the right-hand one, which willbe denoted simply by f', and that f is **in**creas**in**g . Under the conditions ofthe theorem it is also clear that f is decreas**in**g and f is negative **in** R+ .Now consider the f T as def**in**ed **in** (5) above, and observe that-fT'(t) - = -(1-T f'(t)+ f(t), if 0

6 .5 REPRESENTATIVE THEOREMS I 193works for 0 < a < 2 . Consider the density functionp(x) =and compute its ch .f. f as follows, us**in**gaif lxl > 1,2lxla+i0 if lxl

194 1 CHARACTERISTIC FUNCTIONthe function f T to be f **in** [-T, T], l**in**ear **in** [-T', -T] and **in** [T, T'], andzero **in** (-oo, -T') and (T', oo) . Clearly f T also satisfies the conditions ofTheorem 6 .5 .3 and so is a ch .f . Furthermore, f = fT **in** [-T, T] . We havethus established the follow**in**g theorem and shown **in**deed how abundant thedesired examples are .Theorem 6 .5.5 . There exist two dist**in**ct ch .f.'s that co**in**cide **in** an **in**tervalconta**in****in**g the orig**in** .That the **in**terval of co**in**cidence can be "arbitrarily large" is, of course,trivial, for if f 1 = f 2 **in** [-8, 8], then gl = $2 **in** [-n8, n8], whereg1(t)=f1Cn , g2(t)=f2Cn/.Corollary . There exist three ch .f.'s, f 1, f 2, f 3, such that f 1 f 3- f 2 f 3 butfl # f2 .To see this, take f I and f 2 as **in** the theorem, and take f 3 to be anych .f. vanish**in**g outside their **in**terval of co**in**cidence, such as the one describedabove for a sufficiently small value of T . This result shows that **in** the algebraof ch .f.'s, the cancellation law does not hold .We end this section by another special but useful way of construct**in**gch .f .'s .Theorem 6 .5 .6 . If f is a ch .f., then so is e" ( f -1) for each A > 0 .PROOF . For each A > 0, as soon as the **in**teger n > A, the function1--+~'f=1+k(f-1)n nnis a ch .f., hence so is its nth power ; see propositions (iv) and (v) of Sec .As n -* oc,1 + ), (f -1)n --+ eA(f-1)nand the limit is clearly cont**in**uous . Hence it is a ch .f. by Theorem 6 .3 .2 .Later we shall see that this class of ch .f.'s is also part of the **in**f**in**itelydivisible family . For f (t) = e", the correspond**in**ge;Je,_00 e -,lAne itnn=0is the ch .f . of the Poisson distribution which should be familiar to the reader .n !

6 .5 REPRESENTATIVE THEOREMS I 195EXERCISES1 . If f is cont**in**uous **in** :W' and satisfies (3) for each x E and eachT > 0, then f is positive def**in**ite .2. Show that the follow**in**g functions are ch .f.'s :1 _ 1 - Itl", if Iti < 1 ;1 -+ Itl'f (t) 0, if Itl > 1 ; 0 < a < l,f(t) =1- Itl, if 0 < Itl < i4iif Iti > z .3 . If {X,} are **in**dependent r .v.'s with the same stable distribution ofexponent a, then Ek=l X k /n 'I' has the same distribution . [This is the orig**in**of the name "stable" .]4 . If F is a symmetric stable distribution of exponent a, 0 < a < 2, thenf 00 Ix J' dF(x) < oo for r < a and = oo for r > a . [HINT : Use Exercises 7and 8 of Sec . 6 .4 .]*5. Another proof of Theorem 6 .5 .3 is as follows . Show thatand def**in**e the d .f.Next show thatG on R+ by1000tdf'(t) = 1G(u) = ft d f'(t) .[0 u](1 - Itl dG(u) = f (t) .JoUHence if we setf(u,t)=(1_u) v0u(see Exercise 2 of Sec . 6 .2), thenf (t) = f[O,oc)f (u, t) dG(u) .Now apply Exercise 2 of Sec . 6 .1 .6 . Show that there is a ch .f. that has period 2m, m an **in**teger > 1, andthat is equal to 1 - Itl **in** [-1, +1] . [HINT : Compute the Fourier series of sucha function to show that the coefficients are positive .]

196 1 CHARACTERISTIC FUNCTION7. Construct a ch .f . that vanishes **in** [-b, -a] and [a, b], where 0 < a

6 .6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 1 197The condition (2) is to be contrasted with the follow**in**g identify **in** onevariable :Vt : fX+Y (t) = fX (Of Y (t),where fx+Y is the ch .f . of X + Y . This holds if X and Y are **in**dependent, butthe converse is false (see Exercise 14 of Sec . 6 .1) .PROOF OF THEOREM 6 .6 .1 . If X and Y are **in**dependent, then so are e isx ande"Y for every s and t, hence(~(e i (sx+tY)) = cf(e'sX , eitY) = e(eisx) (t(e itY ) ,which is (2) . Conversely, consider the 2-dimensional product measure µ i x µ2,where ILl and µ2 are the 1-dimensional p .m .'s of X and Y, respectively, andthe product is def**in**ed **in** Sec . 3 .3 . Its ch .f. is given by def**in**ition as(sXy)xffei+t (ILi A2)(dx, dy) = ffe isxg?2 °Rze `t) /L l ( dx)IL2 (dy)= f eisXµi(dx) f eity 92(dy) = fx(s)f y(t)(Fub**in**i's theorem!) . If (2) is true, then this is the same as f (x, Y) (s, t), so thatµ l x µ2 has the same ch .f. as µ, the p .m . of (X, Y) . Hence, by the uniquenesstheorem mentioned above, we have I,c l x i 2 = / t . This is equivalent to the**in**dependence of X and Y .The multidimensional analogues of the convergence theorem, and someof its applications such as the weak law of large numbers and central limittheorem, are all valid without new difficulties . Even the characterization ofBochner has an easy extension . However, these topics are better pursued withcerta**in** objectives **in** view, such as mathematical statistics, Gaussian processes,and spectral analysis, that are beyond the scope of this book, and so we shallnot enter **in**to them .We shall, however, give a short **in**troduction to an allied notion, that ofthe Laplace transform, which is sometimes more expedient or appropriate thanthe ch .f., and is a basic tool **in** the theory of Markov processes .Let X be a positive (>O) r .v . hav**in**g the d.f. F so that F has support **in**[0, oc), namely F(0-) = 0 . The Laplace transform of X or F is the functionF on [0, oc) given by(3) F(X) = -P(e - ;'X ) = f e - ~`X dF(x) .[o,oo)It is obvious (why?) thatF(0) = hm F(A) = 1, F(oo) = lim F(A) = F(0) .x J,ox-+oo

198 1 CHARACTERISTIC FUNCTIONMore generally, we can def**in**e the Laplace transform of an s .d .f. or a functionG of bounded variation satisfy**in**g certa**in** "growth condition" at **in**f**in**ity . Inparticular, if F is an s .d .f., the Laplace transform of its **in**def**in**ite **in**tegralis f**in**ite for ~ . > 0 and given byXG(x) = / F(u) du0G(~ ) _ e--xF(x) dx = fe-~ dxf/.c(d y)4[O,oo) [O,oo) [O,x]_fa(dy) fe-Xx dx = - f e_ ~,cc(dy)[O, ) ) [O oo) Awhereµ is the s .p.m . of F . The calculation above, based on Fub**in**i's theorem,replaces a familiar "**in**tegration by parts" . However, the reader should bewareof the latter operation . For **in**stance, accord**in**g to the usual def**in**ition, asgiven **in** Rud**in** [1] (and several other standard textbooks!), the value of theRiemann-Stieltjes **in**tegralis 0 rather than 1, but1000e-~'x d6o(x)or e-~ d8o (x) = Jim 80 (x)e-; x IIE + f00AToc80 (x)Ae - ~ dxis correct only if the left member is taken **in** the Lebesgue-Stieltjes sense, asis always done **in** this book .There are obvious analogues of propositions (i) to (v) of Sec . 6 .1 .However, the **in**version formula requir**in**g complex **in**tegration will be omittedand the uniqueness theorem, due to Lerch, will be proved by a different method(cf. Exercise 12 of Sec . 6 .2) .Theorem 6 .6 .2 . Let F j be the Laplace transform of the d .f . F j with support**in** °+,j=1,2 .If F1 = F2, then F1=F2 .PROOF . We shall apply the Stone-Weierstrass theorem to the algebragenerated by the family of functions {e - ~` X, A > 0}, def**in**ed on the closedpositive real l**in**e : 7+ = [0, oo], namely the one po**in**t compactification ofM,+ = [0, oo) . A cont**in**uous function of x on R + is one that is cont**in**uous**in** ~+ and has a f**in**ite limit as x --* oo . This family separates po**in**ts on A' +and vanishes at no po**in**t of ~f? + (at the po**in**t +oo, the member e-0x = 1 of thefamily does not vanish!) . Hence the set of polynomials **in** the members of the

6 .6 MULTIDIMENSIONAL CASE ; LAPLACE TRANSFORMS I 199family, namely the algebra generated by it, is dense **in** the uniform topology,**in** the space C B 04 + ) of bounded cont**in**uous functions on o + . That is to say,given any g E CB(:4), and e > 0, there exists a polynomial of the formng, (x) = cje _ ;, ' X ,j=1where c j are real and A j > 0, such thatsup 1g(X) - gE(x)I < E .XE~+Consequently, we havef Ig(x) - gE(x)I dF j (x) < E, j = 1, 2 .By hypothesis, we have for each a . > 0 :fedFi(x)-"x= fedF2(x)-~-- ,and consequently,f gE (x)dF1(x) = fg€ (x)dF2(x) .It now follows, as **in** the proof of Theorem 4 .4 .1, first thatf g(x) dF1(x) = f g(x) dF2(x)for each g E CB (J2+ ) ; second, that this also holds for each g that is the**in**dicator of an **in**terval **in** R+ (even one of the form (a, oc] ) ; third, thatthe two p .m .'s **in**duced by F 1 and F2 are identical, and f**in**ally that F 1 = F 2as asserted .Remark . Although it is not necessary here, we may also extend thedoma**in** of def**in**ition of a d .f. F to 7+ ; thus F(oc) = 1, but with the newmean**in**g that F(oc) is actually the value of F at the po**in**t oo, rather thana notation for limX,,,,F(x) as previously def**in**ed . F is thus cont**in**uous atoc . In terms of the p.m . µ, this means we extend its doma**in** to ~+ but setit(fool) = 0 . On other occasions it may be useful to assign a strictly positivevalue to u({oo}) .Pass**in**g to the convergence theorem, we shall prove it **in** a form closestto Theorem 6 .3 .2 and refer to Exercise 4 below for improvements .Theorem 6 .6 .3 . Let IF,, 1 < n < oo} be a sequence of s .d .f.'s with supports**in** ?,o+ and IF, } the correspond**in**g Laplace transforms . Then F n4F,,,,,, whereF,,, is a d .f., if and only if :

200 1 CHARACTERISTIC FUNCTION(a) lim, , ~ oc F„ (,~) exists for every k . > 0 ;(b) the limit function has the limit 1 as k ., 0 .Remark . The limit **in** (a) exists at k = 0 and equals 1 if the F n 's ared .f.'s, but even so (b) is not guaranteed, as the example F n = 6, shows .PROOF . The "only if" part follows at once from Theorem 4 .4.1 and theremark that the Laplace transform of an s .d .f. is a cont**in**uous function **in** R+ .Conversely, suppose lim F,, (a,) = GQ.), A > 0 ; extended G to M+ by sett**in**gG(0) = 1 so that G is cont**in**uous **in** R+ by hypothesis (b) . As **in** the proofof Theorem 6 .3 .2, consider any vaguely convergent subsequence Ffk with thevague limit F,,,, necessarily an s .d .f. (see Sec . 4 .3) . S**in**ce for each A > 0,e-XX E C0, Theorem 4 .4 .1 applies to yield P,, (A) -± F,(A) for A > 0, whereF,, is the Laplace transform of F, Thus F,, (A) = GQ.) for A > 0, andconsequently for ~. > 0 by cont**in**uity of F~ and G at A = 0 . Hence everyvague limit has the same Laplace transform, and therefore is the same F~ byTheorem 6 .6 .2. It follows by Theorem 4 .3 .4 that F n ~ F 00 . F**in**ally, we haveFoo (oo) = Fc (0) = G(0) = 1, prov**in**g that F. is a d .f .There is a useful characterization, due to S . Bernste**in**, of Laplace transformsof measures on R+ . A function is called completely monotonic **in** an**in**terval (f**in**ite or **in**f**in**ite, of any k**in**d) iff it has derivatives of all orders theresatisfy**in**g the condition :(4)(- 1) n f (n) (A) > 0for each n > 0 and each a. **in** the doma**in** of def**in**ition .Theorem 6 .6 .4 . A function f on (0, oo) is the Laplace transform of a d .f. F :e-~X(5) f (.) =dF(x),1?+if and only if it is completely monotonic **in** (0, oo) with f (0+) = 1 .Remark . We then extend f to M+ by sett**in**g f (0) = 1, and (5) willhold for A > 0 .PROOF.The "only if" part is immediate, s**in**cef (n) ('~) = f (-x) n e -ax dF(x) .'Turn**in**g to the "if" part, let us first prove that f is quasi-analytic **in** (0, oo),namely it has a convergent Taylor series there . Let 0 < Ao < A < it, then, by

6 .6 MULTIDIMENSIONAL CASE ; LAPLACE TRANSFORMS 1 201Taylor's theorem, with the rema**in**der term **in** the **in**tegral form, we have(6) f (~) = (~ - ~)k-I f(j)(µ)Jj=0 J '+ (~- µ)k 1 (1 - t)kIf (k) (µ + (~ - µ)t) dt .(k-1)! 0Because of (4), the last term **in** (6) is positive and does not exceed~- k 1( µ)I (1 - t)k-1f(1)(µ + (Xo - µ)t)dt .(k - 1) ! oFor if k is even, then f (k) 4, and (~ - µ) k > 0, while if k is odd then f (k) Tand (~ - µ) k < 0 . Now, by (6) with ). replaced by ) . o , the last expression isequal to(7);~4 _µ - µ/k-iµ)k- f(j)f (4)(4 - A) Jj=0 J •F, (x) =[nx]n (-1)J f (j) (n) .j=0 j!~_µ(X0 -µkf (, 0 ),where the **in**equality is trivial, s**in**ce each term **in** the sum on the left is positiveby (4) . Therefore, as k -a oc, the rema**in**der term **in** (6) tends to zero and theTaylor series for f (A) converges .Now for each n > 1, def**in**e the discrete s .d .f. F n by the formula :This is **in**deed an s .d .f., s**in**ce for each c > 0 and k > 1 we have from (6) :k-1 f(j)()T,j I1 = f (0+) > f (E) > n (E - n )Jj=0Lett**in**g E ~ 0 and then k T oo, we see that F,, (oc) < 1 . The Laplace transformof F„ is pla**in**ly, for X > 0 :\°O`Je - 'xdF 7 ,(x) = -~(jl~i) ( '~) f(j)(n )Lej=0`~~ - .l/») - n) J f (J) (n) = f(n (1 - e -Zl11 )),=L li (n(1 - ej=0the last equation from the Taylor series . Lett**in**g n --a oc, we obta**in** for thelimit of the last term f (A), s**in**ce f is cont**in**uous at each A . It follows from

2 02 ( CHARACTERISTIC FUNCTIONTheorem 6 .6 .3 that IF, } converges vaguely, say to F, and that the Laplacetransform of F is f . Hence F(oo) = f (0) = 1, and F is a d .f. The theoremis proved .EXERCISES1 . If X and Y are **in**dependent r .v .'s with normal d .f.'s of the samevariance, then X + Y and X - Y are **in**dependent .*2 . Two uncorrelated r .v .'s with a jo**in**t normal d .f. of arbitrary parametersare **in**dependent . Extend this to any f**in**ite number of r.v .'s .*3. Let F and G be s .d .f.'s . If a. 0 > 0 andPQ.) = G(a,) for all A > A0, thenF = G . More generally, if F(nA0) = G(n .10) for **in**teger n > 1, then F = G .[HINT : In order to apply the Stone-Weierstrass theorem as cited, adjo**in** theconstant 1 to the family {e - ~` -x, .k > A0} ; show that a function **in** Co can actuallybe uniformly approximated without a constant .]*4. Let {F„ } be s .d .f.'s . If A0 > 0 and limnw Fn ( ;,) exists for all ~ . > a.0,then IF,J converges vaguely to an s .d .f .5. Use Exercise 3 to prove that for any d .f. whose support is **in** a f**in**ite**in**terval, the moment problem is determ**in**ate .6 . Let F be an s .d .f . with support **in** R+ . Def**in**e G 0 = F,Gn (x) = IXGn-1 (u) dufor n > 1 . F**in**d G, (A) **in** terms of F (A. ) .7 . Let f (A) = f°° e`` f (x) dx where f E L 1 (0, oo) . Suppose that f hasa f**in**ite right-hand derivative f'(0) at the orig**in**, thenf (0) =lim J,f-00f'(0)= ~. 00A[Af (A) - f (0)] . )*8 . In the notation of Exercise 7, prove that for every a,, µ E M+:(µ - )0 C,*, 00Ce -(~s+ur) f(s+t)fdsdt9. Given a function a on W + that is f**in**ite, cont**in**uous, and decreas**in**gto zero at **in**f**in**ity, f**in**d a a-f**in**ite measure µ on A'+ such that`dt > 0 : 1 6(t - s)~i(ds) = 1 .[0,t][HINT : Assume a(0) = 1 and consider the d .f. 1 - a .]

6 .6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 1 203*10 . If f > 0 on (0, oo) and has a derivative f' that is completely monotonicthere, then 1/ f is also completely monotonic .11 . If f is completely monotonic **in** (0, oo) with f (0+) = + oo, then fis the Laplace transform of an **in**f**in**ite measure /.t on ,W+ :f (a.) = f e- ;`Xµ(dx) .[HIwr : Show that F n (x) < e26 f (B) for each 8 > 0 and all large n, where F,is def**in**ed **in** (7) . Alternatively, apply Theorem 6 .6 .4 to f (A. + n - ')/ f (n -1 )for A > 0 and use Exercise 3 .]12. Let {g,1 , 1 < n < oo} on M+ satisfy the conditions : (i) for eachn, g,, ( .) is positive and decreas**in**g ; (ii) g00 (x) is cont**in**uous ; (iii) for eachA>0,0lim f e-kxgn (x) dx = fn-+ooo oe_ ;,Xg00 (x) dx.Thenlim gn (x) = g. (x) for every x E + .n-+co[HINT : For E > 0 consider the sequence fO° e - EX gn (x) dx and show thatbb~~li o f e -EX g„ (x) dx = e-EXg 00 (x) dx, `l gn (b) < goo (b - ),and so on .]aa00Formally, the Fourier transform f and the Laplace transform F of a p.m .with support **in** T+ can be obta**in**ed from each other by the substitution t = iAor A = -it **in** (1) of Sec . 6 .1 or (3) of Sec . 6 .6 . In practice, a certa**in** expressionmay be derived for one of these transforms that is valid only for the pert**in**entrange, and the question arises as to its validity for the other range . Interest**in**gcases of this will be encountered **in** Secs . 8 .4 and 8 .5. The follow**in**g theoremis generally adequate for the situation .Theorem 6 .6 .5 .The function h of the complex variable z given byh(z) = feu dF(x)is analytic **in** Rz < 0 and cont**in**uous **in** Rz < 0 . Suppose that g is anotherfunction of z that is analytic **in** Rz < 0 and cont**in**uous **in** Rz -< 0 such thatVIE 1 : h(it) = g(it) .

2 04 1 CHARACTERISTIC FUNCTIONThen h(z) - g(z) **in** Rz < 0 ; **in** particularV). E J??+ : h(-A) = gPROOF .For each **in**teger m > 1, the function hm def**in**ed by°° x nh .(Z) _ e'dF(x) = n dF(x)[O,m][O ,m ] n .n-0is clearly an entire function of z . We havee 7-x dF(x) < dF(x)fm,oo) f ,oo)**in** Rz < 0 ; hence the sequence hm converges uniformly there to h, and his cont**in**uous **in** Rz < 0 . It follows from a basic proposition **in** the theory ofanalytic functions that h is analytic **in** the **in**terior Rz < 0 . Next, the differenceh - g is analytic **in** Rz < 0, cont**in**uous is Rz < 0, and equal to zero on thel**in**e Rz = 0 by hypothesis . Hence, by Schwarz's reflection pr**in**ciple, it can beanalytically cont**in**ued across the l**in**e and over the whole complex plane . Theresult**in**g entire functions, be**in**g zero on a l**in**e, must be identically zero **in** theplane . In particular h - g - 0 **in** Rz

7Central limit theorem andits ramifications7.1 Liapounov's theoremThe name "central limit theorem" refers to a result that asserts the convergence**in** dist . of a "normed" sum of r .v .'s, (S 1, - a„)/b 11 , to the unit normal d .f. (D .We have already proved a neat, though special, version **in** Theorem 6 .4 .4 .Here we beg**in** by generaliz**in**g the set-up . If we write(1)we see that we are really deal**in**g with a double array, as follows . For eachn > 1 let there be k, 1 r .v .'s {X„ j , 1 < j < k11}, where kit --* oo as n - oo :X11, X12, . . . , X1k, ;(2)X21, X22, . . .,X2k2 ;. . . . . . . . . . . . . . . . .X ,11 ,X,12, . . . , X 11k,, ,

206 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSThe r.v .'s with n as first subscript will be referred to as be**in**g **in** the nth row .Let F, ,j be the d .f., f n j the ch .f. of X nj ; and put`(Xnj) = a n j,knS n= Sn,knThe particular case kn = n for each n yields a triangular array, and if, furthermore,X,, j = X j for every n, then it reduces to the **in**itial sections of a s**in**glesequence {X j , j > 1) .We shall assume from now on that the r.v.'s **in** each row **in** (2) are**in**dependent, but those **in** different rows may be arbitrarily dependent as **in**the case just mentioned -**in**deed they may be def**in**ed on different probabilityspaces without any relation to one another . Furthermore, let us **in**troduce thefollow**in**g notation for moments, whenever they are def**in**ed, f**in**ite or **in**f**in**ite :(- (Sn ) n j = an,(3) j=1j=1o (1i'n j) = Un j ,kno 2 (Sn) Un 2 2j =S n ,j=1(4)e` (IXnj1 3 )= Ynj,In the special case of (1), we haveIf we take b„ = s ,Xnj = X b j,nthenknj=1U 2 (Xn j) = 1 .By consider**in**g X,, j - a,, j **in**stead of Xnj , we may suppose(5) `dn, bj : a„j = 0whenever the means exist . The reduction (sometimes called "norm**in**g")lead**in**g to (4) and (5) is always available if each X nj has a f**in**ite secondmoment, and we shall assume this **in** the follow**in**g .In deal**in**g with the double array (2), it is essential to impose a hypothesisthat **in**dividual terms **in** the sumj=1Ynj .2 a 2 (Xj)6 (X n j)= b2nj=1

7.1 LIAPOUNOV'S THEOREM l 207are "negligible" **in** comparison with the sum itself . Historically, this arosefrom the assumption that "small errors" accumulate to cause probabilisticallypredictable random mass phenomena . We shall see later that such a hypothesisis **in**deed necessary **in** a reasonable criterion for the central limit theorem suchas Theorem 7 .2 .1 .In order to clarify the **in**tuitive notion of the negligibility, let us considerthe follow**in**g hierarchy of conditions, each to be satisfied for every e > 0 :(a) Vj : lim G~'{IXnjI > E} = 0 ;n-*o0(b) lim max vi'{ IX n j I >,e) = 0 ;11 +00 1< j E = 0 ;12 +00 1< j E} = 0 .n+o0j=1It is clear that (d) ~ (c) = (b) = (a) ; see Exercise 1 below . It turns out that(b) is the appropriate condition, which will be given a name .DEFINITION . The double array (2) is said to be holospoudic* iff (b) holds .Theorem 7 .1 .1 .holospoudic is :A necessary and sufficient condition for(2) to be(6)PROOF .Vt E 7 1 : lira max I f n j (t) - 11 =0 .n- *00 1< jE J IxI~E< 2dF n j (x) + It] Ix Ix I dFnj(x)IxJ>E Ix I Eand consequentlymaxif nj (t)- 11 < 2maxJ'{IXnjI > e}+Eltl .Jj* I am **in**debted to Professor M . Wigodsky for suggest**in**g this word, the only new term co**in**ed**in** this book .

208 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSLett**in**g n -~ oc, then e -- 0, we obta**in** (6) . Conversely, we have by the**in**equality (2) of Sec . 6 .3,J d F„ j(x) < 2 - 2 f f , ; (t)d tXI>EItI_2/E 0, where 9 is a (f**in**ite) complex number .()Then we havek„(7) 11(1 +0„ i ) -~ e 9 .j=1PROOF . By (i), there exists n o such that if n > n o , then I9„ j I < 2for allj, so that 1 + 9, j :A 0. We shall consider only such large values of n below,and we shall denote by log (1 + 0,, j ) the determ**in**ation of logarithm with anangle **in** (-7r, 7r] . Thus(8) log(1 + 0„ j ) = 9„ j + AI0', j 12 ,where A is a complex number depend**in**g on various variables but boundedby some absolute constant not depend**in**g on anyth**in**g, and is not necessarily

7.1 LIAPOUNOV'S THEOREM 1 209the same **in** each appearance . In the present case we have **in** factIlog(1 + 9nj) - OnjI =m0njG(2)so that the absolute constant mentioned above may be taken to be 1 . (Thereader will supply such a computation next time .) Hencem-2_ 10n .12 < 1,nj=1log(l+0 )=.+A(This A is not the same as before, but bounded by the same l!) . It followsfrom (ii) and (i) thatn(9) ( „ jI2 < max 10,1

210 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSCondition (i) is satisfied s**in**ce2max I9n jI 0 .1< jU 2 (X n j)1< j

(D7.1 LIAPOUNOV'S THEOREM 1 211If1,3 -0,Snthen Sn Is, converges **in** dist . to.This is obta**in**ed by sett**in**g Xn j = X j /sn . It should be noticed that the doublescheme gets rid of cumbersome fractions **in** proofs as well as **in** statements .We proceed to another proof of Liapounov's theorem for a s**in**glesequence by the method of L**in**deberg, who actually used it to prove hisversion of the central limit theorem, see Theorem 7 .2.1 below . In recent timesthis method has been developed **in**to a general tool by Trotter and Feller .The idea of L**in**deberg is to approximate the sum X1 + . . . + X, **in** (13)successively by replac**in**g one X at a time with a comparable normal (Gaussian)r .v. Y, as follows . Let {Y j, j > 1) be r .v.'s hav**in**g the normal distributionN(0, a ) ; thus Y j has the same mean and variance as the correspond**in**g X jabove ; let all the X's and Y's be totally **in**dependent . Now putZj=Y1+ . . .+Yj-1-f-Xj+1+ . . .+Xn, 1 < j

212 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSNote that the r .v .'s f (~), f'(4), and f"(t;) are bounded hence **in**tegrable . If ~is another r .v . **in**dependent of ~ and hav**in**g the same mean and variance as 77,and c` { l1 3 } < oc, we obta**in** by replac**in**g 77 with ~ **in** (16) and then tak**in**g thedifference :(17) IC `{f( +77)} - { f( +0}4 < 6 {171 3 +1 1 3 } .This key formula is applied to each term on the right side of (15), with~ = Z j/s„, i = X i /s„ , = Y j/s„ . The bounds on the right-hand side of (17)then add up tofy, + cadS3S 3where c = ,,f8/7r s**in**ce the absolute third moment of N(0, Q 2 ) is equal to ccr .By Liapounov's **in**equality (Sec . 3 .2) a < yj , so that the quantity **in** (18) is0(F„/s ; ) . Let us **in**troduce a unit normal r .v . N for convenience of notation,so that (Y 1 + + Y„ )/s„ may be replaced by N so far as its distribution isconcerned. We have thus obta**in**ed the follow**in**g estimate :(19) Vf E C 3 : c fsn ) } - S{ .f (N)} < O (k) .nConsequently, under the condition (14), this converges to zero as n - * oc . Itfollows by the general criterion for vague convergence **in** Theorem 6 .1 .6 thatSn IS, converges **in** distribution to the unit normal . This is Liapounov's form ofthe central limit theorem proved above by the method of ch .f .'s . L**in**deberg'sidea yields a by-product, due to P**in**sky, which will be proved under the sameassumptions as **in** Liapounov's theorem above .Theorem 7.1 .3 . Let {x„ } be a sequence of real numbers **in**creas**in**g to +ocbut subject to the growth condition : for some c > 0,2(20) logs~l + - ( 1 + E) -* -ocnas n --* oc . Then for this c, there exists N such that for all n > N we have2 2(21) exp - 2 (1 + 6) < .°~'{ S„ >_ x„s„} < exp -2(1 - E)PROOF . This is derived by choos**in**g particular test functions **in** (19) . Letf E C 3 be such thatf(x)=0 forx

7 .1 LIAPOUNOV'S THEOREM I 213and put for all x :fn(x)=f(x-xn-2), gn(x)=f(x-xn+2) .Thus we have, denot**in**g by IB the **in**dicator function of B C ~%l?' :I[xn +1,oo) :S f n (x) xn-1}+Or n3S nNow an elementary estimate yields for x -+ +oo :1 °° T{N > x}2 / dy1 x 22n f exp C2nx exp 2 ,(see Exercise 4 of Sec . 7 .4), and a quick computation shows further thatx2GJ'{N > x+ 1} =exp - 2 (1 +o(1)) ,Thus (24) may be written asXP{Sn > xnSn} =exp - (1+0(1)) + O 2 (S3"nSuppose n is so large that the o(1) above is strictly less thanvalue ; **in** order to conclude (23) it is sufficient to have[4 (1rnS3- = o exp +,G) ,nrnn --)~ oo .This is the sense of the condition (20), and the theorem is proved .E **in** absolute

214 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSRecall**in**g that s„ is the standard deviation of S,,, we call the probability**in** (21) that of a "large deviation" . Observe that the two estimates of thisprobability given there have a ratio equal to e EX n 2 which is large for each e,_ Xn `/2as x„ --+ -I-oo . Nonetheless, it is small relative to the pr**in**cipal factor eon the logarithmic scale, which is just to say that Ex 2 is small **in** absolutevalue compared with -x2 /2 . Thus the estimates **in** (21) is useful on such ascale, and may be applied **in** the proof of the law of the iterated logarithm **in**Sec . 7 .5 .EXERCISES*1 . Prove that for arbitrary r .v .'s {X„j} **in** the array (2), the implications(d) : (c) ==~ (b) = (a) are all strict . On the other hand, if the X„ j 's are**in**dependent **in** each row, then (d) _ (c) .2. For any sequence of r .v .'s {Y,}, if Y„/b, converges **in** dist . for an**in**creas**in**g sequence of constants {bn }, then Y, /bn converges **in** pr . to 0 ifbn = o(b', ) . In particular, make precise the follow**in**g statement : "The centrallimit theorem implies the weak law of large numbers ."3. For the double array (2), it is possible that S„ /b n converges **in** dist .for a sequence of strictly positive constants b„ tend**in**g to a f**in**ite limit . Is itstill possible if b„ oscillates between f**in**ite limits?4. Let {XJ ) be **in**dependent r .v .'s such that maxi 0,there exists a 8(E) such that F n < 8(E) = L(Fn , (D) < E, where L is Levydistance . Strengthen the conclusion to read :sup JF n (x) - c(x)j < E .xE2A* 6 . Prove the assertion made **in** Exercise 5 of Sec . 5 .3 us**in**g the methodsof this section . [HINT : use Exercise 4 of Sec . 4 .3 .]7 .2 L**in**deberg-Feller theoremWe can now state the L**in**deberg-Feller theorem for the double array (2) ofSec . 7 .1 (with **in**dependence **in** each row) .Theorem 7 .2 .1 . Assume ~L < oc for each n and j and the reductionhypotheses (4) and (5) of Sec . 7 .1 . In order that as n - oc the two conclusionsbelow both hold :

7 .2 LINDEBERG-FELLER THEOREM 1 215(i) S„ converges **in** dist . to (D,(ii) the double array (2) of Sec . 7.1 is holospoudic ;it is necessary and sufficient that for eachq > 0, we have(1)nx2 dF„ j (x) -+ 0 .fI=1 IXI>??The condition (1) is called L**in**deberg's condition ; it is manifestlyequivalent to(1') n f x 2 dF11j (x) ~ 1 .i=1IXI `'?PROOF . Sufficiency . By the argument that leads to Chebyshev's **in**equality,we have(2) J'{IXnjl > 17} < 2f x 2 dF,,j (x) .17IxI>nHence (ii) follows from (1) ; **in**deed even the stronger form of negligibility (d)**in** Sec . 7 .1 follows . Now for a fixed 17, 0 < 17 < 1, we truncate X,_j as follows :(3)X'__ X,1 j, if IX" ; N n(X",,)I < f IxI dF,l1(x) nand so by (1),~'(S ;,1 `k") I < - L f x 2 dF 1 , j (x) > 0 .j=1 IxI>rlNext we have by the Cauchy-Schwarz **in**equality( < XI f 2ZxdF n j(x) fIxI> ,~IXI>i1 dF„j(x)'

216 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSand consequentlya2 2(X"1) _ x2 dF ni (x) - C (X;,~ )2fix> f _f }x 2 dFn ~ (x) .I

7.2 LINDEBERG-FELLER THEOREM 1 217We apply the lemma to (1) as follows . For each m > 1, we havelim m f x 2 d F n jn(x) = 0 .+oo j1 = IXI>1/mIt follows that there exists a sequence {>7n }decreas**in**g to 0 such thatfj=1XI > 1/nx 2 dF n j (x) --)~ 0 .Now we can go back and modify the def**in**ition **in** (3) by replac**in**g 77 withi) n . As **in**dicated above, the cited corollary becomes applicable and yields theconvergence of [S ;, - ~~(S;,)]/sn **in** dist . to (Y, hence also that of Sn/s' asremarked .F**in**ally we must go from Sn to S n . The idea is similar to Theorem 5 .2 .1but simpler . Observe that, for the modified Xn j **in** (3) with 27 replaced by 17n,we have~P{Sn :A Sn } -

218 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSsuch that if n > n o (T ),thenmax max 1 f n j (t) - 11 < 2jtj

7 .2 LINDEBERG-FELLER THEOREM 1 219< limn-+oofI >7?2 dFn (x)< lim 2n-+ooIQn j 2 2~2 = 2 '77the last **in**equality above by Chebyshev's **in**equality . S**in**ce 0 < 1 - cos 9 0,IxI7 IxI>?? '72+sdF n(x)Ixl 2+s dFni. (x),show**in**g that (10) implies (1) .The essence of Theorem 7 .2 .1 is the assumption of the f**in**iteness of thesecond moments together with the "classical" norm**in**g factor s n , which is thestandard deviation of the sum S n ; see Exercise 10 below for an **in**terest**in**gpossibility . In the most general case where "noth**in**g is assumed," we have thefollow**in**g criterion, due to Feller and **in** a somewhat different form to Levy .

220 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSTheorem 7.2 .2. For the double array (2) of Sec . 7 .1 (with **in**dependence **in**each row), **in** order that there exists a sequence of constants {a„) such that(i) Ek"_1 Xn j - a„ converges **in** dist . t o (D, and (ii) the array is holospoudic,it is necessary and sufficient that the follow**in**g two conditions hold for everyq>0 :( a ) E k j n =1 fxl > d F„ j (x)-+ 0 ;(b) ~j=1{fxl

7 .2 LINDEBERG-FELLER THEOREM 1 221Lemma 2 . For each n, the r .v .'s {X n j, 1 < j < n } are **in**dependent with thefollow**in**g distributions :1'~{Xnj=m}=- for 0 0, and sufficiently large n, we haven!2 n 3nS 2 -4 n 36 *lx„ j I< j -1

222 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSHence L**in**deberg's condition is satisfied for the double arrayX,,;1< j

7 .2 LINDEBERG-FELLER THEOREM 1 223*7. F**in**d an example where L**in**deberg's condition is satisfied butLiapounov's is not for any 8 > 0 .In Exercises 8 to 10 below {X j , j > 1 } is a sequence of **in**dependent r .v .'s .8. For each j let X j have the uniform distribution **in** [-j, j] . Show thatL**in**deberg's condition is satisfied and state the result**in**g central limit theorem .9. Let X,. be def**in**ed as follows for some a > 1 :1fja , with probability 6 j2(a_1) each ;X~ - 10, with probability 1 - 3 j2(a-1)Prove that L**in**deberg's condition is satisfied if and only if a < 3/2 .* 10 . It is important to realize that the failure of L**in**deberg's conditionmeans only the failure of either (i) or (ii) **in** Theorem 7 .2 .1 with the specifiedconstants s, A central limit theorem may well hold with a different sequenceof constants . Letwith probability each ;12j 2with probability 12 each ;with probability 1 - 1 - 16 6j 2Prove that L**in**deberg's condition is not satisfied . Nonetheless if we take b, _11 3/18, then S„ /b„ converges **in** d ist . to (D . The po**in**t is that abnormally largevalues may not count! [HINT : Truncate out the abnormal value .]11 . Prove that fo x 2 d F (x) < oo implies the condition (11), but notvice versa .* 12 . The follow**in**g comb**in**atorial problem is similar to that of the numberof **in**versions . Let Q and '~P be as **in** the example **in** the text . It is standardknowledge that each permutation1a,2a2can be uniquely decomposed **in**to the product of cycles, as follows . Considerthe permutation as a mapp**in**g n from the set (1, . . ., n) onto itself suchthat 7r(j) = a j . Beg**in**n**in**g with 1 and apply**in**g the mapp**in**g successively,1 --* 7r (1) - 2r 2 (1) -* . . ., until the first k such that rr k (1) = 1 . Thus(1, 7r(l), 7r2(1), . . . , 7rk-1(1))

2241 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSis the first cycle of the decomposition . Next, beg**in** with the least **in**teger,say b, not **in** the first cycle and apply rr to it successively ; and so on . Wesay 1 --+ r(1) is the first step k-1 (1) --f 1 the kth step, b --k 7r(b) the(k + 1)st step of the decomposition, and so on . Now def**in**e X n j (cv) to be equalto 1 if **in** the decomposition of co, a cycle is completed at the jth step ; otherwiseto be 0 . Prove that for each n, {Xnj , 1 < j < n} is a set of **in**dependent r .v .'swith the follow**in**g distributions :~P{X nj=1}= n1- j+1'~'{Xnj=O}=1-n-j+11Deduce the central limit theorem for the number of cycles of a permutation .7.3 Ramifications of the central limit theoremAs an illustration of a general method of extend**in**g the central limit theoremto certa**in** classes of dependent r .v .'s, we prove the follow**in**g result . Furtherelaboration along the same l**in**es is possible, but the basic idea, attributed toS . Bernste**in**, consists always **in** separat**in**g **in**to blocks and neglect**in**g small ones .Let {X n , n > 1 } be a sequence of r .v .'s ; let ~n be the Borel field generatedby {Xk, 1 < k < n), and ,,7 ' that by {Xk, n < k < oo} . The sequence is calledm-dependent iff there exists an **in**teger m > 0 such that for every n the fields`g**in** and z~„+m are **in**dependent . When m = 0, this reduces to **in**dependence .Theorem 7 .3 .1 . Suppose that {Xn } is a sequence of m-dependent, uniformlybounded r .v .'s such that(T(Snn 1/3 ) - +ooas n -->. oo . Then [S n - f (S n )]/6(S n ) converges **in** dist . t o (P .PROOF . Let the uniform bound be M . Without loss of generality we maysuppose that f (Xn ) = 0 for each n . For an **in**teger k > 1 let n j = [jn /k],0 < j < k, and put for large values of n :We have thenYj =Xnj+l +Xn j+2 + - • . +Xn ;+1-**in** ;Zj = Xn jti -m+1 +Xn j+i -m+2 + . . . +Xnj +i .k-1k-1Y 1 +j=0 j=0+ s 11 , say .

(D if Sn /s'n does7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 225It follows from the hypothesis of m-dependence and Theorem 3 .3 .2 that theYd's are **in**dependent ; so are the Zj's, provided nj+l - m + 1 - nj > m,which is the case if n 1k is large enough . Although S ;, and S," are not**in**dependent of each other, we shall show that the latter is comparativelynegligible so that S, behaves like S ;, . Observe that each term X, **in** S ;,'is **in**dependent of every term X, **in** Sn except at most m terms, and that('(X,X,) = 0 when they are **in**dependent, while I ci (X,X,) I < M2 otherwise .S**in**ce there are km terms **in** S ;;, it follows thatI(y(S;,Sn)I

226 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSthe last relation from (1) and the one preced**in**g it from a hypothesis of thetheorem . Thus for each 77 > 0, we have for all sufficiently large nJ x > 27 S' nx2 dF n j (x) = 0, 0 < j < k n - 1,where F,, j is the d .f. of Y, j . Hence L**in**deberg's condition is satisfied for thedouble array {Y nj/s;,}, and we conclude that S;,/s n converges **in** d ist. t o (D .This establishes the theorem as remarked above .The next extension of the central limit theorem is to the case of a randomnumber of terms (cf. the second part of Sec . 5 .5) . That is, we shall deal withthe r .v . S vn whose value at c) is given by Svn (w) (co), whereS"( 60 ) X - ( )j=1as before, and {vn (c)), n > 1 } is a sequence of r.v .'s . The simplest case,but not a very useful one, is when all the r .v .'s **in** the "double family"{Xn , Vn , n > 1 } are **in**dependent . The result below is more **in**terest**in**g and isdist**in**guished by the simple nature of its hypothesis . The proof relies essentiallyon Kolmogorov's **in**equality given **in** Theorem 5 .3 .1 .Theorem 7 .3 .2 . Let {X j , j > 1 } be a sequence of **in**dependent, identicallydistributed r .v.'s with mean 0 and variance 1 . Let {v,,, n > 1} be a sequenceof r.v .'s tak**in**g only strictly positive **in**teger values such that(3)where c is a constant : 0 < c < oc . Then S vn / vn converges **in** dist . to (D .PROOF . We know from Theorem 6 .4 .4 that S 7, / Tn- converges **in** d ist . toJ, so that our conclusion means that we can substitute v n for n there . Theremarkable th**in**g is that no k**in**d of **in**dependence is assumed, but only the limitproperty **in** (3) .Vnnn**in** pr .,First of all, we observe that **in** the result of Theorem 6 .4.4 wemay substitute [cn] (= **in**teger part of cn) for n to conclude the convergence**in** dist . of S[un]/,/[cn] to c (why?) . Next we writeS Vn _ S[cn] S v, - S[cn]V n /[cn ] + f[cn ]The second factor on the right converges to 1 **in** pr ., by (3) . Hence a simpleargument used before (Theorem 4 .4 .6) shows that the theorem will be provedif we show thatVn(4)Sv n - S[ `n] -a 0 **in** pr .,/[cn ]

7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM ( 227Let E be given, 0 < E < 1 ; puta n = [(1 - E3 )[cn]], bn = [( 1 + E 3 )[cn]] - 1 .By (3), there exists no(E) such that if n > no(E), then the setA = { C) : an < vn (co) < bn }has probability > 1 -,e . If co is **in** this set, then S„ n (W)(co) is one of the sumsS j with a n < j < b n . For [cn ] < j < b n , we havehence by Kolmogorov's **in**equalitySj - S[cn ] = X [cn ]+1 + X [cn]+2 + + X j ;JP max I Sj - S[cn] I > E cn _< 6 2 (Sbn - S[cn]) < E3[cn] < E .}[cn] E cn } < 2 E .an -< j

228 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSLet {X j , j > 1 } be a sequence of **in**dependent and identically distributedr.v .'s with mean 0 and variance 1 ; andSn =j=1It will become evident that these assumptions may be considerably weakened,and the basic hypothesis is that the central limit theorem should be applicable .Consider now the **in**f**in**ite sequence of successive sums {S, 1 , n > I) . This isthe k**in**d of stochastic process to which an entire chapter will be devoted later(Chapter 8) . The classical limit theorems we have so far discussed deal withthe **in**dividual terms of the sequence [S, j itself, but there are various othersequences derived from it that are no less **in**terest**in**g and useful . To give afew examples :max S**in** , m**in** S**in** , max IS,, 1, max1

7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 229and for each j, def**in**e f (j) byNow we write, for 0 < E < x :nt(j)-1 < j < nt(j)-nI max Sm > x,~n- } _ ° '{Ej ; I Snt(j) - Sj I < 6,1n)1 Efn- } and Q2 (S",( ;) - Sj)

230 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSconverges **in** distribution as n -f oc . Now its ch .f. f (t i , . . . , tk ) is given byF{exp(iVk/n(tjS ni + . . . + t kSnk ))) = d,`'{exp(i-\/k/n[(ti + . . . + tk )S n,+ (t 2 + . . .+ tk) (Sn 2 - S'1)+ . . . + tk(Snk - Sn k _ I )D},which converges to(7) exp [- i(t i + . . . + tk) 2 ] exp [-21(t2 + . . . tk )2] . . . exp (_ z tk) ,s**in**ce the ch .f.'s ofVnS17k1 9 - (Snz - Sn, ), . . .7nk(Snk -'Snk-1 )all converge to e_ t2 /2 by the central limit theorem (Theorem 6 .4 .4), n .i+l -n j be**in**g asymptotically equal to n/k for each j . It is well known that thech .f. given **in** (7) is that of the k-dimensional normal distribution, but for ourpurpose it is sufficient to know that the convergence theorem for ch .f.'s holds**in** any dimension, and so R nk converges vaguely to R ... k, where R 00k is somefixed k-dimensional distribution .Now suppose for a special sequence {X j } satisfy**in**g the same conditionsas IX .}, the correspond**in**g Pn can be shown to converge ("po**in**twise" **in** fact,but "vaguely" j if need be) :(8) Vx : hm Pn (x) = G(x) .n -± 00Then, apply**in**g (6) with Pn replaced by P n and lett**in**g n ---). oc, we obta**in**,s**in**ce R oo k is a fixed distribution :1G(x) < Rook (x) < G(x + E) + Elk .Substitut**in**g back **in**to the orig**in**al (6) and tak**in**g upper and lower limits,we have1 1G(x - E) -2 < Rook (x - E) -6 2k < lim P n (x)Ek Ek n1< Rook(x) < G(x +c) + Elk.Lett**in**g k - oo, we conclude that P n converges vaguely to G, s**in**ce E isarbitrary .It rema**in**s to prove (8) for a special cWoice of {Xj } and determ**in**e G . Thiscan be done most expeditiously by tak**in**g the common d .f. of the X D 's to be

7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 2 3 1the symmetric Bernoullian 2(8 1 + S_ 1 ) . In this case we can **in**deed computethe more specific probability(9) UT max Sm < x ; Sn = y ,1 y . If we observe that**in** our particular case maxl< m < n S**in** > x if and only if S j = x for some j,1 < j < n, the probability **in** (9) is seen to be equal to`'T{Sn = y} - !~71' max S **in** > x ; S, = yI

232 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSand we have proved that the value of the probability **in** (9) is given by(10) . The trick used above, which consists **in** chang**in**g the sign of every X jafter the first time when S, reaches the value x, or, geometrically speak**in**g,reflect**in**g the path { (j, Si), j > 1 ] about the l**in**e Sj = x upon first reach**in**g it,is called the "reflection pr**in**ciple" . It is often attributed to Desire Andre **in** itscomb**in**atorial formulation of the so-called "ballot problem" (see Exercise 5below) .The value of (10) is, of course, well known **in** the Bernoullian case, andsumm**in**g over y we obta**in**, if n is even :~I max Sm

7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 23 3EXERCISES1. Let {X j , j > 1 } be a sequence of **in**dependent r .v .'s, and f a Borelmeasurable function of m variables. Then if ~k = f (Xk+l, . . . , Xk+,n ), thesequence {~ k , k > 1) is (m - 1)-dependent .*2 . Let {X j , j > 1} be a sequence of **in**dependent r .v .'s hav**in**g theBernoullian d .f. p8 1 + (1 - p)8o , 0 < p < 1 . An r-run of successes **in** thesample sequence {X j (w), j > 1 } is def**in**ed to be a sequence of r consecutive"ones" preceded and followed by "zeros" . Let Nn be the number of r-runs **in**the first n terms of the sample sequence . Prove a central limit theorem for N n .3. Let {X j , v j , j > 1} be **in**dependent r .v .'s such that the vg's are **in**tegervalued,vj --~ oc a .e ., and the central limit theorem applies to (Sn - an)/b,,where S„ _ E~=1 Xj , an , b n are real constants, bn -+ oo . Then it also appliesto (SVn - aV n )l b,,,* 4. Give an example of a sequence of **in**dependent and identicallydistributed r.v .'s {X„ } with mean 0 and variance I and a sequence of positive**in**teger-valued r .v .'s vn tend**in**g to oo a .e . such that S„n /s„, does not converge**in** distribution . [HINT : The easiest way is to use Theorem 8 .3.3 below .]*5 . There are a ballots marked A and b ballots marked B . Suppose thatthese a + b ballots are counted **in** random order . What is the probability thatthe number of ballots for A always leads **in** the count**in**g?6 . If {X n } are **in**dependent, identically distributed symmetric r .v .'s, thenfor every x > 0,{IsnI >x} > 1 2 ;, ~{max IXkI >x}1 0 such thatJ~{IXI > n 1 l') > c/n . [This is due to Feller ; use Exercise 3 of Sec . 6 .5 .]8. Under the same hypothesis as **in** Theorem 7 .3 .3, prove thatmaxi< n,

234 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS1I) n00nn(n+2+Y_Z kx- (n+2_Y_Z)} kx.2 2This can be done by start**in**g the sample path at z and reflect**in**g at both barriers0 and x (Kelv**in**'s method of images) . Next, show thatlim :'7{-z J < Sm < (x - z),1n for 1 < m < n}n-+0C1 (2k+1)x-z 2kx-z2kx-z(2k-1)x-zeye /2 d y .F**in**ally, use the Fourier series for the function h of period 2x :to convert the above limit to_ 1, if - x - z < y < -z ;h( ~ ) +1, if - z < y < x - z ;4 1 . (2k + 1)7rz (2k + 1) 2 7r22k + 1 s**in** exp -x2X2 ] .=0This gives the asymptotic jo**in**t distribution ofm**in** Sm and max Sm ,1

7 .4 ERROR ESTIMATION 1 2 35as n -~~ oo, thenThus(r + 1)c,~ oz -1 /2 (1 - z) r12dz ( 2) (r+1>/2 .r(2+1)Cr+ 1= Crrr+1 +12F**in**allye2r/2 r r + 1N„ r F(r+ 1) 22r12r(2+1) - r 1(2)000x rdG(x) .This result rema**in**s valid if the common d .f. F of X j is of the **in**teger latticetype with mean 0 and variance 1 . If F is not of the lattice type, no S„ needever be zero-but the "next nearest th**in**g", to wit the number of changes ofsign of S,,, is asymptotically distributed as G, at least under the additionalassumption of a f**in**ite third absolute moment .]7 .4 Error estimationQuestions of convergence lead **in**evitably to the question of the "speed" ofconvergence-**in** other words, to an **in**vestigation of the difference betweenthe approximat**in**g expression and its limit . Specifically, if a sequence of d .f.'sF„ converge to the unit normal d .f. c, as **in** the central limit theorem, whatcan one say about the "rema**in**der term" F„ (x) - c (x)? An adequate estimateof this term is necessary **in** many mathematical applications, as well as fornumerical computation . Under Liapounov's condition there is a neat "orderbound" due to Berry and Esseen, who improved upon Liapounov's older result,as follows .Theorem 7 .4.1 . Under the hypotheses of Theorem 7 .1 .2, there is a universalconstant A 0 such that(1) sup IF,, (x) - (D (x)I < A0F,xwhere F„ is the d .f . of S,, .

236 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSIn the case of a s**in**gle sequence of **in**dependent and identically distributedr .v .'s {Xj, j > 1 } with mean 0, variance 62 , and third absolute moment y < oc,the right side of (1) reduces tony Aoy 1Ao (n j2 ) 3/2 =9 3 n 1/2'H. Cramer and P . L. Hsu have shown that under somewhat stronger conditions,one may even obta**in** an asymptotic expansion of the form :H1(x) H2(x) H3(x)F" (x) _ ~(x)+ n1/2 + n + n3/2 + . . .where the H's are explicit functions **in**volv**in**g the Hermite polynomials . Weshall not go **in**to this, as the basic method of obta**in****in**g such an expansion issimilar to the proof of the preced**in**g theorem, although considerable technicalcomplications arise . For this and other variants of the problem see Cramer[10], Gnedenko and Kolmogorov [12], and Hsu's paper cited at the end ofthis chapter .We shall give the proof of Theorem 7 .4 .1 **in** a series of lemmas, **in** whichthe mach**in**ery of operat**in**g with ch .f.'s is further exploited and found to beefficient .Lemma 1 . Let F be a d .f., G a real-valued function satisfy**in**g the conditionsbelow :Set(i) lim1,_ 00 G(x) = 0, lima ,+. G(x) = 1 ;(ii) G has a derivative that is bounded everywhere : supx jG'(x)i < M .1(2) = 2M sup I F(x) - G(x)l .Then there exists a real number a such that we have for every T > 0 :( T° 1 - cosx(3) 2MTA { 3 2 dx - nl / o xcos TxJ {F(x + a) - G(x + a)} dxX 2PROOF . Clearly the A **in** (2) is f**in**ite, s**in**ce G is everywhere bounded by(1) and (ii) . We may suppose that the left member of (3) is strictly positive, forotherwise there is noth**in**g to prove ; hence A > 0 . S**in**ce F - G vanishes at+oc by (i), there exists a sequence of numbers {x„ } converg**in**g to a f**in**ite limit

22x2x27 .4 ERROR ESTIMATION 1 23 7b such that F(x„) - G(x„) converges to 2MA or -2MA . Hence either F(b) -G(b) = 2MA or F(b-) - G(b) = -2MA . The two cases be**in**g similar, weshall treat the second one . Put a = b - A ; then if Ix I < A, we have by (ii)and the mean value theorem of differential calculus :and consequentlyG(x + a) > G(b) + (x - A)MF(x + a) - G(x + a) < F(b-) - [G(b) + (x - 0)M] _ -M(x + A) .It follows that° 1 - cos Tx f° 1 - cos Tx~° x2{F(x+a)-G(x+a)}dx < -M(x+0)dx= -2MA ~° 1 - cos Tx dx ;Jox 2-°001 - cos Txf {F(x+a)-G(x+a)}dx+J ° x° f00) 1 - cos Tx 00 1 - cos Tx< 2M A ~~ + 2 dx = 4M A 2 dx .x ° xAdd**in**g these **in**equalities, we obta**in**I 1 - cos TxAf O°{F(x+a)-G(x+a)}dx < 2MA +2J -[f00 x 0 °1 - cos Tx ° 00 I - cos Txdx - 2MA -3 + 2dx .0x2This reduces to (3),s**in**cef 00 1 -cos Tx 7Tdx = -o x 22by (3) of Sec . 6.2, provided that T is so large that the left member of (3) ispositive ; otherwise (3) is trivial .0I °Lemma 2 .In addition to the assumptions of Lemma 1, we assume that(iii) G is of bounded variation **in** (-oo, oo) ;(iv) f 00 JF(x) - G(x){ dx < oo .Let00f(t) _ J e` tx dF(x), g(t) = e`fxdG(x).00 -00

x2238 1CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSThen we have(4) p < 1 ~T I f (t ) - g(t)1 dt + 127rM Jot7rTPROOF .That the **in**tegral on the right side of (4) is f**in**ite will soon beapparent, and as a Lebesgue **in**tegral the value of the **in**tegrand at t = 0 maybe overlooked . We have, by a partial **in**tegration, which is permissible onaccount of condition (iii) :(5)f (t) - g(t) _ -it f 00 {F(x) - G(x )}e itx dx ;00and consequentlyf(t)-g(t)_ 1t4f{F(x=e°° + a)-it- G(x + a)}e'tx dx .In particular, the left member above is bounded for allMultiply**in**g by T - Itl and **in**tegrat**in**g, we obta**in**t ; 0 by condition (iv) .fT(6) f(t)-g(t) .Je-1itItl)dtTT °0f-Tf {F(x0c+ a) - G(x + a)}e` tx (T - I tI) dx AWe may **in**vert the repeated **in**tegral by Fub**in**i's theorem and conditionand obta**in** (cf. Exercise 2 of Sec . 6 .2) :(iv),00{F(x + a) - G(x + a)) 1 - cos~Txdx < T pT I f(t)-g(t)Ix2dt .JotIn conjunction with (3), this yields(7) 2M0 3 f Ix1TThe quantity **in** braces **in** (7) is not less thansx dx,- n< T I f(t)-g(t) 1A0 t3f' 1 -cos x 00 2f dx-3 ~ 2 dx-7r= 7r6JTE X2 2 TOUs**in**g this **in** (7), we obta**in** (4) .Lemma 2 bounds the maximum difference between two d .f .'s satisfy**in**gcerta**in** regularity conditions by means of a certa**in** average difference betweentheir ch .f .'s (It is stated for a function of bounded variation, s**in**ce this is needed

7 .4 ERROR ESTIMATION l 23 9for the asymptotic expansion mentioned above) . This should be compared withthe general discussion around Theorem 6 .3 .4. We shall now apply the lemmato the specific case of F„ and (D **in** Theorem 7 .4 .1 . Let the ch .f . of F„ be fso thatknf n (t)= 11f nj (t) .j=1Lemma 3 . For I t I < 1/(21"1 13), we have(8) If, (t) - e - `2/2 1 < 1' n ItI 3 e _t2/2 .PROOF . We shall denote by 9 below a "generic" complex number with101 < 1, otherwise unspecified and not necessarily the same at each appearance .By Taylor's expansion :2t 1 - Qnj t2 +9Yni t3 .fn ' ( ) 2 6For the range of t given **in** the lemma, we have by Liapounov's **in**equality :(9) Ianjtl

24 0 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSor explicitly :t 2log f n (t) + 2 < 1 2 r'„ ItI 3 .S**in**ce l e" - 1( < I u I eH"H for all u, it follows that(f„ (t)e2 /2 - 1( < r"2tI3 exphn2tl 3S**in**ce I'„ I t 13 /2 < 1/16 and e 1116 < 2, this implies (8) .Lemma 4 . For I t I < 1/(4r,,), we have(10) I f,,(t)I < e -t2/3 .PROOF . We symmetrize (see end of Sec . 6 .2) to facilitate the estimation .We have00If „j (t)1 2 = L f cos t(x - y)dF, j (x) dF n j (Y),s**in**ce I f „ j I 2 is real . Us**in**g the elementary **in**equalitiescosu-1+ 2< (~ 3Ix - YI 3 < 4(IxI3 + IY1 3 ) ;we see that the double **in**tegral above does not exceed0o x t 2 2{1- 2(x 2- 2xy+Y 2 )+3jti 3 (jX1 3 +IY1 3 ) dF, 1 j(x)dF„j(Y)00 0CMultiply**in**g over j, we obta**in**= 1 -621t2 4 4+ 3ynjItl3 < exp-Q2 jt2 + 3y„jjt1 3If n(t)1 2 < eXp__t2+ 3rnIt13

7 .4 ERROR ESTIMATION 1 24 1PROOF . If Its < 1/(21, 1/3), this is implied by (8) . If 1/(2r n 1 /3 ) < I tI

242 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSan asymptotic evaluation ofI- F,(x,)1 - 4)(x,)as x„ -a oc more rapidly than **in**dicated above . This requires a different typeof approximation by means of "bilateral Laplace transforms", which will notbe discussed here .EXERCISES1 . If F and G are d .f .'s with f**in**ite first moments, then~DOIF(x) - G(x)I dx < oo .[HINT : Use Exercise 18 of Sec . 3 .2 .]2 . If f and g are ch .f .'s such that f (t) = g(t) for Itl < T, then00TIF(x)-G(x)Idx< ./ - TThis is due to Esseen (Acta Math, 77(1944)) .*3 . There exists a universal constant A 1 > 0 such that for any sequenceof **in**dependent, identically distributed **in**teger-valued r .v .'s {X j } with mean 0and variance 1, we havesup IF ., (x) - (D (x) I ?1/2'where F t? is the d .f . of (E~,1 X j )/~ . [HINT : Use Exercise 24 of Sec . 6 .4 .]4. Prove that for every x > 0 :00X e-x2/2 < e-y2 /2 d < l e -x2 /21 -+- x2 I - - x7 .5 Law of the iterated logarithmThe law of the iterated logarithm is a crown**in**g achievement **in** classical probabilitytheory . It had its orig**in** **in** attempts to perfect Borel's theorem on normalnumbers (Theorem 5 .1 .3) . In its simplest but basic form, this asserts : if N„ (w)denotes the number of occurrences of the digit 1 **in** the first n places of theb**in**ary (dyadic) expansion of the real number w **in** [0, 1], then N n (w) n/2for almost every w **in** Borel measure . What can one say about the deviationN, (a)) - n/2? The order bounds O(n(l/21+E) E > 0; O((n log n) 1 / 2 ) ( cf.

7.5 LAW OF THE ITERATED LOGARITHM 1 2 43Theorem 5 .4 .1) ; and O((n log logn) 1 / 2 ) were obta**in**ed successively by Hausdorff(1913), Hardy and Littlewood (1914), and Kh**in**tch**in**e (1922) ; but **in**1924 Kh**in**tch**in**e gave the def**in**itive answer :limn-+oonN n (w) - 21 2 n log log n1for almost every w . This sharp result with such a f**in**e order of **in**f**in**ity as "loglog" earned its celebrated name . No less celebrated is the follow**in**g extensiongiven by Kolmogorov (1929) . Let {X n , n > 1} be a sequence of **in**dependentr .v .'s, S, = En=, X j ; suppose that S' (X„) = 0 for each n andS(1) sup IX, (w)I = ° ( iogiogsn ) 'where s,2 = U 2(S"), then we have for almost every w :(2) lim S"(6)) = 1 .n >oo ,/2S2 log log SnThe condition (1) was shown by Marc**in**kiewicz and Zygmund to be of thebest possible k**in**d, but an **in**terest**in**g complement was added by Hartman andW**in**tner that (2) also holds if the Xn 's are identically distributed with a f**in**itesecond moment. F**in**ally, further sharpen**in**g of (2) was given by Kolmogorovand by Erdos **in** the Bernoullian case, and **in** the general case under exactconditions by Feller ; the "last word" be**in**g as follows : for any **in**creas**in**gsequence qp n , we haveaccord**in**g as the series?'Ns" (w) > S,, (p„ i .o . ) _ { 0 100

244 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSproof of such a "strong limit theorem" (bounds with probability one) as thelaw of the iterated logarithm depends essentially on the correspond**in**g "weaklimit theorem" (convergence of distributions) with a sufficiently good estimateof the rema**in**der term .The vital l**in**k just mentioned will be given below as Lemma 1 . In the*at .of this section "A" will denote a generic strictly positive constant, notnecessarily the same at each appearance, and A will denote a constant suchthat I A I < A . We shall also use the notation **in** the preced**in**g statement ofKolmogorov's theorem, andYn = ((ox"I 3 ),rnnj=1Yas **in** Sec . 7.4, but for a s**in**gle sequence of r.v .'s .Let us set alsocP(), ,x) = 2Xlog log x, 'k > 0, x > 0 .Lemma 1 .(3)Suppose that for some E, 0 < E < 1, we havern A (W +&, SO) :!Sxe- y'/2 dyA(log s n ) 1 +s'A{S„ > (PG - E, s n )} > .(log sn) 1 _ (6/2)PROOF . By Theorem 7 .4.1, we have for each x :(6) J~'{S„ > xs„ } = 1,/2n x S 3We have as x -- oc :(7)1~ e_~''2 d y + A T ne-x2 j2(See Exercise 4 of Sec . 7 .4) . Substitut**in**g x = ,,/2(1 ± 6) log log s n , the firstterm on the right side of (6) is, by (7), asymptotically equal tox.,/4n(1 ± 6) log log s n (1og s n )its

7 .5 LAW OF THE ITERATED LOGARITHM 1 24 5This dom**in**ates the second (rema**in**der) term on the right side of (6), by (3)s**in**ce 0 < 8 < e. Hence (4) and (5) follow as rather weak consequences .To establish (2), let us write for each fixed 8, 0 < 8 < E :En+ _ {w : S n (w) > cp(1 + 8, sn)},E n= {w : S n (w) > ~0(1 - 8, s')},and proceed by steps .1° . We prove first that(8) ~PI {E n + i .o .} = 0**in** the notation **in**troduced **in** Sec . 4.2, by us**in**g the convergence part ofthe Borel-Cantelli lemma there . But it is evident from (4) that the seriesEn -(En+) is far from convergent, s**in**ce s, is expected to be of the orderof **in**-. The ma**in** trick is to apply the lemma to a crucial subsequence Ink)(see Theorem 5 .1 .2 for a crude form of the trick) chosen to have two properties: first, >k ?'(E nk +) converges, and second, "E„+ i .o ." already implies"Enk + i .o ." nearly, namely if the given 8 is slightly decreased . This modifiedimplication is a consequence of a simple but essential probabilistic argumentspelled out **in** Lemma 2 below .Given c > 1, let n k be the largest value of n satisfy**in**g S n < C k , so thatS nk < Ck < snk+1S**in**ce (maxi _ {w : ISnk+i - S1 I< S, tk+1 } .By Chebyshev's **in**equality, we haveJ'(F,) > 1 -s 2 - snk+l2nk 1--~ CS2nk+12hence GJ'(F 1 ) > A > 0 for all sufficiently large k .Lemma 2 . Let {E j ) and {F j }, 1 < j < n < oc, be two sequences of events .Suppose that for each j, the event F j is **in**dependent of Ei I . . . Ec_ 1Ej , and

1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS2 46that there exists a constant A > 0 such that / (F j) > A for every j . Thenwe have(12) UEjF1 >kJ' UEjj=1 j=1PROOF . The left member **in** (12) is equal to/ nU[(E1F1)c\j=1nj=1rrU [J E`1J=1n(Ej-1Fj-1)`(EjFj)]E`_1E1FJ)(Ei . . . Ej-,E,) A,which is equal to the right member **in** (12) .nj=1n9(E1 . . . E>-tEj)g(F'j)Apply**in**g Lemma 2 to Ej+ and the F j **in** (11), we obta**in**114_+1-1nk+,-1(13) ~' U Ej+Fj > Ate' U E/J=nkIt is clear that the event Ej+ fl F j impliesJ=nkSnk_1 > S1 - S„k+r > (W + 8, S j) - Snk .+t ,which is, by (9) and (10), asymptotically greater than1 3C ~O 1 + 4 8' sr, k+,.Choose c so close to 1 that (1 + 3/48)/c2 > 1 + (8/2) and put8Gk = w : Snk+i > 1 + 2' Snk+, ;note that Gk is just E„k + with 8 replaced by 8/2 . The above implication maybe written asEj+F j C Gkfor sufficiently large k and all j **in** the range given **in** (10) ; hence we havenk+,-1(14) U E+F j C Gk .J=1t k

7 .5 LAW OF THE ITERATED LOGARITHM 1 2 4 7It follows from (4) thatE 9)(Gk) - A +(sl2) < A 1: 1 1+(s/~~ < oo .k (log Snk ) k (k log c)In conjunction with (13) and (14) we conclude thatknk+, - 1U Ej + < oc ;j=nkand consequently by the Borel-Cantelli lemma thatnA+, -1U E j + i .o . = 0 .j=n AThis is equivalent (why?) to the desired result (8) .2° . Next we prove that with the same subsequence Ink) but an arbitraryc, if we put tk = S2k+I- Sn A , andthen we haveDk = w : SnA+1 (w) - S,ik (w) > cP 1 - 2 , tk ,(15) UT(Dk i .o .) = 1 .S**in**ce the differences SnA+, - S1lk , k > l, are **in**dependent r .v .'s, the divergencepart of the Borel-Cantelli lemma is applicable to them . To estimate ~ (D k ),we may apply Lemma 1 to the sequence {X,lk+j, j > 11 . We haveand consequently by (3) :2 1 2t k 1 - C2 S nk+,, ( I - 2CC 2(k+1)Hence by (5),r, k+1 - r,tk < AF,lk+, < At3kS 3nk+,0090 1+E~A11(Dk) >-- (log tk >- kiAand so Ek ,/'(D k ) = oo and (15) follows .3 ° . We shall use (8) and (15) together to prove(16) /I (E, - i .o .) = 1 .

248 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSThis requires just a rough estimate entail**in**g a suitable choice of c . By (8)applied to {-X,) and (15), for almost every w the follow**in**g two assertionsare true :(i) S,jk+ , ( cv) - S„ k. (w) > (p(1 - (3/2), tk ) for **in**f**in**itely many k ;(ii) Silk (w) > -cp(2, silk ) for all sufficiently large k .For such an w, we have thenS(17) S„ k .+) (w) > 2 , tk - cp(2, sil k ) for **in**f**in**itely many k .Us**in**g (9) and log log t2 log log Snk+l, we see that the expression **in** the rightside of (17) is asymptotically greater thanCS\ 11 - 2 1 -\\ / C2- C cp(1, Snk+i )> (P(1 - S, Snk+3 ),provided that c is chosen sufficiently large . Do**in**g this, we have thereforeproved that(18) '7(Enk+l _ i .o .) = 1,which certa**in**ly implies (16) .4° . The truth of (8) and (16), for each fixed S, 0 < S 1} . Recall that (3) is more than sufficient to ensure the validityof the central limit theorem, namely that S n /s n converges **in** dist . to c1. Thusthe law of the iterated logarithm complements the central limit theorem bycircumscrib**in**g the extraord**in**ary fluctuations of the sequence {S n , n > 1} . Animmediate consequence is that for almost every w, the sample sequence S, (w)changes sign **in**f**in**ite&v often . For much more precise results **in** this directionsee Chapter 8 .In view of the discussion preced**in**g Theorem 7 .3.3, one may wonderabout the almost everywhere bounds formax S**in** , max JSmJ, and so on .1

7 .5 LAW OF THE ITERATED LOGARITHM I 249It is **in**terest**in**g to observe that as far as the lim sup,, is concerned, these twofunctionals behave exactly like Sn itself (Exercise 2 below) . However, thequestion of lim**in**f,, is quite different . In the case of maxim,, IS,,,I, anotherlaw of the (**in**verted) iterated logarithm holds as follows . For almost every w,we haveJim maxi"=1 X j . Then)J9p lim ISn((0)1 = 0 } = 1 .{n+C0[xrn-r : Consider SnA+, - S nA, with nk ^- kk . A quick proof follows fromTheorem 8 .3 .3 below .]4. Prove that **in** Exercise 9 of Sec . 7 .3 we have '7flSn = 0 i .o .} = 1 .*5. The law of the iterated logarithm may be used to supply certa**in** counterexamples. For **in**stance, if the X n 's are **in**dependent and X,, = ±n'/ 2 /log logn with probability 1 each, then S, 1n --f 0 a .e., but Kolmogorov's sufficientcondition (see case (i) after Theorem 5 .4 .1) >,, I(Xn)/n 2 < oo fails .6. Prove that U'{ISn I >(p(1 - 8, sn )i .o .) = 1, without use of (8), asfollows . Letek = {(0 : Sil k (w)1 < cp(1 - 8, SnA)} ;f k = { w:Snk+l(w)_Snk(w) > (p 1 2 , sfk .+1)}.

25 0 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSShow that for sufficiently large k the event ek n f k implies the complementof ek+1 ; hence deducek+1 k~~ n e j < ~(elo) 11[1 - ow j )]j=joj=joand show that the product -k 0 as k -3 00 .7 .6 Inf**in**ite divisibilityThe weak law of large numbers and the central limit theorem are concerned,respectively, with the convergence **in** dist . of sums of **in**dependent r .v .'s to adegenerate and a normal d .f. It may seem strange that we should be so muchoccupied with these two apparently unrelated distributions . Let us po**in**t out,however, that **in** terms of ch .f.'s these two may be denoted, respectively, byeat and e°`t-b't2 - exponentials of polynomials of the first and second degree**in** (it) . This expla**in**s the considerable similarity between the two cases, asevidenced particularly **in** Theorems 6 .4 .3 and 6 .4 .4 .Now the question arises : what other limit**in**g d .f.'s are there when small**in**dependent r .v.'s are added? Specifically, consider the double array (2) **in**Sec . 7 .1, **in** which **in**dependence **in** each row and holospoudicity are assumed .Suppose that for some sequence of constants a n ,e~S n- aj=1converges **in** dist . to F . What is the class of such F's, and when doessuch a convergence take place? For a s**in**gle sequence of **in**dependent r .v .'s{X j , j > 1}, similar questions may be posed for the "normed sums" (S 7z -a n )/b n .These questions have been answered completely by the work of Levy, Kh**in**tch**in**e,Kolmogorov, and others ; for a comprehensive treatment we refer to thebook by Gnedenko and Kolmogorov [12] . Here we must content ourselveswith a modest **in**troduction to this important and beautiful subject .We beg**in** by recall**in**g other cases of the above-mentioned limit**in**g distributions,conveniently dispilayed by their ch .f.'s :i.>0 ; e0

7 .6 INFINITE DIVISIBILITY 1 251among the latter for a = 2. All these are exponentials and have the furtherproperty that their "n th roots" :eait/n , e (1 /n)(ait-b 2 t2 ) e(x/n)(e"-1)e -(c/n)Itl'are also ch .f.'s. It is remarkable that this simple property already characterizesthe class of distributions we are look**in**g for (although we shall prove onlypart of this fact here) .DEFINITION OF INFINITE DIVISIBILITY . A ch .f. f is called **in**f**in**itely divisible ifffor each **in**teger n > 1, there exists a ch .f. fn such that( 1 ) f = (fn) n •In terms of d .f.'s, this becomes **in** obvious notation :F=Fn * = F,*F11,k . . .,kFn .(n factors)In terms of r .v.'s this means, for each n > 1, **in** a suitable probability space(why "suitable"?) : there exist r .v .'s X and X n j , 1 < j < n, the latter be**in**g**in**dependent among themselves, such that X has ch .f. f, X, j has ch .f. f n , and(2) X = x1 j .nj=1X is thus "divisible" **in**to n **in**dependent and identically distributed parts, foreach n . That such an X, and only such a one, can be the "limit" of sums ofsmall **in**dependent terms as described above seems at least plausible .A vital, though not characteristic, property of an **in**f**in**itely divisible ch .f.will be given first .Theorem 7 .6 .1 . An **in**f**in**itely divisible ch .f. never vanishes (for real t) .PROOF . We shall see presently that a complex-valued ch .f . is trouble-somewhen its "nth root" is to be extracted . So let us avoid this by go**in**g to the real,us**in**g the Corollary to Theorem 6 .1 .4 . Let f and f n be as **in** (1) and writeg=If 12 , gn=Ifnl 2 .For each t EW1, g(t) be**in**g real and positive, though conceivably vanish**in**g,its real positive nth root is uniquely def**in**ed ; let us denote it by [g(t)] 1 /" . S**in**ceby (1) we haveg(t) = [gn (t)) n ,and gn (t) > 0, it follows that(3) dt : gn (t) = [g(t)]' ln •

2 52 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSBut 0 < g(t) < 1, hence lim„ .0 [g(t)]t/n is 0 or 1 accord**in**g as g(t) = 0 org(t) :A 0 . Thus urn co g„ (t) exists for every t, and the limit function, sayh(t), can take at most the two possible values 0 and 1. Furthermore, s**in**ce g iscont**in**uous at t = 0 with g(0) = 1, there exists a t o > 0 such that g(t) =A 0 forI tl < t o . It follows that h(t) = 1 for ItI < to . Therefore the sequence of ch .f.'sg n converges to the function h, which has just been shown to be cont**in**uousat the orig**in** . By the convergence theorem of Sec . 6 .3, h must be a ch .f . andso cont**in**uous everywhere . Hence h is identically equal to 1, and so by theremark after (3) we haveThe theorem is proved .Vt : if (t)1 z = g(t) 0 0 .Theorem 7 .6.1 immediately shows that the uniform distribution on [-1, 1 ]is not **in**f**in**itely divisible, s**in**ce its ch .f . is s**in** t/t, which vanishes for some t,although **in** a literal sense it has **in**f**in**itely many divisors! (See Exercise 8 ofSec . 6 .3.) On the other hand, the ch .f.2 + cos t3never vanishes, but for it (1) fails even when n = 2; when n > 3, the failureof (1) for this ch .f. is an immediate consequence of Exercise 7 of Sec . 6 .1, ifwe notice that the correspond**in**g p .m . consists of exactly 3 atoms .Now that we have proved Theorem 7 .6 .1, it seems natural to go back toan arbitrary **in**f**in**itely divisible ch .f. and establish the generalization of (3) :f n (t) _ [f (t)] II nfor some "determir. : :ion" of the multiple-valued nth root on the right side . Thiscan be done by a 4 -nple process of "cont**in**uous cont**in**uation" of a complexvaluedfunction of _ real variable . Although merely an elementary exercise **in**"complex variables" . it has been treated **in** the exist**in**g literature **in** a cavalierfashion and then r: Bused or abused . For this reason the ma**in** propositionswill be spelled ou :n meticulous detail here .Theorem 7 .6 .2 . Let a complex-valued function f of the real variable t begiven . Suppose th_ : f (0) = 1 and that for some T > 0, f is cont**in**uous **in**[-T, T] and does : :t vanish **in** the **in**terval . Then there exists a unique (s**in**glevalued)function /' . -:f t **in** [-T, T] with A(0) = 0 that is cont**in**uous there andsatisfies(4) f (t) = e~`W, -T < t < T .

7.6 INFINITE DIVISIBILITY 1 253The correspond**in**g statement when [-T, T] is replaced by (-oo, oc) isalso true .PROOF . Consider the range of f (t), t E [-T, T] ; this is a closed set ofpo**in**ts **in** the complex plane . S**in**ce it does not conta**in** the orig**in**, we have**in**f If (t)-01 =PT > 0 .-T

2 54 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSwith a similar expression for t_ k _ 1 < t < t_ k . Thus (4) is satisfied **in** [t_ 1 , t 1 ],and if it is satisfied for t = t k , then it is satisfied **in** [tk , tk+1 ], s**in**cee ~.(r) = eX(tk)+L(f(t)lf(tA)) = f (tk) f (t) f (tk )= f (t) •Thus (4) is satisfied **in** [-T, T] by **in**duction, and the theorem is proved forsuch an **in**terval . To prove it for (-oo, oc) let us observe that, hav**in**g def**in**ed**in** [-n, n], we can extend it to [-n - 1, n + 1] with the previous method,by divid**in**g [n, n + 1] . for example, **in**to small equal parts whose length mustbe chosen dependent on n at each stage (why?) . The cont**in**uity of A is clearfrom the construction .To prove the uniqueness of A, suppose that A' has the same properties asX . S**in**ce both satisfy equation (4), it follows that for each t, there exists an**in**teger m(t) such thatA(t) - A'(t) = 27ri m(t) .The left side be**in**g cont**in**uous **in** t, m( .) must be a constant (why?), whichmust be equal to m(0) = 0 . Thus A(t) = A'(t) .Remark . It may not be amiss to po**in**t out that A(t) is a s**in**gle-valuedfunction of t but not of f (t) ; see Exercise 7 below .Theorem 7 .6 .3 . For a fixed T, let each k f , k > 1, as well as f satisfy theconditions for f **in** Theorem 7 .6.2, and denote the correspond**in**g ), by kA .Suppose that k f converges uniformly to f **in** [-T, T], then kA convergesuniformly to ;, **in** [-T . T] .PROOF . Let L be as **in** (7), then there exists a 6, 0 < 6 < 2, such that1L(z)I < 1, if 1z - 11 < 6 .By the hypothesis of uniformity, there exists kl (T) such that if k > k1 (T ),then we have(8) sup kf (t)iti :!~-T f (t)and consequently- 1 < 8,(9) sup LItIST(kf (t)f (t)

7 .6 INFINITE DIVISIBILITY 1 255S**in**ce L is cont**in**uous **in** Iz - I < 8, it follows that km(') is cont**in**uous **in**Jtl < T . S**in**ce it is **in**teger-valued and equals 0 at t - 0, it is identically zero .Thus (10) reduces to(11) kA(t) - a,(t) _ L kf(t) ,f (t)Itl < T .The function L be**in**g cont**in**uous at z - 1, the uniform convergence of k f / fto 1 **in** [-T, T] implies that of k l - -k to 0, as asserted by the theorem .Thanks to Theorem 7 .6.1, Theorem 7 .6 .2 is applicable to each **in**f**in**itelydivisible ch .f. f **in** (-oo, oc) . Henceforth we shall call the correspond**in**g ),the dist**in**guished logarithm, and eI(t)/n the dist**in**guished nth root of f . Wecan now extract the correct nth root **in** (1) above .Theorem 7 .6.4 . For each n, the f n **in** (1) is just the dist**in**guished nth rootof f .PROOF . It follows from Theorem 7 .6 .1 and (1) that the ch .f. f,, nevervanishes **in** (-oc, oc), hence its dist**in**guished logarithm a. n is def**in**ed. Tak**in**gmultiple-valued logarithms **in** (1), we obta**in** as **in** (10) :Vt : a. (t) - nA n (t) - 27r m n (t),where Mn () takes only **in**teger values . We conclude as before that m„ ( .) = 0,and consequently(12)f n ( t ) = e ), " W = e k(t)/nas asserted .Corollary . If f is a positive **in**f**in**itely divisible ch .f., then for every t thef n (t) **in** (1) is just the real positive nth root of f (t) .PROOF . Elementary analysis shows that the real-valued logarithm of areal number x **in** (0, oo) is a cont**in**uous function of x . It follows that this isa cont**in**uous solution of equation (4) **in** (-oo, oo) . The uniqueness assertion**in** Theorem 7 .6 .2 then identifies it with the dist**in**guished logarithm of f, andthe corollary follows, s**in**ce the real positive nth root is the exponential of 1 /ntimes the real logarithm .As an immediate consequence of (12),Vt : lim f n (t) = 1 .n-+oowe haveThus by Theorem 7 .1 .1 the double array {Xn 1, 1 < j < n, 1 < n) giv**in**g riseto (2) is holospoudic . We have therefore proved that each **in**f**in**itely divisible

2 5 6 I CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSdistribution can be obta**in**ed as the limit**in**g distribution of S, = E"=1 X„ j **in**such an array .It is trivial that the product of two **in**f**in**itely divisible ch .f .'s is aga**in** sucha one, for we have **in** obvious notation :The next proposition lies deeper .If -2 f = (If,)' - (2fn) n = (If, -2 f,) " -Theorem 7 .6 .5 . Let { k f , k > 1 } be a sequence of **in**f**in**itely divisible ch .f.'sconverg**in**g everywhere to the ch .f. f . Then f is **in**f**in**itely divisible .PROOF . The difficulty is to prove first that f never vanishes . Consider, as**in** the proof of Theorem 7 .6 .1 : g = I f 12 4g = I kf I2. For each n > 1, let x 1 Jndenote the real positive nth root of a real positive x . Then we have, by thehypothesis of convergence and the cont**in**uity of x11' as a function of x,(13) Vt : [kg(t)] 'I' --*[g(t)] lln .By the Corollary to Theorem 7 .6 .4, the left member **in** (13) is a ch .f. The rightmember is cont**in**uous everywhere . It follows from the convergence theoremfor ch .f.'s that [g( .)] 1 /n is a ch .f . S**in**ce g is its nth power, and this is true foreach n > 1, we have proved that g is **in**f**in**itely divisible and so never vanishes .Hence f never vanishes and has a dist**in**guished logarithm X def**in**ed everywhere. Let that of k f be dl . S**in**ce the convergence of k f to f is necessarilyuniform **in** each f**in**ite **in**terval (see Sec . 6 .3), it follows from Theorem 7 .6 .3that d- -* ? everywhere, and consequently(14) exp(k~.(t)/n) -a exp(A(t)/n)for every t . The left member **in** (14) is a ch .f. by Theorem 7 .6 .4, and the rightmember is cont**in**uous by the def**in**ition of X . Hence it follows as before thate'*O' is a ch .f . and f is **in**f**in**itely divisible .The follow**in**g alternative proof is **in**terest**in**g . There exists a S > 0 suchthat f does not vanish for ItI < S, hence a. is def**in**ed **in** this **in**terval . Foreach n, (14) holds unifonnly **in** this **in**terval by Theorem 7 .6 .3 . By Exercise 6of Sec . 6 .3, this is sufficient to ensure the existence of a subsequence from{exp(kX(t)/n), k > 1} converg**in**g everywhere to some ch .f. (p,, . The nth powerof this subsequence then converges to (cp„)", but be**in**g a subsequence of 1k f }it also converges to f . Hence f = (cpn ) 1l, and we conclude aga**in** that f is**in**f**in**itely divisible .Us**in**g the preced**in**g theorem, we can construct a wide class of ch .f.'sthat are **in**f**in**itely divisible . For each a and real u, the function( ehd _ 1q3(t ; a, u) = ee~

7 .6 INFINITE DIVISIBILITY 1 257is an **in**f**in**itely divisible ch .f., s**in**ce it is obta**in**ed from the Poisson ch .f. withparameter a by substitut**in**g ut for t . We shall call such a ch .f. a generalizedPoisson ch .f. A f**in**ite product of these :kjj T (t ; a j , u j ) = expj=1 j=1j(e"i - 1)is then also **in**f**in**itely divisible. Now if G is any bounded **in**creas**in**g function,the **in**tegral fo(e't" - 1)dG(u) may be approximated by sums of the k**in**dappear**in**g as exponent **in** the right member of (16), for all t **in** M, 1 and **in**deeduniformly so **in** every f**in**ite **in**terval (why?) . It follows that for each such G,the function(17) f (t) = exp (- 1)dG(u)[r_0is an **in**f**in**itely divisible ch .f. Now it turns out that although this falls somewhatshort of be**in**g the most general form of an **in**f**in**itely divisible ch .f.,we have nevertheless the follow**in**g qualititive result, which is a completegeneralization of (16) .Theorem 7 .6 .6 . For each **in**f**in**itely divisible ch .f. f, there exists a doublearray of pairs of real constants (a n j , u„ j ), 1 < j < k,, , 1 < n, where a j > 0,such that(18) f (t) = li ~jj q (t ; an j, un j) .j=1k„The converse is also true . Thus the class of **in**f**in**itely divisible d .f.'s co**in**cideswith the closure, with respect to vague convergence, of convolutions of a f**in**itenumber of generalized Poisson d .f.'s .PROOF . Let f and f „ be as **in** (1) and let X be the dist**in**guished logarithmof f , F„ the d .f . correspond**in**g to f n . We have for each t, as n -a oo :n [fn ( t ) - 1] = n [e X(t)ln - 1] --~ A(t)and consequently(19) e»[f"(t)-1l -* e x(t) - f (t)Actually the first member **in** (19) is a ch .f. by Theorem 6 .5 .6, so that theconvergence is uniform **in** each f**in**ite **in**terval, but this fact alone is neithernecessary nor sufficient for what follows . We have(eitu -n [f n (t) - 1 ] = f0C 1)n dF„ (u) .-oo

25 8 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSFor each n, n F, is a bounded **in**creas**in**g function, hence there exists{a,,j,unj ;1

7 .6 INFINITE DIVISIBILITY 1 259Let us end this section with an **in**terest**in**g example . Put s = o, + it, a > 1and t real ; consider the Riemann zeta function :where p ranges over all prime numbers . Fix a > 1 and def**in**e.f (t)+ it) .We assert that f is an **in**f**in**itely divisible ch .f. For each p and every real t,the complex number 1 - p- ° - ` r lies with**in** the circle {z : 1z - 1I < 1} . Letlog z denote that determ**in**ation of the logarithm with an angle **in** (-1r, m] . Bylook**in**g at the angles, we see thatlog 11 pp--tr - log(1 - P-U) - log(1 - p-U-tr )S**in**ce001 (e -(m log 1 it - 1)MP 'norm=100m=1f (t) = "li 0 11T3(t ;P 0it is an **in**f**in**itely divisible ch .f .* 3 . Let f be a ch .f. such that there exists a sequence of positive **in**tegersnk go**in**g to **in**f**in**ity and a sequence of ch .f .'s ck satisfy**in**g f = ((p k )" z ; thenf is **in**f**in**itely divisible .4 . Give another proof that the right member of (17) is an **in**f**in**itely divisiblech .f. by us**in**g Theorem 6 .5 .6 .

1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS2 6 05 . Show that f (t) _ (1 - b)1(1 - be"), 0 < b < 1, is an **in**f**in**itely divisiblech .f. [xmrr: Use canonical form .]6. Show that the d .f. with density ~ar(a)-lx"-le-$X, a > 0, 8 > 0, **in**(0, oo), and 0 otherwise, is **in**f**in**itely divisible .7. Carry out the proof of Theorem 7 .6 .2 specifically for the "trivial" but**in**structive case f (t) = e'", where a is a fixed real number .*8 . Give an example to show that **in** Theorem 7 .6 .3, if the uniformity ofconvergence of k f to f is omitted, then kA need not converge to A . {Hmn :k f (t) = exp{2rri(-1)kkt(1 + kt)-1 -1).]9. Let f (t) = 1 - t, f k (t) = I - t + (_1)kitk-1, 0 < t < 2, k > 1. Thenf k never vanishes and converges uniformly to f **in** [0, 2] . Let If-k, denote thedist**in**guished square root of f k **in** [0, 2] . Show that / 'fk does not converge**in** any neighborhood of t = 1 . Why is Theorem 7 .6 .3 not applicable? [Thisexample is supplied by E . Reich] .*10 . Some writers have given the proof of Theorem 7 .6 .6 by apparentlyconsider**in**g each fixed t and us**in**g an analogue of (21) without the "supjtj,there . Criticize this "quick proof' . [xmrr : Show that the two relationsVt and Vm :Vt :Jim umn (t) = u,n (t),M_+00lim u .n (t) = u(t),n-+oodo not imply the existence of a sequence {mn } such thatVt : nli U,,,, (t) = U(t) .-~ ccIndeed, they do not even imply the existence of two subsequences {,n„} and{n„} such thatVt : 1 rn um„n (t) = u(t) .Thus the extension of Lemma 1 **in** Sec . 7 .2 is false .]The three "counterexamples" **in** Exercises 7 to 9 go to show that thecavalierism alluded to above is not to be shrugged off easily .11. Strengthen**in**g Theorem 6 .5 .5, show that two **in**f**in**itely divisiblech .f.'s may co**in**cide **in** a neighborhood of 0 without be**in**g identical .*12 . Reconsider Exercise 17 of Sec . 6.4 and try to apply Theorem 7 .6 .3 .[HINT : The latter is not immediately applicable, ow**in**g to the lack of uniformconvergence . However, show first that if e`nit converges for t E A, where**in**(A) > 0, then it converges for all t . This follows from a result due to Ste**in**haus,assert**in**g that the difference set A - A conta**in**s a neighborhood of 0 (see,e .g ., Halrnos [4, p . 68]), and from the equation e`n"e`n`t = e`n`~t-f-t Let (b, J,{b', } be any two subsequences of c,, then e(b„-b„ )tt --k 1 for all t . S**in**ce I is a

7 .6 INFINITE DIVISIBILITY 1 261ch .f., the convergence is uniform **in** every f**in**ite **in**terval by the convergencetheorem for ch .f .'s . Alternatively, ifco(t) = lim e`n` tn-+a0then ~o satisfies Cauchy's functional equation and must be of the form e"t,which is a ch .f. These approaches are fancier than the simple one **in**dicated**in** the h**in**t for the said exercise, but they are **in**terest**in**g . There is no knownquick proof by "tak**in**g logarithms", as some authors have done .]Bibliographical NoteThe most comprehensive treatment of the material **in** Secs . 7 .1, 7 .2, and 7 .6 is byGnedenko and Kolmogorov [12] . In this as well as nearly all other exist**in**g bookson the subject, the handl**in**g of logarithms must be strengthened by the discussion **in**Sec . 7 .6 .For an extensive development of L**in**deberg's method (the operator approach) to**in**f**in**itely divisible laws, see Feller [13, vol . 2] .Theorem 7 .3 .2 together with its proof as given here is implicit **in**W . Doebl**in**, Sur deux problemes de M. Kolmogoroff concernant les cha**in**esdenombrables, Bull . Soc . Math . France 66 (1938), 210-220 .It was rediscovered by F . J. Anscombe . For the extension to the case where the constantc **in** (3) of Sec . 7 .3 is replaced by an arbitrary r.v ., see H . Wittenberg, Limit**in**g distributionsof random sums of **in**dependent random variables, Z . Wahrsche**in**lichkeitstheorie1 (1964), 7-18 .Theorem 7 .3 .3 is conta**in**ed **in**P . Erdos and M . Kac, On certa**in** limit theorems of the theory of probability, Bull .Am . Math . Soc . 52 (1946) 292-302 .The proof given for Theorem 7 .4 .1 is based onP . L . Hsu, The approximate distributions of the mean and variance of a sample of**in**dependent variables, Ann . Math . Statistics 16 (1945), 1-29 .This paper conta**in**s historical references as well as further extensions .For the law of the iterated logarithm, see the classicA . N . Kolmogorov, Uber das Gesetz des iterierten Logarithmus, Math . Annalen101 (1929), 126-136 .For a survey of "classical limit theorems" up to 1945, seeW . Feller, The fundamental limit theorems **in** probability, Bull . Am . Math . Soc .51 (1945), 800-832 .

2 62 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSKai Lai Chung, On the maximum partial sums of sequences of **in**dependent randomvariables . Trans . Am . Math . Soc . 64 (1948), 205-233 .Inf**in**itely divisible laws are treated **in** Chapter 7 of Levy [11] as the analyticalcounterpart of his full theory of additive processes, or processes with **in**dependent**in**crements (later supplemented by J . L . Doob and K . Ito) .New developments **in** limit theorems arose from connections with the Brownianmotion process **in** the works by Skorohod and Strassen . For an exposition see references[16] and [22] of General Bibliography .

8Random walk8 .1 Zero-or-one lawsIn this chapter we adopt the notation N for the set of strictly positive **in**tegers,and N° for the set of positive **in**tegers ; used as an **in**dex set, each is endowedwith the natural order**in**g and **in**terpreted as a discrete time parameter . Similarly,for each n E N, N,, denotes the ordered set of **in**tegers from 1 to n (both**in**clusive) ; N° that of **in**tegers from 0 to n (both **in**clusive) ; and Nn that of**in**tegers beg**in**n**in**g with n + 1 .On the probability triple (2, T , °J'), a sequence {X n , n E N} where eachX n is an r .v . (def**in**ed on Q and f**in**ite a .e .), will be called a (discrete parameter)stochastic process . Various Borel fields connected with such a process will nowbe **in**troduced . For any sub-B .F . ~ of ~T, we shall write(1)XEanduse the expression "X belongs to . " or "S conta**in**s X" to mean thatX -1 ( :T) c 6' (see Sec . 3 .1 for notation) : **in** the customary language X is saidto be "measurable with respect to ~j" . For each n E N, we def**in**e two B .F .'sas follows :

264 1 RANDOM WALK`7n = the augmented B .F . generated by the family of r .v .'s {Xk, k E N n } ;that is, , is the smallest B .F.' conta**in****in**g all Xk **in** the familyand all null sets ;= the augmented B .F . generated by the family of r.v .'s {Xk, k E N'n } .Recall that the union U,°°_1 ~ is a field but not necessarily a B .F . Thesmallest B .F . conta**in****in**g it, or equivalently conta**in****in**g every -,~?n- , n E N, isdenoted by00moo = V Mn ;n=1it is the B .F . generated by the stochastic process {Xn , n E N} . On the otherhand, the **in**tersection f n_ 1 n is a B .F . denoted also by An° = 1 0~' . It will becalled the remote field of the stochastic process and a member of it a remoteei vent .S**in**ce 3~~ C ~,T, 9' is def**in**ed on ~& . For the study of the process {X n , n EN} alone, it is sufficient to consider the reduced triple (S2, 3'~, G1' Thefollow**in**g approximation theorem is fundamental .Theorem 8 .1 .1 .such thatGiven e > 0 andA E ~, there exists A E E Un_=1 9,Tn(2' -1~6(AAA E ) < c .PROOF . Let -6' be the collection of sets A for which the assertion of thetheorem is true . Suppose Ak E 6~ for each k E N and Ak T A or A k , A . ThenA also belongs to i, as we can easily see by first tak**in**g k large and thenapply**in**g the asserted property to Ak . Thus ~ is a monotone class . S**in**ce it istrivial that ~ conta**in**s the field Un° l nthat generates ~&, ' must conta**in**;1 by the Corollary to Theorem 2 .1 .2, prov**in**g the theorem .Without us**in**g Theorem 2 .1 .2, one can verify that 6' is closed with respectto complementation (trivial), f**in**ite union (by Exercise 1 of Sec . 2 .1), andcountable union (as **in**creas**in**g limit of f**in**ite unions) . Hence ~ is a B .F . thatmust conta**in** ~& .It will be convenient to use a particular type of sample space Q . In theno :ation of Sec . 3 .4, let00Q = X Qn,n=1where each St n is a "copy" of the real l**in**e R 1 . Thus S2 is just the space of all**in**nnite sequences of real numbers . A po**in**t w will be written as {wn , n E N},and w, as a function of (o will be called the nth coord**in**ate nction) of w .

8 .1 ZERO-OR-ONE LAWS 1 2 65Each S2„ is endowed with the Euclidean B .F . A 1 , and the product Borel field**in** the notation above) on Q is def**in**ed to be the B .F. generated bythe f**in**ite-product sets of the form(3) n {co : w n , E B" j }kj=1where (n 1 , . . . , n k ) is an arbitrary f**in**ite subset of N and where each B" 1 E A 1 .In contrast to the case discussed **in** Sec . 3 .3, however, no restriction is made onthe p .m . ?T on JT . We shall not enter here **in**to the matter of a general constructionof JT. The Kolmogorov extension theorem (Theorem 3 .3 .6) asserts thaton the concrete space-field (S2, i ) just specified, there exists a p .m . 9' whoseprojection on each f**in**ite-dimensional subspace may be arbitrarily preassigned,subject only to consistency (where one such subspace conta**in**s the other) .Theorem 3 .3 .4 is a particular case of this theorem .In this chapter we shall use the concrete probability space above, of theso-called "function space type", to simplify the exposition, but all the resultsbelow rema**in** valid without this specification . This follows from a measurepreserv**in**ghomomorphism between an abstract probability space and one ofthe function-space type ; see Doob [17, chap . 2] .The chief advantage of this concrete representation of Q is that it enablesus to def**in**e an important mapp**in**g on the space .DEFINITION OF THE SHIFT .The shift z is a mapp**in**g of 0 such thatr : w = {w", n E N} --f rw = {w,,+1, n E N} ;**in** other words, the image of a po**in**t has as its nth coord**in**ate thecoord**in**ate of the orig**in**al po**in**t .(n + l)stClearly r is an oo-to-1 mapp**in**g and it is from Q onto S2 . Its iteratesare def**in**ed as usual by composition : r ° = identity, rk = r . r k-1 for k > 1 . It**in**duces a direct set mapp**in**g r and an **in**verse set mapp**in**g r -1 accord**in**g tothe usual def**in**itions . Thusr -1A`{w:rweA}and r - " is the nth iterate of r -1 . If A is the set **in** (3), then(4) r-1 A= n {w : w„ ;+1 E B,, ; }It follows from this that r -1 maps Ji **in**to ~/T ; more precisely,VA E :-~ : -r- " A E n E N,kj=1

2 6 6 1 RANDOM WALKwhere is the Borel field generated by {wk, k > n]. This is obvious by (4)if A is of the form above, and s**in**ce the class of A for which the assertionholds is a B .F ., the result is true **in** general .DEFNrrION . A set **in** 3 is called **in**variant (under the shift) iff A = r-1 A .An r .v . Y on Q is **in**variant iff Y(w) = Y(rw) for every w E 2 .Observe the follow**in**g general relation, valid for each po**in**t mapp**in**g zand the associated **in**verse set mapp**in**g -c - I, each function Y on S2 and eachsubset A of : h 1(5) r -1 {w: Y(w) E A} = {w : Y(rw) E A} .This follows from z -1 ° Y -1 = (Y o z) -1 .We shall need another k**in**d of mapp**in**g of Q . A permutation on Nn is a1-to-1 mapp**in**g of N„ to itself, denoted as usual byl, 2, . . ., n61, 62, . . ., an ) .The collection of such mapp**in**gs forms a group with respect to composition .A f**in**ite permutation on N is by def**in**ition a permutation on a certa**in** "**in**itialsegment" N of N . Given such a permutation as shown above, we def**in**e awto be the po**in**t **in** Q whose coord**in**ates are obta**in**ed from those of w by thecorrespond**in**g permutation, namely_ w0j, if j E N n ;(mow) j - wj , if j E Nn .As usual . o **in**duces a direct set mapp**in**g a and an **in**verse set mapp**in**g a -1 ,the latter be**in**g also the direct set mapp**in**g **in**duced by the "group **in**verse"07 -1 of o . In analogy with the preced**in**g def**in**ition we have the follow**in**g .DEFncrHON . A set **in** i is called permutable iff A = aA for every f**in**itepermutation or on N. A function Y on Q is permutable iff Y(w) = Y(aw) for,every f**in**ite permutation a and every w E Q .It is fairly obvious that an **in**variant set is remote and a remote set ispermutable : also that each of the collections : all **in**variant events, all remoteevents, all permutable events, forms a sub-B .F. of,- . If each of these B .F.'sis augmented (see Exercise 20 of Sec 2 .2), the result**in**g augmented B .F .'swill be called "almost **in**variant", "almost remote", and "almost permutable",respectively . F**in**ally, the collection of all sets **in** Ji of probability either 0 or

8 .1 ZERO-OR-ONE LAWS 1 2 6 71 clearly forms a B .F ., which may be called the "all-or-noth**in**g" field . ThisB .F . will also be referred to as "almost trivial" .Now that we have def**in**ed all these concepts for the general stochasticprocess, it must be admitted that they are not very useful without furtherspecifications of the process . We proceed at once to the particular case below .DEFINITION . A sequence of **in**dependent r .v .'s will be called an **in**dependentprocess ; it is called a stationary **in**dependent process iff the r .v .'s have acommon distribution .Aspects of this type of process have been our ma**in** object of study,usually under additional assumptions on the distributions . Hav**in**g christenedit, we shall henceforth focus our attention on "the evolution of the processas a whole" -whatever this phrase may mean . For this type of process, thespecific probability triple described above has been constructed **in** Sec . 3 .3 .Indeed, ZT= ~~, and the sequence of **in**dependent r .v .'s is just that of thesuccessive coord**in**ate functions [a), n c- N), which, however, will also be**in**terchangeably denoted by {X,,, n E N} . If (p is any Borel measurable function,then {cp(X„ ), n E N) is another such process .The follow**in**g result is called Kolmogorov's "zero-or-one law" .Theorem 8 .1 .2 . For an **in**dependent process, each remote event has probabilityzero or one .PROOF . Let A E nn, 1 J~,t and suppose that DT(A) > 0 ; we are go**in**g toprove that ?P(A) = 1 . S**in**ce 3„ and uri are **in**dependent fields (see Exercise 5of Sec . 3 .3), A is **in**dependent of every set **in** ~„ for each n c- N ; namely, ifM E Un 1 %,, then(6) P(A n m) = ~(A)?)(M) .If we set~A(M) _~(A n m)'~(A)for M E JT, then ETA ( .) is clearly a p .m . (the conditional probability relative toA ; see Chapter 9) . By (6) it co**in**cides with ~~ on Un °_ 1 3 „ and consequentlyalso on -~T by Theorem 2 .2 .3 . Hence we may take M to be A **in** (6) to concludethat /-,(A) = ?/'(A) 2 or ,?P(A) = 1 .The usefulness of the notions of shift and permutation **in** a stationary**in**dependent process is based on the next result, which says that both -1 -1 andor are "measure-preserv**in**g" .

268 1 RANDOM WALKTheorem 8 .1 .3 . For a stationary **in**dependent process, if A E - and a isany f**in**ite permutation, we have(7) J'(z - 'A) _ 9/'(A) ;(8) ^QA) _ .J'(A ) .PROOF . Def**in**e a set function JT on T as follows :~' (A) = 9)(i-'A) .S**in**ce z -1 maps disjo**in**t sets **in**to disjo**in**t sets, it is clear that ~' is a p .m . Fora f**in**ite-product set A, such as the one **in** (3), it follows from (4) thatkA(A) _ fJ,u,(B ) = -'J'(A) .j=1Hence ;~P and -';~ co**in**cide also on each set that is the union of a f**in**ite numberof such disjo**in**t sets, and so on the B .F. 2-7 generated by them, accord**in**g toTheorem 2 .2 .3 . This proves (7) ; (8) is proved similarly .The follow**in**g companion to Theorem 8 .1 .2,is very' useful .due to Hewitt and Savage,Theorem 8 .1 .4 . For a stationary **in**dependent process, each permutable eventhas probability zero or one .PROOF . Let A be a permutable event . Given c > 0, we may choose Ek > 0so thatccE Ek < 6 -k=1By Theorem 8 .1 .1, there exists Ak E '" k such that 91(A A A k )

8.1 ZERO-OR-ONE LAWS 1 269Apply**in**g this to Ek = A A Mk , and observ**in**g the simple identitywe deduce thatlim sup (A A Mk) =(A\ lim **in**f Mk) U (lira sup Mk \ A ),'(A o lim sup Mk ) < J'(lim sup(A A Mk)) < E .S**in**ce U',, Mk E n m, the set lim sup Mk belongs to nk_,,, „A . , which is seento co**in**cide with the remote field . Thus lim sup M k has probability zero or oneby Theorem 8 .1 .2, and the same must be true of A, s**in**ce E is arbitrary **in** the**in**equality above .Here is a more transparent proof of the theorem based on the metric onthe measure space (S2, ~~ 7 , 9) given **in** Exercise 8 of Sec . 3 .2. S**in**ce A k andMk are **in**dependent, we haveA(Ak fl Mk) _ ,'~1(Ak)9'(Mk) .Now Ak -,, A and M k-* A **in** the metric just mentioned, hence alsoAkf1Mk-+APIA**in** the same sense . S**in**ce convergence of events **in** this metric implies convergenceof their probabilities, it follows that J' (A fl A) = J'(A)~(A ), and thetheorem is proved .Corollary . For a stationary **in**dependent process, the B .F .'s of almostpermutable or almost remote or almost **in**variant sets all co**in**cide with theall-or-noth**in**g field .EXERCISESQ and ~ are the **in**f**in**ite product space and field specified above .1 . F**in**d an example of a remote field that is not the trivial one ; to makeit **in**terest**in**g, **in**sist that the r.v .'s are not identical .2. An r .v . belongs to the all-or-noth**in**g field if and only if it is constanta .e .3. If A is **in**variant then A = rA ; the converse is false .4. An r .v . is **in**variant [permutable] if and only if it belongs to the**in**variant [permutable] field .5. The set of convergence of an arbitrary sequence of r .v .'s {Y,,, n E N}or of the sequence of their partial sums E " _, Y j are both permutable . Theirlimits are permutable r .v.'s with doma**in** the set of convergence .

270 1 RANDOM WALK*6. If an > O, limn , a„ exists > 0 f**in**ite or **in**f**in**ite, and lim n_o, ( a,, +I /an)= 1, then the set of convergence of {an-1 E~_1 Y 1 } is **in**variant . If a n -* + 00 ,the upper and lower limits of this sequence are **in**variant r .v .'s .*7. The set {Y2, E A i .o .}, where A E Adl, is remote but not necessarily**in**variant; the set {E'= Yj A i .o .} is permutable but not necessarily remote .F**in**d some other essentially different examples of these two k**in**ds .8. F**in**d trivial examples of **in**dependent processes where the threenumbers JT(r -1 A), T(A), :7(rA) take the values 1, 0, 1 ; or 0, z, 1 .9. Prove that an **in**variant event is remote and a remote event ispermutable .*10 . Consider the bi-**in**f**in**ite product space of all bi-**in**f**in**ite sequences ofreal numbers {co n , n E N}, where N is the set of all **in**tegers **in** its natural(algebraic) order**in**g . Def**in**e the shift as **in** the text with N replac**in**g N, andshow that it is 1-to-1 on this space. Prove the analogue of (7) .11 . Show that the conclusion of Theorem 8 .1 .4 holds true for a sequenceof **in**dependent r.v .'s, not necessarily stationary, but satisfy**in**g the follow**in**gcondition : for every j there exists a k > j such that Xk has the same distributionas X j . [This remark is due to Susan Horn .]12 . Let IX,, n > 1 } be **in**dependent r .v .'s with 1flX n = 4 - n } = ~P{Xn =-4-'T } = 2 . Then the remote field of {S,,, n > 11, where S„ Xi, is nottrivial .8.2 Basic notionsFrom now on we consider only a stationary **in**dependent process {X11 , n EN} on the concrete probability triple specified **in** the preced**in**g section . Thecommon distribution of X„ will be denoted by p. (p .m .) or F (d .f.) ; when onlythis is **in**volved, we shall write X for a representative X,,, thus ((X) for cr'(X n ) .Our **in**terest **in** such a process derives ma**in**ly from the fact that it underliesanother process of richer content . This is obta**in**ed by form**in**g the successivepartial sums as follows :(1)nSn = ZX 1 , n EN .j=1An **in**itial r .v. So - 0 is adjo**in**ed whenever this serves notational convenience,as **in** X n = Sn - S„ _ I for n E N . The sequence IS,, n E N} is then a veryfamiliar object **in** this book, but now we wish to f**in**d a proper name for it . Anofficially correct one would be "stochastic process with stationary **in**dependentdifferences" ; the name "homogeneous additive process" can also be used . We

8 .2 BASIC NOTIONS 1 2 7 1have, however, decided to call it a "random walk (process)", although the useof this term is frequently restricted to the case when A is of the **in**teger latticetype or even more narrowly a Bernoullian distribution .DEFINUION OF RANDOM WALK . A random walk is the process {S n , n E N}def**in**ed **in** (1) where {Xn , n E N} is a stationary **in**dependent process . Byconvention we set also So = 0 .A similar def**in**ition applies **in** a Euclidean space of any dimension, butwe shall be concerned only with R1 except **in** some exercises later .Let us observe that even for an **in**dependent process {Xn , n E N}, itsremote field is **in** general different from the remote field of {S n , n E N}, whereS„ _ >J=1 X j . They are almost the same, be**in**g both almost trivial, for astationary **in**dependent process by virtue of Theorem 8 .1 .4, s**in**ce the remotefield of the random walk is clearly conta**in**ed **in** the permutable field of thecorrespond**in**g stationary **in**dependent process .We add that, while the notion of remoteness applies to any process,"(shift)-**in**variant" and "permutable" will be used here only for the underly**in**g"coord**in**ate process" {co n , n E N} or IX,, n E N} .The follow**in**g relation will be much used below, for m < nmn-mX (rrnw)j=1 j=1=n-m X i+. (w) = Sn (w) - Sm (w) •It follows from Theorem 8 .1 .3 that S, _,n and S n - Sm have the same distribution. This is obvious directly, s**in**ce it is just ,u (n - nt ) * .As an application of the results of Sec . 8 .1 to a random walk, we statethe follow**in**g consequence of Theorem 8 .1 .4 .Theorem 8 .2.1 . Let B„ E %3 1 for each n e N . Thenis equal to zero or one .u~{Sn E B n i .o .}PROOF . If cr is a permutation on N,n , then S„ (6co) = S n (c )) for n > m,hence the set 00A**in** = U {Sn E Bn}n =**in**is unchanged under o--1 or a . S**in**ce A, n decreases as m **in**creases, itfollows that00n A, Am**in**=]is permutable, and the theorem is proved .

272 1 RANDOM WALKEven for a fixed B = B n the result is significant, s**in**ce it is by no meansevident that the set {Sn > 0 i .o .), for **in**stance, is even vaguely **in**variantor remote with respect to {X n , n E N} (cf. Exercise 7 of Sec . 8 .1) . Yet thepreced**in**g theorem implies that it is **in** fact almost **in**variant . This is the strengthof the notion of permutability as aga**in**st **in**variance or remoteness .For any serious study of the random walk process, it is imperative to**in**troduce the concept of an "optional r .v ." This notion has already been usedmore than once **in** the book (where?) but has not yet been named . S**in**ce thebasic **in**ferences are very simple and are supposed to be **in**tuitively obvious,it has been the custom **in** the literature until recently not to make the formal**in**troduction at all . However, the reader will profit by meet**in**g these fundamentalideas for the theory of stochastic processes at the earliest possible time .They will be needed **in** the next chapter, too .DEFINITION OF OPTIONAL r .v . An r .v. a is called optional relative to thearbitrary stochastic process {Z n , n E N} iff it takes strictly positive **in**tegervalues or -boo and satisfies the follow**in**g condition :(2) Vn E N U fool : {w : a(w) = n} E W**in**,where „ is the B .F . generated by {Z k , k E Nn } .Similarly if the process is **in**dexed by N° (as **in** Chapter 9), then the rangeof a will be N ° . Thus if the **in**dex n is regarded as the time parameter, thena effects a choice of time (an "option") for each sample po**in**t (0 . One mayth**in**k of this choice as a time to "stop", whence the popular alias "stopp**in**gtime", but this is usually rather a momentary pause after which the processproceeds aga**in** : time marches on!Associated with each optional r .v . a there are two or three importantobjects . First, the pre-a field `T is the collection of all sets **in** ~& of the form(3) U {{a = n) n A n ],I < n

8 .2 BASIC NOTIONS 1 273Instead of requir**in**g a to be def**in**ed on all 0 but possibly tak**in**g the valueoc, we may suppose it to be def**in**ed on a set A **in** Note that a strictlypositive **in**teger n is an optional r.v . The concepts of pre-a and post-a fieldsreduce **in** this case to the previous ~„ and ~~, ; .A vital example of optional r.v . is that of the first entrance time **in**to agiven Borel set A :00m**in**{n E N : Z n (w) E A} on U {w : Z n (w) E A) ;(5) aA(w) = n=1+00 elsewhere .To see that this is optional, we need only observe that for each n E N :{w : aA(w) = n } _ {w : Z j (w) E A`, 1 < j < n - 1 ; Zn (w) E A}which clearly belongs to ?,T, ; similarly for n = oo .Concepts connected with optionality have everyday counterparts, implicit**in** phrases such as "with**in** thirty days of the accident (should it occur)" . Historically,they arose from "gambl**in**g systems", **in** which the gambler choosesopportune times to enter his bets accord**in**g to previous observations, experiments,or whatnot . In this **in**terpretation, a + 1 is the time chosen to gambleand is determ**in**ed by events strictly prior to it. Note that, along with a, a + 1is also an optional r .v ., but the converse is false .So far the notions are valid for an arbitrary process on an arbitrary triple .We now return to a stationary **in**dependent process on the specified triple andextend the notion of "shift" to an "a-shift" as follows : ra is a mapp**in**g on{a < oo} such that(6) racy = r n w on {w : a(co) = n} .Thus the post-a process is just the process IX,, (taco), n E N} . Recall**in**g thatX n is a mapp**in**g on S2, we may also write(7) Xa+n (w) = Xn (r a (0) = (Xn ° 'r ' ) (0) )and regard X„ ° ra, n E N, as the r .v.'s of the new process . The **in**verse setmapp**in**g (ra) -1 , to be written more simply as r-a, is def**in**ed as usual :r-'A = {w : raw E A} .Let us now prove the fundamental theorem about "stopp**in**g" a stationary**in**dependent process .Theorem 8 .2 .2 . For a stationary **in**dependent process and an almost everywheref**in**ite optional r .v . a relative to it, the pre-a and post-a fields are **in**dependent. Furthermore the post-a process is a stationary **in**dependent processwith the same common distribution as the orig**in**al one .

274 1 RANDOM WALKPROOF . Both assertions are summarized **in** the formula below . For anyA E .-;,, k E N, Bj E A 1 , 1< j< k, we have(8) 9'{A;X«+j E Bj, 1 < j < k} = _"P{A} 11 g(Bj) .j=1To prove (8),we observe that it follows from the def**in**ition of a and ~a that(9) Aft{a=n}=A n fl{a=n}E9,,where A n E 7n for each n E N . Consequently we have°J'{A :a = n ; X,, + , E B j , 1 < j < k} = °J'{A n ; a = n ;Xn+j E Bj , 1 < j < k}k= _IP{A ;a = n}_'{Xn+j E B j , 1 < j < k}_A ; a = n} 11 g(Bj),kj=1where the second equation is a consequence of (9) and the **in**dependence ofn and zT' . Summ**in**g over n E N, we obta**in** (8) .An immediate corollary is the extension of (7) of Sec . 8 .1 to an a-shift .Corollary . For each A E 3~~ we have(10) 91 (r-'A) _ _'~P(A) .a kJust as we iterate the shift r, we can iterate r" . Put a 1**in**ductively byak+1(( o) = ak k E N .= a, and def**in**eEach a k is f**in**ite a .e . if a is . Next, def**in**e 8o = 0, andWe are now **in** a position to state the follow**in**g result, which will beneed°,: later.Theorem 8 .2.3. Let a be an a .e . f**in**ite optional r .v . relative to a stationary**in**dependent process . Then the random vectors {Vk, k E N}, whereVk(w) _ (ak (w) , X k_,+i (w), . . . , XP, (w)),are **in**dependent and identically distributed .

8 .2 BASIC NOTIONS 1 275PROOF . The **in**dependence follows from Theorem 8 .2.2 by our show**in**gthat 1 ,11, . . . , Vk_ 1 belong to the pre-,Bk_ 1 field, while V k belongs to the post-$k_1 field . The details are left to the reader ; cf. Exercise 6 below .To prove that Vk and Vk+1 have the same distribution, we may supposethat k = 1 . Then for each n E N, and each n-dimensional Borel set A, we have{w : a 2 (w) = n ; (Xaj+1(w), . . ., Xai+a 2 ((O)) E A}= {co : a l (t"co) = n ; (X 1 (r"CO), . . ., Xa i (r" (0) E A),s**in**ceXaI (r " CJ) = X a i(~W)(t " co) = X a 2(w )(taw)= XaI (m)+a2((0) (co) = X,1 +,2 (w)by the quirk of notation that denotes by X,,( .) the function whose value atw is given by Xa( ,, ) (co) and by (7) with n = a2 (co) . By (5) of Sec . 8 .1, thepreced**in**g set is the r-"-image (**in**verse image under r') of the set{w : a l (w) = n ; (X 1(w), . . ., Xal (w)) E A),and so by (10) has the same probability as the latter . This proves our assertion .Corollary. The r .v .'s {Y k , k E N}, whereYk(W) =f+1(P(Xn(w))and co is a Borel measurable function, are **in**dependent and identicallydistributed .For cP - 1, Y k reduces to ak . For co(x) - x, Yk = S& - S $A _, . The readeris advised to get a clear picture of the quantities ak, 13k' and Y k beforeproceed**in**g further, perhaps by consider**in**g a special case such as (5) .We shall now apply these considerations to obta**in** results on the "globalbehavior" of the random walk . These will be broad qualitative statementsdist**in**guished by their generality without any additional assumptions .The optional r .v . to be considered is the first entrance time **in**to the strictlypositive half of the real l**in**e, namely A = (0, oc) **in** (5) above . Similar resultshold for [0, oc) ; and then by tak**in**g the negative of each X n , we may deducethe correspond**in**g result for (-oc, 0] or (-oo, 0) . Results obta**in**ed **in** this waywill be labeled as "dual" below . Thus, omitt**in**g A from the notation :00m**in**{n E N : Sn > 0} on U {w : S n (w) > 0} ;(1 1) a(co)n=1+cc elsewhere ;

2 7 6 1 RANDOM WALKandb'n EN :{a=n}={Sj

8.2 BASIC NOTIONS 1 27 7Theorem 8.2 .5. For the general random walk, there are four mutually exclusivepossibilities, each tak**in**g place a .e . :(i) `dn E N : S n = 0 ;(ii) S, -~ -00 ;(Ill) Sn --~ +00 ;(iv) -00 = limn-),oo Sn < lim""~ S n = +00 .PROOF. If X = 0 a .e ., then (i) happens . Exclud**in**g this, let c1 = limn S n .Then ~o1 is a permutable r.v ., hence a constant c, possibly ±oo, a .e . byTheorem 8 .1 .4 . S**in**celim Sn = X 1 + lim(Sn - X1),nnwe have (P1 = X1 +'P2, where c0 2 (0)) ='p l (rw) = c a.e . S**in**ce X1 $ 0, itfollows that c = +oo or -oo . This means thatBy symmetry we have alsoeither llm S n = +oo or lim S n = -00nneither lim S n = -oo or lim S, = + 00 .nnThese double alternatives yield the new possibilities (ii), (iii), or (iv), othercomb**in**ations be**in**g impossible .This last possibility will be elaborated upon **in** the next section .EXERCISESIn Exercises 1-6, the stochastic process is arbitrary .* 1 . a is optional if and only if V n E N : {a < n } E `f**in** .*2 . For each optional a we have a E a and Xa E « . If a and ~ are bothoptional and a < ,8, then 3 C . p7 .3 . If a 1 and a2 are both optional, then so is a l A a2, a l V a 2 , a 1 + a 2 . Ifa is optional and A E a, then ao def**in**ed below is also optional :ao =a on A+00 on c2\0 .*4 . If a is optional and ~ is optional relative to the post-a process, thena + ~ is optional (relative to the orig**in**al process) .5 . `dk E N : a l + . . . + a k is optional . [For the a **in** (11), this has beencalled the kth ladder variable .]

27 8 1 RANDOM WALK*6. Prove the follow**in**g relations :ak = a ° TOA-1 ; ,T~A-1 ° Ta = TO' ; XI'R k-i +j ° -r' = X /gk+j7. If a and 8 are any two optional r .v .'s, thenX 13+j (Taw) = Xa((0)+fl(raw)+j ( (0 ) ;(T O ° Ta)(w) = z~(r"w)+a(w)(~)0 T~+a (w) **in** general .*8. F**in**d an example of two optional r .v .'s a and fi such that a < ,B buta' ~5 ~j . However, if y is optional relative to the post-a process and ,B =a + y, then **in**deed Via' D 1' . As a particular case, ~k is decreas**in**g (while'-1 k is **in**creas**in**g) as k **in**creases .9. F**in**d an example of two optional r .v .'s a and ,B such that a < ,B but- a is not optional .10. Generalize Theorem 8 .2 .2 to the case where the doma**in** of def**in**itionand f**in**iteness of a is A with 0 < ~P(A) < 1 . [This leads to a useful extensionof the notion of **in**dependence . For a given A **in** 9 with _`P(A) > 0, twoevents A and M, where M C A, are said to be **in**dependent relative to A iff{A n A n M} = {A n A}go {M} .]* 11 . Let {X,,, n E N} be a stationary **in**dependent process and folk, k E N}a sequence of strictly **in**creas**in**g f**in**ite optional r .v .'s . Then {Xa k +1, k E N} isa stationary **in**dependent process with the same common distribution as theorig**in**al process . [This is the gambl**in**g-system theorem first given by Doob **in**1936 .]12 . Prove the Corollary to Theorem 8 .2 .2 .13. State and prove the analogue of Theorem 8 .2 .4 with a replaced bya[o,,,,) . [The **in**clusion of 0 **in** the set of entrance causes a small difference .]14. In an **in**dependent process where all X„ have a common bound,(c{a} < oc implies (`{S a } < oc for each optional a [cf. Theorem 5 .5 .3] .8 .3 RecurrenceA basic question about the random walk is the range of the whole process :U,OO_ 1 S„ (w) for a .e . w ; or, "where does it ever go?" Theorem 8 .2 .5 tells usthat, ignor**in**g the trivial case where it stays put at 0, it either goes off to -00or +oc, or fluctuates between them . But how does it fluctuate? Exercise 9below will show that the random walk can take large leaps from one end tothe other without stopp**in**g **in** any middle range more than a f**in**ite number oftimes . On the other hand, it may revisit every neighborhood of every po**in**t an**in**f**in**ite number of times . The latter circumstance calls for a def**in**ition .

8.3 RECURRENCE I 2 7 9DEFINrrION . The number x E P21 is called a recurrent value of the randomwalk {S,,, n E N}, iff for every c > 0 we have(1) {ISn -xI < E i .0 .} = 1 .The set of all recurrent values will be denoted by t)1 .Tak**in**g a sequence of E decreas**in**g to zero, we see that (1) implies theapparently stronger statement that the random walk is **in** each neighborhoodof x i .o . a.e .Let us also call the number x a possible value of the random walk ifffor every e > 0, there exists n E N such that ISn - xI < E} > 0 . Clearly arecurrent value is a possible value of the random walk (see Exercise 2 below) .Theorem 8 .3 .1 . The set Jt is either empty or a closed additive group of realnumbers . In the latter case it reduces to the s**in**gleton {0) if and only if X = 0a .e . ; otherwise 91 is either the whole _0'2 1 or the **in**f**in**ite cyclic group generatedby a nonzero number c, namely {±nc : n E N° ) .PROOF. Suppose N :A 0 throughout the proof . To prove that S}1 is a group,let us show that if x is a possible value and y E Y, then y - x E N. Supposenot; then there is a strictly positive probability that from a certa**in** value of non, Sn will not be **in** a certa**in** neighborhood of y - x . Let us put for z E R 1 :(2) p€,m(z) _ 7{IS n - zI > E for all n > m} ;Iso that p 2E , n,(y - x) > 0 for some e > 0 and m E N . S**in**ce x is a possiblevalue, for the same c we have a k such that P{ISk - xI < E) > 0 . Now thetwo **in**dependent events Sk - x I < E and IS„ - Sk - (y - x) I > 2E togetherimply that IS, - y I > E ; hence(3) PE,k+m (Y) _ ='T {IS,, - Y I >E for all n > k + m)> JY{ISk - XI < WITS, - Sk - (y - x)I > 2E for all n > k + **in**) .The last-written probability is equal to p2E .n1(y - x), s**in**ce S„ - Sk has thesame distribution as Sn _k . It follows that the first term **in** (3) is strictly positive,contradict**in**g the assumption that y E 1. We have thus proved that 91 is anadditive subgroup of :%,' . It is trivial that N as a subset of %, 1 is closed **in** theEuclidean topology . A well-known proposition (proof?) asserts that the onlyclosed additive subgroups of are those mentioned **in** the second sentenceof the theorem . Unless X - 0 a .e ., it has at least one possible value x =A 0,and the argument above shows that -x = 0 - x c= 91 and consequently alsox = 0 - (-x) E 91 . Suppose 91 is not empty, then 0 E 91 . Hence 91 is not as**in**gleton . The theorem is completely proved .

2 8 0 1 RANDOM WALKIt is clear from the preced**in**g theorem that the key to recurrence is thevalue 0, for which we have the criterion below .Theorem 8 .3 .2 .If for some E > 0 we have(4) 1: -,:;P { IS, I< E} < oc,nthen(5) -:flIS .< E i .o .} = 0(for the same c) so that 0 ' N . If for every E > 0 we have(6) EJ ?;D{IS,I < E} = oo,nthen(7) ?;?If IS, I < E i .o .} = 1for every E > 0 and so 0 E 9I .Remark . Actually if (4) or (6) holds for any E > 0, then it holds forevery c > 0 ; this fact follows from Lemma 1 below but is not needed here .PROOF . The first assertion follows at once from the convergence part ofthe Borel-Cantelli lemma (Theorem 4 .2 .1) . To prove the second part considerF = lim **in**f{ I S, I > E} ;nnamely F is the event that IS n I < E for only a f**in**ite number of values of n .For each co on F, there is an m(c)) such that IS, (w)I > E for all n > m(60) ; itfollows that if we consider "the last time that IS, I < E", we have0C~'(F) =7{ISmI< E ; IS, I > E for all n > m + 1} .M=0S**in**ce the two **in**dependent events ISm I < E and I S n - Sm I > 2E together implythat IS, I > e, we have1 > 1' (F) >am=1ISmI< E},'~P{IS, - SmI > 2E for all n > m+ 1}M=1{ISmI< E}P2E,1(0)

8.3 RECURRENCE 1 28 1by the previous notation (2), s**in**ce S,, - Sn, has the same distribution as Sn-m •Consequently (6) cannot be true unless P2,,1(0) = 0 . We proceed to extendthis to show that P2E,k = 0 for every k E N . To this aim we fix k and considerthe eventA,,,={IS.IEforall n>m+k} ;then A, and A n,- are disjo**in**t whenever m' > m + k and consequently (why?)kThe argument above for the case k = 1 can now be repeated to yieldm=100k J'{ISm I < E}P2E,k(0),m=1and so P2E,k(0) = 0 for every E > 0 . Thus1'(F) = +m PE,k(0) = 0,which is equivalent to (7) .A simple sufficient condition for 0 E t, or equivalently for 91 willnow be given .Theorem 8.3 .3 . If the weak law of large numbers holds for the random walkIS, n E N} **in** the form that S n /n -* 0 **in** pr ., then N 0 0 .PROOF. We need two lemmas, the first of which is also useful elsewhere .Lemma 1 .For any c > 0 and m E N we have00(8) ,~P{ISn I < mE} < 2m -'~fl IS„ I < e} .(9)n=0PROOF OF LEMMA 1 . It is sufficient to prove that if the right member of (8)is f**in**ite, then so is the left member and (8) is true . PutI = (- E, E),(PJ (Sn ) =n=0J = [JE, (j + 1)E),for a fixed j E N ; and denote by co1,'J the respective **in**dicator functions .Denote also by a the first entrance time **in**to J, as def**in**ed **in** (5) of Sec . 8 .2with Z„ replaced by S,, and A by J . We have=k00=1coj (Sn )d 7 .

2 82 1 RANDOM WALKThe typical **in**tegral on the right side is by def**in**ition of a equal tof1 + 6oj (Sn ) d ' 0 : lim„ .,,) un (Sn) = 1 .(10) un (1) = oo .n=0Remark. If (ii) is true for all **in**teger m > 1, then it is true for all realm > 1, with c doubled .PROOF OF LEMMA 2. Suppose not ; then for every A > 0 :00 >n=0U, 0) > -1 [Am]1(m) > -cm n=0Cm n=0

8.3 RECURRENCE 1 2 831 A> U ( )Cm n=0Lett**in**g m -+ oc and apply**in**g (iii) with 8 = A -1 ,001: un=0n (1) ? A 'Cwe obta**in**s**in**ce A is arbitrary, this is a contradiction, which proves the lemmaTo return to the proof of Theorem 8 .3 .3, we apply Lemma 2 withun (m) = 3 '{ ISn I < m} .Then condition (i) is obviously satisfied and condition (ii) with c = 2 followsfrom Lemma 1 . The hypothesis that S n /n -- 0 **in** pr. may be written asu n (Sn) = GJ'S nnfor every S > 0 as n - oo, hence condition (iii) is also satisfied . Thus Lemma 2yields00E-'~'P{ I Sn I < 1) _ + 00 .n=0Apply**in**g this to the "magnified" random walk with each X n replaced by X n /E,which does not disturb the hypothesis of the theorem, we obta**in** (6) for everye > 0, and so the theorem is proved .In practice, the follow**in**g criterion is more expedient (Chung and Fuchs,1951) .Theorem 8 .3.4 . Suppose that at least one of S(X+) and e(X - ) is f**in**ite . The910 0 if and only if cP(X) = 0 ; otherwise case (ii) or (iii) of Theorem 8 .2 .5happens accord**in**g as c(X) < 0 or > 0 .PROOF . If -oo < (-f(X) < 0 or 0 < o (X) < +oo, then by the strong lawof large numbers (as amended by Exercise 1 of Sec . 5 .4), we haveSn -~ (X) a .e .,nso that either (ii) or (iii) happens as asserted . If (f (X) = 0, then the same lawor its weaker form Theorem 5 .2 .2 applies ; hence Theorem 8 .3 .3 yields theconclusion .

284 1 RANDOM WALKDEFINITION OF RECURRENT RANDOM WALK. A random walk will be calledrecurrent iff , :A 0 ; it is degenerate iff N _ {0} ; and it is of the lattice typeiff T is generated by a nonzero element .The most exact k**in**d of recurrence happens when the X, 's have a commondistribution which is concentrated on the **in**tegers, such that every **in**teger isa possible value (for some S,,), and which has mean zero . In this case foreach **in**teger c we have 7'{S n = c i .o .} = 1 . For the symmetrical Bernoullianrandom walk this was first proved by Polya **in** 1921 .We shall give another proof of the recurrence part of Theorem 8 .3 .4,namely that ( (X) = 0 is a sufficient condition for the random walk to be recurrentas just def**in**ed . This method is applicable **in** cases not covered by the twopreced**in**g theorems (see Exercises 6-9 below), and the analytic mach**in**eryof ch .f.*s which it employs opens the way to the considerations **in** the nextsection .The start**in**g po**in**t is the **in**tegrated form of the **in**version formula **in**Exercise 3 of Sec . 6 .2 . Talk**in**g x = 0, u = E, and F to be the d .f. of S n , wehaveEE(11) -"On I < E) > f ~(ISn I < u) du = E f : 1 -~ os Et f(t)n dt .Thus the series **in** (6) may be bounded below by summ**in**g the last expression**in** (11) . The latter does not sum well as it stands, and it is natural to resortto a summability method . The Abelian method suits it well and leads to, for0 I1-rf(t)I2 >0and (1 - cos Et)/t 2 > CE 2 for IEtI < 1 and some constant C, it follows thatfor r~ < 1\c the right member of (12) is not less than(14)m-nRj 1 - r f (t) dt .77Now the existence of FF(JXI) implies by Theorem 6 .4 .2 that 1 - f (t) = o(t)as t -* 0 . Hence for any given 8 > 0 we may choose the rl above so thatII1 - rf (t) 1 2 < (1 - r + r[1 - R f (t)]) 2 + (rlf (t)) 2pp

8 .3 RECURRENCE 1 285< 2(1 - r) 2 + 2(r8t) 2 + (r8t) 2 = 2(1 - r) 2 + 3r 28Y .The **in**tegral **in** (14) is then not less thann (1 - r) dt 1 rW - 6-1ds-2(1 - r) 2 + 3r2S2t2 dt >- 3 f_(1_r) r71 + 8 2 s 2As r T 1, the right member above tends to rr\3S ; s**in**ce S is arbitrary, we haveproved that the right member of (12) tends to +oo as r T 1 . S**in**ce the series**in** (6) dom**in**ates that on the left member of (12) for every r, it follows that(6) is true and so 0 E In by Theorem 8 .3 .2 .EXERCISESf is the ch .f . of p .1. Generalize Theorems 8 .3 .1 and 8 .3 .2 to Rd . (For d > 3 the generalizationis illusory ; see Exercise 12 below .)*2. If a random walk **in**R dis recurrent, then every possible value is arecurrent value .3. Prove the Remark after Theorem 8 .3 .2 .*4. Assume that 7{X 1 = 01 < 1 . Prove that x is a recurrent value of therandom walk if and only if00n=1'{ ISn - x I < E) = oo for every c > 0 .5. For a recurrent random walk that is neither degenerate nor of thelattice type, the countable set of po**in**ts IS, (w), n E N) is everywhere dense **in**M, 1 for a .e.w. Hence prove the follow**in**g result **in** Diophant**in**e approximation :if y is irrational, then given any real x and c > 0 there exist **in**tegers m and nsuch that Imy + n - xj < E .*6 . If there exists a 6 > 0 such that (the **in**tegral below be**in**g real-valued)sdtlim = 00,rTlrf (t)then the random walk is recurrent .*7. If there exists a S > 0 such thatsdtsupO

28 6 I RANDOM WALKthen the random walk is not recurrent . [HINT : Use Exercise 3 of Sec . 6 .2 toshow that there exists a constant C(E) such that,,'(IS,, I < E) < C(E)_'w11 - cos E-lx l/EC(E) uAn (dx)

8 .3 RECURRENCE 1 2 8 7is a strictly positive quadratic form Q **in** (t 1 , t2, t3)- If3i=lIti I 0 such that for every b > 1 and m > m(b) :2(iii') un (m) > dm for m 2 < n < (bm) 2nThen (10) is true .*15. Generalize Theorem 8 .3 .3 to 92 as follows . If the central limittheorem applies **in** the form that S 12 I ,fn- converges **in** dist. to the unitnormal, then the random walk is recurrent . [HINT : Use Exercises 13 and 14 andExercise 4 of § 4 .3 . This is sharper than Exercise 11 . No proof of Exercise 12us**in**g a similar method is known .]16. Suppose (X) = 0, 0 < (,,(X2) < oo, and A is of the **in**teger latticetype, thenGJ'{S n 2 = 0 i .o .} = 1 .17. The basic argument **in** the proof of Theorem 8 .3 .2 was the "last time**in** (-E, E)" . A harder but **in**structive argument us**in**g the "first time" may begiven as follows .f n {ISO)>Eform

288 1 RANDOM WALKIt follows by a form of Lemma 1 **in** this section thatlim**in**-+x n=mf nm ) > 14now use Theorem 8 .1 .4 .18. For an arbitrary random walk, if flVn E N : S n > 0) > 0, thennP{Sn < 0, S n+i > 01 < oo .Hence if **in** addition 9I{Vn E N :S, < 01 > 0, thenand consequentlyE I -~P7 {Sn > 01 - -~P{Sn+1 > O1j < oo ;nn(-1)n,,,J'{Sn > 01 < oo .n[HINT : For the first series, consider the last time that S n < 0 ; for the third series,apply Du Bois-Reymond's test. Cf. Theorem 8 .4 .4 below ; this exercise willbe completed **in** Exercise 15 of Sec . 8 .5 .]8 .4 F**in**e structureIn this section we embark on a prob**in**g **in** depth of the r .v . a def**in**ed **in** (11)of Sec . 8.2 and some related r .v .'s .The r .v. a be**in**g optional, the key to its **in**troduction is to break up thetime sequence **in**to a pre-a and a post-a era, as already anticipated **in** theterm**in**ology employed with a general optional r .v . We do this with a sort ofcharacteristic functional of the process which has already made its appearance**in** the last section :O° 1(1) ~rrn e itSn _ n f (t) n = 1 -rf(t)n=0 n=0where 0 < r < 1, t is real, and f is the ch .f . of X . Apply**in**g the pr**in**ciple justenunciated, we break this up **in**to two parts :a-I

8 .4 FINE STRUCTURE 1 289with the understand**in**g that on the set {a = oo}, the first sum above is En°_ owhile the second is empty and hence equal to zero . Now the second part maybe written as(2) e I:r00 00a+n eitSa+ n = raeitsaLrrn e it(S..+n - S.)n=0 n=0It follows from (7) of Sec . 8 .2 that,S'a+n- S a = Sno r ahas the same distribution as S n , and by Theorem 8 .2 .2 that for each n it is**in**dependent of S a . [Note that the same fact has been used more than oncebefore, but for a constant a .] Hence the right member of (2) is equal tot { ra eits" } d,rn e itsn = f{raeits,} 11-rf(t)where raeitsais taken to be 0 for a = oo . Substitut**in**g this **in**to (1), weobta**in**1 a-1(3) [1 - t{raeitsa } ] _ r n e itsn1 - rf (t)n=0We have00(4) 1 - a „ {r e its '} = 1 - Efla=n)r nand(5)a-1 00 r k-1r n e itSn = 1 rn eitsn d -OPn=0 k=1 {a=k} n=000= 1 + Er n eits~ dJ'n=1 {a>n}by an **in**terchange of summation . Let us record the two power series appear**in**g**in** (4) and (5) asP(r, t) = 1 - S{raeits') = 1: r n Pn (t) ;00n=0Q(r, t) _ rneitsn-E rn gn(t),5Gn=0 n=000

2 9 0 1 RANDOM WALKwherepo(t) pn (t) _ -fla=n)eitSn dJ' = - f eitX Un (dx) ;qo(t) = 1, qn (t) ={a>n)eirsn dam'= Jei x Vn (dx) .iNow U n ( •) = Jf{a = n ; S n E •} is a f**in**ite measure with support **in** (0, oc),while V,, ( •) = Jf{a > n ; S, E •} is a f**in**ite measure with support **in** (-oc, 0] .Thus each p n is the Fourier transform of a f**in**ite measure **in** (0, oo), whileeach qn is the Fourier transform of a f**in**ite measure **in** (-oc, 0] . Return**in**g to(3), we may now write it as(6) 1 - r f (t) P(r, t) = Q(r, t) .The next step is to observe the familiar Taylor series :Thus we have1 °O1 - x = exp , IxI < 1 .n=1(7)11 - r f (t) = eXpn=1= expr ' eirsn dGJ' + f{Sn >o){Sn _

8 .4 FINE STRUCTURE 1 29 1rearrangements of the result**in**g double series that0000.f+(r, t) = 1 + ~on(t), .f-(r, t) = 1 + " *n(t),n=1where each cp n is the Fourier transform of a measure **in** (0, oo), while eachY**in** is the Fourier transform of a measure **in** (-oo, 0] . Substitut**in**g (7) **in**to (6)and multiply**in**g through by f + ( r, t), we obta**in**(8) P(r, t) f -(r, t) = Q(r, t) f +(r, t) .The next theorem below supplies the basic analytic technique for this development,known as the Wiener-Hopf technique .Theorem 8.4 .1 .LetP(r, t) rn pn (t), Q(r, t) = n qn (t),n =O0000n=0P * (r, t) = r n P n (t), Q * (r, t) _n=0 n=000n=1n q,i (t),where po (t) - qo (t) = po (t) = qo (t) = 1 ; and for n > 1, p„ and p ; as functionsof t are Fourier transforms of measures with support **in** (0, oc) ; q„ andq, as functions of t are Fourier transforms of measures **in** (-oo, 0]. Supposethat for some r o > 0 the four power series converge for r **in** (0, r o ) and allreal t, and the identity(9) P(r, t)Q * (r, t) _- P * (r, t)Q(r, t)holds there . ThenP=P * ,The theorem is also true if (0, oc) and(-oc, 0), respectively .Pk(t)gn-k(t) _k=0 k=0(-oc, 0] are replaced by [0, oc) andPROOF . It follows from (9) and the identity theorem for power series thatfor every n > 0 :nnPk( t )gn - k( t ) -Then for n = 1 equation (10) reduces to the follow**in**g :pl (t) - pi (t) = qi (t) - qi (t) .

292 1 RANDOM WALKBy hypothesis, the left member above is the Fourier transform of a f**in**ite signedmeasure v 1 with support **in** (0, oc), while the right member is the Fouriertransform of a f**in**ite signed measure with support **in** (-oc, 0] . It follows fromthe uniqueness theorem for such transforms (Exercise 13 of Sec . 6 .2) thatwe must have v 1 - v 2 , and so both must be identically zero s**in**ce they havedisjo**in**t supports . Thus p1 = pi and q l = q* . To proceed by **in**duction on n,suppose that we have proved that p j - pt and q j - q~ for 0 < j < n - 1 .Then it follows from (10) thatpn (t) + qn (t)_- pn (t) + qn (t) •Exactly the same argument as before yields that pn - pn and qn =- qn . Hencethe **in**duction is complete and the first assertion of the theorem is proved ; thesecond is proved **in** the same way .Apply**in**g the preced**in**g theorem to (8), we obta**in** the next theorem **in**the case A = (0, oo) ; the rest is proved **in** exactly the same way .Theorem 8 .4 .2 . If a = aA is the first entrance time **in**to A, where A is oneof the four sets : (0, oo), [0, oc), (-oc, 0), (-oo, 0], then we have1 - c`{ r a e `rs~} = exp - e`tsnd 7 ;r n eitsn1 = exp +n=1 n {S„EA`}elrsn d~P .From this result we shall deduce certa**in** analytic expressions **in**volv**in**gthe r .v . a . Before we do that, let us list a number of classical Abelian andTauberian theorems below for ready reference .(* )(A) If c, > 0 and En°- o C n r n converges for 0 < r < 1, then00lim c n rn =rT1 11=0 n=0f**in**ite or **in**f**in**ite .(B) If Cn are complex numbers and En°_ o c, rn converges for 0 < r < 1,then (*) is true .(C) If Cn are complex numbers such that Cn = o(n _1 ) [ or just O(n -1 )]as n --+ oc, and the limit **in** the left member of (*) exists and is f**in**ite, then(*) is true .

8.4 FINE STRUCTURE 1 293(D) If c,(,' ) > 0, En °-0 c ( ' ) r n converges for 0 < r < 1 and diverges forr=1,i=1,2;andnnE Ck (l) K1 : n2)k=O k=0[or more particularly cn' ) - Kc,(2) ] as n --+ oo, where 0 < K < +oc, then00 00cMrn _ K 1: C (2) r nnn=0 n=0nas rT1 .(E) If c,> 0 and00E Cn rnn=011-ras r T 1, thenn-1I: Ck-n .k=0Observe that (C) is a partial converse of (B), and (E) is a partial converseof (D) . There is also an analogue of (D), which is actually a consequence of(B) : if c n are complex numbers converg**in**g to a f**in**ite limit c, then as r f 1,00E Cn rnn=0ti c1 - rProposition (A) is trivial . Proposition (B) is Abel's theorem, and proposition(C) is Tauber's theorem **in** the "little o" version and Littlewood's theorem**in** the "big 0" version ; only the former will be needed below . Proposition (D)is an Abelian theorem, and proposition (E) a Tauberian theorem, the latterbe**in**g sometimes referred to as that of Hardy-Littlewood-Karamata . All fourcan be found **in** the admirable book by Titchmarsh, The theory of functions(2nd ed ., Oxford University Press, Inc ., New York, 1939, pp . 9-10, 224 ff.) .Theorem 8 .4 .3 .The generat**in**g function of a **in** Theorem 8 .4 .2 is given by(13) (`{r') = 1 - exp -n=1rnGJ'[S n E A]nn= 1 - (1 - r) exp y- GJ'[S n E AC] .

294 1 RANDOM WALKWe haveO° 1(14) ~P{a < oo} = 1 if and only f -~[S n E A] = oc ;**in** which case(15) 1 {a) = exp E n~[S n E Ac]n=1PROOF . Sett**in**g t = 0 **in** (11), we obta**in** the first equation **in** (13), fromwhich the second follows at once through10000YnY n 00 Y n= exp = exp -OP[Sn E A] + -"P[Sn E A`]1-r nn= n= n=S**in**celim 1 {ra } = lim ?'{ = n =rT1rTln=0n=1by proposition (A), the middle term **in** (13) tends to a f**in**ite limit, hence alsothe power series **in** r there (why?) . By proposition (A), the said limit may beobta**in**ed by sett**in**g r = 1 **in** the series . This establishes (14) . F**in**ally, sett**in**gt = 0 **in** (12), we obta**in**n=0n=1a-1 n(16) Lrn = expn=0 n=1r~P [Sn E A c ]nRewrit**in**g the left member **in** (16) as **in** (5) and lett**in**g r T 1, we obta**in**00(17) urn rn 7[a > n] = °J' [a > n] = 1{a} < oorTI= n} = P{a < oo}by proposition (A) . The right member of (16) tends to the right member of(15) by the same token, prov**in**g (15) .When is (N a) **in** (15) f**in**ite? This is answered by the next theorem.*Theorem 8 .4.4 .is f**in**ite ; thenSuppose that X # 0 and at least one of 1(X+) and 1(X - )(18) 1(X) > 0 = ct{a(o, 00 )} < 00 ;(19) c'(X) < 0 = (P{a[o,00)) = 00-* It can be shown that S, --* +oc a .e . if and only if f{a(o,~~} < oo ; see A . J . Lemo**in**e, Annalsof **Probability** 2(1974) .

8.4 FINE STRUCTURE I 2 9 5PROOF . If c` (X) > 0, then 9 1{Sn -f +oo) = 1 by the strong law of largenumbers . Hence Sn = -00} = 0, and this implies by the dualof Theorem 8 .2 .4 that {a(_,,,o) < oo) < 1 . Let us sharpen this slightly toJ'{a(_,,),o] < o0} < 1 . To see this, write a' = a(_oo,o] and consider S, as**in** the proof of Theorem 8 .2 .4 . Clearly S**in** < 0, and so if JT{a' < oo} = 1,one would have J'{S n < 0 i .o .) = 1, which is impossible . Now apply (14) toa(_o,~,o] and (15) to a(o, 00 ) to **in**ferO° 1f l(y(o,oo)) = exp E < 01 < oc,n=1 nprov**in**g (18) . Next if (X) = 0, then J'{a(_~,o) < 00} = 1 by Theorem 8 .3 .4 .Hence, apply**in**g (14) toand (15) to a[o,,,.), we **in**fer{a[o,~)} = exp°O 1n J'[S n < 0] = 0c .n=1F**in**ally, if (X) < 0, then /){a[o,~) = 00} > 0 by an argument dual to thatgiven above for a(,,,o], and so {a[o,,,3 )} = oc trivially .Incidentally we have shown that the two r .v .'s a(o,w ) and a[o, 0c ) haveboth f**in**ite or both **in**f**in**ite expectations . Compar**in**g this remark with (15), wederive an analytic by-product as follows .Corollary .We have(20)n=1JT [Sn = 0] < 00 .nThis can also be shown, purely analytically, by means of Exercise 25 ofSec . 6 .4 .The astonish**in**g part of Theorem 8 .4 .4 is the case when the random walkis recurrent, which is the case if ((X) = 0 by Theorem 8 .3 .4 . Then the set[0, 00), which is more than half of the whole range, is revisited an **in**f**in**itenumber of times . Nevertheless (19) says that the expected time for even onevisit is **in**f**in**ite! This phenomenon becomes more paradoxical if one reflectsthat the same is true for the other half (-oc, 0], and yet **in** a s**in**gle step oneof the two halves will certa**in**ly be visited . Thus we have :01(- 0,-,01 A a[o, C)O) = 1, ( {a'(_ ,o]} = c~'{a[o,oo)} = 00 .Another curious by-product concern**in**g the strong law of large numbersis obta**in**ed by comb**in****in**g Theorem 8 .4.3 with Theorem 8 .2 .4 .

296 1 RANDOM WALKTheorem 8 .4 .5 . S, 1n -* m a .e . for a f**in**ite constant m if and only if forevery c > 0 we haven=1S n--mnPROOF . Without loss of generality we may suppose m = 0 . We know fromTheorem 5 .4 .2 that S n /n -> 0 a.e . if and only if (,'(IX I) < oc and (~ (X) = 0 .If this is so, consider the stationary **in**dependent process {X' , n E N}, whereX n = X n - E, E > 0 ; and let S' and a' = ado ~> be the correspond**in**g r .v .'sfor this modified process . S**in**ce AX') _ - E, it follows from the strong lawof large numbers that S' --> -oo a .e ., and consequently by Theorem 8 .2 .4 wehave < oo} < 1 . Hence we have by (14) applied to a' :001(22) 1: -°J'[S n - nE > 0] < 00 .nBy consider**in**g Xn + E **in**stead of X n - E, we obta**in** a similar result with"Sn - nE > 0" **in** (22) replaced by "S n + nE < 0" . Comb**in****in**g the two, weobta**in** (21) when m = 0 .Conversely, if (21) is true with m = 0, then the argument above yields°J'{a' < oo} < 1, and so by Theorem 8 .2 .4, GJ'{limn . oo Sn = +oo} = 0 . Afortiori we have. E > 0 : _'P{S' > nE i .0 .} = -:7p{Sn > 2ne i .o .) = 0 .Similarly we obta**in** YE > 0 : J'{S,, < -2ne i .o .} = 0, and the last two relationstogether mean exactly Sn/n --* 0 a .e . (cf. Theorem 4 .2 .2) .Hav**in**g **in**vestigated the stopp**in**g time a, we proceed to **in**vestigate thestopp**in**g place S a , where a < oc . The crucial case will be handled first .Theorem 8 .4 .6 . If (-r(X) = 0 and 0 < e(X2) = 62 < oc, then(23) ~,-, {S«} =a exp1 -1 - - ' (Sn E A ) < 00 .n=1 n 2PROOF . Observe that ~,,(X) = 0 implies each of the four r .v .'s a is f**in**itea .e. by Theorem 8 .3 .4 . We now switch from Fourier transform to Laplacetransform **in** (11), and suppose for the sake of def**in**iteness that A = (0, oo ) .It is readily verified that Theorem 6 .6 .5 is applicable, which yields0° nr(24) 1 - e`"'{r"e- ;~S°} = exp - .- e -)'s" dGJ'nn=1 {sn>0)

8 .4 FINE STRUCTURE 1 2 9 7for 0 < r < 1, 0 < X < oo . Lett**in**g r T 1 **in** (24), we obta**in** an expression forthe Laplace transform of Sa , but we must go further by differentiat**in**g (24)with respect to X to obta**in**(25) (r'{rae- ;~S~Sa}r nI S„ e -X s, dUT • exp - - e -xS ^ d:~ ,(ii=1 n 17>0) n=1 n S,, >0)the justification for termwise differentiation be**in**g easy, s**in**ce f { IS,, I } < n J'I I X I) .If we now set ?, = 0 **in** (25), the result is(26) (-r{raS a }00 Yn 00 rn- (,{S„ } exp > 0]nn=1By Exercise 2 of Sec . 6 .4, we have as n --). oc,Ln Snn } s/27rnso that the coefficients **in** the first power series **in** (26) are asymptotically equalto a/-~f2- times those of°° . 12n(1 - r)-1/2 =22nC n / r n ,n=0s**in**ce2n ti22n 2) ( nrrn 1It follows from proposition (D) above that00Ilr 0)It rema**in**s to prove that the limit above is f**in**ite, for then the limit of thepower series is also f**in**ite (why? it is precisely here that the Laplace transformsaves the day for us), and s**in**ce the coefficients are o(l/n) by the central

298 1 RANDOM WALKlimit theorem, and certa**in**ly O(11n) **in** any event, proposition (C) above willidentify it as the right member of (23) with A = (0, oc) .Now by analogy with (27), replac**in**g (0, cc) by (-oo, 0] and writ**in**ga(_oc,o] as 8, we have(28) ({S p} = v lim exp n 1 J'(S,, < 0)rTl n 2n=1Clearly the product of the two exponentials **in** (27) and (28) is just exp 0 = 1,hence if the limit **in** (27) were +oc, that **in** (28) would have to be 0 . Buts**in**ce c% (X) = 0 and (`(X2) > 0, we have T(X < 0) > 0, which implies atonce %/'(Sp < 0) > 0 and consequently cf{S f } < 0 . This contradiction provesthat the limits **in** (27) and (28) must both be f**in**ite, and the theorem is proved .Theorem 8 .4 .7 . Suppose that X # 0 and at least one of (,(X+) and (-,''(X- )is f**in**ite ; and let a = a ( o , oc ) , , = a (_,, o 1 .(i) If (-r(X) > 0 but may be +oo, then ASa ) =(ii) If t(X) = 0, then (F(Sa ) and o (S p) are both f**in**ite if and only if(~, '(X 2 ) < 00 .PROOF. The assertion (i) is a consequence of (18) and Wald's equation(Theorem 5 .5.3 and Exercise 8 of Sec . 5 .5) . The "if' part of assertion (ii)has been proved **in** the preced**in**g theorem ; **in**deed we have even "evaluated"(Sa ) . To prove the "only if' part, we apply (11) to both a and ,8 **in** thepreced**in**g proof and multiply the results together to obta**in** the remarkableequation :(29) [ 1 - c { . a e tts a}]{l - C`{rIe`rs0}] = exp -- f (t) n = 1 - r f (t) .Sett**in**g r = 1, we obta**in** for t ; 0 :1 - f (t) 1 - c'{eitS-} 1 - (`L{el tSf}t 2 -it +itLett**in**g t ,(, 0, the right member above tends to (-,"ISa}c`'{-S~} by Theorem 6 .4 .2 .Hence the left member has a real and f**in**ite limit . This implies (`(X2) < 00 byExercise 1 of Sec . 6 .4 .8 .5 Cont**in**uationOur next study concerns the r .v .'s M n and M def**in**ed **in** (12) and (13) ofSec . 8 .2. It is convenient to **in**troduce a new r.v . L, which is the first time

8 .5 CONTINUATION 1 299(necessarily belong**in**g to N,,O ) that M n is atta**in**ed :(1) Vn ENO : L,(co) = m**in**{k E N° : Sk(w) = M n ((0)} ;note that Lo = 0 .We shall also use the abbreviation(2)a = a(O,00),8 = a(-00,0]as **in** Theorems 8 .4.6 and 8 .4 .7 .For each n consider the special permutation below :(3) pn_ l, 2, nn n-1 ,1)'which amounts to a "reversal" of the ordered set of **in**dices **in** N n . Recall**in**gthat 8 ° p n is the r .v . whose value at w is 6(p n w), and Sk ° pn = Sn - Sn-kfor 1 < k < n, we have{~ ° pn > n } =nn{Sk ° pn > 0}k=1It follows from (8) of Sec . 8 .1 that(4) e'tsn d 7 =n n-1= n {Sn > Sn-k) = n {Sn > Sk } _ {L n = n} .k=1 k=0e` t(s n ° Pn ) dJi = f e'tsn dg') .4>n} f{~ ° p, >n} {Ln=n}Apply**in**g (5) and (12) of Sec . 8 .4 to fi and substitut**in**g (4), we obta**in**(5)00 00`Y ,1rn e'tsn d ;? = exp+L_ e'tsn d9' ;n =o {L,=n} n=1 n {Sn>0}apply**in**g (5) and (12) of Sec . 8 .4 to{a > n } = {L,, = 0}, we obta**in**a and substitut**in**g the obvious relation(6)00 00• rnfe' ts n d :%T = exp +>-n=0 {L,=O}r'l e't.Sn ,~dJn{sn

300 1 RANDOM WALKM is f**in**ite a .e . if and only if(8) En-'~Pfl S n > 0} < oo,**in** which case we have(9)of L n :PROOF .00n=100{e"}= expE n[,9(eits„) - 1]n=1Observe the basic equation that follows at once from the mean**in**g( 10) IL, = k} _ {Lk = k} n {Lf_k ° z k = 0},where zk is the kth iterate of the shift . S**in**ce the two events on the right sideof (10) are **in**dependent, we obta**in** for each n E N°, and real t and u :(11)It follows that(12)n{eirM ne iu(sn -M n ) } = > eitskeiu(sn-Sk)d P00n=0n=0kk=0J_f{Ln=k}eitsk d~k=0 {Lk=k} {Ln-k~ik =0}nfe l"(sn-koTk )= eitsk dGJ'Jeiusn-k dJ~'Ak=0 {Lk=k} {Ln-k=0}r n {eitM ne iu(Sn -M n ) }00 00nfeitsn dc?P . N7 r'fLn=n} n =0 {Ln=O}eiusn d°J' .Sett**in**g u = 0 **in** (12) and us**in**g (5) as it stands and (6) with t = 0, we obta**in**00Er n=0 n=n {e itM n } = exp - isn >0}eirsn dP +f{Sn _0}1dGJ'which reduces to (7) .Next, by Theorem 8 .2 .4, M < oo a .e . if and only if J'{a < oc} < 1 orequivalently by (14) of Sec . 8 .4 if and only if (8) holds . In this case theconvergence theorem for ch .f.'s asserts that{eitM} = lim { e itM n },n-+00d~/

8 .5 CONTINUATION 1 30 1and consequently[ {ettM} = lim(l -rT100n=Or" c(` {e1tMn}= lim exp - - exprTl n nn=1 n=00= lim exp - [ '(e`tsn) - 1 ]rTlnn=1where the first equation is by proposition (B) **in** Sec . 8 .4 . S**in**ceE I 00I c" (ettsn) - 1 ?'P[Sn > 0] < 00n=1 n=1by (8), the last-written limit above is equal to the right member of (9), byproposition (B) . Theorem 8 .5 .1 . is completely proved .By switch**in**g to Laplace transforms **in** (7) as done **in** the proof ofTheorem 8 .4 .6 and us**in**g proposition (E) **in** Sec . 8 .4, it is possible to givean "analytic" derivation of the second assertion of Theorem 8 .5 .1 withoutrecourse to the "probabilistic" result **in** Theorem 8 .2 .4 . This k**in**d of mathematicalgambit should appeal to the curious as well as the obst**in**ate ; see Exercise 9below . Another **in**terest**in**g exercise is to establish the follow**in**g result :(13) ~(Mn) k f(Sk)k=1by differentiat**in**g (7) with respect to t . But a neat little formula such as (13)deserves a simpler proof, so here it is .Proof of (13) .Writ**in**gnMn=VS;=VSj=0 i=(1and dropp**in**g "dy" **in** the **in**tegrals below :nC '(Mn) =V SkJIM,, >O} k=1nnSk{M n >O ;S n >O} k=2 {M Mnn >O ;Sn 0} {Sn >0} k=2 {Mn_1>O ;Sn

302 1 RANDOM WALKCall the last three **in**tegrals fl , f2 , and f3 . We have on grounds of symmetryApply the cyclical permutation1 n J // S n .J {s n >o }1,2, . . .,n2,3, . . ., 1to f2to obta**in**Obviously, we have//, Mn-12 = J is n >0}Gather**in**g these, we obta**in**1 _ M n - 1 =IMn-1 •3 {M n_1 >O ;Sn

8 .5 CONTINUATION 1 3 0 3Lemma .For each n E N°, the two random vectorshave the same distribution .(Ln ,Sn ) and (n - Ln ,S n )PROOF .This will follow from (14) if we show that(15) dk E N, : pn 1 {L n = k} = {Ln = n - k} .Now for pn cw the first n + 1 partial sums Sj(pn c)), j E N°, are0,W n ,Wn +&)n-1, . . .,(On + . . .+Wn-j+1, . . .,(On + . . . +w1,which is the same asS n - Sn, Sn - Sn-1, Sn - Sn-2, . . . , S n - Sn_.j, . . . , Sn - S°,from which (15) follows by **in**spection .Theorem 8 .5 .2 .For each n E N°, the random vectors(16) (Ln, Sn) and (Vn, Sn)have the same distribution ; and the random vectors(16') (Ln, Sn ) and (vn , Sn )have the same distribution .PROOF . For n = 0 there is noth**in**g to prove ; for n = 1, the assertion about(16) is trivially true s**in**ce {L1 = 0} = {S1 < 0} = {v 1 = 0) ; similarly for (16') .We shall prove the general case by simultaneous **in**duction, suppos**in**g that bothassertions have been proved when n is replaced by n - 1 . For each k E N°- 1and y E ~W , 1 , let us putG(y) = J'{L„-1 = k; S„-1 < y}, H(y) _ ?''{ Vn-1 = k ; Sn-1 y} .Then the **in**duction hypothesis implies that G = H . S**in**ce X n is **in**dependentof Jhn _ 1 and so of the vector (Li - 1, S n -1), we have for each x E 3,' :w(17) /5{Llz -1 = k ; Sn < x} = f F(x - y) dG(y)00=1F(x - y)dH(y)00= ~P{Vn-1 = k ; S n < x),

304 1 RANDOM WALKwhere F is the common d .f . of each X, . Now observe that on the set {S„ < 0}we have L n = L,,-, by def**in**ition, hence if x < 0 :(18) {w : L„ (w) = k ; S,, (w) ::S x} = {w : Ln-1 (w) = k ; S n (w) < x} .On the other hand, on the set {S n < 0} we have vn - 1 = v n , so that if x < 0,(19) {w : v„ (w) = k ; S n (w) < x} = {cv : v n - 1 (w) = k ; S n (w) < x} .Comb**in****in**g (17) with (19), we obta**in**(20) Vk E N°, x < 0 : ?T{L n = k ; Sn < x } = ~{ vn = k ; Sn < x } .Next if k E N°, and x > 0, then by similar arguments :{w : Ln (w) = n - k ; Sn (w) > x} _ {w : Ln_ 1 (w) = n - k ; Sn (w) > x},{w : vn (w) = n - k ; Sn (CO) > X) 10-) V'-Us**in**g (16') when n is replaced by n - 1, we obta**in** the analogue of (20) :(21) Vk E N°, x > 0 : '{Ln = n - k ; Sn > x} = °J'{vn = n - k ; S n > x} .The left members **in** (21) and (22) below are equal by the lemma above, whilethe right members are trivially equal s**in**ce v n + vn = n :(22) `dk E N°, x > 0 : ~{Ln = k ; S„ > x} _ °J'{v n = k ; Sn > x} .Comb**in****in**g (20) and (22) for x = 0, we have(23) dk E N° : ~{Ln = k} = °'{vn = k} ;subtract**in**g (22) from (23), we obta**in** the equation **in** (20) for x > 0 ; hence itis true for every x, prov**in**g the assertion about (16) . Similarly for (16'), andthe **in**duction is complete .As an immediate consequence of Theorem 8 .5.2, the obvious relation(10) is translated **in**to the by-no-means obvious relation (24) below :Theorem 8 .5 .3 . We have for k E N°(24) % { v„ = k} = .mil'{vk = k} Z{vn_k = 0} .If the common distribution of each X„ is symmetric with no atom at zero,then_ 1l(25) Vk E N° : J'{ vn = k} = (- l )nk2 - n -2 k

8 .5 CONTINUATION 1 3 0 5PROOF . Let us denote the number on right side of (25), which is equal to1 2k) (2n - 2k)( k) ( n - k '22nby an (k) . Then for each n E N, {an (k), k E N° } is a well-known probabilitydistribution. For n = 1 we have9'{v1 = 0} = ~P{v1 = 1} = 2= a n (0) = a n (n),so that (25) holds trivially . Suppose now that it holds when n is replaced byn - 1 ; then for k E N,_1 we have by (24) :1 1GJ'{vn = k} _ (- 1)k_k2 (-1)n-k n-2k = a„ (k)It follows thatn-1{vn=0}+°J'{vn=n}_ 9~1 v„=k}k=1n-11 - a n (k) = a 12 (0) + a n (n) .k=1Under the hypotheses of the theorem, it is clear by consider**in**g the dual randomwalk that the two terms **in** the first member above are equal ; s**in**ce the twoterms **in** the last member are obviously equal, they are all equal and thetheorem is proved .Stirl**in**g's formula and elementary calculus now lead to the famous "arcs**in**law", first discovered by Paul Levy (1939) for Brownian motion .Theorem 8 .5 .4 . If the common distribution of a stationary **in**dependentprocess is symmetric, then we haveJ ( V n < 2 1 ~'X dubx E [0, 1 ] : lim .~' x = - arc s**in** x _ -n->oo n 7 TC p ,/u(l - u)This limit theorem also holds for an **in**dependent, not necessarily stationaryprocess, **in** which each X n has mean 0 and variance 1 and such that theclassical central limit theorem is applicable . This can be proved by the samemethod (**in**variance pr**in**ciple) as Theorem 7 .3 .3 .

3 0 6 1 RANDOM WALKEXERCISES1 . Derive (3) of Sec . 8 .4 by consider**in**g00g{ r aeus ° } = 11n=1{a>n-1}eirsn d?' -eirsn d ?P .{a>n}*2 . Under the conditions of Theorem 8 .4 .6, show that{Sa } _ - 11m Sn d GJ' .n-+oo {,>n}3. F**in**d an expression for the Laplace transform of Sa . Is the correspond**in**gformula for the Fourier transform valid?*4. Prove (13) of Sec . 8 .5 by differentiat**in**g (7) there ; justify the steps .5. Prove thatand deduce that1 00?~P[M = 0] = exp -1GJ'[Sn > 0]6. If M < oo a .e ., then it has an **in**f**in**itely divisible distribution .7. Prove thatT {Ln = 0 ; Sn = 0} =expn=0 n= n-?,'[S, = 0][HINT : One way to deduce this is to switch to Laplace transforms **in** (6) ofSec . 8 .5 and let a, -+ oo .]8. Prove that the left member of the equation **in** Exercise 7 is equal to100n=1J'{a' = n ; S n = 0}where a' = a{0, 00 ) ; hence prove its convergence .*9. Prove that00E I J'[S n > 0] < 00n=1I

8 .5 CONTINUATION 1 307implies JT(M < oc) = 1 as follows . From the Laplace transform version of(7) of Sec . 8 .5, show that for A > 0,00lim(1 -n n {e - ; ^ }rTlexists and is f**in**ite, say = X(A) . Now use proposition (E) **in** Sec . 8 .4 and applythe convergence theorem for Laplace transforms (Theorem 6 .6 .3) .*10 . Def**in**e a sequence of r .v .'s {Y n , n E N ° ) as follows :Yo = 0,where {Xn , n E N) is a stationary **in**dependent sequence . Prove that for eachn, Y n and M n have the same distribution . [This approach is useful **in** queu**in**gtheory .]11. If -oo < c,,'~' (X) < 0, then[HINT : If Vn = maxi 0] -[HINT : Consider lim n,,o En°_ o G.Y'{v,n = n }rn and use (24) of Sec . 8 .5 .]13 . If ~(X) = 0, (X 2 ) = U 2 , 0 < a2 < oc, then as n - oo we haveece-C= ~[Vn 0] ^ nn , 'P[Vn = n]nn ,where°O 1 1 lc-~n 2 -~T[S„ >0]} .n=1JJ~[HINT :Considerlim(1 -rT1)1200n =O"'~P'[vn = 0]

3 0 8 1 RANDOM WALKas **in** the proof of Theorem 8 .4.6, and use the follow**in**g lemma : if p„ is adecreas**in**g sequence of positive numbers such that- 1/2 .]then p„ ^- nnk=1Pk - 2n 1/z* 14 . Prove Theorem 8 .5 .4 .15. For an arbitrary random walk, we haven-1)n .( 1'{Sn > 0) < oc .n[HINT : Half of the result is given **in** Exercise 18 of Sec . 8 .3 . For the rema**in****in**gcase, apply proposition (C) **in** the 0-form to equation (5) of Sec . 8 .5 with L„replaced by vn and t = 0 . This result is due to D . L. Hanson and M . Katz .]Bibliographical NoteFor Sec . 8 .3, seeK. L . Chung and W . H . J . Fuchs, On the distribution of values of sums of randomvariables, Mem. Am . Math . Soc . 6 (1951), 1-12 .K . L . Chung and D . Ornste**in**, On the recurrence of sums of random variables,Bull . Am . Math . Soc . 68 (1962) 30-32 .Theorems 8 .4 .1 and 8 .4 .2 are due to G . Baxter ; seeG . Baxter, An analytical approach to f**in**ite fluctuation problems **in** probability,J . d' Analyse Math . 9 (1961), 31-70 .Historically, recurrence problems for Bernoullian random walks were first discussedby Polya **in** 1921 .The rest of Sec . 8 .4, as well as Theorem 8 .5 .1, is due largely to F . L . Spitzer ; seeF. L . Spitzer, A comb**in**atorial lemma and its application to probability theory,Trans . Am . Math . Soc . 82 (1956), 323-339 .F . L . Spitzer, A Tauberian theorem and its probability **in**terpretation, Trans . Am .Math . Soc . 94 (1960), 150-169 .See also his book below-which, however, treats only the lattice case :F . L . Spitzer, Pr**in**ciples of random walk . D . Van Nostrand Co ., Inc ., Pr**in**ceton,N .J ., 1964 .

8 .5 CONTINUATION 1 309The latter part of Sec . 8 .5 is based onW . Feller, On comb**in**atorial methods **in** fluctuation theory, Surveys **in** **Probability**and Statistics (the Harald Cramer volume), 75-91 . Almquist & Wiksell,Stockholm, 1959 .Theorem 8 .5 .3 is due to E . S . Andersen . Feller [13] also conta**in**s much material onrandom walks .

9 Condition**in**g . Markovproperty. Mart**in**gale9 .1 Basic properties of conditional expectationIf A is any set **in** 7 with T(A)>O, we def**in**e(1) ~'~'n(E) _-A (•) on ~~7 as follows :Clearly ?An is a p .m. on c ; it is called the "conditional probability relative toA" . The **in**tegral with respect to this p .m . is called the "conditional expectationrelative to A" :( 2) (-A (Y) _ Y(w)AA(dw) _~~ ~A) f Y(co)~'(dco) .If //(A) = 0, we decree that :%$A (E) = 0 for every E E T . This convention isexpedient as **in** (3) and (4) below .Let now {A,,, n > 1} be a countable measurable partition of Q, namely :OcQ = U An,n=1A n E)~' D(A)A,EJ~, A, n nA,=o, if m=n .

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 1 1Then we have(4) (-"(Y) =00(3) ;!'//(E) = 7'(A n n E) = (An ) JPA n (E) ;n=1n=1provided that C "(Y) is def**in**ed . We have already used such decompositionsbefore, for example **in** the proof of Kolmogorov's **in**equality (Theorem 5 .3 .1) :nf Sn dT =D(Ak ) cn A (Sn ) .A k=1Another example is **in** the proof of Wald's equation (Theorem 5 .5 .3), wheree(SN) =000000Y(& (dw) = ?/)(A n )UA„ (Y),n=1k=1T (N = k) 0'{N=k) (SN ) .Thus the notion of condition**in**g gives rise to a decomposition when a givenevent or r .v . is considered on various parts of the sample space, on each ofwhich some particular **in**formation may be obta**in**ed .Let us, however, reflect for a few moments on the even more elementaryexample below . From a pack of 52 play**in**g cards one card is drawn and seento be a spade . What is the probability that a second card drawn from therema**in****in**g deck will also be a spade? S**in**ce there are 51 cards left, amongwhich are 12 spades, it is clear that the required probability is 12/51 . But isthis the conditional probability 7'A (E) def**in**ed above, where A = "first cardis a spade" and E = "second card is a spade"? Accord**in**g to the def**in**ition,13 .12:JP(A n E) - 52 .51 12^A) - 13 51'52where the denom**in**ator and numerator have been separately evaluated byelementary comb**in**atorial formulas . Hence the answer to the question above is**in**deed "yes" ; but this verification would be futile if we did not have anotherway to evaluate J''A (E) as first **in**dicated . Indeed, conditional probability isoften used to evaluate a jo**in**t probability by turn**in**g the formula (1) around asfollows :13 12'~(A n E) = J-P(A)J~)A(E) = 52 51In general, it is used to reduce the calculation of a probability or expectationto a modified one which can be handled more easily .

3 12 1 CONDITIONING . MARKOV PROPERTY . MARTINGALELet be the Borel field generated by a countable partition {A n }, forexample that by a discrete r .v . X, where A n = {X = a n } . Given an **in**tegrabler .v . Y, we def**in**e the function c` (Y) on 0 by :(5) 4(Y) _ A n (Y)IA, ( •)nThus c'.,- (Y) is a discrete r .v . that assumes that value en,, (Y) on the set A,for each n . Now we can rewrite (4) as follows :•(Y) = f A47(Y) dJ' = [4(Y)d 9' .nFurthermore, for any A E -,07 A is the union of a subcollection of the A n 's(see Exercise 9 of Sec . 2 .1), and the same manipulation yields(6) VA E :fJAY dJ' = f 4~(Y) dJ' .In particular, this shows that 4g?(Y) is **in**tegrable . Formula (6) equates two**in**tegrals over the same set with an essential difference : while the **in**tegrandY on the left belongs to 3 ;7, the **in**tegrand 4~(Y) on the right belongs to thesubfield -9. [The fact that 4(Y) is discrete is **in**cidental to the nature of i' .]It holds for every A **in** the subfield -,6% but not necessarily for a set **in** 97\6 ? .Now suppose that there are two functions Cpl and ~02, both belong**in**g to a7 ,such thatVA E mar : f Y d~' = f gyp; dam', i = 1, 2 .JAALet A = {w : cp l (w) > cp2(w)}, then A E -' and sonf n((pi - ~O 2) dGJ' = 0 .Hence 9'(A) = 0 ; **in**terchang**in**g cpi and cp2 above we conclude that cpi = c02a .e. We have therefore proved that the 4(Y) **in** (6) is unique up to an equivalence. Let us agree to use 4(Y) or d (Y I ;!~`) to denote the correspond**in**gequivalence class, and let us call any particular member of the class a "version"of the conditional expectation .The results above are valid for an arbitrary Borel subfield ;!~' and will bestated **in** the theorem below .Theorem 9 .1 .1 . If c ( I Y I) < oo and ;(;'is a Borel subfield of IT, then thereexists a unique equivalence class of **in**tegrable r .v .'s S(Y 1 ~6) belong**in**g to Isuch that (6) holds .

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 13PROOF . Consider the set function v on ;~7 :VA E r~' : v(A) = f Y dP .IIt is f**in**ite-valued and countably additive, hence a "signed measure" on ,

3 1 4 1 CONDITIONING. MARKOV PROPERTY . MARTINGALEwhere "Y" 1 -" means that a (Y"Z) = 0 for every bounded Z E ,6' . In thelanguage of Banach space, Y' is the "projection" of Y on -,,~' and Y" its "orthogonalcomplement" .For the Borel field ~F{X} generated by the r .v . X, we write also (`(Y I X)for ~, (Y I ZIF {X}) ; similarly for I(Y I X 1 , . . ., X„) . The next theorem clarifiescerta**in** useful connections .Theorem 9 .1 .2 . One version of the conditional expectation 'Z'(Y I X) is givenby cp(X ), where cp is a Borel measurable function on R 1 . Furthermore, if wedef**in**e the signed measure A on _,~V bydB E 0 : X (B) = I Y dam',x- ' (B)and the p.m . of X by g, then (p is one version of the Radon-Nikodym derivatived),/dg .PROOF. The first assertion of the theorem is a particular case of thefollow**in**g lemma .Lemma. If Z E i {X}, then Z = ~p(X) for some extended-valued Borelmeasurable function cp .PROOF OF THE LEMMA . It is sufficient to prove this for a bounded positive Z(why?) . Then there exists a sequence of simple functions Z m which **in**creasesto Z everywhere, and each Z m is of the forme~ c j l njj=1where A j E 3?7{X} . Hence A j = X -1 (B j ) for some B j E A1 (see Exercise 11of Sec . 3 .1) . Thus if we takeej=1we have Zm = (p,, (X ) . S**in**ce corn (X) -> Z, it follows that corn converges on therange of X . But this range need not be Borel or even Lebesgue measurable(Exercise 6 of Sec . 3 .1) . To overcome this nuisance, we put,t1x E R 1 : cp(x) = hm (pm (x) .M_+00Then Z = limm co rn (X) = cp(X ), and cp is Borel measurable, prov**in**g the lemma .To prove the second assertion : given any B E ?A 1 , let A = X-1 (B), thenby Theorem 3 .2.2 we haveAnc(Y IX)dJ'= f IB(X)co(X)d"~'= / 1B(x)co(x)dg = f cp(x)dg)n

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 1 5Hence by (6),X(B) = / Yd°J'= cp(x)dµ)JA 1 BThis be**in**g true for every B **in** 73 1 , it follows that cp is a version of the derivativedX/dµ . Theorem 9 .1 .2 is proved .IAs a consequence of the theorem, the function u(Y IX) of w is constanta .e . on each set on which X (w) is constant . By an abuse of notation, the co(x)above is sometimes written as S°(Y X = x) . We may then write, for example,for each real c :Y dGJ' = ~(Y I X = x) dGJ'{X < x} .{X_4 :501-00,C]Generalization to a f**in**ite number of X's is straightforward . Thus oneversion of (9(Y I X 1 , . . ., X,) is cp(X 1 , . . ., X,), where co is an n-dimensionalBorel measurable function, and by m`(Y I X 1 = xl , . . . , X, = x n ) is meant(p(xl, , xn)-It is worthwhile to po**in**t out the extreme cases of ~, (Y I;6?) :°(Y if) = e(Y), m I ~) = Y ; a .e .where / is the trivial field {0, Q} . If ' is the field generated by one setA : {P , A, A', S2}, then e(Y I -6-7) is equal to &(Y I A) on A and e(Y I A c )on A` . All these equations, as hereafter, are between equivalent classes ofr .v .'s .We shall suppose the pair ( .97, -OP) to be complete and each Borel subfieldof JF to be augmented (see Exercise 20 of Sec . 2 .2) . But even if isnot augmented and ~ is its augmentation, it follows from the def**in**ition thatdt(Y I ) = i (Y I ), s**in**ce an r .v . belong**in**g to 3~ is equal to one belong**in**gto ;6~ almost everywhere (why ?) . F**in**ally, if ~j-o is a field generat**in**g a7 , or justa collection of sets whose f**in**ite disjo**in**t unions form such a field, then thevalidity of (6) for each A **in** -fo is sufficient for (6) as it stands . This followseasily from Theorem 2.2 .3 .The next result is basic .Theorem 9.1 .3 . Let Y and YZ be **in**tegrable r .v .'s and Z E then we have(8) c,`(YZ I ) = Zt(Y I -.6-7 ) a .e .[Here "a .e ." is necessary, s**in**ce we have not stipulated to regard Z as anequivalence class of r .v .'s, although conditional expectations are so regardedby def**in**ition . Nevertheless we shall sometimes omit such obvious "a .e .'s"from now on .]

3 1 6 1 CONDITIONING. MARKOV PROPERTY . MARTINGALEPROOF . As usual we may suppose Y > 0, Z > 0 (see property (ii) below) .The proof consists **in** observ**in**g that the right member of (8) belongs to andsatisfies the def**in****in**g relation for the left member, namely :I(9) VA E ;!~' :f ZC'(Y I -°) d P = f ZY dJ' .JnFor (9) is true if Z = 1 0 , where A E f, hence it is true if Z is a simpler .v. belong**in**g to ;6' and consequently also for each Z **in** § by monotoneconvergence, whether the limits are f**in**ite or positive **in**f**in**ite . Note that the**in**tegrability of Z (-r'(Y 1 §) is part of the assertion of the theorem .Recall that when § is generated by a partition {A n }, we have exhibited aspecific version (5) of (t(Y 1 §) . Now consider the correspond**in**g ~?P(M I }as a function of the pair (M, w) :9(M 1 -) ( ) ?(M I An)1A (w) •nFor each fixed M, as a function of w this is a specific version of °J'(M 1 -9) .For each fixed wo, as a function of M this is a p .m . on .T given by -"P{ • I Am }for wo E A,n . Let us denote for a moment the family of p .m .'s aris**in**g **in** thismanner by C(wo, •) . We have then for each **in**tegrable r .v . Y and each wo **in** Q :(10) (r (Y I §)(wo)_ fdt'(Y I An ) 1 A„ (WO) YC(wo , do)) .nThus the specific version of (-f'(Y I §) may be evaluated, at each COQ E Q, by**in**tegrat**in**g Y with respect to the p .m. C(a)o, •) . In this case the conditionalexpectation 6`(- 1 - ') as a functional on **in**tegrable r .v .'s is an **in**tegral **in** theliteral sense . But **in** general such a representation is impossible (see Doob[16, Sec . 1 .9]) and we must fall back on the def**in****in**g relations to deduce itsproperties, usually from the unconditional analogues . Below are some of thesimplest examples, **in** which X and X n are **in**tegrable r .v .'s .(i) If X E §', then f' (X 1 §) = X a .e . ; this is true **in** particular if X is aconstant a .e .(ii) (P (XI + X2 I §) _ (1V1 I §) + 6"- (X2(iii) If X 1 < X2, then (`(X i I §) < c`(X 2 16') .(iv) W (X I {)I

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 17(vii) If J X, I < Y where c`(Y) < oo and X„ -+ X, then ((X„ I l')~` (X 1 10 .To illustrate, (iii) is proved by observ**in**g that for each A E < :f~f(Xi I ~)dJT=f X i d2 ~`'(X2 I )}, we have 7'(A) = 0 . The **in**equality(iv) may be proved by (ii), (iii), and the equation X = X+ - X- . To prove(v), let the limit of f (X„ 1 ~) be Z, which exists a .e . by (iii) . Then for eachA E {, we have by the monotone convergence theorem :JZ dT = lim ' ~ (X„ I ~) dT = lim X„ dT = f X dT .A n f A " f A AThus Z satisfies the def**in****in**g relation for o (X 1 6'), and it belongs to 6' withthe F (X„ 16')'s, hence Z = e(X 16') .To appreciate the caution that is necessary **in** handl**in**g conditional expectations,let us consider the Cauchy-Schwarz **in**equality :(IXYI 16)2 < e(X 2 I -6')A(Y 2 I ') .If we try to extend one of the usual proofs based on the positiveness of thequadratic form **in** A : (`((X + AY) 2 16'), the question arises that for each ),the quantity is def**in**ed only up to a null set N;,, and the union of these overall A cannot be ignored without comment . The reader is advised to th**in**k thisdifficulty through to its logical end, and then get out of it by restrict**in**g theA's to the rationals . Here is another way out : start from the follow**in**g trivial**in**equality :IXIIYI X 2 y2up- 2a2 + 2fl 2 'where a = (X 2 1 6)12 , ,B = C~(Y 2 16) 1 / 2 , and a,B>0 ; apply the operation{- 16} us**in**g (ii) and (iii) above to obta**in**IXYIca,81 X 2 1 Y ~ 2< 2 c( a2 l~ + ZenNow use Theorem 9 .1 .3 to **in**fer that this can be reduced toufi2 2{IXYIIf}< - a2 +2 2=1,the desired **in**equality .The follow**in**g theorem is a generalization of Jensen's **in**equality **in** Sec . 3 .2 .

3 1 8 1 CONDITIONING . MARKOV PROPERTY . MARTINGALETheorem 9 .1 .4 . If (p is a convex function on Ml and X and (p(X) are **in**tegrabler.v .'s, then for each ~ :(11) (p(~ (X I s)) < ((P (X) I {)PROOF. If X is a simple r .v . tak**in**g the values {yj) on the sets {A j ), 1

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 1 9where cP' is the right-hand derivative of co . Hence(P(X) - (p((r`'(X I ~)) >_ ~P'(~`(X I ~)){X - ~`(X )~The right member may not be **in**tegrable; but let A = {w : I (r'(X I ) I < A}for A>0 . Replace X by X 1 A**in** the above, take expectations of both sides, andlet A T oc. Observe that&`{1P(XlA) I -67 1 = C{cP(X)lA +(p(0)1A' I ~} = ~'{ cP(X) I I}ln +cP(0)lAc .We now come to the most important property of conditional expectationrelat**in**g to chang**in**g fields . Note that when A = Q, the def**in****in**g relation (6)may be written asThis has an immediate generalization .051((-rle(Y)) = (1-5r (Y) (Y))-Theorem 9 .1 .5 .If Y is **in**tegrable and 31 C ~z, then(13) (Y) _ (Y) if and only if c (Y) E ;and(14) ~(c (Y)) = ( (Y) _ (~5 (~ (Y))-PROOF . S**in**ce Y satisfies trivially the def**in****in**g relation for c'- (Y I 1 ), itwill be equal to the latter if and only if Y E 31 . Now if we replace our basicby ~Z and Y by : (Y), the assertion (13) ensues . Next, s**in**cec`;, (Y) E 1 C 2,the second equation **in** (14) follows from the same observation . It rema**in**sto prove the first equation **in** (14) . Let A E ~i71 , then A E 12 ; apply**in**g thedef**in****in**g relation twice, we obta**in**14((`~(Y))dJT = f c~nJA (Y)dUl`J Yd~ .JAHence e~, (~ (Y)) satisfies the def**in****in**g relation for c% (Y) ; s**in**ce it belongsto T1 , it is equal to the latter .As a particular case, we note, for example,(15) {c`(Y I X1, X2) I X1} = c%~(Y I X1) = (f{(=` (Y I X1) I X1, X2}To understand the mean**in**g of this formula, we may th**in**k of X 1 and X2 asdiscrete, each produc**in**g a countable partition . The superimposition of both

3 2 0 I CONDITIONING . MARKOV PROPERTY . MARTINGALEpartitions yields sets of the form {A j fl Mk} . The "**in**ner" expectation on theleft of (15) is the result of replac**in**g Y by its "average" over each Ajfl Mk .Now if we replace this average r .v. aga**in** by its average over each A j , thenthe result is the same as if we had simply replaced Y by its average over eachA j . The second equation has a similar **in**terpretation .Another k**in**d of simple situation is afforded by the probability triple(Jlln, J3", m") discussed **in** Example 2 of Sec . 3 .3 . Let x1, . . . , xn be the coord**in**ater .v .'s . . . = f (x 1 , . . . , x n ), where f is (Borel) measurable and **in**tegrable. It is easy to see that for 1 < k < n - 1,P1e(YIxI, . . .,xk)= J . ..J0 0PIf(xl, . . .,xn)dxk+l . . .dxn ,while for k = n, the left side is just y (a .e .) . Thus, tak**in**g conditional expectationwith respect to certa**in** coord**in**ate r .v.'s here amounts to **in**tegrat**in**g outthe other r.v .'s . The first equation **in** (15) **in** this case merely asserts the possibilityof iterated **in**tegration, while the second reduces to a banality, which weleave to the reader to write out .EXERCISES1 . Prove Lemma 2 **in** Sec . 7 .2 by us**in**g conditional probabilities .2 . Let [A,,) be a countable measurable partition of 0, and E E 3 with~P(E)>O ; then we have for each m :1:"'(A . )EA m (E)~E(A .) _ E_1~6(An)r"A„(E)n[This is Bayes' rule .]*3 . If X is an **in**tegrable r .v ., Y a bounded r .v., and ' a Borel subfield,then we have{e(X I S)Y} _ '{X (Y I )}4. Prove Fatou's lemma and Lebesgue's dom**in**ated convergence theoremfor conditional expectations .*5. Give an example where (-"(("(y I X 1 ) I X2) (c (Y I X 2 ) I X1)-[HINT : It is sufficient to give an example where (t (X I Y) =,4 ({(r'(X I Y) I X} ;consider an Q with three po**in**ts .]* 6. Prove that a 2 (c (Y)) < a2 (y), where a2 is the variance .7. If the random vector has the probability density function p( •, •) andX is **in**tegrable, then one version of ('V I X + Y = z) is given byfxp(xz , - x) dx/Ip(x, z - x) dx .

9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 321*8 . In the case above, there exists an **in**tegrable function cp( •, •) with theproperty that for each B E A1 ,f ~p(x, y) d yBis a version of 7'{Y E B I X = x} . [This is called a "conditional density function**in** the wide sense" andnco (x, y) d yfoois the correspond**in**g conditional distribution **in** the wide sense :cp(x, n) = GJ'{Y < r? I X = x} .The term "wide sense" refers to the fact that these are functions on R1 ratherthan on 0 ; see Doob [1, Sec . I .9] .]9. Let the p( ., •) above be the 2-dimensional normal density :1 1 x 2 2pxy y 2exp - 2 2 - + 22rra1Q2\/1 - p 2 2(1 - P) c1 910'2 0'2where a1>0, 62>0, 0 < p < 1 . F**in**d the qo mentioned **in** Exercise 8 and(x, y) dy •foo y~pThe latter should be a version of d (Y I X = x) ; verify it.10 . Let ~ be a B .F., X and Y two r .v .'s such that(Y 2 I-C_/) = X 2 , (Y I ) = X .Then Y = X a .e .11. As **in** Exercise 10 but suppose now for any f E CK ;{X 2 I f (X )j = (,,{Y2 I f (X)}; M I f (X )} = e{Y I f (X )}Then Y = X a .e . [mNT : By a monotone class theorem the equations hold forf = I B , B E 276 1 ; now apply Exercise 10 with = 3T{X} .]12. Recall that X, **in** L 1 converges weakly **in** L 1 to X iff 1(X, Y) -a~(XY) for every bounded r .v . Y . Prove that this implies c`'(X„ 16') convergesweakly **in** L 1 to (,(X 1 -,6") for any Borel subfield E of 3T .*13. Let S be an r .v . such that 7 S > t) = e - `, t>0 . Compute d'{S I SAt}and c`{S I S v t} for each t > 0 .

3 22 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE9.2 Conditional **in**dependence ; Markov propertyIn this section we shall first apply the concept of condition**in**g to **in**dependentr .v .'s and to random walks, the two basic types of stochastic processes thathave been extensively studied **in** this book ; then we shall generalize to aMarkov process . Another important generalization, the mart**in**gale theory, willbe taken up **in** the next three sections .All the B .F.'s (Borel fields) below will be subfields of 9 . The B .F.'s{tea , a E A), where A is an arbitrary **in**dex set, are said to be conditionally **in**dependentrelative to the B .F. ;!~_, iff for any f**in**ite collection of sets A 1 , . . . , A,such that A j E : and the a j 's are dist**in**ct **in**dices from A, we haveWhen is the trivial B .F., this reduces to unconditional **in**dependence .Theorem 9 .2 .1 . For each a E A let 3TM denote the smallest B .F . conta**in****in**gall , 8 E A - {a} . Then the 3's are conditionally **in**dependent relative tof~ , ' if and only if for each a and A a E " we have91 (Aa I (a ) V --C/- ) = -OP (Aa I ),where ZT (" ) V denotes the smallest B .F . conta**in****in**g ~7 (1) and -'6 .PROOF . It is sufficient to prove this for two B .F .'s 1 and , s**in**ce thegeneral result follows by **in**duction (how?) . Suppose then that for each A E 1we have(1) ~(A I J ) _ (A I )Let M E , then9)(AM I ) _ {~(AM I V ) I '} = {,'~P(A I V §)1M I }I )1M I fi~ } = ~(A I )?)'(M I ),where the first equation follows from Theorem 9 .1 .5, the second and fourthfrom Theorem 9 .1 .3, and the third from (1) . Thus j and are conditionally**in**dependent relative to' . Conversely, suppose the latter assertion is true, then{J'(A I )1M I } = J$(A I ) ~'(M I )= .--/'(AM I fl ) _ (`{U~'(A I ~~ V ~')1M I ~},

9.2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 3 2 3where the first equation follows from Theorem 9 .1 .3, the second by hypothesis,and the third as shown above . Hence for every A Ewe have?-)(A I ~) d J~' _ f J/'(A I J~ v fi) dJ' = T(AMA)MDMAIt follows from Theorem 2 .1 .2 (take ~ to be f**in**ite disjo**in**t unions of sets likeMA) or more quickly from Exercise 10 of Sec . 2 .1 that this rema**in**s true ifMA is replaced by any set **in** ~ v ;6 . The result**in**g equation implies (1), andthe theorem is proved .When -~' is trivial and each ~ti" is generated by a s**in**gle r .v ., we have thefollow**in**g corollary .Corollary . Let {X", a E A} be an arbitrary set of r .v .'s. For each a let .~(a)denote the Borel field generated by all the r.v .'s **in** the set except X" . Thenthe X"'s are **in**dependent if and only if : for each a and each B E %31, we have{X" E B 13 (")} = A{X" E B} a .e .An equivalent form of the corollary is as follows : for each **in**tegrable r .v .Y belong**in**g to the Borel field generated by X", we have(2) cF{Y I ~(")} = c{Y} .This is left as an exercise .Roughly speak**in**g, **in**dependence among r.v .'s is equivalent to the lackof effect by condition**in**g relative to one another . However, such **in**tuitivestatements must be treated with caution, as shown by the follow**in**g examplewhich will be needed later .If ~T1, 1-, and ' are three Borel fields such that ., l V .%; is **in**dependentof , then for each **in**tegrable X E we have(3) {XI~2V%$}_«{XI } .Instead of a direct verification, which is left to the reader, it is **in**terest**in**gto deduce this from Theorem 9 .2 .1 by prov**in**g the follow**in**g proposition .If : T v .~ is **in**dependent ofthen :~-hl and /S are conditionally **in**dependentrelative to % .To see this, let A1 E T, A E. S**in**ce/'(A1A2A3) = .P(A1A )J~(A3) = f(Ai J/',I `~z)'~'(A3)d'~'for every A2 E . , we have(AI A3 I '~z) = J/'( A1 I ;z )'' (A3) = °/'(A1 I Iwhich proves the proposition .

3 24 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEand X2 are **in**dependent, what is the effect of condi-Next we ask : if X 1tion**in**g X 1 + X 2 by X 1 ?Theorem 9 .2 .2 . Let X1 and X2 be **in**dependent r .v .'s with p .m .'s µl and µ2 ;then for each B E %3 1 :(4) ;/'{X 1 + X2 E B I X 1 } = µ2(B - X 1 ) a .e .More generally, if {X,,, n > 1) is a sequence of **in**dependent r.v .'s with p .m .'s{µn , n > 1), and Sn = 2~=1 X j , then for each B E Al(5) A{S n E B I S1, . . . , Sn-1 } _ pn (B - Sn-1) = G1'{Sn E B I Sn-1 } a .e .PROOF . To prove (4), s**in**ce its right member belongs to :7 {X 1 }, itis sufficient to verify that it satisfies the def**in****in**g relation for its leftmember. Let A E Z {X1 }, then A = X1 1 (A) for some A E A 1 . It follows fromTheorem 3 .2 .2 thatµ2(B - X1)d9l = [2(B µ-xi)µl(dxl) .fA AWrit**in**g A = Al X A2 and apply**in**g Fub**in**i's theorem to the right side above,then us**in**g Theorem 3 .2.3, we obta**in**fµl (dxl)A fi+x2EBA2(dx2) = ff µ(dxl, dx2)x, EAx1 +x2EB= dJ' = ~{X1 E A ; X, +X 2 E B} .JX~EAX1 +X2EBThis establishes (4) .To prove (5), we beg**in** by observ**in**g that the second equation has just beenproved . Next we observe that s**in**ce {X 1 , . . . , X,) and {S1, . . . , S, } obviouslygenerate the same Borel field, the left member of (5) is justJq'{Sn = B I X1, . . . , Xn-1 } .Now it is trivial that as a function of (XI, . . ., Xi_ 1 ), S, "depends on themonly through their sum S,_," . It thus appears obvious that the first term **in** (5)should depend only on Sn _ 1 , namely belong to ~ JS,,_,) (rather than the larger:'f {S1, . . . , S i _ 1 }) . Hence the equality of the first and third terms **in** (5) shouldbe a consequence of the assertion (13) **in** Theorem 9 .1 .5 . This argument,however, is not rigorous, and requires the follow**in**g formal substantiation .Let µ(") = Al x . . .X An = A(n-1) X An andn-1A = n S 71 (Bj ),1= 1

9 .2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 1 32 5where Bn = B . Sets of the form of A generate the Borel field 7{S, . . . , S"-I } .It is therefore sufficient to verify that for each such A, we havef nAn (Bn - Sn-1) d_,11 = '{A ; Sn E Bn } .If we proceed as before and write s, = E j=, xj ,the left side above is equal toAn (Bn - Sn-1)l (n-1) (dx1, . . . , dxn-1)SjEBj,I

326 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE{0 < S„ < A I S1, . . . , S„-1 } d.JT=[F n (A - S„-1) - F n (0 - Sn-1 )] d GT,where Fn is the d .f. Of A n . [A quirk of notation forbids us to write the**in**tegrand on the right as T {-Sn _ i < Xn < A - S, 1 1!] Now for each co o **in**An-1, S n-1(wo)>O, henceF n (A - Sn-I((oo)) < :flXn < A) < 1 - 8by hypothesis . It follows that< f (1 - 8) dam' = (I - 8) GJ'{An-1 },12-1and the first assertion of the theorem follows by iteration . The second is provedsimilarly by observ**in**g that A{X n + • • • +Xn+m-1 > mA) > 8m and choos**in**gm so that mA exceeds the length of I . The details are left to the reader .Let N° = {0} U N denote the set of positive **in**tegers . For a given sequenceof r .v .'s {X n , n E N° } let us denote by 1 the Borel field generated by {Xn , n EI}, where I is a subset of N ° , such as [0, n], (n, oc), or {n} . Thus {n), [o,,],and have been denoted earlier by ti {X n }, 9n , and respectively .DEFINITION OF MARKOV PROCESS . The sequence of r .v .'s {Xn , n E N ° ) issaid to be a Markov process or to possess the "Markov property" iff forevery n E N° and every B E A 1 , we have(6) 2{X,+1 (E B I X o , . . . , Xn) = ~P{Xn+1 E B I X n } .This property may be verbally announced as : the conditional distribution(**in** the wide sense!) of each r .v . relative to all the preced**in**g ones is the sameas that relative to the last preced**in**g one . Thus if {Xn } is an **in**dependentprocess as def**in**ed **in** Chapter 8, then both the process itself and the processof the successive partial sums {S n } are Markov processes by Theorems 9 .2 .1and 9 .2 .2. The latter category **in**cludes random walk as a particular case ; notethat **in** this case our notation X n rather than Sn differs from that employed **in**Chapter 8 .Equation (6) is equivalent to the apparently stronger proposition : forevery **in**tegrable Y E ~T(n+ 1), we have(6) ({Y I X1, . . ., X n } = (`{Y I X n } .It is clear that (6) implies (6) . To see the converse, let Ym be a sequence ofsimple r .v .'s belong**in**g to :'%{n+1 ) and **in**creas**in**g to Y . By (6) and property (ii)

9 .2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY I 3 27of conditional expectation **in** Sec . 9 .1, (6') is true when Y is replaced by Ym ;hence by property (v) there, it is also true for Y .The follow**in**g remark will be repeatedly used . If Y and Z are **in**tegrable,andZ E :'1,for each A of the formf Yd .JA= Zd9'n fnn X . 1 (Bj),j EIouwhere I0 is an arbitrary f**in**ite subset of I and each B j is an arbitrary Borel set,then Z = (-r'(Y I d1) . This follows from the uniqueness of conditional expectationand a previous remark given before Theorem 9 .1 .3, s**in**ce f**in**ite disjo**in**tnions of sets of this form generate a field that generates 3 I .If the **in**dex n is regarded as a discrete time parameter, as is usual **in**the theory of stochastic processes, then 9 7[o ,,2 ] is the field of "the past and thepresent", whileis that of "the future" ; whether the present is adjo**in**edto the past or the future is often a matter of convenience . The Markov propertyjust def**in**ed may be further characterized as follows .Theorem 9 .2 .4 . The Markov property is equivalent to either one of the twopropositions below :(7) Vn E N, M E '{M I ~T[O,n]} = lM I X,, 1 .(8) `dn E N, M1 E ~T[O,,i ], M2 E :( n , oo ) : b ° {MlM2 I Xn}= °J'{M1 I Xn }2flM2 I X n }These conclusions rema**in** true if (,i , oo ) is replaced byPROOF . To prove (7) implies (8), let Y ; = 1M,, i = 1, 2 . We then have(9) 7'{M1 I Xn} 7{M2 I Xn} _ (`{Y1 I Xn}F~{Y2 I Xn}_ "{Y1 (`(Y2 I Xn ) I X,,_ (t {Y1 (^(Y2 I -[0,n]) I Xn}{(°(Y1Y2 13X[0,n]) IXn}_ ~J{Y1Y2 I Xn} = J'{M1M2 I X, J,where the second and fourth equations follow from Theorems 9 .1 .3, the thirdfrom assumption (7), and the fifth from Theorem 9 .1 .5 .Conversely, to prove that (8) implies (7), let A E :'{ n }, M1 E :'h[o, n ), M2 EBy the second equation **in** (9) applied to the fourth equation below,J~,,, ocl .

3 2 8 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEwe havef ?P (M2 I Xn)d~'= f i(Y2 IXn)dgl = f Y1~^(Y2 1 Xn)d_'AMA AM, A=I f c {Y1~(Y2 I Xn) Xn }d-'A_ ~ ('-(Y1 Xn)e(Y2 IAXn )d-OP= f P(M 1 I Xn)~(M2 I X.) dJ'A= / '(M1M2 IXn ) d~ = ~'(AM1M2) .AS**in**ce disjo**in**t unions of sets of the form AM, as specified above generate theBorel field [o ,n ], the uniqueness of °J' (M2 I o,n])shows that it is equal toGJ'(M2 I X,), prov**in**g (7) .F**in**ally, we prove the equivalence of the Markov property and the proposition(7) . Clearly the former is implied by the latter ; to prove the conversewe shall operate with conditional expectations **in**stead of probabilities and use**in**duction. Suppose that it has been shown that for every n E N, and everybounded f belong**in**g to 9[n+l,n+k], we have( 10 ) ~(f ( [O,n]) = e(f I -'1"fn}) .This is true fork = 1 by (6') . Let g be bounded, g E 9[n+1,n+k+1] ; we are go**in**gto show that (10) rema**in**s true when f is replaced by g . For this purpose itis sufficient to consider a g of the form 9 1 9 2 , where gi E 9[n+1,n+k], 92 E~T{n+k+1}, both bounded. The successive steps, **in** slow motion fashion, are asfollows :e-19 1 9 [0,,,]) _ ({ (00 (g I {O,n+k]) I [o,n]} _(_ g7 g7{91e(92 I [o,n+k]) I [O,n]}_ {gl ( '(g2 I ~n+k}) I - O,n] } = S{g1 (g2 I -' 1'tn+k}) I -n} }= U191(-r(92 I 37[n,n+k]) I V**in**)} = e{1(8192 I O'(n,n+k]) I {n}}~"'{g1g2 1 9{n}} = 0{g 1 9{n}} •It is left to the reader to scrut**in**ize each equation carefully for explanation,except that the fifth one is an immediate consequence of the Markov property(n{92I ~{n+k}} = &`{g2 Io[O,n+k]} and (13) of Sec . 9 .1 . This establishes (7) forM E Uk° 1 which is a field generat**in**g .X(n , w ) . Hence (7) is true (why?) .The last assertion of the theorem is left as an exercise .

9.2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 329The property embodied **in** (8) may be announced as follows : "The pastand the future are conditionally **in**dependent given the present" . In this formthere is a symmetry that is not apparent **in** the other equivalent forms .The Markov property has an extension known as the "strong Markovproperty". In the case discussed here, where the time **in**dex is N°, it is anautomatic consequence of the ord**in**ary Markov property, but it is of greatconceptual importance **in** applications . We recall the notions of an optionalr .v . a, the r .v . X a , and the fields J~a and T-', which are def**in**ed for a generalsequence of r .v .'s **in** Sec . 8 .2 . Note that here a may take the value 0, whichis **in**cluded **in** N ° . We shall give the extension **in** the form (7) .Theorem 9 .2 .5 . Let {X n , n E N°} be a Markov process and a a f**in**iteoptional r .v . relative to it . Then for each M E ~'7' a we have(11) GJ'{M I ?/;,} = Y{M I a,Xa} .PROOF . S**in**ce a E a and Xa Era (Exercise 2 of Sec . 8 .2), the rightmember above belongs to Via . To prove (11) it is then sufficient to verifythat the right member satisfies the def**in****in**g relation for the left, when M is ofthe formen{Xa+j E Bj},j=1Put for each n,j°INow the crucial step is to show thatPB j E ', 1 < j < < f < 00 .M,1 = n{Xn+ j E Bj) E ~(n oo ) .(12) J~{M~i I Xn} 1 {a=n} = ~flM I a, X" 1 .n=0By the lemma **in** the proof of Theorem 9 .1 .2, there exists a Bore] measurablefunction cp„ such that %/'{M„ I X 71 } = cp„ (X 11 ), from which it follows that theleft member of (12) belongs to the Borel field generated by the two .r .v .'s aand Xa . Hence we shall prove (12) by verify**in**g that its left member satisfiesthe def**in****in**g relation for its right member, as follows . For each m E N andB E ,7A, 1 , we have{a=m :X,, EB}J~{Mn In=0X 1, [a=,,) d JP= { a=m :X,,,a=m :X,,, EB}J ~{Mm IX,n} d~'l'fa=m:XEB}JP{a = m ; Xm E B ; Mm }_ :~'{a = m ;X a E B ; M},

3 3 0 1 CONDITIONING. MARKOV PROPERTY . MARTINGALEwhere the second equation follows from an application of (7),from the optionality of a, namelyThis establishes (12) .Now let A E :Tq,{a = m} Ethen [cf. (3) of Sec . 8 .2] we have00A=U{(a= n) f1 A,1},n=Owhere A„ E :T[o,,,] . It follows that00{AM} = J,P{a = n ; An ; Mn }11=°00~P{Mn In=0 (c =n )nA,and the third} d `'7'0C11=0 AJ~{M,1 1 Xn}1{a=n}d P= f 2P{M I a,Xa }dG ,where the third equation is by an application of (7) while the fourth is by (12) .This be**in**g true for each A, we obta**in** (11) . The theorem is proved .When a is a constant n, it may be omitted from the right member of(11), and so (11) **in**cludes (7) as a particular case . It may be omitted also **in**the homogeneous case discussed below (because then the cp„ above may bechosen to be **in**dependent of n ) .There is a very general method of construct**in**g a Markov process withgiven "transition probability functions", as follows . Let P0( . ) be an arbitraryp.m . on (a 1 3 1 ) For each n > 1 let P1,( ., •) be a function of the pair (x, B)where x E _W1 and B E '~3 1 , hav**in**g the measurability properties below :(a) for each x, P„ (x, •) is a p.m . on ~~1 ;(b) for each B, P„ ( •, B) J3 ' EIt is a consequence of Kolmogorov's extension theorem (see Sec . 3 .3) thatthere exists a sequence of r .v .'s {X,,, n E N ° } on some probability spacewith the. follow**in**g "f**in**ite-dimensional jo**in**t distributions" : for each 0 < ~

9 .2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 33 1above to be mil, and that the result**in**g collection of p .m .'s are mutually consistent**in** the obvious sense [see (16) of Sec . 3 .3] . But the actual construction ofthe process {Xn , n E N ° ) with these given "marg**in**al distributions" is omittedhere, the procedure be**in**g similar to but somewhat more sophisticated than thatgiven **in** Theorem 3 .3 .4, which is a particular case . Assum**in**g the existence,we will now show that it has the Markov property by verify**in**g (6) briefly .By Theorem 9 .1 .5, it will be sufficient to show that one version of the leftmember of (6) is given by Pn+1(Xn, B), which belongs to ~T{n) by condition(b) above . Let thennA = n[XJ E Bj]j=°and (n+ 1) be the (n + 1)-dimensional p .m. of the random vector(Xo, . . . , X n ) . It follows from Theorem 3 .2 .3 and (13) used twice thatf Pn+1(Xn, B) d-1, = f . . .fPfl+ i(X fl ,B)d'I+ µ ( unB o x . . .xB,f . . . fn+1Po(dxo)11 Pj(xj-1, dxj)B 0 x . . .xB n xB j=1_ _,"P(A ; Xn+1 E B) .This is what was to be shown .We call P° ( .) the **in**itial distribution of the Markov process and P n (, )its "nth-stage transition probability function" . The case where the latter is thesame for all n > 1 is particularly important, and the correspond**in**g Markovprocess is said to be "(temporally) homogeneous" or "with stationary transitionprobabilities" . In this case we write, with x = x0 :(14)P (») (x B)= f. . . f HP(xj, dxj+1),WlX . . .xVXBj =oand call it the "n-step transition probability function" ; when n = 1, the qualifier"1-step" is usually dropped . We also put P (°) (x, B) = 1B(x) . It is easy tosee that(15) p(''+l)(x, B) =fp(n)(y, B)P (1) (x, dy),so that all p(n) are just the iterates of P (l) .It follows from Theorem 9 .2.2 that for the Markov process {S n , n E N)there, we haveP n (x, B) = µn (B - x) .

332 I CONDITIONING . MARKOV PROPERTY . MARTINGALEIn particular, a random walk is a homogeneous Markov process with the 1-steptransition probability functionP' (x, B) = µ(B - x) .In the homogeneous case Theorem 9 .2.4 may be sharpened to read likeTheorem 8 .2 .2, which becomes then a particular case . The proof is left as anexercise .Theorem 9.2 .6 . For a homogeneous Markov process and a f**in**ite r .v. a whichis optional relative to the process, the pre-a and post-a fields are conditionally**in**dependent relative to Xa , namely :VA E 3~ , M E .fat : 9{AM I Xa } = OPJA I Xa }GJ'{M I Xa} •Furthermore, the post-a process {X a+n , n E N} is a homogeneous Markovprocess with the same transition probability function as the orig**in**al one .Given a Markov process {X n , n E N°), the distribution of X° is a p .m.and for each B E A1 , there is accord**in**g to Theorem 9 .1 .2 a Borel measurablefunction cn ( ., B) such that7 {Xn+1 E B I Xn = x} = con (x, B) .It seems plausible that the function (p n ( . , .) would correspond to the n-stagetransition probability function just discussed . The trouble is that while condition(b) above may be satisfied for each B by a particular choice of cn ( •, B), itis by no means clear why the result**in**g collection for vary**in**g B would satisfycondition (a) . Although it is possible to ensure this by means of conditionaldistributions **in** the wide sense alluded to **in** Exercise 8 of Sec . 9 .1, we shallnot discuss it here (see Doob [16, chap . 2]) .The theory of Markov processes is the most highly developed branchof stochastic processes . Special cases such as Markov cha**in**s, diffusion, andprocesses with **in**dependent **in**crements have been treated **in** many monographs,a few of which are listed **in** the Bibliography at the end of the book .EXERCISES*1. Prove that the Markov property is also equivalent to the follow**in**gproposition : if t 1 < . . . < to < to+1 are **in**dices **in** N° and B j , 1 < j < n + 1,are Borel sets, then'~Pi {Xtn+l E Bn+1 I X tl , . . . , X t, } = g '{Xto+l E Bn+ 1 I X t o }In this form we can def**in**e a Markov process (X t )t rang**in**g **in** [0, oo) .with a cont**in**uous parameter

9 .2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 3 3 32. Does the Markov property imply the follow**in**g (notation as **in**Exercise 1) :ZT{Xn+1 E Bn+1 I X1 E B1,X,, E Bn } = ~'{Xn+1 E Bn+1 IXr, E B, J?3. Unlike **in**dependence, the Markov property is not necessarily preservedby a "functional" : { f (X,,), n E N°} . Give an example of this, but show that itis preserved by each one-to-one Borel measurable mapp**in**g f .*4. Prove the strong Markov property **in** the form of (8) .5. Prove Theorem 9 .2 .6 .*6. For the p(n) def**in**ed **in** (14), prove the "Chapman-Kolmogorovequations" :Y**in** E N, n E N : P ('+' ) (x, B) =fPt m) (x, dy)P (n) (y, B) .u 17. Generalize the Chapman-Kolmogorov equation **in** the nonhomogeneouscase .*8. For the homogeneous Markov process constructed **in** the text, showthat for each f > 0 we have{f (Xm+n) I Xm} = f f (y)P(n)(Xm, dy) .* 9. Let B be a Borel set, f 1(x, B) = P(x, B), and def**in**e f , for n > 2**in**ductively byf n (x, B)= fBCdy)f n-1(y, B) ;BCput f (x, B) = E', f n (x, B) . Prove that f (X,,, B) is a version of theconditional probability J'{UO_ n +1 [X j E B] I Xn } for the homogeneous Markovprocess with transition probability function P( ., •) .*10. Us**in**g the f def**in**ed **in** Exercise 9, put00g(x,B)=f(x,B)-~ f P(n)(x,dy)[1-f(y,B)]n=1 fProve that g(X,,, B) is a version of the conditional probability:fllim sup[X J E B] I X„ } .J* 11 . Suppose that for a homogeneous Markov process the **in**itial distributionhas support **in** N ° as a subset of %~ 1 , and that for each i c N°, the

3 3 4 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEtransition probability function P(i, •) , also has support **in** N° . Thus{P(i, j) ; (i, j) E N ° x N° )is an **in**f**in**ite matrix called the "transition matrix" . Show that P ( n ) as a matrixis just the nth power of P ( l ) . Express the probability ?7{X tk = i x , 1 < k < n}**in** terms of the elements of these matrices . [This is the case of homogeneousMarkov cha**in**s .]12 . A process {Xn , n E N°} is said to possess the "rth-order Markovproperty", where r > 1, iff (6) is replaced by{Xn+1 E B I X 0 , . . . , Xn } = i{X n11 E B I X n , . . . , Xn-r+1}for n > r - 1 . Show that if r < s, then the rth-order Markov property impliesthe sth . The ord**in**ary Markov property is the case r = 1 .13 . Let Y n be the random vector (X n , Xn+l, • • • , Xn+r-1 ) . Then thevector process {Y n , n E N ° } has the ord**in**ary Markov property (triviallygeneralized to vectors) if and only if {X n , n E N ° ) has the rth-order Markovproperty .14 . Let {X n , n E N°) be an **in**dependent process . LetnnSnl)_j=0j=0Jfor r > 1 . Then {S ;,r ) , n E N ° } has the rth-order Markov property . For r = 2,give an example to show that it need not be a Markov process .15 . If {S n , n E N} is a random walk such that GJ'{S1 :A 0)>O, then forany f**in**ite **in**terval [a, b] there exists an E < 1 such thatJ'{S1 E [a, b], 1 < j < n } < c" .This is just Exercise 6 of Sec . 5 .5 aga**in** .]16 . The same conclusion is true if the random walk above is replacedby a homogeneous Markov process for which, e .g ., there exist 6>0 and ri>Osuch that P(x, ~? 1 - (x - 6, x + 8)) > ri for every x .9.3 Basic properties of smart**in**galesThe sequence of sums of **in**dependent r .v .'s has motivated the generalizationto a Markov process **in** the preced**in**g section ; **in** another direction it will nowmotivate a mart**in**gale . Chang**in**g our previous notation to conform with later

9 .3 BASIC PROPERTIES OF SMARTINGALES I 335usage, let {x,,, n E N} denote **in**dependent r .v .'s with mean zero and writeX, _ > j -1 x j for the partial sum. Then we haveJ (X,+1 I x1, . . . , xn) = c"~ (X,7 + xn+1 I X1, . . . , X17)=X n +("' (xn+I IX1, . . .,xn)=Xn +cf(x„+1)=Xn .Note that the condition**in**g with respect to x1, . . . , x„ may be replaced bycondition**in**g with respect to X 1 , . . . , X,, (why?) . Historically, the equationabove led to the consideration of dependent r .v .'s {x„ } satisfy**in**g the condition(1)C'- (xn+1 I x1, . . . , xn) = 0 .It is astonish**in**g that this simple property should del**in**eate such a useful classof stochastic processes which will now be **in**troduced . In what follows, wherethe **in**dex set for n is not specified, it is understood to be either N or some**in**itial segment Nn, of N .DEFINITION OF MARTINGALE . The sequence of r .v.'s and B .F .'s {X,,, ' } iscalled a mart**in**gale iff we have for each n :(a) J~„ C ~n+1 and X n E ~ ;(b) A IXn1) < 00 ;(c) Xn = (Xn+1 I g**in**), a .e .It is called a supermart**in**gale iff the "=" **in** (c) above is replaced by ">", anda submart**in**gale iff it is replaced by "

336 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEAn equivalent form of (3) is as follows : for each AE 3' and n < m, we have(4) fX n dP=fXm dA . :It is often safer to use the explicit formula (4) rather than (3), because conditionalexpectations can be slippery th**in**gs to handle . We shall refer to (3) or(4) as the def**in****in**g relation of a mart**in**gale ; similarly for the "super" and "sub"varieties .Let us observe that **in** the form (3) or (4), the def**in**ition of a smart**in**galeis mean**in**gful if the **in**dex set N is replaced by any l**in**early ordered set, with"

9.3 BASIC PROPERTIES OF SMARTINGALES 1 33 7PROOF . For a mart**in**gale we have equality **in** (5) for any convex cp, hencewe may take cp(x) = Ix 1, Ix I P or Ix I log' Ix I **in** the proof above .Thus for a mart**in**gale IX,), all three transmutations : {X; }, {X ;, } and{ 1X„ I } are submart**in**gales . For a submart**in**gale {X n }, noth**in**g is said about thelast two .Corollary 3 . If {Xn , 3 n } is a supermart**in**gale, then so is {X, A A, ~, n } whereA is any constant .PROOF . We leave it to the reader to deduce this from the theorem, buthere is a quick direct proof:(6)X„ AA > f(Xn+1 i 7n) A e` (A 1 Jan) > (r (Xn+1 AA I ~T n)It is possible to represent any smart**in**gale as a mart**in**gale plus or m**in**ussometh**in**g special . Let us call a sequence of r .v .'s {Z,,, n E N) an **in**creas**in**gprocess iff it satisfies the conditions :( i) Z1 = 0 ; Z n < Zn+1 for n > 1 ;(ii) (-'C- (Z„) < oo for each n .It follows that Z, = T Z n exists but may take the value +oo ; Z,,.is **in**tegrable if and only if {Z„ } is L 1 -bounded as def**in**ed above, whichmeans here f (Z,) < oo . This is also equivalent to the uniform**in**tegrability of {Z,,) because of (i) . We can now state the result as follows .Theorem 9 .3 .2 . Any submart**in**gale {X n , Jan } can be written asXn = Yn +Zn,where {Y n , ~, } is a mart**in**gale, and {Z„ } is an **in**creas**in**g process .PROOF . From {X„ } we def**in**e its difference sequence as follows :(7) x1 = X1, x,, = X„ - Xn-1, n > 2,so that X,, = E~=1 x j , n > 1 (cf. the notation **in** the first paragraph of thissection) . The def**in****in**g relation for a submart**in**gale then becomesc" {Xn I ~ - 1) >- 0 ,with equality for a mart**in**gale . Furthermore, we putnY1 = X1, yn = x n - ~, {xn Y„ = yj'j=1nZ1 = 0, Zn Zn = Zj .j=1

338 1 CONDITIONING . MARKOV PROPERTY. MARTINGALEThen clearly xn = yn + z, and (6) follows by addition . To show that {Y, , n }is a mart**in**gale, we may verify that c{yn I ~n _ 1 } = 0 as **in**dicated a momentago, and this is trivial by Theorem 9 .1 .5 . S**in**ce each zn > 0, it is equallyobvious that (Z, J is an **in**creas**in**g process . The theorem is proved .Observe that Z n E 9n _ 1 for each n, by def**in**ition . This has importantconsequences ; see Exercise 9 below . The decomposition (6) will be calledDoob's decomposition . For a supermart**in**gale we need only change the "+"there **in**to "-", s**in**ce {- Y, n } is a mart**in**gale . The follow**in**g complementis useful .Corollary . If {Xn } is L 1 -bounded [or uniformly **in**tegrable], then both {Y n }and {Zn } are L 1 -bounded [or uniformly **in**tegrable] .PROOF. We have from (6) :(9(Zn) _

9.3 BASIC PROPERTIES OF SMARTINGALES 1 339where aga**in** {a < n } may be replaced by {a = n } . Writ**in**g then(10) An = A fl {a = n),we have An E J andA=UAn=U[{a=n}nAn]nnwhere the **in**dex n ranges over Nom . This is (3) of Sec . 8 .2 . The reader shouldnow do Exercises 1-4 **in** Sec . 8 .2 to get acqua**in**ted with the simplest propertiesof optionality . Here are some of them which will be needed soon :.~a isa B .F. and a E Via ; if a is optional then so is aAn for each n E N ; if a < pwhere 8 is also optional then 3a c ~f ; **in** particular Mann c ~a n On and **in**fact this **in**clusion is an equation .Next we assume X,,,, has been def**in**ed and X,,, E 3 ; ;,, . We then def**in**e Xaas follows :(11) Xa(C)) =Xa(cv)(w) ;**in** other words,X"' (w) = X" ((0) on {a = n}, n e N, .This def**in**ition makes sense for any a tak**in**g values **in** Nom, but for an optionala we can assert moreover that(12) Xa EThis is an exercise the reader should not miss ; observe that it is a naturalbut nontrivial extension of the assumption X11 E 2X-n for every n . Indeed, allthe general propositions concern**in**g optional sampl**in**g aim at the same th**in**g,namely to make optional times behave like constant times, or aga**in** to enableus to substitute optional r .v .'s for constants . For this purpose conditions mustsometimes be imposed either on a or on the smart**in**gale {Xn } . Let us howeverbeg**in** with a perfect case which turns out to be also very important .We **in**troduce a class of mart**in**gales as follows . For any **in**tegrable r .v . Ywe put(13) Xn = (n-(Y I ~7.), n E Noo .By Theorem 9 .1 .5, if n < m :(14) Xn = ("{c (Y I ~m) ~ gn} = L{Xm W**in**}which shows {Xn, Jan } is a mart**in**gale, not only on N but also on Nom . Thefollow**in**g properties extend both (13) and (14) to optional times .

340 1 CONDITIONING . MARKOV PROPERTY. MARTINGALETheorem 9 .3 .3 . For any optional a, we have(15) Xa = e (Y I ZTa) .If a < 8 where ,B is also optional, then {Xa , ~a ;X,6 , } forms a two-termmart**in**gale .PROOF. Let us first show that Xa is **in**tegrable . It follows from (13) andJensen's **in**equality thatIXnI ::s (IYI I V**in**)S**in**ce {a = n }E „, we may apply this to getIX,, IdA f - IX, IdA

9 .3 BASIC PROPERTIES OF SMARTINGALES 1 34 1because A j E ~/7Fj C ~Jrk, whereas {,B > k} = {,B < k}` E . It follows from thedef**in****in**g relation of a supermart**in**gale thatand consequentlyI X k dJ' > Xk+1 d .I'A ;n{P>k}A ;n{fi>k}A n >k {~_ }Rewrit**in**g this asXk d' > fA J n{ fi=k}JXk dA+ An{fi>k}Xk+1 d'l'Xk d~' - Xk+l d A > X~ d~ ;n1n{~>k} A ;n{B>-k+l} A ;n{~=k}summ**in**g over k from j to m, where m is an upper bound for 8 ; and thenreplac**in**g X j by Xa on A j , we obta**in**(17) X a dJ' - Xm+1 dam' > X fi d' .IA 3n{ > '} JA 1n{ >m+1} A-n{j

3 42 1CONDITIONING. MARKOV PROPERTY . MARTINGALERemark . For a mart**in**gale {X n , urn ; n E N,,) this theorem is conta**in**ed**in** Theorem 9 .3 .3 s**in**ce we may take the Y **in** (13) to be X,,,, here .PROOF . (a) Suppose first that the supermart**in**gale is positive with X," = 0a .e . The **in**equality (17) is true for every m E N, but now the second **in**tegralthere is positive so that we have~ X a dJ' > Xfi d A .JA IA . n(,8 X~ &JAn{a X,, by the def**in****in**g property of supermart**in**gale applied to X„ and X,,,, .Hence the difference IX,,, n E N) is a positive supermart**in**gale with X"0 a .e. By Theorem 9 .3 .3, {X a, X , %}f~forms a mart**in**gale ; by case oc(a),{Xa, :Via ; X~, forms a supermart**in**gale . Hence the conclusion of the theoremfollows simply by addition .The two preced**in**g theorems are the basic cases of Doob's optionalsampl**in**g theorem . They do not cover all cases of optional sampl**in**g (see e .g .

9 .3 BASIC PROPERTIES OF SMARTINGALES ( 343Exercise 11 of Sec . 8 .2 and Exercise 11 below), but are adequate for manyapplications, some of which will be given later .Mart**in**gale theory has its **in**tuitive background **in** gambl**in**g . If X„ is **in**terpretedas the gambler's capital at time n, then the def**in****in**g property postulatesthat his expected capital after one more game, played with the knowledge ofthe entire past and present, is exactly equal to his current capital . In otherwords, his expected ga**in** is zero, and **in** this sense the game is said to be"fair". Similarly a smart**in**gale is a game consistently biased **in** one direction.Now the gambler may opt to play the game only at certa**in** preferredtimes, chosen with the benefit of past experience and present observation,but without clairvoyance **in**to the future . [The **in**clusion of the present status**in** his knowledge seems to violate raw **in**tuition, but exam**in**e the examplebelow and Exercise 13 .] He hopes of course to ga**in** advantage by devis**in**gsuch a "system" but Doob's theorem forestalls him, at least mathematically .We have already mentioned such an **in**terpretation **in** Sec . 8 .2 (see **in** particularExercise 11 of Sec . 8 .2 ; note that a + 1 rather than a is the optionaltime there .) The present generalization consists **in** replac**in**g a stationary **in**dependentprocess by a smart**in**gale . The classical problem of "gambler's ru**in**"illustrates very well the ideas **in**volved, as follows .Let IS,, n E N°} be a random walk **in** the notation of Chapter 8, and let S1have the Bernoullian distribution 281 + 28_1 . It follows from Theorem 8 .3 .4,or the more elementary Exercise 15 of Sec . 9 .2, that the walk will almostcerta**in**ly leave the **in**terval [-a, b], where a and b are strictly positive **in**tegers ;and s**in**ce it can move only one unit a time, it must reach either -a or b . Thismeans that if we set(20) a = m**in**{n > L S„ = -a}, ,B = m**in**{n > 1 : S„ = b},then y = aA~ is a f**in**ite optional r .v. It follows from the Corollary toTheorem 9 .3 .4 that {S yA n } is a mart**in**gale . Now(21) lim S yAf = S y a .e .n->ocand clearly S y takes only the values -a and b . The question is : with whatprobabilities? In the gambl**in**g **in**terpretation : if two gamblers play a fair co**in**toss**in**ggame and possess, respectively, a and b units of the constant stake as**in**itial capitals, what is the probability of ru**in** for each?The answer is immediate ("without any computation"!) if we show firstthat the two r .v .'s IS,, S y } form a mart**in**gale, for then(22) (` (Sy) = «'(S,) = 0,which is to say that-a :J/'{S y = -a) + b,J/ (S y = b} = 0,

344 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEso that the probability of ru**in** is **in**versely proportional to the **in**itial capital ofthe gambler, a most sensible solution .To show that the pair {S1, S y } forms a mart**in**gale we use Theorem 9 .3 .5s**in**ce {S yA n , n EN,, } is a bounded mart**in**gale . The more elementaryTheorem 9 .3 .4 is **in**applicable, s**in**ce y is not bounded. However, there isa simpler way out **in** this case : (21) and the boundedness just mentionedimply thatc` (S y ) = lim c`(S yA n ),n -4 00and s**in**ce (` (S yA 1) = f (S 1 ), (22) follows directly .The ru**in** problem belonged to the ancient history of probability theory,and can be solved by elementary methods based on difference equations (see,e .g ., Uspensky, Introduction to mathematical probability, McGraw-Hill, NewYork, 1937) . The approach sketched above, however, has all the ma**in** **in**gredientsof an elaborate modern theory . The little equation (22) is the prototype ofa "harmonic equation", and the problem itself is a "boundary-value problem" .The steps used **in** the solution-to wit : the **in**troduction of a mart**in**gale,its optional stopp**in**g, its convergence to a limit, and the extension of themart**in**gale property to **in**clude the limit with the consequent convergence ofexpectations -are all part of a standard procedure now ensconced **in** thegeneral theory of Markov processes and the allied potential theory .EXERCISES1. The def**in****in**g relation for a mart**in**gale may be generalized as follows .For each optional r .v . a < n, we have C''{Xn I Via } = Xa . Similarly for asmart**in**gale .*2. If X is an **in**tegrable r .v ., then the collection of (equivalence classesof) r .v .'s 1`(X I r) with -,(, rang**in**g over all Bore] subfields of ~T, is uniformly**in**tegrable .3. Suppose {Xnk),k = 1, 2, are two [super] mart**in**gales, a is a f**in**iteoptional r .v ., and Xal1 _ [>]Xa2) . Def**in**e Xn =X,(,1)1{n

9 .3 BASIC PROPERTIES OF SMARTINGALES 1 345[HINT : Let x1 and x' be **in**dependent Bernoullian r .v .'s ; and x2 = x2 = + 1 or-1 accord**in**g as x1 +xl = 0 or :A 0 ; notation as **in** (7) .]7. F**in**d an example of a positive mart**in**gale which is not uniformly **in**tegrable. [HINT : You w**in** 2" if it's heads n times **in** a row, and you lose everyth**in**gas soon as it's tails .]8. F**in**d an example of a mart**in**gale {X,) such that X n -* -oo a .e . Thisimplies that even **in** a "fair" game one player may be bound to lose anarbitrarily large amount if he plays long enough (and no limit is set to theliability of the other player) . [HINT : Try sums of **in**dependent but not identicallydistributed r .v.'s with mean 0 .]* 9 . Prove that if {Y n , ?n } is a mart**in**gale such that Y n E 2-h-,7- 1 , then forevery n, Y, = Y 1 a .e. Deduce from this result that Doob's decomposition (6)is unique (up to equivalent r .v .'s) under the condition that Z n E 3n-1 for everyn > 2 . If this condition is not imposed, f**in**d two different decompositions .10 . If {X n } is a uniformly **in**tegrable submart**in**gale, then for any optionalr .v. a we have(i) {XaAn } is a uniformly **in**tegrable submart**in**gale ;(11) c(X 1) < co (Xa ) < SUN (X n ) .[HINT : IXaAn I -< IX, I + IXn l .]*11 . Let {Xn , „ ; n E N} be a [super] mart**in**gale satisfy**in**g the follow**in**gcondition : there exists a constant M such that for every n > 1 :e { IXn - Xn-1 I } < Ma .e .where Xo = 0 and 1b is trivial . Then for any two optional r .v .'s a and $such that a < 8 and (-P (8) < oc, {Xa , 9%a ; X fi , :'gyp} is a [super] mart**in**gale . Thisis another case of optional sampl**in**g given by Doob, which **in**cludes Wald'sequation (Theorem 5 .5 .3) as a special case . [HINT : Dom**in**ate the **in**tegrand **in**the second **in**tegral **in** (17) by Y~ where Xo = 0 and Y,n = ~m=1 I X,t - X„_1 I .We have00f(Yfi) = f lx,, - Xn-lid :% < Mn=1 {~?n}12. Apply Exercise 11 to the gambler's ru**in** problem discussed **in** thetext and conclude that for the a **in** (20) we must have c` (a) = +oo . Verifythis by elementary computation .* 13 . In the gambler's ru**in** problem take b = 1 **in** (20) . Compute (,(SSA,,)for a fixed n and show that {So, S,,, n } forms a mart**in**gale . Observe that {So, Sfi}does not form a mart**in**gale and expla**in** **in** gambl**in**g terms the effect of stopp**in**g

1 CONDITIONING346. MARKOV PROPERTY . MARTINGALEfi at n . This example shows why **in** optional sampl**in**g the option may be takeneven with the knowledge of the present moment under certa**in** conditions . Inthe case here the present (namely fi A n) may leave one no choice!14 . In the gambler's ru**in** problem, suppose that S1 has the distributionP81 + (1 - P)8-1, P 0 iand let d = 2 p - 1 . Show that (_F(Sy) = d e (y) . Compute the probabilities ofru**in** by us**in**g difference equations to deduce (9(y), and vice versa .15 . Prove that for any L 1-bounded smart**in**gale {X,, ~i n , n E N,,, }, andany optional a, we have c (jXaI) < oo . [iurcr : Prove the result first for amart**in**gale, then use Doob's decomposition .]*16 . Let {Xn, n } be a mart**in**gale : x1 = X1, xn = Xn - X, -I for n > 2 ;let vn E :Tn_1 for n > 1 where A = 9,Tj ; now putnj=1vjxj .Show that IT,, 0~,7} is a mart**in**gale provided that Tn is **in**tegrable for every n .The mart**in**gale may be replaced by a smart**in**gale if v, > 0 for every n . Asa particular case take vn = l{n

9 .4 INEQUALITIES AND CONVERGENCE 1 347s**in**ce it takes only a f**in**ite number of values, Theorem 9 .3.4 shows that thepair {X(,X n } forms a submart**in**gale . If we writeM = { max Xj1 A on M, hence the first **in**equality follows)'J (M) < [ xadA < f X n d A;M Mthe second is just a cruder consequence .Similarly let fi be the first j such that X j < -A if there is such a j **in**N, otherwise let ,B = n . Put alsoMk = { m**in** X j < -A) .1

348 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEWe now come to a new k**in**d of **in**equality, which will be the tool forprov**in**g the ma**in** convergence theorem below . Given any sequence of r .v .'s{X j }, for each sample po**in**t w, the convergence properties of the numericalsequence {X j (w)} h**in**ge on the oscillation of the f**in**ite segments {Xj(w), j EN„ } as n -* cc . In particular the sequence will have a limit, f**in**ite or **in**f**in**ite, ifand only if the number of its oscillations between any two [rational] numbers aand b is f**in**ite (depend**in**g on a, b and w) . This is a standard type of argumentused **in** measure and **in**tegration theory (cf. Exercise 10 of Sec . 4 .2) . The**in**terest**in**g th**in**g is that for a smart**in**gale, a sharp estimate of the expectednumber of oscillations is obta**in**able .Let a < b . The number v of "upcross**in**gs" of the **in**terval [a, b] by anumerical sequence {x 1 , . . . , x, } is def**in**ed as follows . Setal = m**in**{ j : 1 < j < n, xj < a},a2 =m**in**{ j : ai < j < n, x j > b) ;if either al or a2 is not def**in**ed because no such j exists, we def**in**e v = 0 . Ingeneral, for k > 2 we seta2k-1 = m**in**{j :a2k-2 < j ::~ n, xj < a},a2k = m**in**{j : a2k-1 < j < n, x> > b} ;if any one of these is undef**in**ed, then all the subsequent ones will be undef**in**ed .Let at be the last def**in**ed one, with £ = 0 if al is undef**in**ed, then v is def**in**edto be [f/2] . Thus v is the actual number of successive times that the sequencecrosses from < a to > b . Although the exact number is not essential, s**in**ce acouple of cross**in**gs more or less would make no difference, we must adhereto a rigid way of count**in**g **in** order to be accurate below .Theorem 9 .4 .2 . Let {X j , .-Tj , j E N„} be a submart**in**gale and -oc < a

9 .4 INEQUALITIES AND CONVERGENCE 1 349the formulas below . In the same ve**in** we set a o - 1 . Observe that a„ = n **in**any case, so thatn-1Xn - X1=Xa„-Xa (X .+ -X )_ +~+If j is odd and j + 1 < f (co), thenj=0 j even j oddXaj+ , (cv) > b>O = X aj (cv) ;If j is odd and j = f (co),thenXaj+i (co) = Xn (co) > 0 = Xaj (w) ;if jis odd and t(co) < j, thenHence **in** all cases we haveXaj+t (Cv) = X, (C.~) = X aj (CO) .(6) (Xaj + , (w) - X aj (( V)) > (X aj+ , (U) - Xaj (co))j odd j oddj+1 2 (n)b = "[0,b1 (co)b .Next, observe that {aj , 0 < j < n} as modified above is **in** general of the form1 = a0 < a1 < a2 < . . . < ae < at + 1 = . . . = a n = n, and s**in**ce constantsare optional, this is an **in**creas**in**g sequence of optional r .v .'s . Hence byTheorem 9 .3 .4, {X aj , 0 < j < n } is a submart**in**gale so that for each j, 0 0 and consequentlyc~ ( Xaj ) > 0 .J evenAdd**in**g to this the expectationsof the extreme terms **in** (6), we obta**in**(7) (`,(X" - X 1) > c`( V [0,b] )b ,which is the particular case of (5) under consideration .In the general case we apply the case just proved to {(X j - a)+, j E N„ },which is a submart**in**gale by Corollary 1 to Theorem 9 .3 .1 . It is clear thatthe number of upcross**in**gs of [a, b] by the given submart**in**gale is exactlythat of [0, b - a] by the modified one . The **in**equality (7) becomes the first**in**equality **in** (5) after the substitutions, and the second one follows s**in**ce(X n - a) + < X+ + jai . Theorem 9 .4 .2 is proved .

3 5 0 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEThe correspond**in**g result for a supermart**in**gale will be given below ; butafter such a pa**in**stak**in**g def**in**ition of upcross**in**g, we may leave the dual def**in**itionof downcross**in**g to the reader .Theorem 9 .4 .3 . Let {Xj , ~Tj ; j E Nn } be a supermart**in**gale and let -cc

9 .4 INEQUALITIES AND CONVERGENCE 1 35 1is a null set; and so is the union over all such pairs . S**in**ce this union conta**in**sthe set where lim ., X, < lim n X n , the limit exists a .e . It must be f**in**ite a .e . byFatou's lemma applied to the sequence JXn 1 .Corollary . Every uniformly bounded smart**in**gale converges a .e . Every positivesupermart**in**gale and every negative submart**in**gale converge a .e.It may be **in**structive to sketch a direct proof of Theorem 9 .4 .4 which isdone "by hand", so to speak . This is the orig**in**al proof given by Doob (1940)for a mart**in**gale .Suppose that the set A [a , b] above has probability > 77>0 . For each co **in**A[ a ,b], the sequence {X n (w), n E N) takes an **in**f**in**ite number of values < aand an **in**f**in**ite number of values > b . Let 1 = no < n 1 < . . . and putA 21 _ 1 = { m**in** Xi < a), A 21 = { max X i > b) .n2j-2

352 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEOnce Theorem 9 .4.4 has been proved for a mart**in**gale, we can extend iteasily to a positive or uniformly **in**tegrable supermart**in**gale by us**in**g Doob'sdecomposition . Suppose {X„ } is a positive supermart**in**gale and X, = Y n - Z,as **in** Theorem 9 .3 .2 . Then 0 < Z n < Y n and consequentlynext we havec`(Z") = lim '(Z, ) < ( (Y 1 ) ;n-> 00c (Yn) = Cr(X11) + ev(Zn)

9 .4 INEQUALITIES AND CONVERGENCE 1 353Theorem 9 .4 .5 . The three propositions below are equivalent for a submart**in**gale{X,,, 11 E N} :(a) it is a uniformly **in**tegrable sequence ;(b) it converges **in** L 1 ;(c) it converges a .e . to an **in**tegrable X,,,, such that {X,,, n E N,,} isa submart**in**gale and ("(X,) converges to (t (X .. . ) .PROOF . (a) = (b) : under (a) the condition **in** Theorem 9 .4 .4 is satisfied sothat X„ -a X,, a .e . This together with uniform **in**tegrability implies X, --* X, '~**in** L i by Theorem 4 .5 .4 with r = 1 .(b) ---* (c) : under (b) let X„ --+ X1,, **in** L 1 , then '1(jX„ J ) -+ C'((X ,~ 1) X,, a .e. by Theorem 9 .4 .4. For each A E ~ and n < n',we haveJA X, &0 < JA J [XflfdPby the def**in****in**g relation . The right member converges to fA X,, d?' by L 1 -convergence and the result**in**g **in**equality shows that {X„ , „ ; n E N~ } is asubmart**in**gale . S**in**ce L 1 -convergence also implies convergence of expectations,all three conditions **in** (c) are proved .(c) = (a) ; under (c) . {X,; , „ ; n c- N om } is a submart**in**gale ; hence wehave for every ).>0 :(10) 4d9',X,+, d~) < X+{Xn f {Xn >),)>kJwhich shows that {X,; , n E N} is uniformly **in**tegrable . S**in**ce Xn -~ X+ a .e .,this implies c (X„) -a (`(X+) . S**in**ce by hypothesis ((X„) -+ ((X~ ), it followsthat ((X,) --± c (X0) . This and Xn -+ X- a .e . imply that {Xn } is uniformly**in**tegrable by Theorem 4 .5 .4 for r = 1 . Hence so is {X„ } .Theorem 9 .4 .6 . In the case of a mart**in**gale, propositions (a) and (b) aboveare equivalent to (c') or (d) below :(c') it converges a .e . to an **in**tegrable X,,, such that {X„ , n E N~ } isa mart**in**gale ;(d) there exists an **in**tegrable r .v . Y such that X„ = -F(Y „) for eachn E N .PROOF . (b) =:~ (c') as before ; (c') = (a) as before if we observe that('(X,,,) for every n **in** the present case, or more rapidly by consider**in**gJX„ I **in**stead of X as below . (c') =~ (d) is trivial, s**in**ce we may takethe Y **in** (d) to be the X,,, **in** (c') . To prove (d) = (a), let n < n', then by

3 54 1 CONDITIONING . MARKOV PROPERTY . MARTINGALETheorem 9 .1 .5 :('(Xn-I~„)=~(~(Yi n-)fi '(Y X",hence {X„ , n , n E N ; Y, } is a mart**in**gale by def**in**ition. Consequently{ IX„ I , : 7,, , n E N ; I Y 1, 271 is a submart**in**gale, and we have for each ) . > 0 :f,IX,I>1, }IXn I d~GAI Y I d< /{ lX„j>a.}~{IXnI > ),}

9 .4 INEQUALITIES AND CONVERGENCE 1 355(a) {X n } is uniformly **in**tegrable ;(b) X„ --> X-,,,, **in** L 1 ;(c) {X n , n E -N .. . } is a submart**in**gale ;(d) limn ~_~ 4, a}< (IX,1 )=2c~(Xn)-L(X,1))}this implies that {X ; }is uniformly **in**tegrable . Next if n < m, then0 > I X n d= ~' = (`'(X,1) - X n dam/'> «(Xn ) - Xrnd J~{X„ > - ;0_ ~c'(Xn - X,n) + (`(Xnl) -J{X,I>-Ik }X, d~J'_ ((X,1 - X111) + X, n

3 5 6 1 CONDITIONING . MARKOV PROPERTY. MARTINGALEby the remark after (12) . It follows that {g**in** ) is also uniformly **in**tegrable, andtherefore (a) is proved .The next result will be stated for the **in**dex set N of all **in**tegers **in** theirnatural order :N={ . . .,-n, . . .,-2,-1,0,1,2, . . .,n, . . .} .Let { In } be **in**creas**in**g B .F .'s on N, namely : n C ~m if n < m . We may"close" them at both ends by adjo**in****in**g the B .F.'s below :oo=1 \ 'nmoo=vnLet {Y n } be r.v.'s **in**dexed by N . If the B .F .'s and r .v .'s are only given on Nor -N, they can be trivially extended to N by putt**in**g ~n = 1 , Y„ = Y 1 forall n < 0, or 7, ,~,7, = ~ 1, Y n = Y-1 for all n > 0. The follow**in**g convergencetheorem is very useful .Theorem 9 .4 .8 . Suppose that the Y n 's are dom**in**ated by an **in**tegrable r .v .Z :(13) sup I Yn I < Znand lim„ Y, = Yx or Y-,, as n --~ oc or -oo . Then we have(14a)(14b)limn-+o0lim~` {Yn I n } _ C~{Y00 I moo } ;'IYn I W**in**} = co {Y-oo I 00} .In particular for a fixed **in**tegrable r .v. Y, we have(15a) lim c`'{Y I W**in** } _ {Y 1 ~-'} ;11-+0C(15b) lim «{Y I `J~/1 } _ " I Y I } .n-+-o0where the convergence holds also **in** L' **in** both cases .PROOF . We prove (15) first . Let X 1t = c{Y I .%} . For n E N, {X, 1i Jn}is a mart**in**gale already **in**troduced **in** (13) of Sec . 9 .3 ; the same is true forn E -N . To prove (15a), we apply Theorem 9 .4.6 to deduce (c') there . Itrema**in**s to identify the limit X 00 with the right member of (15a) . For eachA E :'f**in** , we haveIYd'/'=I X n d .°A= J X0A A A

9 .4 INEQUALITIES AND CONVERGENCE 1 357Hence the equations hold also for A E ~& (why?), and this shows that X".has the def**in****in**g property of 1(Y I ~&, s**in**ce X,,,, E ~& . Similarly, the limitX_~ **in** (15b) exists by Theorem 9 .4 .7 ; to identify it, we have by (c) there,for each A E ,, :fX_ ocam'= d XndGJ'_ YdA.J n IThis shows that X_,,, is equal to the right member of (15b) .We can now prove (14a) . Put for M E N :Wm = sup IY n - Y" 1 ;n>mthen IWmI < 2Z and limm,c Wm = 0 a .e. Apply**in**g (15a) to W m we obta**in**lim co{IY n - YooIj } < lim e{W m I ~Tn } = co{Wm ~ moo}0n-+oon--+00As m -± oc, the last term above converges to zero by dom**in**ated convergence(see (vii) of Sec . 9 .1) . Hence the first term must be zero and this clearlyimplies (14a) . The proof of (14b) is completely similar .Although the corollary below is a very special case we give it for historical**in**terest. It is called Paul Levy's zero-or-one law (1935) and **in**cludesTheorem 8 .1 .1 as a particular case .Corollary .If A E 3 , then(16) lim -OP(A I On ) = I n a .e .n-+ocThe reader is urged to ponder over the **in**tuitive mean**in**g of this result andjudge for himself whether it is "obvious" or "**in**credible" .EXERCISES*1 . Prove that for any smart**in**gale, we have for each k >O :~, Z'P{suP IXn I> ?,} < 3 sup (F(IXn D ) .nnFor a mart**in**gale or a positive or negative smart**in**gale the constant 3 may bereplaced by 1 .2. Let {X,) be a positive supermart**in**gale . Then for almost every w,Xk (w) = 0 implies X,, (w) = 0 for all n > k . [This is the analogue of am**in**imum pr**in**ciple **in** potential theory .]3. Generalize the upcross**in**g **in**equality for a submart**in**gale {Xn ,~} asfollows :{(Xn - a )+ I ) -(X i - a )+°{V[a b]I }

3 5 8 1 CONDITIONING . MARKOV PROPERTY . MARTINGALESimilarly, generalize the downcross**in**g **in**equality for a positive supermart**in**gale{X„ , as follows :X 1 Abv[Q b ] I h } < b - a*4. As a sharpen**in**g of Theorems 9 .4 .2 and 9 .4 .3 we have, for a positivesupermart**in**gale {X,,, n E N} :X A av ( Q,b} > k} < (b ) ( b )(n)J~{v{a b} > k}

9 .4 INEQUALITIES AND CONVERGENCE 1 359; 9 . Let IX,, : h'7-,,n E N} be a supermart**in**gale satisfy**in**g the conditionlimn -0C (` (X„) > -oc . Then we have the representation X„ = X,, + X,7 where{X,,, c „} is a mart**in**gale and {X„, %,} is a positive supennart**in**gale such thatlim„, 0 X„ = 0 **in** L 1 as well as a.e. This is the analogue of F . Riesz'sdecomposition of a superharmonic function, X ;, be**in**g the harmonic part andX;, the potential part . [HINT : Use Doob's decomposition X„ = Y„ - Z„ andput X„=Y„-~`(Z" I~'/n) •]10 . Let IX,,, ?X„ } be a potential ; namely a positive supermart**in**gale suchthat Jim, ,--+,, = 0; and let X„ = Y„ - Z„ be the Doob decomposition[cf. (6) of Sec . 9 .3]. Show thatXn =I(Z,, I~T-n ) - Zn .* 11 . If {X„} is a mart**in**gale or positive submart**in**gale such that(Xn) < oc, then {X„ } converges **in** L2 as well as a .e .SUN_12 . Let {~,,, n E N} be a sequence of **in**dependent and identicallydistributed r.v.'s with zero mean and unit variance ; and S„ _ ~~=.I ~jThen for any optional r .v . a relative to {~,, } such that c (ja- ) < oc, wehave c (IS a I) < /2-C`( .,Aa-) and c(S a ) = 0 . This is an extension of Wald'sequation due to Louis Gordon . [HINT : Truncate a and put 77k = (S2 /,/) -(Sn_ 1 /,/k - 1) ; thenk-1 fit k)77k dJ'00k=1k}/-1k- < 2(r( ,,fc-e ) ;now use Schwarz's **in**equality followed by Fatou's lemma .]The next two problems are meant to give an idea of the passage fromdiscrete parameter mart**in**gale theory to the cont**in**uous parameter theory .13 . Let {X t , `At ; t E [0, 1 ]} be a cont**in**uous parameter supermart**in**gale .For each t E [0, 1 ] and sequence It,j decreas**in**g tot, {X t „ } converges a .e . and**in** L 1 . For each t E [0, 1 ] and sequence {t,, } **in**creas**in**g to t, {X t, } converges a .e .but not necessarily **in** L 1 . [HINT : In the second case consider X t„ - (1'7{X t I ?It,, } .]*14. In Exercise 13 let Q be the set of rational numbers **in** [0, 1] . Foreach t E (0, 1) both limits below exist a .e . :lilnXs , limX s .'Tt,cQ[HINT : Let {Q,,, n > 1 } be f**in**ite subsets of Q such that Q,, T Q ; and apply theupcross**in**g **in**equality to {X,•, S E Q,, }, then let n ---> oo .],i tsEQ

3 60 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE9.5 ApplicationsAlthough some of the major successes of mart**in**gale theory lie **in** the field ofcont**in**uous-parameter stochastic processes, which cannot be discussed here, ithas also made various important contributions with**in** the scope of this work .We shall illustrate these below with a few that are related to our previoustopics, and **in**dicate some others among the exercises .(I)The notions of "at least once" and "**in**f**in**itely often"These have been a recurr**in**g theme **in** Chapters 4, 5, 8, and 9 and play importantroles **in** the theory of random walk and its generalization to Markovprocesses . Let {X,I , n E N ° ) be an arbitrary stochastic process ; the notationfor fields **in** Sec . 9 .2 will be used . For each n consider the events :00A n =U{Xj EB1 ),j=n00M= I I A n = {Xj E B j i .o .},n=1where B n are arbitrary Borel sets .Theorem 9 .5 .1 .We have(1) lim ,?I {An+1 13[0,n)} = 1 M a .e .,where [o .n] may be replaced by {n } or X n if the process is Markovian .PROOF .By Theorem 9 .4 .8, (14a), the limit is~'{M I ~io,00>} = 1 M .The next result is a "pr**in**ciple of recurrence" which is useful **in** Markovprocesses ; it is an extension of the idea **in** Theorem 9 .2 .3 (see also Exercises 15and 16 of Sec . 9 .2) .Theorem 9 .5 .2 . Let {Xn , n E N°} be a Markov process and A, Bn Borelsets . Suppose that there exists 8>0 such that for every n,00(2) ){ Uthen we have[X j E B j] I X n } > 8 a .e . on the set {X n E A n } ;j=n+1(3) J'{[X j E A j i .o.]\[Xj E Bj i .o .]} = 0 .

9.5 APPLICATIONS 1 361PROOF. Let A _ {Xj E Aj i .o .} and use the notation An and M above .We may ignore the null sets **in** (1) and (2) . Then if w E A, our hypothesisimplies thatZII{An+l I X„}((o) > 8 i.o .In view of (1) this is possible only if w E M . Thus A C M, which implies (3) .The **in**tuitive mean**in**g of the preced**in**g theorem has been given byDoebl**in** as follows : if the chance of a pedestrian's gett**in**g run over is greaterthan 8 > 0 each time he crosses a certa**in** street, then he will not be cross**in**git **in**def**in**itely (s**in**ce he will be killed first)! Here IX, E A,,) is the event ofthe nth cross**in**g, {Xn E Bn} that of be**in**g run over at the nth cross**in**g .(II) Harmonic and superharmonic functions for a Markov processLet {X,,, n E N°} be a homogeneous Markov process as discussed **in** Sec . 9 .2with the transition probability function P( ., . ). An extended-valued functionf on R1 is said to be harmonic (with respect to P) iff it is **in**tegrable withrespect to the measure P(x, .) for each x and satisfies the follow**in**g "harmonicequation" ;(4) Vx E ~W1 : f (x) = f P(x, dy)f(y) .It is superharmonic (with respect to P) iff the "_" **in** (4) is replaced by**in** this case f may take the value +oo .Lemma . If f is [super]harmonic, then { f (X,), n e N° }, where X0 - x° forsome given x° **in** is a [super]mart**in**gale .PROOF . We have, recall**in**g (14) of Sec . 9 .2,{f(Xn)} = f P" (xo,dy)f(y) < oc,as is easily seen by iterat**in**g (4) and apply**in**g an extended form of Fub**in**i'stheorem (see, e .g ., Neveu [6]). Next we have, upon substitut**in**g X„ for x **in** (4) :f (X,) = f P(Xn, d y)f (y) = (`{ .f (Xn+l) I Xn} = ("If (Xn+1) 1 ~[°,nl},where the second equation follows by Exercise 8 of Sec . 9 .2 and the third byMarkov property . This proves the lemma **in** the harmonic case ; the other caseis similar . (Why not also the "sub" case?)The most important example of a harmonic function is the g( ., B) ofExercise 10 of Sec . 9 .2 for a given B ; that of a superharmonic function is the

1 B) CONDITIONINGof Exercise 9 there is superharmonic and is called the "potential" of the362. MARKOV PROPERTY . MARTINGALEf ( •,. These assertions follow easily from their probabilisticmean**in**gs given **in** the cited exercises, but purely analytic verificationsare also simple and **in**structive . F**in**ally, if for some B we havefor every x, then 7r( •)set B .7r(x) =P(') (x, B) < oc11=0Theorem 9 .5 .3 . Suppose that the remote field of {Xn, n E N°} is trivial .Then each bounded harmonic function is a constant a .e . with respect to eachwhere A„ is the p .m . of Xn .PROOF. By Theorem 9 .4 .5, f (Xn) converges a .e . to Z such that{ f (Xn), [0,n1 Z, x[0,00)}is a mart**in**gale . Clearly Z belongs to the remote field and so is a constant ca .e . S**in**cef (X n) = (~IJZ ( n }each f (X„) is the same constant c a .e. Mapped **in**to 31, the last assertionbecomes the conclusion of the theorem .(III) The supremum of a submart**in**galeThe first **in**equality **in** (1) of Sec . 9 .4 is of a type familiar **in** ergodic theoryand leads to the result below, which has been called the "dom**in**ated ergodictheorem" by Wiener. In the case where X„ is the sum of **in**dependent r .v .'swith mean 0, it is due to Marc**in**kiewicz and Zygmund . We write IIXIIP forthe U-norm of X : IIXIIn = (IXIp) .Theorem 9 .5 .4. Let 1 < p < oo and 1 / p + 1 /q = 1 . Suppose that {Xn , n EN) is a positive submart**in**gale satisfy**in**g the condition(5) sup ("(XP} < oo .nThen sup,IENX,1 E LP and(6) I I supXn I Ir < q sup I IX,, I IP .11 /tPROOF . The condition (5) implies that IX,,) is uniformly **in**tegrable(Exercise 8 of Sec . 4 .5), hence by Theorem 9 .4 .5, Xn --~ X,,, a .e . and {Xn, n ENom} is a submart**in**gale . Writ**in**g Y for sup X,,, we have by an obvious

9 .5 APPLICATIONS 1 363extension of the first equation **in** (1) of Sec . 9 .4 :(7) W .>0: A~{Y > A} < / X,0 dT .J{Y?~}Now it turns out that such an **in**equality for any two r .v .'sY and X"' impliesthe **in**equality I I Y I I p::S g I IXoo I Ip, from which (6) follows by Fatou's lemma .This is shown by the calculation below, where G Q,) _ 91{Y > a.} .00((Yp) _ -JA dG(A) A}_YpAp-2 da~ d's~ X" fo-17IX,) Yp -1 d~P < gIIXooiIpIIYP -' 11q= glfXoollp{('(Yp)j 11 q .S**in**ce we do not yet know (ff (Yp) < 00, it is necessary to replace Y firstwith Y AM, where m is a constant, and verify the truth of (7) after thisreplacement, before divid**in**g through **in** the obvious way . We then let m T o0and conclude (6) .The result above is false for p = 1 and is replaced by a more complicatedone (Exercise 7 below) .(IV)Convergence of sums of **in**dependent r .v.'sWe return to Theorem 5 .3 .4 and complete the discussion there by show**in**gthat the convergence **in** distribution of the series En X, already implies itsconvergence a .e. This can also be proved by an analytic method based onestimation of ch .f.'s, but the mart**in**gale approach is more illum**in**at**in**g .Theorem 9 .5 .5 . If {X n , n E N) is a sequence of **in**dependent r .v .'s such thatS n =E'=, X j converges **in** distribution as n ---> oo, then S, converges a .e .PROOF . Let f ; be the ch .f. of X j , so thatn(Pit = 11 f jj=1

3 64 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEis the ch .f. of S,, . By the convergence theorem of Sec . 6 .3, cn convergeseverywhere to co, the ch .f. of the limit distribution of S,, . We shall need onlythis fact for (tI < to, where t o is so small that cp(t) 0 0 for ItI < to ; then thisis also true of c ,, for all sufficiently large n . For such values of n and a fixedt with I tt < t o , we def**in**e the complex-valued r .v. Z, as follows :(8)Zne its,cpn (t)Then each Z„ is **in**tegrable ; **in**deed the sequence {Z n ) is uniformly bounded .We have for each n, if ~n denotes the Borel field generated by S1, . . . , S n("{Zn+1eitsn e itX n+ ,(n (t) f n+1(t)eitS n e itX„ + ,~ ncn (t) f n+1 (t)eitSnf n+1 (t)~n (t) f n+1(t)=Z Z/1where the second equation follows from Theorem 9 .1 .3 and the third from**in**dependence . Thus {Zn , :~„ } is a mart**in**gale, **in** the sense that its real andimag**in**ary parts are both mart**in**gales . S**in**ce it is uniformly bounded, it followsfrom Theorem 9 .4 .4 that Zn converges a .e . This means, for each t with I t I < to,there is a set Sg t with °J'(S2 t ) = 1 such that if w E Q, then the sequence ofcomplex numbers eits^ (W ) / cp„ (t) converges and so also does e irsn (w) . But howdoes one deduce from this the convergence of S, (w)? The argument belowmay seem unnecessarily tedious, but it is of a familiar and **in**dispensable k**in**d**in** certa**in** parts of stochastic processes .Consider e irsn (w) as a function of (t, w) **in** the product space T x Q, whereT = [-to, to], with the product measure nZ x where n is the Lebesguemeasure on T . S**in**ce this function is measurable **in** (t, w) for each n, theset C of (t, w) for which limn~~ eits„(m) exists is measurable with respect tom x ~ . Each section of C by a fixed t has full measure ?~ (Q t ) = 1 as justshown, hence Fub**in**i's theorem asserts that almost every section of C by afixed w must also have full measure m(T) = 2to . This means that there existsan S2 with '~'(Q) = 1, and for each w E S2 there is a subset T,,, of T withm(T .) = m(T), such that if t E T, then lirn n,~ e its" (' ) exists . Now we are**in** a position to apply Exercise 17 of Sec . 6 .4 to conclude the convergence ofS n (w) for w E S2, thus f**in**ish**in**g the proof of the theorem .Accord**in**g to the preced**in**g proof, due to Doob, the hypothesis ofTheorem 9 .5 .5 may be further weakened to that the sequence of ch .f.'s ofS„ converges on a set of t of strictly positive Lebesgue measure . In particular,if an **in**f**in**ite product II„ f „ of ch .f .'s converges on such a set, then it convergeseverywhere .

9.5 APPLICATIONS 1 365(V)The strong law of large numbersOur next example is a new proof of the classical strong law of large numbers**in** the form of Theorem 5 .4 .2, (8) . This basically different approach, whichhas more a measure-theoretic than an analytic flavor, is one of the strik**in**gsuccesses of mart**in**gale theory . It was given by Doob (1949) .Theorem 9 .5 .6 . Let IS,, n e N} be a random walk (**in** the sense ofChapter 8) with c`{IS i 1} < oc . Then we havelimn-+ooSnn= '{Si } a .e .PROOF. Recall that S n = Ejn=1 Xj and consider for 1 < k < n :(9) ( F{Xk I Sn, Sn+1 . . . . } = c fXk I f**in**where n is the Borel field generated by {S j , j > n} . Thusas n **in**creases . By Theorem 9 .4 .8, (15b), the right side of (9) converges toc`{Xk 1 6] . Now ;!~;, is also generated by S n and {X j , j > n + 1}, hence itfollows from the **in**dependence of the latter from the pair (X k , S,,) and (3) ofSec . 9 .2 that we have(10) C {Xk Sn, Xn+l, Xn+2, . . .} = ( {Xk S, } = c`IX 1 Sn},the second equation by reason of symmetry (proof?) . Summ**in**g over k from1 to n and tak**in**g the average, we **in**fer thatnENS" _ (r fXi I n }nso that if Y_,, = S,,/n for n E N, {Y,,, n E -NJ is a mart**in**gale . In particular,lim S " = Jim cc{X 1I n} = (``{XI I _},n - oo n 17__>00where the second equation follows from the argument above . On the otherhand, the first limit is a remote (even **in**variant) r .v . **in** the sense of Sec . 8 .1,s**in**ce for every m > 1 we havenlim Sn (w) = limEj=m Xj ((0) ,n-+o0 n n-+oo n

3 6 6 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEhence it must be a constant a .e. by Theorem 8 .1 .2 . [Alternatively we may useTheorem 8 .1 .4, the limit above be**in**g even more obviously a permutable r .v .]This constant must be f { {X 1 1 -,0-11 _ ~ {X 1 }, prov**in**g the theorem .(VI)Exchangeable eventsThe method used **in** the preced**in**g example can also be applied to the theoryof exchangeable events . The events {E,, n E NJ are said to be exchangeableiff for every k > 1, the probability of the jo**in**t occurrence of any k of them isthe same for every choice of k members from the sequence, namely we have(11) ;~1'{E n , f1 . . . fl Enk } = Wk, k E N ;for any subset {n 1 , . , n k } of N . Let us denote the **in**dicator of E n by en ,and putN nj=1then Nn is the number of occurrences among the first n events of the sequence .Denote by , the B .F . generated by {N j , j > n}, andThen the def**in**ition of exchangeability implies that if n j < n for 1k, thennEN< J

9.5 APPLICATIONS 1 3 6 7But it is trivial that 1 + ejz = ( 1 + Z)ej s**in**ce e j takes only the values 0 and1, hence'nnf,tjz'' _ 11(1 +Z) ej _ (1 +Z)N"j=0 j=1From this we obta**in** by compar**in**g the coefficients :(13) fnk= Nn),0 41j=1I owe this derivation to David Klarner .

3 68 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEThe sequence {Q2 (X ), n E N} associated with X is called its squared variationprocess and is useful **in** many applications . We beg**in** with the algebraicidentity :X ;,17j=12xj xj + 2 Xj-lx j .j=1 j=2If X„ E L 2 for every n, then all terms above are **in**tegrable, and we have foreach j :( 16 ) (c (X j-l xj) = cn (Xj_1 cn(xj Jj-1 )) = 0 .It follows thatL(X n )nj=1When X„ is the nth partial sum of a sequence of **in**dependent r .v .'s with zeromean and f**in**ite variance, the preced**in**g formula reduces to the additivity ofvariances ; see (6) of Sec . 5 .1 .Now suppose that {X,} is a positive bounded supermart**in**gale such that0 < X, < a. for all n, where A is a constant . Then the quantity of (16) isnegative and bounded below byIn this case we obta**in** from (15) :XJ((a.(F (xj I ' -I ) ) _ Acr(xj) < 0 .nn> (f (X,,) > c` (Q 2 ) + 2Anj=2C`(xj) _ c (Q1) + 2ALV (X„) = c`(XIso that(17) (,c (Q') < 2A6`(X1) < 2a . 2 .If X is a positive mart**in**gale, then XAA, is a supermart**in**gale of the k**in**djust considered so that (17) is applicable to it . Lett**in**g X* = supl

9 .5 APPLICATIONS 1 369We have therefore established the **in**equality :(18) Z'1 1 Q,, (X) ot'where the fraction is taken to be zero if the denom**in**ator vanishes . Accord**in**gto the discussion of Sec . 9.1, we haveX,, = ("I Y INow suppose that the partitions become f**in**er as n **in**creases so that {J7;;, n E N}is an **in**creas**in**g sequence of Borel fields . Then we obta**in** by Theorem 9 .4 .8 :limX„Y1 :'7 U .11-+00

3 7 0 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEIn particular if Y E ~~ we have obta**in**ed the Radon-Nikodym derivativeY = d v/d 7 as the almost everywhere limit of "**in**crement ratios" over a "net" :(n)Y(w) = lim v( ~' ( c' ))n-aoo ~(A(n) )'i(w)where j(w) is the unique j such that w E Ann ) .If (0, ZT, ~P) is (ill, ~A, m) as **in** Example 2 of Sec . 3 .1, and v is anarbitrary measure which is absolutely cont**in**uous with respect to the Lebesguemeasure m, we may take the nth partition to be 0 = i ( 0 1n) < . . . < 4(n) = 1such thato0 ; otherwise Z n = 0 . Then {Z n , n E N} is a mart**in**galethat converges a .e . and **in** L' . [This is from **in**formation theory .]

9 .5 APPLICATIONS ( 3712. Suppose that for each n, {X j , 1 < j < n} and {X ' , 1 < j < n) haverespectively the n-dimensional probability density functions p, and qn . Def**in**eYn _ gn(X1, . . .,X n )pn(X1, . . .,X,)if the denom**in**ator >0 and = 0 otherwise . Then {Y n , n E N} is a supermart**in**galethat converges a .e . [This is from statistics .]3 . Let {Z n , n E N°} be positive **in**teger-valued r .v .'s such that Z° - 1and for each n > 1, the conditional distribution of Z n given Z 0 , . . . , Z n _ 1 isthat of Zn _ 1 **in**dependent r .v.'s with the common distribution {p k , k E N°),where p1 < 1 and000

3 7 2 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE*7. The analogue of Theorem 9 .5 .4 for p = 1 is as follows : if X n > 0 forall n, thene(''{supXn } < [1 + sup i`{Xn log+ Xn}],n e - 1 nwhere log+ x = log x if x > 1 and 0 if x < 1 .*8 . As an example of a more sophisticated application of the mart**in**galeconvergence theorem, consider the follow**in**g result due to Paul Levy . LetIX, n E N°) be a sequence of uniformly bounded r.v .'s, then the two series~7 Xn and G{Xn I X1, . . .,Xn-1}nnconverge or diverge together . [HINT : LetYn = X n - L {Xn I X1, . . . , Xn-1 } and Z n =nj=1Def**in**e a to be the first time that Z n > A and show that e (ZAn ) is bounded**in** n . Apply Theorem 9 .4 .4 to {ZaAn } for each A to show that Z n converges onthe set where lim„ Z„ < oo ; similarly also on the set where lim71 Z n > - oc .The situation is rem**in**iscent of Theorem 8 .2 .5 .]9. Let {Yk, 1 < k < n} be **in**dependent r .v .'s with mean zero and f**in**itevariances ak ;kSk=~Y i .j=1 j=1a~ > 0, Zk = Sk - SkProve that {Zk, 1 < k < n } is a mart**in**gale . Suppose now all Yk are boundedby a constant A, and def**in**e a and M as **in** the proof of Theorem 9 .4 .1, withthe Xk there replaced by the Sk here . Prove thatSna'(M`) < (F(Sa) _ (1(Sa) < (~.+A) 2 .Thus we obta**in**:/'{ max ISk I < A} < (), 2A )z .1n} . Prove that if (f ((ISaI/a)l{a

9 .5 APPLICATIONS 1 373is due to McCabe and Shepp . [HINT :nCn = 11?/-){1X11 < j} -~ C>0 ;j=1L°O 1 C,~IXn -1I d'/ = 1X1 I dJ~ ;n=1 n {a=n} n=1 n {1X1I>n}~ 1- -n=1 n {an }I'sn-1 I d, < oc .]11. Deduce from Exercise 10 that c~(sup n ISn 1 /n) < oc if and onlyif c`(IX1I log+ IXi1) < oo . [HINT : Apply Exercise 7 to the mart**in**galeI . . . . S„ /n, . . . , S2/2, S 1 } **in** Example (V) .]12. In Example (VI) show that (i) -r,- is generated by rl ; (ii) the events{E n , n E NJ are conditionally **in**dependent given 17 ; (iii) for any l events E nj ,1 < j < l and any k < l we have~'{E,, , n . . . n E,lA n En L+I n . . . n En }= Jf10Xk (lx) l-kG(dx)-where G is the distributions of q .13. Prove the **in**equality (19) .*14. Prove that if v is a measure on ~~ that is s**in**gular with respect to./ , then the X,'s **in** (21) converge a .e . to zero . [HINT : Show thatv(A) = X n dZ' for A E m < n,n15. In the case of (V, A, m), suppose that v = 6 1 and the nth partitionis obta**in**ed by divid**in**g °l/ **in**to 2" equal parts : what are the X„'s **in** (21)? Usethis to "expla**in** away" the St . Peterburg paradox (see Exercise 5 of Sec . 5 .2) .Bibliographical Noteand apply Fatou's lemma . {X" } is a supermart**in**gale, not necessarily a mart**in**gale!]Most of the results can be found **in** Chapter 7 of Doob [17] . Another useful account isgiven by Meyer [20] . For an early but stimulat**in**g account of the connections betweenrandom walks and partial differential equations, seeA . Kh**in**tch**in**e, Asymptotische Gesetze der Wahrsche**in**lichkeitsrechnung . Spr**in**ger-Verlag, Berl**in**, 1933 .

3 7 4 1 CONDITIONING . MARKOV PROPERTY . MARTINGALETheorems 9 .5 .1 and 9 .5 .2 are conta**in**ed **in**Kai Lai Chung, The general theory of Markov processes accord**in**g to Doebl**in**,Z . Wahrsche**in**lichkeitstheorie 2 (1964), 230-254 .For Theorem 9 .5 .4 **in** the case of an **in**dependent process, seeJ. Marc**in**kiewicz and A . Zygmund, Sur les fonctions **in**dependants, Fund . Math .29 (1937), 60-90,which conta**in**s other **in**terest**in**g and useful relics .The follow**in**g article serves as a guide to the recent literature on mart**in**gale**in**equalities and their uses :Ann. Proba-D . L . Burkholder, Distribution function **in**equalities for mart**in**gales,bility 1 (1973), 19-42 .

Supplement : Measure andIntegralFor basic mathematical vocabulary and notation the reader is referred to § 1 .1and §2.1 of the ma**in** text .1 Construction of measureLet Q be an abstract space and / its total Borel field, then AAc Q .E J meansDEFINITION 1 . A function A* with doma**in** / and range **in** [0, oc] is anouter measure iff the follow**in**g properties hold :(a) A* (0) = 0 ;(b) (monotonicity) if A I C A2, then /-t* (A I ) < ~~ * 02) ;(c) (subadditivity) if {A j } is a countable sequence of sets **in** J, then(UiA j< µ*(A j ) .DEFINITION 2 . Let - be a field **in** Q . A function µ with doma**in** ~'Ao andrange **in** [0, oo] is a measure on A% iff (a) and the follow**in**g property hold :

376 1 SUPPLEMENT : MEASURE AND INTEGRAL(d) (additivity) if {B J }UJ BJ E :%, thenis a countable sequence of disjo**in**t sets **in** tand(1)It(UBi) -i )Jg(Bj) .Let us show that the properties (b) and (c) for outer measure hold for a measureg, provided all the sets **in**volved belong to A .If A 1 A2 E z~b, and A 1 C A 2 , then A 'A2 E A because ~5t- is a field ;A 2 =A 1 UA~A 2 and so by (d) :g(A2) = /,t(A1)+ g(A'A2) ? A00-Next, if each AJ E ~b , and furthermore if U J A J E A (this must be assumedfor a countably **in**f**in**ite union because it is not implied by the def**in**ition of afield!), thenJ-A 1 UAlA2 UAlA2A 3 U . . .and so by (d), s**in**ce each member of the disjo**in**t union above belongs to : 3 :g UAJ = A(Ai) + g(A,A2) + A(AlA2A3) + . . .I< g(A1)+g(A2)+A03)+ . . .by property (b) just proved .The symbol N denotes the sequence of natural numbers (strictly positive**in**tegers) ; when used as **in**dex set, it will frequently be omitted as understood .For **in**stance, the **in**dex j used above ranges over N or a f**in**ite segment of N .Now let us suppose that the field : b is a Borel field to be denoted byand that g is a measure on it . Then if A„ E '- for each n E N, the countableunion U, A„ and countable **in**tersection nn A„ both belong to ~T- . In this casewe have the follow**in**g fundamental properties .(e) (**in**creas**in**g limit) if A„ C A, + 1 for all n and A, T A - Un A,,, thenlim T g(A„) = A(A) .17(f) (decreas**in**g limit) if A„ D A„ + 1 for all n, A„ 4, A = nn A,,, and forsome n we have g (A„) < oo, then1im ~ A(An) = A(A) .11

I CONSTRUCTION OF MEASURE 1 377The additional assumption **in** (f) is essential . For a counterexample letA„ _ (n, oo) **in** R, then A„ 4, 4 the empty set, but the measure (length!) of A„is +oc for all n, while 0 surely must have measure 0 . See §3 for formalitiesof this trivial example . It can even be made discrete if we use the count**in**gmeasure # of natural numbers : let A„ _ {n, n + 1, n + 2, . . .} so that #(A„) _+oc, #(n n An) = 0 .Beg**in**n**in**g with a measure µ on a field A, not a Bore] field, we proceedto construct a measure on the Borel field 7 generated by A, namely them**in**imal Borel field conta**in****in**g % (see §2 .1) . This is called an extension ofji from A to Y , when the notation it is ma**in**ta**in**ed . Curiously, we do thisby first construct**in**g an outer measure A* on the total Borel field J and thenshow**in**g that µ* is **in** truth a measure on a certa**in** Borel field to be denotedby z~;7 * that conta**in**s A . Then of course 3T* must conta**in** the m**in**imal ~ , andso it* restricted to Z is an extension of the orig**in**al µ from A to 7. But wehave obta**in**ed a further extension to ~A * that is **in** general "larger" than 3T andpossesses a further desirable property to be discussed .DEFINITION 3 . Given a measure A on a field A **in** Q, we def**in**e µ* onJ as follows, for any A E J :(2) A*(A) = **in**f ,u(BJ)JB J E A for all j and U BJ D A .A countable (possibly f**in**ite) collection of sets {BJ } satisfy**in**g the conditions**in**dicated **in** (2) will be referred to below as a "cover**in**g" of A . The **in**fimumtaken over all such cover**in**gs exists because the s**in**gle set c2 constitutes acover**in**g of A, so that0 < u * (A) < i.c * (Q) < +oo .It is not trivial that µ* (A) = ,u (A) if A E %, which is part of the next theorem .Theorem 1 . We have µ* = I, on A ; u* on J is an outer measure .PROOF . Let A E ~b, then the s**in**gle set A serves as a cover**in**g of A ; henceµ* (A) < µ(A) . For any cover**in**g {BJ } of A, we have AB j E A andUABJ =AEI ;Jhence by property (c) of µ on 3~ followed by property (b) :µ(A) = µ (UAB J) < A(ABJ) < ti (BJ) .JJ

378 1 SUPPLEMENT : MEASURE AND INTEGRALIt follows from (2) that p(A) < ,u*(A) . Thus µ* _ /I. on A .To prove u* is an outer measure, the properties (a) and (b) are trivial .To prove (c), let E > 0 . For each j, by the def**in**ition of ,u* (A j ), there exists acover**in**g {Bjk} of A j such that~A(Bjk) < A * (4 (4j)+ 2jikThe double sequence{Bjk} is a cover**in**g of U j A j such thatE It(B ik) 0 :,u U A j < ,u* (A j ) +cthat establishes (c) for A*, s**in**ce E is arbitrarily small .With the outer measure E.t*, a class of sets ,* is associated as follows .DEFINITION 4 .A set A c Q belongs to ~T* iff for every Z c S2 we have(3) A* (Z) _ It* (AZ) + µ * (A`Z) .If **in** (3) we change "_" **in**to "" .Theorem 2 . Y * is a Borel field and conta**in**s ~ . On 97*, It* is a measure .PROOF . Let A E A . For any Z c 0 and any c > 0, there exists a cover**in**g(B j } of Z such that(4) ~µ(Bj) < µ* (Z) +jS**in**ce AB j E ~;b, JAB j } is a cover**in**g of AZ ; {AcB j } is a cover**in**g of A Z, hence(5) A * (AZ) _< µ(ABj), It* (A`Z) < It (A CBj) .S**in**ce A is a measure on A, we have for each j :'(6) I(ABj) + I(A cBj) - A(Bj ) .

1 CONSTRUCTION OF MEASURE 1 379It follows from (4), (5), and (6) thatA*(AZ) + µ*(A`Z) < µ*(Z) + E .Lett**in**g E ~ 0 establishes the criterion (3) **in** its ">" form . Thus A E ~T *, andwe have proved that 3 C ~* .To prove that 3 ;7 * is a Borel field, it is trivial that it is closed undercomplementation because the criterion (3) is unaltered when A is changed**in**to A` . Next, to show that -IT* is closed under union, let A E 9,- * and B E 3~7 * .Then for any Z C Q, we have by (3) with A replaced by B and Z replaced byZA or ZA` :Hence by (3) aga**in** as written :A*(ZA) = A*(ZAB) + µ*(ZAB') ;A* (ZA`) _ A * (ZA`B) + It* (ZA`B` ) .A* (Z) = ,u* (ZAB) + A* (ZAB`) + A * (ZA`B) + µ* (ZA`B`) .Apply**in**g (3) with Z replaced by Z(A U B), we haveA*(Z(A U B)) = µ*(Z(A U B)A) + p*(Z(A U B)A`)= ,u * (ZA) + A * (ZA`B)= µ*(ZAB) + It* (ZAB`) + µ*(ZA`B) .Compar**in**g the two preced**in**g equations, we see that/J-* (Z) =,u*(Z(A U B)) + µ*(Z(A U B)`) .Hence A U B E *, and we have proved that * is a field .Now let {A j } be an **in**f**in**ite sequence of sets **in** * ; putj-1B I =A 1 , Bj =A j \ UA for j > 2 .(i=1Then {B j } is a sequence of disjo**in**t sets **in** * (because * is a field) and hasthe same union as {Aj} . For any Z C Q, we have for each n > 1 :nJ_ g (Z(~B1)Bfl) + / G * (Z(~B J)B)nj=1 j=1=µ*(ZBn)+A* ( Z flU:j=1BJ))

380 1 SUPPLEMENT : MEASURE AND INTEGRALbecause B,, E T * . It follows by **in**duction on n that(7) ~l Z U B j = µ*(ZBj ) .nJ=1 j=1S**in**ce U". ] B j E :7*, we have by (7) and the monotonicity of 1U* :~(.( * (Z )nnUBj -}- ~(.l.* Z U( j=1 j=1nBjcpp,u *(ZBj) + g* Z U B j>=1 J=1Lett**in**g n f oo and us**in**g property (c) of /f , we obta**in**00,u * (Z )> µ* Z U Bj + ,u * Z ( 00 U Bjj=1 j=1that establishes U~° 1Bj E * . Thus J7* is a Borel field .F**in**ally, let {Bj } be a sequence of disjo**in**t sets **in** * . By the property (b)of ,u* and (7) with Z = S2, we havecc n n o0,u U Bj > lim sup /f U B j = 1 nm µ (B1) = /f (B j ) .j=1nj=1 j=1 j=Comb**in**ed with the property (c) of /f , we obta**in** the countable additivity ofµ* on *, namely the property (d) for a measure :The proof of Theorem 2 is complete .j=1/*(Bj)2 Characterization of extensionsWe have proved thatwhere some of the "n" may turn out to be "=" . S**in**ce we have extended themeasure µ from moo to .f * **in** Theorem 2, what for :J7h- ? The answer will appear**in** the sequel .

2 CHARACTERIZATION OF EXTENSIONS 1 381The triple (S2, zT, µ) where ~ is a Borel field of subsets of S2, and A is ameasure on will be called a measure space . It is qualified by the adjective"f**in**ite" when ,u(S2) < oc, and by the noun "probability" when ,u(Q) = l .A more general case is def**in**ed below .DEFINITION 5 .A measure ,u on a field 33 (not necessarily Borel field) issaid to be a-f**in**ite iff there exists a sequence of sets IQ,, n E N} **in** J~ suchthat µ (Q,,) < oc for each n, and U 17 St n = S2 . In this case the measure space(Q, :h , p), where JT is the m**in**imal Borel field conta**in****in**g , is said to be"a-f**in**ite on so" .Theorem 3 . Let ~~ be a field and z the Borel field generated by r~ . Let µland µ2 be two measures on J that agree on ~ . If one of them, hence bothare a-f**in**ite on `b-, then they agree on 1 .PROOF . Let {S2„ } be as **in** Def**in**ition 5 . Def**in**e a class 6F of subsets of S2as follows :F = {A C S2 : I,1(S2 nA) = A 2 (S2 nA) for all n E N} .S**in**ce Q,, E A, for any A E 3 we have QA E - for all n ; hence 6' D :3 .Suppose A k E l A k C Ak+I for all k E N and Ak T A. Then by property (e)of µ1 and /L2 as measures on we have for each nµi (S2„ A) = 1km t ,uI (Q,,Ak) = liken T A2(S2nAk) = ,2(S2nA) .Thus A E 6'. Similarly by property (f), and the hypothesis (S2 n ) _/L2(Q,,) < oo, if Ak E f and Ak 4, A, then A E (,-- . Therefore { is closed underboth **in**creas**in**g and decreas**in**g limits ; hence D : by Theorem 2 .1 .2 of thema**in** text . This implies for any A Eii1(A) = liranT /L1 (2nA) = lim T /2(QnA) _ U2 (A)nby property (e) once aga**in** . Thus ,u, and lµ2 agree on .It follows from Theorem 3 that under the a-f**in**ite assumption there, theouter measure µ* **in** Theorem 2 restricted to the m**in**imal Borel fieldconta**in****in**g -% is the unique extension of A from 3 to 3 . What about themore extensive extension to ~T*? We are go**in**g to prove that it is also uniquewhen a further property is imposed on the extension . We - beg**in** by def**in****in**gtwo classes of special sets **in** . .DEFINITION 6 . Given the field of sets **in** Q, let

382 1 SUPPLEMENT : MEASURE AND INTEGRALBoth these collections belong to ~T because the Borel field is closed undercountable union and **in**tersection, and these operations may be iterated, heretwice only, for each collection . If B E . ~b , then B belongs to both %os andz~bba because we can take B,,,,, = B . F**in**ally, A E Avb if and only if A` E As,becausen /n U Bmn = U B mnIn nm nTheorem 4 .Let A E T * . There exists B E &b such thatA CB ; It* (A) = A* (B) .If u is a-f**in**ite on ~, then there exists C E Aba such thatC C A ; ,u*(C) = A*(A) .PROOF . For each m, there exists {Bmn } **in** ~T such thatPutA C U BmnnnB,n = U Bmn ;nA * (Bmn) :!~ A * (A) + 1 .mB = n Bmmthen A C B and B E A~b .We haveh * (B) < p * (Bm)

2 CHARACTERIZATION OF EXTENSIONS 1 383S**in**ce St n E ~ and Bn E aQ , it is easy to verify that S2„Bn E .7u s , by thedistributive law for the **in**tersection with a union . PutIt is trivial that C E % SQ andConsequently, we haveC = U Q n B' nnnA = U S2 1A D C .nµ * (A) > µ*(C) > Jim**in**f/c * (S2 nBn)n= lim **in**f µ* (Q,,A) -,a* (A),nthe last equation ow**in**g to property (e) of the measure A* . Thus A* (A) =A* (C), and the assertion is proved .The measure A* on 3T* is constructed from the measure A on the field. The restriction of [c* to the m**in**imal Borel field ~ conta**in****in**g ~?'b7- willhenceforth be denoted by A **in**stead of µ* .In a general measure space (Q, v), let us denote by ..4"(fi', v) the classof all sets A **in** ~ with v(A) = 0 . They are called the null sets when -,6' andv are understood, or v-null sets when ry is understood . Beware that if A C Band B is a null set, it does not follow that A is a null set because A may notbe **in** ! This remark **in**troduces the follow**in**g def**in**ition .DEFINITION 7 . The measure space (Q, ~', v) is called complete iff anysubset of a null set is a null set .Theorem 5 . The follow**in**g three collections of subsets of Q are idential :(i) A C S2 and the outer measure /.c * (A) = 0 ;(ii) A E ~T * and µ*(A) = 0 ;(iii) A C B where B E :% and µ(B) = 0 .It is the collection ( "( *, W) .PROOF . If µ* (A) = 0, we will prove A E :?X-7* by verify**in**g the criterion(3) . For any Z C Q, we have by properties (a) and (b) of µ* :0 < µ*(ZA) < µ* (A) = 0 ; u * (ZA`) < /.t* (Z) ;

384 1 SUPPLEMENT : MEASURE AND INTEGRALand consequently by property (c) :g*(Z) = g * (ZA U ZA`) < g*(ZA) + g*(ZA`) < g* (Z) .Thus (3) is satisfied and we have proved that (i) and (ii) are equivalent .Next, let A E T * and g* (A) = 0 . Then we have by the first assertion **in**Theorem 4 that there exists B E 3;7 such that A C B and g* (A) = g (B) . ThusA satisfies (iii) . Conversely, if A satisfies (iii), then by property (b) of outermeasure : g* (A) < g* (B) = g(B) = 0, and so (i) is true .As consequence, any subset of a (ZT*, g*)-null set is a (*, g*)-null set .This is the first assertion **in** the next theorem .Theorem 6 . The measure space (Q, *, g*) is complete . Let (Q, 6-7 , v) be acomplete measure space ; §' D A and v = g on A . If g is a-f**in**ite on A then;J' D3 * and v=g* on :'* .PROOF . Let A E *, then by Theorem 4 there exists B E O and C Esuch that(8) C C A C B ; g(C) = g*(A) = A(B) .S**in**ce v =,u on z o , we have by Theorem 3, v = g on 3T . Hence by (8) wehave v(B - C) = 0 . S**in**ce A - C C B - C and B - C E ;6', and (Q, ;6', v) iscomplete, we have A - C E -6- and so A = C U (A - C) E .Moreover, s**in**ce C, A, and B belong to , it follows from (8) thatA(C) = v(C) < v(A) < v(B) = g(B)and consequently by (8) aga**in** v(A) = g(A) . The theorem is proved .To summarize the gist of Theorems 4 and 6, if the measure g on the field~To is a-f**in**ite on A, then (3;7 , g) is its unique extension to ~ , and ( *, g* )is its m**in**imal complete extension . Here one is tempted to change the notationgtogoonJ%!We will complete the picture by show**in**g how to obta**in** (J~ *, g*) from( ~?T , g), revers**in**g the order of previous construction . Given the measure space(Q, ZT, g), let us denote by the collection of subsets of 2 as follows : A E Ciff there exists B E I _(:T, g) such that A C B . Clearly has the "hereditary"property : if A belongs to C, then all subsets of A belong to ~ ; is also closedunder countable union . Next, we def**in**e the collection(9) 3 ={ACQIA =B-C where BE`,CEC} .where the symbol "-" denotes strict difference of sets, namely B - C = BC`where C C B . F**in**ally we def**in**e a function µ on :-F as follows, for the A

2 CHARACTERIZATION OF EXTENSIONS 1 385shown **in** (9) :(10) A(A) = µ(B) .We will legitimize this def**in**ition and with the same stroke prove the monotonicityof µ . Suppose then(11) B1 - C1CB2 - C2, BiE9,CiE ,i=1,2 .Let C 1 C D E J '(Z,,7 , µ) . Then B 1 C B 2 U D and so µ(B 1 ) < µ(B 2 U D) =µ(B2)- When the C **in** (11) is "=", we can **in**terchange B 1 and B2 to concludethat µ(B1) = µ(B2), so that the def**in**ition (10) is legitimate .Theorem 7 . ~F is a Borel field and µ is a measure on 7 .PROOF . Let A n E 3T, n E N ; so that A n = B J1 Cn as **in** (9) . We have then00( ' 'n n BnnU Cn .n=1 n=1 n=1S**in**ce the class -C is closed under countable union, this shows that ~ is closedunder countable **in**tersection . Next let C C D, D E At(3~7 , µ) ; thenA`=B`UC=B`UBC=B`U(B(D-(D-C)))=(B`UBD)-B(D-C) .S**in**ce B(D - C) C D, we have B(D - C) E f; hence the above shows thatis also closed under complementation and therefore is a Borel field . Clearly~T D ~ because we may take C = 0 **in** (9) .To prove J is countably additive on 7, let JA„) be disjo**in**t sets **in**7 . ThenA ll = Bn -C 11 , Bn E ~J , Cn E .There exists D **in** µ) conta**in****in**g U1 C n . Then {B n - D) aredisjo**in**t and00 00 ccU (Bn - D) C U A n C U Bn .n=1 n=1 n=1All these sets belong to ?T and so by the monotonicity of µ :A < A (uAn< µ (uB)n .nn

386 1 SUPPLEMENT : MEASURE AND INTEGRALS**in**ce µ = µ on ~T, the first and third members above are equal to, respectively:A (u(B n - D) = l-, (Bn - l-t(Bn) µ(A,,) ;n n n nIt (u)Bn ) :!~ 1i(Bn) lT(An)nTherefore we havennµ ~UA n =nn710n)-S**in**ce µ(q) = µ(O - q) = /.t(O) = 0, µ is a measure on 9- .Corollary. In truth: 2 = * and µ = µ* .PROOF . For any A E *, by the first part of Theorem 4, there exists B Esuch thatA=B-(B-A), µ*(B-A)=0 .Hence by Theorem 5, B - A E and so A E ~T by (9) . Thus 97* c . S**in**ceC * and { E * by Theorem 6, we have 9 7 C 97 * by (9) . Hence =*. It follows from the above that µ*(A) = µ(B) = µ(A) . Hence µ* = µ onThe question arises naturally whether we can extend µ from 5 todirectly without the **in**tervention of * . This is **in**deed possible by a method oftransf**in**ite **in**duction orig**in**ally conceived by Borel ; see an article by LeBlancand G .E . Fox: "On the extension of measure by the method of Borel", CanadianJournal of Mathematics, 1956, pp . 516-523 . It is technically lengthierthan the method of outer measure expounded here .Although the case of a countable space 2 can be treated **in** an obviousway, it is **in**structive to apply the general theory to see what happens .Let Q = N U w ; A is the m**in**imal field (not Borel field) conta**in****in**g eachs**in**gleton n **in** N, but not w . Let Nf denote the collection of all f**in**ite subsetsof N ; then `~o consists of Nf and the complements of members of Nf (withrespect to Q), the latter all conta**in****in**g w . Let 0 < µ(n) < oc for all n E N ; ameasure µ is def**in**ed on alb as follows :µ(A) = / Jµ(n) if A E Nf ; µ(A`) = µ(S2) - µ(A) .nEAWe must still def**in**e µ(Q) . Observe that by the properties of a measure, wehave µ(S2) > >fEN lc(n) = s, say .

3 MEASURES IN R 1 387Now we use Def**in**ition 3 to determ**in**e the outer measure µ* . It is easyto see that for any A C N, we haveIn particular A* (N) = s .A * (A) =Next we haveEAa (n)µ* (w) = **in**f /-t (A`) = µ(0) - sup µ(A) = µ(2) - sAENfAENfprovided s < oc ; otherwise the **in**f above is oo . Thus we haveµ * (w) _ A(Q) - s if µ(2) < oc ; µ* (a)) = oc if µ(2) = cc .It follows that for any A C S2 :µ * (A) = µ * (n)where µ* (n) = g (n) for n E N . Thus µ* is a measure on,/, namely, ~?T* = J .But it is obvious that ~T- = J s**in**ce 3 conta**in**s N as countable union andso conta**in**s co as complement . Hence ~T = ~ * = J .If µ(S2) oc and s = oc, the extension µ* of a to J is not unique,because we can def**in**e µ((o) to be any positive number and get an extension. Thus µ is not 6-f**in**ite on A, by Theorem 3 . But we can verify thisdirectly when µ(S2) = oc, whether s = oc or s < oc . Thus **in** the latter case,A(c)) = cc is also the unique extension of µ from A to J . This means thatthe condition of a-f**in**iteness on A is only a sufficient and not a necessarycondition for the unique extension .As a ramification of the example above, let Q = N U co, U cot , with twoextra po**in**ts adjo**in**ed to N, but keep J,O as before. Then (_ *) is strictlysmaller than J because neither w l nor 0)2 belongs to it . From Def**in**ition 3 weobta**in**µ*(c ) 1 U cot) = Lc*(w1) = A*(w2)Thus µ* is not even two-by-two additive on .' unless the three quantitiesabove are zero . The two po**in**ts coi and cot form an **in**separable couple . Weleave it to the curious reader to wonder about other possibilities .nEA3 Measures **in** RLet R = (- oc, +oo) be the set of real members, alias the real l**in**e, with itsEuclidean topology. For -oo < a < b < +oc,(12) (a,b]={xER:a

3 8 8 1 SUPPLEMENT : MEASURE AND INTEGRALis an **in**terval of a particular shape, namely open at left end and closed at rightend . For b = +oc, (a, +oo] = (a, +oo) because +oc is not **in** R . By choiceof the particular shape, the complement of such an **in**terval is the union oftwo **in**tervals of the same shape :(a, b]` = (-oc, a] U (b, oo] .When a = b, of course (a, a] = q5 is the empty set . A f**in**ite or countably**in**f**in**ite number of such **in**tervals may merge end to end **in**to a s**in**gle one asillustrated below :°° 1 1(13) (0,21=(0,11U(1,21 ; (0,11=Un+1' nn=1Apart from this possibility, the representation of (a, b] is unique .The m**in**imal Borel field conta**in****in**g all (a, b] will be denoted by 3 andcalled the Borel field of R . S**in**ce a bounded open **in**terval is the countableunion of **in**tervals like (a, b], and any open set **in** R is the countable unionof (disjount) bounded open **in**tervals, the Borel field A conta**in**s all opensets ; hence by complementation it conta**in**s all closed sets, **in** particular allcompact sets . Start**in**g from one of these collections, form**in**g countable unionand countable **in**tersection successively, a countable number of times, one canbuild up A3 through a transf**in**ite **in**duction .Now suppose a measure m has been def**in**ed on J6, subject to the soleassumption that its value for a f**in**ite (alias bounded) **in**terval be f**in**ite, namelyif -oc < a < b < + oo, then(14) 0 < m((a, b]) < oc .We associate a po**in**t function F on R with the set function m on as follows :(15) F(0) = 0; F(x) = m((0, x]) for x > 0 ; F(x) = -m((x, 0]) for x < 0 .This function may be called the "generalized distribution" for m .F is f**in**ite everywhere ow**in**g to (14), andWe see that(16) m((a, b1) = F(b) - F(a) .F is **in**creas**in**g (viz . nondecreas**in**g) **in** R and so the limitsF(+oc) = lim F(x) < +oo, F(-oo) = lim F(x) > -oox- +oo x-+ -ooboth exist . We shall write oc for +oc sometimes . Next, F has unilateral limitseverywhere, and is right-cont**in**uous :F(x-) < F(x) = F(x+) .

3 MEASURES IN R 1 389The right-cont**in**uity follows from the monotone limit properties (e) and (f )of m and the primary assumption (14) . The measure of a s**in**gle po**in**t x isgiven bym(x) = F(x) - F(x-) .We shall denote a po**in**t and the set consist**in**g of it (s**in**gleton) by the samesymbol .The simplest example of F is given by F(x) __ x . In this case F is cont**in**uouseverywhere and (14) becomesm((a, b]) = b - a .We can replace (a, b] above by (a, b), [a, b) or [a, b] because m(x) = 0 foreach x . This measure is the length of the l**in**e-segment from a to b . It was**in** this classic case that the follow**in**g extension was first conceived by SmileBorel (1871-1956) .We shall follow the methods **in** §§1-2, due to H . Lebesgue andC. Caratheodory . Given F as specified above, we are go**in**g to construct ameasure m on _1Z and a larger Borel field _1~6* that fulfills the prescription (16) .The first step is to determ**in**e the m**in**imal field Ao conta**in****in**g all (a, b] .S**in**ce a field is closed under f**in**ite union, it must conta**in** all sets of the formn(17) B- U I j, I j = (a j, b j ], 1< j< n ; n E N .j=1Without loss of generality, we may suppose the **in**tervals I jmerg**in**g **in**tersections as illustiated byto be disjo**in**t, by(1, 31 U (2, 41 = (1, 41 .Then it is clear that the complement BC is of the same form . The union oftwo sets like B is also of the same form . Thus the collection of all sets likeB already forms a field and so it must be Ao . Of course it conta**in**s (**in**cludes)the empty set ¢ = (a, a] and R . However, it does not conta**in** any (a, b) exceptR, [a, b), [a, b], or any s**in**gle po**in**t!Next we def**in**e a measure m on Ao satisfy**in**g (16) . S**in**ce the condition(d) **in** Def**in**ition 2 requires it to be f**in**itely additive, there is only one way :for the generic B **in** (17) with disjo**in**t I j we must putn(18) m(B) = i i ) = (F(bj) - F(aj)) .j=1 j=1Hav**in**g so def**in**ed m on Ao, we must now prove that it satisfies the condition(d) **in** toto, **in** order to proclaim it to be a measure on o . Namely, ifn

390 1 SUPPLEMENT : MEASURE AND INTEGRAL{B k , 1 < k < I < oc} is a f**in**ite or countable sequence of disjo**in**t sets **in** A o,we must prove(19) m U B k =k=11k=m(Bk) .whenever I is f**in**ite, and moreover when I = oc and the union U k _ 1 Bk happensto be **in** : 3o .The case for a f**in**ite 1 is really clear . If each Bk is represented as **in**(17), then the disjo**in**t union of a f**in**ite number of them is represented **in** asimilar manner by pool**in**g together all the disjo**in**t I1's from the Bk's . Then theequation (19) just means that a f**in**ite double array of numbers can be summed**in** two orders .If that is so easy, what is the difficulty when l = oo? It turns out, asBore] saw clearly, that the crux of the matter lies **in** the follow**in**g fabulous"banality ."Borel's lemma .If -oo < a < b < +oc and(20) (a, b] = U(aj, b1],j=100where a j < b j for each j, and the **in**tervals (a j , b j ] are disjo**in**t, then we have(21) F(b) - F(a)00j=1(F(bj) - F(a j )) .PROOF . We will first give a transf**in**ite argument that requires knowledgeof ord**in**al numbers . But it is so **in**tuitively clear that it can be appreciatedwithout that prerequisite . Look**in**g at (20) we see there is a unique **in**dexj such that h j = b ; name that **in**dex k and rename ak as c1 . By remov**in**g(a k , bk] = (c 1 , b] from both sides of (20) we obta**in**(22) (a, c1] = U(a j , b j ]00J-1j#kThis small but giant step shortens the orig**in**al (a, b] to (a, c1] . Obviously wecan repeat the process and shorten it to (a, c2] where a < c2 < c1 = b, and soby mathematical **in**duction we obta**in** a sequence a < c„ < . . . < c2 < c1 = b .Needless to say, if for some n we have c„ = a, then we have accomplishedour purpose, but this cannot happen under our specific assumptionsbecause we have not used up all the **in**f**in**ite number of **in**tervals **in** the union .

3 MEASURES IN R 1 39 1Therefore the process must go on ad **in**f**in**itum . Suppose then c„ > c„+1 forall 17 E N, so that c u, = lim„ ~ c„ exists, then c u, > a . If c u, = a (which caneasily happen, see (13)), then we are done and (21) follows, although the terms**in** the series have been gathered step by step **in** a (possibly) different order .What if cu, > a? In this case there is a unique j such that b j = c w ; renamethe correspond**in**g a j as c,,1 . We have now(23) (a, c.] =00U (0j, bj],j=1where the (a', b 'J's are the leftovers from the orig**in**al collection **in** (20) afteran **in**f**in**ite number of them have been removed **in** the process . The **in**terval(cc 1, cu,] is conta**in**ed **in** the reduced new collection and we can beg**in** a newprocess by first remov**in**g it from both sides of (23), then the next, to bedenoted by [cu,2, c,,,], and so on. If for some n we have c u,,, = a, then (21)is proved because at each step a term **in** the sum is gathered . Otherwise thereexists the limit lim ~ c u,,, = cww > a . If c,,, = a, then (21) follows **in** the limit .Otherwise c,,,,,, must be equal to some b j (why?), and the **in**duction goes on .Let us spare ourselves of the cumbersome notation for the successive wellorderedord**in**al numbers . But will this process stop after a countable numberof steps, namely, does there exist an ord**in**al number a of countable card**in**alitysuch that ca = a? The answer is "yes" because there are only countably many**in**tervals **in** the union (20) .The preced**in**g proof (which may be made logically formal) reveals thepossibly complex structure hidden **in** the "order-bl**in**d" union **in** (20) . Borel **in**his These (1894) adopted a similar argument to prove a more general resultthat became known as his Cover**in**g Theorem (see below) . A proof of the lattercan be found **in** any text on real analysis, without the use of ord**in**al numbers .We will use the cover**in**g theorem to give another proof of Borel's lemma, forthe sake of comparison (and learn**in**g) .This second proof establishes the equation (21) by two **in**equalities **in**opposite direction . i he first **in**equality is easy by consider**in**g the first n terms**in** the disjo**in**t union (20) :nF(b) - F(a) (F(bj) - F(aj)) .j=1As n goes to **in**f**in**ity we obta**in** (21) with the "=" replaced by ">" .The other half is more subtle : the reader should pause and th**in**k why?The previous argument with ord**in**al numbers tells the story .

392 1 SUPPLEMENT : MEASURE AND INTEGRALBorel's cover**in**g theorem . Let [a, b] be a compact **in**terval, and (a j , bj ),j E N, be bounded open **in**tervals, which may **in**tersect arbitrarily, such that(24)[a, b] C U (aj, b j) .j=1Then there exists a f**in**ite **in**teger 1 such that when 1 is substituted for oo **in**the above, the **in**clusion rema**in**s valid .In other words, a f**in**ite subset of the orig**in**al **in**f**in**ite set of open **in**tervalssuffices to do the cover**in**g . This theorem is also called the He**in**e-BorelTheorem ; see Hardy [1] (**in** the general Bibliography) for two proofs by Besicovitch.To apply (24) to (20), we must alter the shape of the **in**tervals (aj , bj ] tofit the picture **in** (24) .Let -oo < a < b < oc ; and E > 0. Choose a' **in** (a, b),choose bj > b j such thatand for each j(25) F(a') - F(a) < 2 ; F(b') - F(b j ) < 2j+ 1 'These choices are possible because F is right cont**in**uous ; and now we have00[a', b] C U (a j ,j=1b')as required **in** (24) .Hence by Borel's theorem, there exists a f**in**ite l such thatt(26) [a', b] C U (aj , b') .j=1From this it follows "easily" that(27) F(b) - F(a') (F(bj ) - F(aj )) .j=1We will spell out the proof by **in**duction on 1 . When l = 1 it is obvious .Suppose the assertion has been proved for l - 1, l > 2 . From (26) as written,there is k, 1 < k < 1, such that a k < a' < b k and so(28) F(bk) - F(d) < F'(bk) - F(ak) .

3 MEASURES IN R 1 393If we **in**tersect both sides of (26) with the complement of (a k , bk ), we obta**in**i[b k , b] C U(aj, b j ) .j=1j#kHere the number of **in**tervals on the right side is 1hypothesis we have- 1 ; hence by the **in**ductionF(b) - F(bk ) -(F(b j ) - F(aj))j=1j#kAdd**in**g this to (28) we obta**in**from (27) and (25) that(27), and the **in**duction is complete . It followsF(b) - F(a) (F(b j ) - F(a j )) + E .j=1Beware that the 1 above depends on E . However, if we change l to 00 (back to**in**f**in**ity!) then the **in**f**in**ite series of course does not depend on E . Therefore wecan let c -* 0 to obta**in** (21) when the "=" there is changed to " _

3 9 4 1 SUPPLEMENT : MEASURE AND INTEGRALWe will proven(30) m(Ii) E m(I jk) .0Cr=1 j=1 k=1For n = 1, (29) is of the form (20) s**in**ce a countable set of sets can beordered as a sequence . Hence (30) follows by Borel's lemma . In general,simple geometry shows that each I i **in** (29) is the union of a subcollection ofthe 1 jk's . This is easier to see if we order the Ii 's **in** algebraic order and, aftermerg**in**g where possible, separate them at nonzero distances . Therefore (30)follows by add**in**g n equations, each of which results from Borel's lemma .This completes the proof of the countable additivity of m on ?A0, namely(19) is true as stipulated there for 1 = oo as well as l < oc .The general method developed **in** § 1 can now be applied to (R, J3o, m) .Substitut**in**g J3 o for zTb, m for t **in** Def**in**ition 3, we obta**in** the outer measurem* . It is remarkable that the countable additivity of m on A0, for which twopa**in**stak**in**g proofs were given above, is used exactly **in** one place, at the beg**in**n**in**gof Theorem 1, to prove that m* = m on .730 . Next, we def**in**e the Borelfield 73* as **in** Def**in**ition 4 . By Theorem 6, (R, ?Z*, m*) is a complete measurespace . By Def**in**ition 5, m is 6-f**in**ite on Ao because (-n, n] f (-oo, oo) asn T oc and m((-n, n]) is f**in**ite by our primary assumption (14) . Hence byTheorem 3, the restriction of m* to J3 is the unique extension of m from J' oto J,3 .In the most important case where F(x) _- x, the measure m on J30 is thelength : m((a, b]) = b - a . It was Borel who, around the turn of the twentiethcentury, first conceived of the notion of a countably additive "length" on anextensive class of sets, now named after him : the Borel field J3 . A member ofthis class is called a Borel set . The larger Borel field J3* was first constructedby Lebesgue from an outer and an **in**ner measure (see pp . 28-29 of ma**in**text) . The latter was later bypassed by Caratheodory, whose method is adoptedhere . A member of A)* is usually called Lebesgue-measurable . The **in**timaterelationship between J3 and J3* is best seen from Theorem 7 .The generalization to a generalized distribution function F is sometimesreferred to as Borel-Lebesgue-Stieltjes . See §2 .2 of the ma**in** text for thespecial case of a probability distribution .The generalization to a Euclidean space of higher dimension presents nonew difficulty and is encumbered with tedious geometrical "baggage" .It can be proved that the card**in**al number of all Borel sets is that of thereal numbers (viz . all po**in**ts **in** R), commonly denoted by c (the cont**in**uum) .On the other hand, if Z is a Borel set of card**in**al c with m(Z) = 0, suchas the Cantor ternary set (p . 13 of ma**in** text), then by the remark preced**in**gTheorem 6, all subsets of Z are Lebesgue-measurable and hence their totalityI

4 INTEGRAL 1 395has card**in**al 2C which is strictly greater than c (see e .g . [3]) . It follows thatthere are **in**comparably more Lebesgue-measurable sets than Borel sets .It is however not easy to exhibit a set **in** A* but not **in** A ; see ExerciseNo . 15 on p . 15 of the ma**in** text for a clue, but that example uses a non-Lebesgue-measurable set to beg**in** with .Are there non-Lebesgue-measurable sets? Us**in**g the Axiom of Choice,we can "def**in**e" such a set rather easily ; see example [3] or [5] . However, PaulCohen has proved that the axiom is **in**dependent of the other logical axiomsknown as Zermelo-Fraenkel system commonly adopted **in** mathematics ; andRobert Solovay has proved that **in** a certa**in** model without the axiom ofchoice, all sets of real numbers are Lebesgue-measurable . In the notation ofDef**in**ition 1 **in** § 1 **in** this case, ?Z* = J and the outer measure m* is a measureon J .N .B . Although no explicit **in**vocation is made of the axiom of choice **in**the ma**in** text of this book, a weaker version of it under the prefix "countable"must have been casually employed on the q .t. Without the latter, allegedly it isimpossible to show that the union of a countable collection of countable setsis countable . This k**in**d of logical f**in**esse is beyond the scope of this book .4 IntegralThe measure space (Q, ~ , µ) is fixed . A function f with doma**in** Q andrange **in** R* = [-oo, +oo] is called ~-measurable iff for each real number cwe have{f

1 SUPPLEMENT3 9 6: MEASURE AND INTEGRALto beDEFINITION 8(a) . For the basic function f **in** (31), its **in**tegral is def**in**ed(32) E(f) = aju(Aj)and is also denoted byf fdµ = f f(w)µ(dw) .s~If a term **in** (32) is O.oo or oo .0, it is taken to be 0 . In particular if f - 0, thenE(O) = 0 even if µ(S2) = oo . If A E ~ and µ(A) = 0, then the basic functionoc .lA + O .lAchas **in**tegral equal tooo .0 + 0.µ(A`) = 0 .We list some of the properties of the **in**tegral .(i) Let {Bj} be a countable set of disjo**in**t sets **in** ?~`, with union S2 and {bj)arbitrary positive numbers or oc, not necessarily dist**in**ct . Then the function(33) bj 1B ;is basic, and its **in**tegral is equal tobj µ (Bj)iPROOF . Collect all equal bb's **in**to aj and the correspond**in**g Bb's **in**to Ajas **in** (31) . The result follows from the theorem on double series of positiveterms that it may be summed **in** any order to yield a unique sum, possibly +oc .(ii) If f and g are basic and f < g, thenE(f) < E(g) .In particular if E(f) = +oo, then E(g) = +oo .PROOF . Let f be as **in** (31) and g as **in** (33) . The doubly **in**dexed set{Ajfl Bk} are disjo**in**t and their union is Q . We have us**in**g (i) :E(f) = aj µ(Ajfl Bk) ;E(g) = bkµ(Ai fl Bk) .i

4 INTEGRAL 1 397The order of summation **in** the second double series may be reversed, and theresult follows by the countable additivity of lt .(iii) If f and g are basic functions, a and b positive numbers, then a f +bg is basic andE(a f + bg) = aE(f) + bE(g) .PROOF . It is trivial that a f is basic andE(af) = aE(f) .Hence it is sufficient to prove the result for a = b = 1 .decomposition **in** (ii), we haveUs**in**g the doubleE(f + g) = ( J + bk)A(Aj n Bk) .JSplitt**in**g the double series **in** two and then summ**in**g **in** two orders, we obta**in**the result .It is good time to state a general result that conta**in**s the double seriestheorem used above and some other version of it that will be used below .Double Limit Lemma . Let {C Jk ; j E N, k E N} be a doubly **in**dexed arrayof real numbers with the follow**in**g properties :(a) for each fixed j, the sequence {C J k ; k E N) is **in**creas**in**g **in** k ;(b) for each fixed k, the sequence {CJk ; j E N} is **in**creas**in**g **in** j .Then we havelim T lira f CJk = lim T lim T C Jk < + oo .j k k jThe proof is surpris**in**gly simple . Both repeated limits exist by fundamentalanalysis . Suppose first that one of these is the f**in**ite number C . Thenfor any E > 0, there exist jo and k o such that CJok0 > C - E . This impliesthat the other limit > C - e. S**in**ce c is arbitrary and the two **in**dices are**in**terchangeable, the two limits must be equal . Next if the C above is +oo,then chang**in**g C - E **in**to c -1 f**in**ishes the same argument .As an easy exercise, the reader should derive the cited theorem on doubleseries from the Lemma .Let A E '~ and f be a basic function . Then the productfunction and its **in**tegral will be denoted by(34) E(A ; f) = ff (w)lt(dw) = ff dµ .AAl A f is a basic

3 9 8 1 SUPPLEMENT : MEASURE AND INTEGRAL(iv) Let A n E h , A n C An+1 for all n and A = U„A n . Then we have(35) limE(A,, ; f) = E(A ; f ) .nPROOF . Denote f by (33), so that IA f = >j bJ l ABj . By (i),E (A ; f) = b j µ (ABA)with a similar equation where A is replaced by A,, . S**in**ce p.(A„B j ) T u.(ABj)as n T oo, and as m f oo, (35) follows by the double limittheorem .Consider now an **in**creas**in**g sequence {f n } of basic functions, namely,f n< f n+1 for all n . Then f = limn T f n exists and f E , but of course fneed not be basic ; and its **in**tegral has yet to be def**in**ed . By property (ii), thenumerical sequence E(f n ) is **in**creas**in**g and so lim n f E(f n ) exists, possiblyequal to +oo . It is tempt**in**g to def**in**e E(f) to be that limit, but we need thefollow**in**g result to legitimize the idea .Theorem 8 . Let {fn} and 19n) be two **in**creas**in**g sequences of basic functionssuch that(36) lira T f n =lira T9,nn(everywhere **in** Q) . Then we have(37) lim T E(f n ) = Jim T E(gn ) .nnPROOF .Denote the common limit function **in** (36) by f and putA = {w E Q : f (w) > 0),then A E ZT . S**in**ce 0 < g n < f, we have lAcg n = 0 identically ; hence by property(iii) :(38) E(g,) = E(A ; g,,) -}- E(A` ; gn) = E(A ; gn ) .Fix an n and put for each k E N :Ak = CO E Q : f k (w) > n - 1 gn (w) .nS**in**ce f k < f k+1, we have A k C A k+1 for all k . We are go**in**g to prove that(39)00U Ak =A .k=1

n4 INTEGRAL 1 399If W E A k , then f (w) > f k(w) > [(n - In ]g, (w) > 0 ; hence w E A . On theother hand, if w E A thenl m T f k (w) = f (w) ->gn (w)and f (w) > 0 ; hence there exists an **in**dex k such thatf k (w) > n -n1 gn (w)and so w E Ak . Thus (39) is proved . By property (ii), s**in**cen-1fk>- IA, f k >- IA, n gnwe haven-1E(fk) > E(Ak ;fk) ?nE(Ak ;gn) .Lett**in**g k T oo, we obta**in** by property (iv) :lkm T E(fk) >n-1lkmE(Ak ;gn)n-1 n-1E (A ; gn) =E (g,l )nnwhere the last equation is due to (38) . Now let n T oo to obta**in**hmf E(fk)>lnmTE(gn) .S**in**ce { f , } and {g,) are **in**terchangeable, (37) is proved .Corollary . Let f , and f be basic functions such that f „ f f , then E (f n ) TE(f ) .PROOF. Take g n = f for all n **in** the theorem .The class of positive :T-measurable functions will be denoted by i+ .Such a function can be approximated **in** various ways by basic functions . It isnice to do so by an **in**creas**in**g sequence, and of course we should approximateall functions **in** J~+ **in** the same way . We choose a particular mode as follows .Def**in**e a function on [0, oo] by the (uncommon) symbol ) ] :)0] = 0 ; )oo] = oo ;)x]=n-l for xE(n-1,n},nEN .

400 1 SUPPLEMENT : MEASURE AND INTEGRALThus )Jr] = 3, )4] = 3 . Next we def**in**e for any f E :'~+ the approximat**in**gsequence { f (""), III E N, by(40) f(m)(w) _)2m f (w)]2mEach f ("') is a basic function with range **in** the set of dyadic (b**in**ary) numbers :{k/2"'} where k is a nonnegative **in**teger or oo. We have f (m) < f (m+1) forall m, by the magic property of bisection . F**in**ally f (**in** ) T f ow**in**g to theleft-cont**in**uity of the function x -*)x] .DEFINITION 8(b) .For f E `+, its **in**tegral is def**in**ed to be(41) E(f) = lim f E(f 'm ) ) .mWhen f is basic, Def**in**ition 8(b) is consistent with 8(a), by Corollary toTheorem 8 . The extension of property (ii) of **in**tegrals to 3+ is trivial, becausef < g implies f (m) < g (m ) . On the contrary, (f + g) (m ) is not f (m) + g (m) , buts**in**ce f (' n) + g (m) T (f + g), it follows from Theorem 8 thatlim T E(f (m) + g (m) ) = lim f**in**mE((f + g) (m) )that yields property (iii) for ~_+, together with E(a f (m ) ) T aE(f ), for a > 0 .Property (iv) for ~+ will be given **in** an equivalent form as follows .(iv) For f E ~+, the function of sets def**in**ed on .T byis a measure .A--* E(A ;f)PROOF . We need only prove that if A = Un°_ 1An where the A n 's aredisjo**in**t sets **in** .-T, then00E(A ;f)= E(An ; .f)n=1~For a basic f, this follows from properties (iii) and (iv) . The extension tocan be done by the double limit theorem and is left as an exercise .L'~+There are three fundamental theorems relat**in**g the convergence of functionswith the convergence of their **in**tegrals . We beg**in** with Beppo Levi'stheorem on Monotone Convergence (1906), which is the extension of Corollaryto Theorem 8 to :'i+ .

4 INTEGRAL 1 401Theorem 9 . Let If, } be an **in**creas**in**g sequence of functions **in** ~T- + withlimit f : f n T f . Then we haveThen we havelim T E(f n) = E(f) < + oo .nPROOF . We have f E ~+ ; hence by Def**in**ition 8(b), (41) holds . For eachf,, we have, us**in**g analogous notation :(42) lim f E(fnm) ) = E(f n ) .MS**in**ce f n T f , the numbers )2m f n (a))] T)2' f (w)] as n t oo, ow**in**g to theleft cont**in**uity of x -p )x] . Hence by Corollary to Theorem 8,(43) urn f E(fnm) ) = E(f (m) ) .It follows thatlmm T 1nm T E(f nm) ) = lmm f E(f (m)) = E(f ) .On the other hand, it follows from (42) thatlim f lim f E(f nm)) = lim t E(f n ) .n m nTherefore the theorem is proved by the double limit lemma .From Theorem 9 we derive Lebesgue's theorem **in** its prist**in**e positiveguise .Theorem 10 . Let f n E F+ , n E N . Suppose(a) lim n f n = 0 ;(b) E(SUp n f n ) < 00 .(44)PROOF .Put for n E N :lim E (f n ) = 0 -n(45)gn = Sup f kk>nThen gn E ~T+ , and as n t oc, g n ,. lim sup„ f n = 0 by (a) ; and g i = sup,, f „so that E(gl) < oc by (b) .Now consider the sequence {g l - g n }, n E N . This is **in**creas**in**g with limitg i . Hence by Theorem 9, we havelim T E(gi - gn) = E(gl ) .n

402 1 SUPPLEMENT : MEASURE AND INTEGRALBy property (iii) for F+ ,E(g1 - gn) + E(9n) = E(gi ) .Substitut**in**g **in**to the preced**in**g relation and cancell**in**g the f**in**ite E($1), weobta**in** E(g n ) J, 0 . S**in**ce 0 < f n < g n , so that 0 < E(fn) < E(gn ) by property(ii) for F+, (44) follows .The next result is known as Fatou's lemma, of the same v**in**tage 1906as Beppo Levi's . It has the virtue of "no assumptions" with the consequentone-sided conclusion, which is however often useful .Theorem 11 . Let If,) be an arbitrary sequence of functions **in** f + . Thenwe have(46) E(lim **in**f f n ) < lim **in**fE(f n ) .nnPROOF . Put for n E N :gn = **in**f f k,k>nthenlim **in**f f n = lim T gn .nnHence by Theorem 9,(47) E (lim **in**f fn ) = lim f E (gn ) .nnS**in**ce g, < f n, we have E(g„) < E(f n ) andlim **in**f E(9n) < lim **in**fE(f n ) .nnThe left member above is **in** truth the right member of (47) ; therefore (46)follows as a milder but neater conclusion .We have derived Theorem 11 from Theorem 9 . Conversely, it is easy togo the other way . For if f „ f f , then (46) yields E(f) < lim n T E(f n ) . S**in**cef > f,,, E(f) > lim„ T E(f n ) ; hence there is equality .We can also derive Theorem 10 directly from Theorem 11 . Us**in**g thenotation **in** (45), we have 0 < gi - f n < gi . Hence by condition (a) and (46),that yields (44) .E(g1) = E(lim**in**f(gi - fn)) < lim**in**f(E(gl) - E(fn))nn= E(g1) - lim sup E(f)n

4 INTEGRAL I 403**in**The three theorems 9, 10, and 11 are **in**timately woven .We proceed to the f**in**al stage of the **in**tegral . For any f E 3~`with range[-oo, +ool, putf+ __ f on {f >- 0}, f- _ f on {f < 0},0 on If < 0} ; 0 on If > 0} .Then f+ E 9+ , f - E~+ ,andBy Def**in**ition 8(b) and property (iii) :f = f+ - f - ; If I = f+ + f-(48) E(I f I) = E(f +) + E(f ) .DEFINITION 8(c) . For f E 3~7 , its **in**tegral is def**in**ed to be(49) E(f) = E(f +) - E(f ),provided the right side above is def**in**ed, namely not oc - oc . We say f is**in**tegrable, or f E L', iff both E (f +) and E (f -) are f**in**ite ; **in** this case E (f )is a f**in**ite number . When E(f) exists but f is not **in**tegrable, then it must beequal to +oo or -oo, by (49) .A set A **in** 0 is called a null set iff A E .' and it (A) = 0 . A mathematicalproposition is said to hold almost everywhere, or a .e . iff there is a null set Asuch that it holds outside A, namely **in** A` .A number of important observations are collected below .Theorem 12 . (i) The function f **in** ~ is **in**tegrable if and only if I f I is**in**tegrable ; we have(50) IE(f )I < E(If I) .(ii) For any f E 3 and any null set A, we have(51) E(A ;f)= f fdlc=0; E(f)=E(A` ;f)=fC fdlL .JJAfACIf f E L 1 , then the set {N E S2 : I f (w)I = oo} is a null set .(iv) If f E L', g E T, and IgI < If I a .e ., then g E L' .(v) If f E OT, g E .T, and g = f a .e ., then E(g) exists if and only if E(f )exists, and then E(g) = E(f ) .(vi) If µ(Q) < oc, then any a .e. bounded _37-measurable function is **in**tegrable.PROOF . (i) is trivial from (48) and (49) ; (ii) follows from1A If I< l A .oc

404 1 SUPPLEMENT : MEASURE AND INTEGRALso that0 < E(lAIf I) < E(lA .oo) = µ(A) .00 = 0 .This implies (51) .To prove (iii), letA(n) = {If I >_ n} .Then A (n)E ?~` andnµ(A(n)) = E(A(n) ; n) < E(A(n) ; IfI) < E(I f 1) .Hence(52) l-t(A(n)) < 1 E(I f 1) .nLett**in**g n T oc, so that A(n) 4 {I f I = oo} ; s**in**ce µ(A(1)) < E(I f I) < oo, wehave by property (f) of the measure µ :,a({If I = oo}) = Jim ~ µ(A(n)) = 0 .nTo prove (iv), let IgI < If I on Ac, where µ(A) = 0 . Thenand consequently191

4 INTEGRAL 1 405To prove this we may suppose, without loss of generality, that f E i+ . Wehave thenHenceE(Bk ; f) = E(Bk n A(n ) ; f) +- E(Bk n A(n )` ; f )< E(A(n ) ; f) + E(Bk )n .lim sup E(Bk ; P < E(A(n ) ; f )kand the result follows by lett**in**g n -+ oo and us**in**g (53) .It is convenient to def**in**e, for any f **in** ~T, a class of functions denotedby C(f ), as follows : g E C(f) iff g = f a .e. When (Q, ?T, it) is a completemeasure space, such a g is automatically **in** 4T . To see this, let B = {g 0 f } .Our def**in**ition of "a .e ." means only that B is a subset of a null set A ; **in** pla**in**English this does not say whether g is equal to f or not anywhere **in** A - B .However if the measure is complete, then any subset of a null set is also anull set, so that not only the set B but all its subsets are null, hence **in** ~T, .Hence for any real number c,{g-

406 1 SUPPLEMENT : MEASURE AND INTEGRALas follows :f++g- < g + + f -Apply**in**g properties (ii) and (iii) for +, we obta**in**E(f +) + E(g ) < E(g + ) + E(f ) .By the assumptions of L', all the four quantities above are f**in**ite numbers .Transpos**in**g back we obta**in** the desired conclusion .(iii) if f E L', g E L', then f + g E L 1 , andE(f + g) = E(f) + E(g) .Let us leave this as an exercise . If we assume only that both E(f) andE(g) exist and that the right member **in** the equation above is def**in**ed, namelynot (+oe) + (-oc) or (-oo) + (+oc), does E(f + g) then exist and equalto the sum? We leave this as a good exercise for the curious, and return toTheorem 10 **in** its practical form .Theorem 10 ° . Let f , E 7 ; suppose(a) lim„ f, = f a .e . ;(b) there exists ~O E L 1 such that for all n :If,, I< cp a .e .Then we have(c)lim„E(If„- f1)=0 .PROOF.observe first thatI lim f „ I < sup If n 1 ;n , 1Ifn-fI < Ifn1+IfI

5 APPLICATIONS 1 407Curiously, the best known partfixed B .of the theorem is the corollary below with aCorollary .We haveli nm fJaadµ=f f duauniformly **in** B E ~-T .This is trivial from (c), because, **in** alternative notation :IE(B ; f n) - E(B ; f)I < E(B ;Ifn - fl)

408 I SUPPLEMENT : MEASURE AND INTEGRALSuppose the **in**f**in**ite series >kuk(x) converges I ; then **in** the usual notation :limnk=1U(x) =xk=1uk(x) = s(x)exists and is f**in**ite . Now suppose each ukkby property (iii) of the **in**tegral ; andis Lebesgue-**in**tegrable, then so is each s,,,Is„ (x) dx =k=1Iuk (x) dx .Question : does the numerical series above converge? and if so is the sum of **in**tegralsequal to the **in**tegral of the sum :E fUk(X)dX = u k (x) dx s(x) dx?= Jk=1 / k=1 /This is the problem of **in**tegration term by term .A very special but important case is when the **in**terval I = [a, b] is compactand the functions Uk are all cont**in**uous **in** I . If we assume that the series Ek_ 1 u k (x)converges uniformly **in** I, then it follows from elementary analysis that the sequenceof partial sums {s„ (x)} is totally bounded, that is,sup sup IS, (x) I = sup sup IS, (x) I < 00 .n x x nS**in**ce m(I) < oc, the bounded convergence theorem applies to yieldlim/ s„ (x) dx = / lim s,, (x) dx.n / / "The Taylor series of an analytic function always converges uniformly and absolutely**in** any compact sub**in**terval of its **in**terval of convergence . Thus the result aboveis fruitful .Another example of term-by-term **in**tegration goes back to Theorem 8 .Example 2 . Let u k > 0, u k E L', then(54)a~b 00bk=1 k= ~ 1 fE uk uk dlt .Let f „ = EL 1 uk, then f „ E L', f ,, T f u k . Hence by monotone convergenceE(f) = limE(f„)1rthat is (54) .

5 APPLICATIONS 1 409When Uk is general the preced**in**g result may be applied to Iuk I to obta**in**CfIukId/L .k=1 l uk l / dp k=1If this is f**in**ite, then the same is true when Iuk I is replaced by uA and u- . It thenfollows by subtraction that (54) is also true . This result of term-by-term **in**tegrationmay be regarded as a special case of the Fub**in**i-Tonelli theorem (pp . 63-64), whereone of the measures is the count**in**g measure on N .For another perspective, we will apply the Borel-Lebesgue theory of **in**tegral tothe older Riemann context .Example 3 . Let (I, A*, m) be as **in** the preced**in**g example, but let I = [a, b] becompact . Let f be a cont**in**uous function on I . Denote by P a partition of I as follows :a=xo

41 0 1 SUPPLEMENT: MEASURE AND INTEGRALThe f**in**ite existence of the limit above signifies the Riemann-**in**tegrability of f, and thelimit is then its Rienwnn-**in**tegral f b f (x) dx . Thus we have proved that a cont**in**uousfunction on a compact **in**terval is Riemann-**in**tegrable, and its Riemann-**in**tegral is equalto the Lebesgue **in**tegral . Let us recall that **in** the new theory, any bounded measurablefunction is **in**tegrable over any bounded measurable set . For example, the function1s**in** -, x E (0, 1]Xbe**in**g bounded by 1 is **in**tegrable . But from the strict Riemannian po**in**t of view ithas only an "improper" **in**tegral because (0, 1] is not closed and the function is notcont**in**uous on [0, 1], **in**deed it is not def**in**able there . Yet the limitlEmE1s**in** 1 dxXexists and can be def**in**ed to be f s**in** (1/x) dx, As a matter of fact, the Riemann sumsdo converge despite the unceas**in**g oscillation of f between 0 and 1 as x J, 0 .Example 4 . The Riemann **in**tegral of a function on (0, oo) is called an "**in**f**in**ite**in**tegral" and is def**in**able as follows :00nfo n-'°° of (x) dx = lim f (x) dxwhen the limit exists and is f**in**ite . A famous example iss**in**x(56) f (x) = , x E (0, oo) .XThis function is bounded by 1 and is cont**in**uous . It can be extended to [0, oo) bydef**in****in**g f (0) = 1 by cont**in**uity . A cute calculation (see §6 .2 of ma**in** text) yields theresult (useful **in** Optics) :n fn"s**in** x nlim x dx = 2*oBy contrast, the function I f I is not Lebesgue-**in**tegrable . To show this, we usetrigonometry :s**in**x > 1 13n-=x ) - / (2n + 1)ir = C n for X E (2n7r + 4 , 2nJr + 4 I n .Thus for x > 0 :00xn=1C n l,n (x) .The right member above is a basic function, with its **in**tegral :Y: Cnm(I n )11n(2n + 1)2 =

log ~ _ .d~ f~ - dx = log ,5 APPLICATIONS 1 4 1 1It follows that E(f +) _ + oo . Similarly E(f -) = +oo ; therefore by Def**in**ition 8(c)E(f) does not exist! This example is a splendid illustration of the follow**in**gNon-Theorem .Let f E-,0 and f n = f 1 (o,,, ), n E N . Then f, E-10 and f, --*f as n-+ 00 .Even when the f n s are "totally bounded", it does not follow that(57) lim E(f n ) = E(f ) ;n**in**deed E(f) may not exist .On the other hand, if we assume, **in** addition, either (a) f > 0 ; or (b) E(f)exists, **in** particular f E L' ; then the limit relation will hold, by Theorems 9 and 10,respectively . The next example falls **in** both categories .Example 5. The square of the function f **in** (56) :s**in**x 2f(x)2= ,( xXERis **in**tegrable **in** the Lebesgue sense, and is also improperly **in**tegrable **in** the Riemannsense .We have2.f (x) < 1(-1,+I)11~-~,-i)ut+i,+~)x2and the function on the right side is **in**tegrable, hence so is f 2 .Incredibly, we have°° smx 2 7t °° s**in**xdx = - = (RI) dx,o C- x ! 2 o xwhere we have **in**serted an "RI" to warn aga**in**st tak**in**g the second **in**tegral as aLebesgue **in**tegral . See §6 .2 for the calculation . So far as I know, nobody has expla**in**edthe equality of these two **in**tegrals .Example 6 . The most notorious example of a simple function that is not Riemann**in**tegrableand that baffled a generation of mathematicians is the function 1 Q, where Qis the set of rational numbers . Its Riemann sums can be made to equal any real numberbetween 0 and 1, when we conf**in**e Q to the unit **in**terval (0, 1) . The function is sototally discont**in**uous that the Riemannian way of approximat**in**g it, horizontally so tospeak, fails utterly . But of course it is ludicrous even to consider this **in**dicator functionrather than the set Q itself. There was a historical reason for this folly : **in**tegration wasregarded as the **in**verse operation to differentiation, so that to **in**tegrate was meant to"f**in**d the primitive" whose derivative is to be the **in**tegrand, for example,~2 d ~2xdx=I2 , d2 -~ ;

4 12 1 SUPPLEMENT : MEASURE AND INTEGRALA primitive is called "**in**def**in**ite **in**tegral", and f2 (1/x)dx e .g . is called a "def**in**ite**in**tegral ." Thus the unsolvable problem was to f**in**d fo'~ I Q (x) dx, 0 < ~ < 1 .The notion of measure as length, area, and volume is much more ancient thanNewton's fluxion (derivative), not to mention the primitive measure of count**in**g withf**in**gers (and toes) . The notion of "countable additivity" of a measure, although seem**in**glynatural and facile, somehow did not take hold until Borel saw thatm(Q) m(q) = =0 .qEQThere can be no question that the "length" of a s**in**gle po**in**t q is zero . Euclid gave it"zero dimension" .This is the beg**in**n**in**g of MEASURE. An INTEGRAL is a weighted measure,as is obvious from Def**in**ition 8(a) . The rest is approximation, vertically as **in** Def**in**ition8(b), and convergence, as **in** all analysis .As for the connexion with differentiation, Lebesgue made it, and a clue is given**in** §1 .3 of the ma**in** text .qEQ

General bibliography[The five divisions below are merely a rough classification and are not meant to bemutually exclusive .]1 . Basic analysis[1] G . H . Hardy, A course of pure mathematics, 10th ed. Cambridge UniversityPress, New York, 1952 .[2] W . Rud**in**, Pr**in**ciples of mathematical analysis, 2nd ed . McGraw-Hill BookCompany, New York, 1964 .[3] I . P . Natanson, **Theory** of functions of a real variable (translated from theRussian) . Frederick Ungar Publish**in**g Co ., New York, 1955 .2 . Measure and **in**tegration[4] P . R . Halmos, Measure theory . D . Van Nostrand Co ., Inc ., Pr**in**ceton, N .J .,1956 .[5] H . L . Royden, Real analysis . The Macmillan Company, New York, 1963 .[6] J . Neveu, Mathematical foundations of the calculus of probability . Holden-Day, Inc ., San Francisco, 1965 .3 . **Probability** theory[7] Paul Levy, Calcul des probabilites . Gauthier-Villars, Paris, 1925 .[8] A . Kolmogoroff, Grundbegriffe der Wahrsche**in**lichkeitsrechnung . Spr**in**ger-Verlag, Berl**in**, 1933 .

414 ( GENERAL BIBIOGRAPHY[9] M . Frechet, Generalizes sur les probabilites . Variables aleatoires. Gauthier-Villars, Paris, 1937 .[10] H . Cramer, Random variables and probability distributions, 3rd ed. CambridgeUniversity Press, New York, 1970 [1st ed ., 1937] .[11] Paul Levy, Theorie de l'addition des variables aleatoires, 2nd ed . Gauthier-Villars, Paris, 1954 [1st ed ., 1937] .[12] B . V . Gnedenko and A . N . Kolmogorov, Limit distributions for sums of **in**dependentrandom variables (translated from the Russian) . Addison-Wesley Publish**in**gCo ., Inc ., Read**in**g, Mass ., 1954 .[13] William Feller, An **in**troduction to probability theory and its applications,vol . 1 (3rd ed .) and vol . 2 (2nd ed .) . John Wiley & Sons, Inc ., New York, 1968 and1971 [1 st ed . of vol . 1, 1950] .[14] Michel Loeve, **Probability** theory, 3rd ed . D . Van Nostrand Co ., Inc ., Pr**in**ceton,N .J ., 1963 .[15] A . Renyi, **Probability** theory . North-Holland Publish**in**g Co ., Amsterdam,1970 .[16] Leo Breiman, **Probability** . Addison-Wesley Publish**in**g Co ., Read**in**g, Mass .,1968 .4 . Stochastic processes[17] J . L . Doob, Stochastic processes . John Wiley & Sons, Inc ., New York, 1953 .[18] Kai Lai Chung, Markov cha**in**s with stationary transition probabilities, 2nded . Spr**in**ger-Verlag, Berl**in**, 1967 [1st ed ., 1960] .[19] Frank Spitzer, Pr**in**ciples of random walk . D. Van Nostrand Co ., Pr**in**ceton,N .J ., 1964 .[20] Paul-Andre Meyer, Probabilites et potential . Herman (Editions Scientifiques),Paris, 1966 .[21] G . A . Hunt, Mart**in**gales et processus de Markov . Dunod, Paris, 1966 .[22] David Freedman, Brownian motion and diffusion . Holden-Day, Inc ., SanFrancisco, 1971 .5 . Supplementary read**in**g[23] Mark Kac, Statistical **in**dependence **in** probability, analysis and numbertheory, Carus Mathematical Monograph 12 . John Wiley & Sons, Inc ., New York,1959 .[24] A . Renyi, Foundations of probability . Holden-Day, Inc ., San Francisco,1970 .[25] Kai Lai Chung, Elementary probability theory with stochastic processes .Spr**in**ger-Verlag, Berl**in**, 1974 .

IndexAbelian theorem, 292Absolutely cont**in**uous part of d .f., 12Adapted, 335Additivity of variance, 108Adjunction, 25a .e ., see Almost everywhereAlgebraic measure space, 28Almost everywhere, 30Almost **in**variant, permutable, remote, trivial,266Approximation lemmas, 91, 264Arcs**in** law, 305Arctan transformation, 75Asymptotic density, 24Asymptotic expansion, 236Atom, 31, 32Augmentation, 32Axiom of cont**in**uity, 22moo, /3, ', 389Ballot problem, 232Basic function, 395Bayes' rule, 320Belong**in**g (of function to B .F .), 263Beppo Levi's theorem, 401ABBernste**in**'s theorem, 200Berry-Esseen theorem, 235B .F ., see Borel fieldBi-**in**f**in**ite sequence, 270B**in**ary expansion, 60Bochner-Herglotz theorem, 187Boole's **in**equality, 21Borel-Cantelli lemma, 80Borel field, 18generated, 18Borel field on R, 388, 394Borel-Lebesgue measure, 24, 389Borel-Lebesgue-Stiltjes measure, 394Borel measurable function, 37Borel's Cover**in**g Theorem, 392Borel set, 24, 38, 394Borel's Lemma, 390Borel's theorem on normal numbers, 109Boundary value problem, 344Bounded convergence theorem, 44Branch**in**g process, 371Brownian motion, 129, 227CK, Co, CB, C, 91CB , Cu, 157C

4 1 6 1 INDEXCanonical representation of **in**f**in**itelydivisible laws, 258Cantor distribution, 13-15, 129, 174Caratheodory criterion, 28, 389, 394Carleman's condition, 103Cauchy criterion, 70Cauchy distribution, 156Cauchy-Schwartz **in**equality, 50, 317Central limit theorem, 205classical form, 177for dependent r .v .'s, 224for random number of terms, 226general form, 211Liapounov form, 208L**in**deberg-Feller form, 214L**in**deberg's method, 211with rema**in**der, 235Chapman-Kolmogorov equations, 333Characteristic function, 150list of, 155-156multidimensional, 197Chebyshev's **in**equality, 50strengthened, 404ch .f., see Characteristic functionCo**in**-toss**in**g, 111Complete probability space, 30Completely monotonic, 200Concentration function, 160Conditional density, 321Conditional distribution, 321Conditional expectation, 313as **in**tegration, 316as mart**in**gale limit, 369Conditional **in**dependence, 322Conditional probability, 310, 313Cont**in**uity **in**terval, 85, 196Convergence a .e ., 68**in** dist., 96**in** LP, 71**in** pr ., 70weak **in** L 1 , 73Convergence theorem :for ch .f., 169for Laplace transform, 200Convolution, 152, 157Coord**in**ate function, 58, 264Countable, 4Dde F**in**etti's theorem, 367De Moivre's formula, 220Density of d .f., 12d .f., see Distribution functionDifferentiation :of ch .f., 175of d .f., 163Diophant**in**e approximation, 285Discrete random variable, 39Dist**in**guished logarithm, nth root, 255Distribution function, 7cont**in**uous, 9degenerate, 7discrete, 9n-dimensional, 53of p.m., 30Dom**in**ated convergence theorem, 44Dom**in**ated ergodic theorem, 362Dom**in**ated r.v ., 71Doob's mart**in**gale convergence, 351Doob's sampl**in**g theorem, 342Double array, 205Double limit lemma, 397Downcross**in**g **in**equality, 350Egorov's theorem, 79Empiric distribution, 138Equi-cont**in**uity of ch .f.'s, 169Equivalent sequences, 112Event, 54Exchangeable, event, 366Existence theorem for **in**dependent r .v .'s, 60Expectation, 41Extension theorem, see Kolmogorov'sextension theorem*, 380,T* -measurability, 378approximation of -IF* -measurable set, 382Fatou's lemma, 45, 402Feller's dichotomy, 134Field, 17F**in**ite-product set, 61First entrance time, 273Fourier series, 183Fourier-Stieltjes transform, 151of a complex variable, 203Fub**in**i's theorem, 63Functionals of S,,, 228Function space, 265EF

INDEX 1 417GGambler's ru**in**, 343Gambl**in**g system, 273, 278, see also Optionalsampl**in**gGamma distribution, 158Generalized Poisson distribution, 257Glivenko-Cantelli theorem, 140Harmonic analysis, 164Harmonic equation, 344Harmonic function, 361Helley's extraction pr**in**ciple, 88Holder's **in**equality, 50Holospoudicity, 207Homogeneous Markov cha**in**, process,333Identically distributed, 37Independence, 53characterizations, 197, 322Independent process, 267Indicator, 40In dist., see In distributionIn distribution, 96Induced measure, 36Inf**in**itely divisible, 250Inf**in**itely often, 77, 360Inf**in**ite product space, 61Information theory, 148, 414Inner measure, 28In pr ., see In probabilityIn probability, 70Integrable random variable, 43Integrable function, 403Integral, 396, 400, 403additivity, 406convergence theorems, 401, 406-408Integrated ch .f., 172Integration term by term, 44, 408Invariance pr**in**ciple, 228Invariant, 266Inverse mapp**in**g, 35Inversion formula for ch .f., 161, 196Inversions, 220i .o ., see Inf**in**itely oftenHIJensen's **in**equality, 50, 317Jump, 3Jump**in**g part of d.f., 7Kolmogorov extension theorem, 64, 265Kolmogorov's **in**equalites, 121, 125Kronecker's lemma, 129Laplace transform, 196Large deviations, 214, 241Lattice d.f., 183Law of iterated logarithm, 242, 248Law of large numbersconverse, 138for pairwise **in**dependent r.v .'s, 134for uncorrelated r.v .'s, 108identically distributed case, 133, 295, 365strong, 109, 129weak, 114, 177Law of small numbers, 181Lebesgue measurable, 394Lebesgue's dom**in**ated and boundedconvergence, 401, 406Levi's monotone convergence, 401Levy-Cramer theorem, 169Levy distance, 98Levy's formula, see Canonical representationLevy's theorem on series of **in**dependentr.v .'s, 126strengthened, 363Liapounov's central limit theorem, 208Liapounov's **in**equality, 50Lim**in**f, Limsup, 75L**in**deberg's method, 211L**in**deberg-Feller theorem, 214Lower semicont**in**uous function, 95Marc**in**kiewicz-Zygmund **in**equality, 362Markov cha**in**, homogeneous, 333Markov process, 326homogeneous, 330or rth order, 333Markov property, 326strong, 327JKLM

4 1 8 1 INDEXMart**in**gale, 335, see also Submart**in**gale,Supermart**in**galedifference sequence, 337Krickeberg's decomposition, 358of conditional expectations, 338stopped, 341Maximum of S,,, 228, 299Maximum of submart**in**gale, 346, 362M .C ., see Monotone classm-dependent, 224Mean, 49Measurable with respect to 3 , 35MeasureCompletion, 384Extension, 377, 380a-f**in**ite, 381Transf**in**ite construction, 386Uniqueness, 381Measure space, 381discrete, 386Median, 117Method of moments, 103, 178Metricfor ch .f.'s, 172for convergence **in** pr ., 71for d .f.'s, 98for measure space, 47for p.m .'s, 172M**in**imal B .F ., 18M**in**kowski's **in**equality, 50Moment, 49Moment problem, 103Monotone class theorems, 19Monotone convergence theorem, 44, 401Monotone property of measure, 21Mult**in**omial distribution, 181n-dimensional distribution, 53Negligibility, 206Non-measurable set, 395Normal d .f., 104characterization of, 180, 182convergence to, see Central limit theoremkernel, 157positive, 104, 232Normal number, 109, 112Norm**in**g, 206Null set, 30, 383Number of positive terms, 302Number of zeros, 234NOptional r.v ., 259, 338Optional sampl**in**g, 338, 346Order statistics, 147Orthogonal, 107Outcome, 57Outer measure, 28, 375, 377-378Pairwise **in**dependence, 53Partition, 40Permutable, 266p.m ., see **Probability** measurePoisson d .f., 194Poisson process, 142P61ya criterion, 191P61ya distribution, 158Positive def**in**ite, 187Positive normal, see NormalPost-a, 259Potential, 359, 361Pre-a, 259**Probability** distribution, 36on circle, 107**Probability** measure, 20of d.f., 30**Probability** space, triple, 23Product B .F . measure, space, 58Projection, 64Queu**in**g theory, 307Radon-Nikodym theorem, 313, 369Random variable, 34, see also Discrete,**in**tegrable, simple, symmetric randomvariableRandom vector, 37Random walk, 271recurrent, 284Recurrence pr**in**ciple, 360Recurrent value, 278Reflection pr**in**ciple, 232Rema**in**der term, 235Remote event, field, 264Renewal theory, 142Riemann **in**tegral, 4090PQR

INDEX 1 419improper, **in**f**in**ite, 410primitive, 411Riemann-Lebesgue lemma, 167Riemann-Stieltjes **in**tegral, 198Riemann zeta function, 259Riesz decomposition, 359Right cont**in**uity, 5r .v ., see Random variableSample function, 141Sample space, 23discrete, 23s .d .f., see Subdistribution functionSecond-order stochastic process, 191Sequentially vaguely compact, 88Set of convergence, 79Shift, 265optional, 273Simple random variable, 40S**in**gleton, 16S**in**gular d .f., 12measure, 31S**in**gular part of d.f., 12Smart**in**gale, 335closeable, 359Span, 184Spectral distribution, 191St . Petersburg paradox, 120, 373Spitzer's identity, 299Squared variation, 367Stable distribution, 193Standard deviation, 49Stationary **in**dependent process, 267Stationary transition probabilities, 331Step function, 92Stochastic process, with discrete parameter,263Stone -Weierstrass theorem, 92, 198Stopped (super) mart**in**gale, 340-341Stopp**in**g time, see Optional sampl**in**gSubdistribution function, 88Submart**in**gale, 335convergence theorems for, 350convex transformation of, 336Doob decomposition, 337**in**equalities, 346Subprobability measure, 85Subsequences, method of, 109Superharmonic function, 361SSupermart**in**gale, 335optional sampl**in**g, 340Support of d.£, 10Support of p.m ., 32Symmetric difference (o), 16Symmetric random variable, 165Symmetric stable law, 193, 250Symmetrization, 155System theorem, see Gambl**in**g system,Submart**in**galeTauberian theorem, 292Taylor's series, 177Three series theorem, 125Tight, 95Topological measure space, 28Trace, 23Transition probability function, 331Trial, 57Truncation, 115Types of d .f., 184Uncorrelated, 107Uniform distribution, 30Uniform **in**tegrability, 99Uniqueness theoremfor ch .f., 160, 168for Laplace transform, 199, 201for measure, 30Upcross**in**g **in**equality, 348TUVVague convergence, 85, 92general criterion, 96Variance, 49WWald's equation, 144, 345Weak convergence, 73Weierstrass theorem, 145Wiener-Hopf technique, 291Zero-or-one lawHewitt-Savage, 268Kolmogorov, 267L6vy, 357Z