Course in Probability Theory

math.uoc.gr

Course in Probability Theory

ACOURSE INPROBABILITYTHEORYTHIRD EDITION


ACOURSE INPROBABILITYTHEORYTHIRD EDITIONKai Lai ChungStanford UniversityACADEMIC PRESSA l-iorcc_lurt Science and Technology CompanySan Diego San Francisco New YorkBoston London Sydney Tokyo


This book is printed on acid-free paper .COPYRIGHT © 2001, 1974, 1968 BY ACADEMIC PRESSALL RIGHTS RESERVED.NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANYMEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATIONSTORAGE AND RETRIEVAL SYSTEM . WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER .Requests for permission to make copies of any part of the work should be mailed to thefollowing address : Permissions Department, Harcourt, Inc ., 6277 Sea Harbor Drive, Orlando,Florida 32887-6777 .ACADEMIC PRESSA Harcourt Science and Technology Company525 B Street, Suite 1900, San Diego, CA 92101-4495, USAh ttp ://www .academicpress .comACADEMIC PRESSHarcourt Place, 32 Jamestown Road, London, NW 17BY, UKhttp ://www .academicpress .comLibrary of Congress Cataloging in Publication Data : 00-106712International Standard Book Number : 0-12-174151-6PRINTED IN THE UNITED STATES OF AMERICA00 01 02 03 IP 9 8 7 6 5 4 3 2 1


ContentsPreface to the third editionPreface to the second editionPreface to the first editionixxixiii1 J Distribution function1 .1 Monotone functions 11 .2 Distribution functions 71 .3 Absolutely continuous and singular distributions 112 I Measure theory2 .1 Classes of sets 162.2 Probability measures and their distributionfunctions 213 1 Random variable . Expectation . Independence3.1 General definitions 343.2 Properties of mathematical expectation 413.3 Independence 534 Convergence concepts4.1 Various modes of convergence 684.2 Almost sure convergence ; Borel-Cantelli lemma 75


vi I CONTENTS4 .3 Vague convergence 844 .4 Continuation 914 .5 Uniform integrability ; convergence of moments 995 1 Law of large numbers . Random series5.1 Simple limit theorems 1065.2 Weak law of large numbers 1125.3 Convergence of series 1215.4 Strong law of large numbers 1295.5 Applications 138Bibliographical Note 1486 1 Characteristic function6.1 General properties ; convolutions 1506.2 Uniqueness and inversion 1606.3 Convergence theorems 1696.4 Simple applications 1756.5 Representation theorems 1876.6 Multidimensional case ; Laplace transforms 196Bibliographical Note 2047 1 Central limit theorem and its ramifications7.1 Liapounov's theorem 2057.2 Lindeberg-Feller theorem 2147 .3 Ramifications of the central limit theorem 2247 .4 Error estimation 2357 .5 Law of the iterated logarithm 2427 .6 Infinite divisibility 250Bibliographical Note 2618 1 Random walk8 .1 Zero-or-one laws 2638 .2 Basic notions 2708 .3 Recurrence 2788 .4 Fine structure 2888 .5 Continuation 298Bibliographical Note 308


CONTENTS I vii9 1 Conditioning . Markov property. Martingale9 .1 Basic properties of conditional expectation 3109.2 Conditional independence ; Markov property 3229 .3 Basic properties of smartingales 3349.4 Inequalities and convergence 3469.5 Applications 360Bibliographical Note 373Supplement : Measure and Integral1 Construction of measure 3752 Characterization of extensions 3803 Measures in R 3874 Integral 3955 Applications 407General Bibliography 413Index 415


Preface to the third editionIn this new edition, I have added a Supplement on Measure and Integral .The subject matter is first treated in a general setting pertinent to an abstractmeasure space, and then specified in the classic Borel-Lebesgue case for thereal line . The latter material, an essential part of real analysis, is presupposedin the original edition published in 1968 and revised in the second editionof 1974 . When I taught the course under the title "Advanced Probability"at Stanford University beginning in 1962, students from the departments ofstatistics, operations research (formerly industrial engineering), electrical engineering,etc . often had to take a prerequisite course given by other instructorsbefore they enlisted in my course . In later years I prepared a set of notes,lithographed and distributed in the class, to meet the need . This forms thebasis of the present Supplement . It is hoped that the result may as well servein an introductory mode, perhaps also independently for a short course in thestated topics .The presentation is largely self-contained with only a few particular referencesto the main text . For instance, after (the old) §2 .1 where the basic notionsof set theory are explained, the reader can proceed to the first two sections ofthe Supplement for a full treatment of the construction and completion of ageneral measure ; the next two sections contain a full treatment of the mathematicalexpectation as an integral, of which the properties are recapitulated in§3 .2 . In the final section, application of the new integral to the older Riemannintegral in calculus is described and illustrated with some famous examples .Throughout the exposition, a few side remarks, pedagogic, historical, even


x IPREFACE TO THE THIRD EDITIONjudgmental, of the kind I used to drop in the classroom, are approximatelyreproduced .In drafting the Supplement, I consulted Patrick Fitzsimmons on severaloccasions for support . Giorgio Letta and Bernard Bru gave me encouragementfor the uncommon approach to Borel's lemma in §3, for which the usual proofalways left me disconsolate as being too devious for the novice's appreciation .A small number of additional remarks and exercises have been added tothe main text .Warm thanks are due : to Vanessa Gerhard of Academic Press who decipheredmy handwritten manuscript with great ease and care ; to Isolde Fieldof the Mathematics Department for unfailing assistence ; to Jim Luce for amission accomplished . Last and evidently not least, my wife and my daughterCorinna performed numerous tasks indispensable to the undertaking of thispublication .


Preface to the second editionThis edition contains a good number of additions scattered throughout thebook as well as numerous voluntary and involuntary changes . The reader whois familiar with the first edition will have the joy (or chagrin) of spotting newentries . Several sections in Chapters 4 and 9 have been rewritten to make thematerial more adaptable to application in stochastic processes . Let me reiteratethat this book was designed as a basic study course prior to various possiblespecializations . There is enough material in it to cover an academic year inclass instruction, if the contents are taken seriously, including the exercises .On the other hand, the ordering of the topics may be varied considerably tosuit individual tastes. For instance, Chapters 6 and 7 dealing with limitingdistributions can be easily made to precede Chapter 5 which treats almostsure convergence . A specific recommendation is to take up Chapter 9, whereconditioning makes a belated appearance, before much of Chapter 5 or evenChapter 4 . This would be more in the modern spirit of an early weaning fromthe independence concept, and could be followed by an excursion into theMarkovian territory .Thanks are due to many readers who have told me about errors, obscurities,and inanities in the first edition . An incomplete record includes the namesbelow (with apology for forgotten ones) : Geoff Eagleson, Z . Govindarajulu,David Heath, Bruce Henry, Donald Iglehart, Anatole Joffe, Joseph Marker,P. Masani, Warwick Millar, Richard Olshen, S . M. Samuels, David Siegmund .T. Thedeen, A . Gonzalez Villa lobos, Michel Weil, and Ward Whitt . Therevised manuscript was checked in large measure by Ditlev Monrad . The


xii IPREFACE TO THE SECOND EDITIONgalley proofs were read by David Kreps and myself independently, and it wasfun to compare scores and see who missed what . But since not all parts ofthe old text have undergone the same scrutiny, readers of the new editionare cordially invited to continue the fault-finding . Martha Kirtley and JoanShepard typed portions of the new material . Gail Lemmond took charge ofthe final page-by-page revamping and it was through her loving care that therevision was completed on schedule .In the third printing a number of misprints and mistakes, mostly minor, arecorrected. I am indebted to the following persons for some of these corrections :Roger Alexander, Steven Carchedi, Timothy Green, Joseph Horowitz, EdwardKorn, Pierre van Moerbeke, David Siegmund .In the fourth printing, an oversight in the proof of Theorem 6 .3 .1 iscorrected, a hint is added to Exercise 2 in Section 6 .4, and a simplificationmade in (VII) of Section 9 .5. A number of minor misprints are also corrected . Iam indebted to several readers, including Asmussen, Robert, Schatte, Whitleyand Yannaros, who wrote me about the text .


Preface to the first editionA mathematics course is not a stockpile of raw material nor a random selectionof vignettes . It should offer a sustained tour of the field being surveyed anda preferred approach to it . Such a course is bound to be somewhat subjectiveand tentative, neither stationary in time nor homogeneous in space . But itshould represent a considered effort on the part of the author to combine hisphilosophy, conviction, and experience as to how the subject may be learnedand taught . The field of probability is already so large and diversified thateven at the level of this introductory book there can be many different viewson orientation and development that affect the choice and arrangement of itscontent . The necessary decisions being hard and uncertain, one too often takesrefuge by pleading a matter of "taste ." But there is good taste and bad tastein mathematics just as in music, literature, or cuisine, and one who dabbles init must stand judged thereby .It might seem superfluous to emphasize the word "probability" in a bookdealing with the subject . Yet on the one hand, one used to hear such speciousutterance as "probability is just a chapter of measure theory" ; on the otherhand, many still use probability as a front for certain types of analysis such ascombinatorial, Fourier, functional, and whatnot . Now a properly constructedcourse in probability should indeed make substantial use of these and otherallied disciplines, and a strict line of demarcation need never be drawn . ButPROBABILITY is still distinct from its tools and its applications not only inthe final results achieved but also in the manner of proceeding . This is perhaps


xiv I PREFACE TO THE FIRST EDITIONbest seen in the advanced study of stochastic processes, but will already beabundantly clear from the contents of a general introduction such as this book .Although many notions of probability theory arise from concrete modelsin applied sciences, recalling such familiar objects as coins and dice, genesand particles, a basic mathematical text (as this pretends to be) can no longerindulge in diverse applications, just as nowadays a course in real variablescannot delve into the vibrations of strings or the conduction of heat . Incidentally,merely borrowing the jargon from another branch of science withouttreating its genuine problems does not aid in the understanding of concepts orthe mastery of techniques .A final disclaimer : this book is not the prelude to something else and doesnot lead down a strait and righteous path to any unique fundamental goal .Fortunately nothing in the theory deserves such single-minded devotion, asapparently happens in certain other fields of mathematics . Quite the contrary,a basic course in probability should offer a broad perspective of the open fieldand prepare the student for various further possibilities of study and research .To this aim he must acquire knowledge of ideas and practice in methods, anddwell with them long and deeply enough to reap the benefits .A brief description will now be given of the nine chapters, with somesuggestions for reading and instruction . Chapters 1 and 2 are preparatory . Asynopsis of the requisite "measure and integration" is given in Chapter 2,together with certain supplements essential to probability theory . Chapter 1 isreally a review of elementary real variables ; although it is somewhat expendable,a reader with adequate background should be able to cover it swiftly andconfidently-with something gained from the effort . For class instruction itmay be advisable to begin the course with Chapter 2 and fill in from Chapter 1as the occasions arise . Chapter 3 is the true introduction to the language andframework of probability theory, but I have restricted its content to what iscrucial and feasible at this stage, relegating certain important extensions, suchas shifting and conditioning, to Chapters 8 and 9 . This is done to avoid overloadingthe chapter with definitions and generalities that would be meaninglesswithout frequent application . Chapter 4 may be regarded as an assembly ofnotions and techniques of real function theory adapted to the usage of probability. Thus, Chapter 5 is the first place where the reader encounters bona fidetheorems in the field . The famous landmarks shown there serve also to introducethe ways and means peculiar to the subject . Chapter 6 develops some ofthe chief analytical weapons, namely Fourier and Laplace transforms, neededfor challenges old and new . Quick testing grounds are provided, but for majorbattlefields one must await Chapters 7 and 8 . Chapter 7 initiates what has beencalled the "central problem" of classical probability theory . Time has marchedon and the center of the stage has shifted, but this topic remains withoutdoubt a crowning achievement . In Chapters 8 and 9 two different aspects of


PREFACE TO THE FIRST EDITION I xv(discrete parameter) stochastic processes are presented in some depth . Therandom walks in Chapter 8 illustrate the way probability theory transformsother parts of mathematics . It does so by introducing the trajectories of aprocess, thereby turning what was static into a dynamic structure . The samerevolution is now going on in potential theory by the injection of the theory ofMarkov processes . In Chapter 9 we return to fundamentals and strike out inmajor new directions . While Markov processes can be barely introduced in thelimited space, martingales have become an indispensable tool for any seriousstudy of contemporary work and are discussed here at length . The fact thatthese topics are placed at the end rather than the beginning of the book, wherethey might very well be, testifies to my belief that the student of mathematicsis better advised to learn something old before plunging into the new .A short course may be built around Chapters 2, 3, 4, selections fromChapters 5, 6, and the first one or two sections of Chapter 9 . For a richer fare,substantial portions of the last three chapters should be given without skippingany one of them . In a class with solid background, Chapters 1, 2, and 4 neednot be covered in detail . At the opposite end, Chapter 2 may be filled in withproofs that are readily available in standard texts . It is my hope that this bookmay also be useful to mature mathematicians as a gentle but not so meagerintroduction to genuine probability theory . (Often they stop just before thingsbecome interesting!) Such a reader may begin with Chapter 3, go at once toChapter 5 with a few glances at Chapter 4, skim through Chapter 6, and takeup the remaining chapters seriously to get a real feeling for the subject .Several cases of exclusion and inclusion merit special comment . I choseto construct only a sequence of independent random variables (in Section 3 .3),rather than a more general one, in the belief that the latter is better absorbed in acourse on stochastic processes . I chose to postpone a discussion of conditioninguntil quite late, in order to follow it up at once with varied and worthwhileapplications . With a little reshuffling Section 9 .1 may be placed right afterChapter 3 if so desired . I chose not to include a fuller treatment of infinitelydivisible laws, for two reasons : the material is well covered in two or threetreatises, and the best way to develop it would be in the context of the underlyingadditive process, as originally conceived by its creator Paul Levy . Itook pains to spell out a peripheral discussion of the logarithm of characteristicfunction to combat the errors committed on this score by numerousexisting books . Finally, and this is mentioned here only in response to a queryby Doob, I chose to present the brutal Theorem 5 .3 .2 in the original formgiven by Kolmogorov because I want to expose the student to hardships inmathematics .There are perhaps some new things in this book, but in general I havenot striven to appear original or merely different, having at heart the interestsof the novice rather than the connoisseur . In the same vein, I favor as a


xvi I PREFACE TO THE FIRST EDITIONrule of writing (euphemistically called "style") clarity over elegance . In myopinion the slightly decadent fashion of conciseness has been overwrought,particularly in the writing of textbooks . The only valid argument I have heardfor an excessively terse style is that it may encourage the reader to think forhimself. Such an effect can be achieved equally well, for anyone who wishesit, by simply omitting every other sentence in the unabridged version .This book contains about 500 exercises consisting mostly of special casesand examples, second thoughts and alternative arguments, natural extensions,and some novel departures . With a few obvious exceptions they are neitherprofound nor trivial, and hints and comments are appended to many of them .If they tend to be somewhat inbred, at least they are relevant to the text andshould help in its digestion . As a bold venture I have marked a few of themwith * to indicate a "must," although no rigid standard of selection has beenused . Some of these are needed in the book, but in any case the reader's studyof the text will be more complete after he has tried at least those problems .Over a span of nearly twenty years I have taught a course at approximatelythe level of this book a number of times . The penultimate draft ofthe manuscript was tried out in a class given in 1966 at Stanford University .Because of an anachronism that allowed only two quarters to the course (asif probability could also blossom faster in the California climate!), I had toomit the second halves of Chapters 8 and 9 but otherwise kept fairly closelyto the text as presented here . (The second half of Chapter 9 was covered in asubsequent course called "stochastic processes .") A good fraction of the exerciseswere assigned as homework, and in addition a great majority of themwere worked out by volunteers . Among those in the class who cooperatedin this manner and who corrected mistakes and suggested improvements are :Jack E . Clark, B . Curtis Eaves, Susan D . Horn, Alan T . Huckleberry, ThomasM . Liggett, and Roy E . Welsch, to whom I owe sincere thanks . The manuscriptwas also read by J . L. Doob and Benton Jamison, both of whom contributeda great deal to the final revision . They have also used part of the manuscriptin their classes . Aside from these personal acknowledgments, the book owesof course to a large number of authors of original papers, treatises, and textbooks. I have restricted bibliographical references to the major sources whileadding many more names among the exercises . Some oversight is perhapsinevitable ; however, inconsequential or irrelevant "name-dropping" is deliberatelyavoided, with two or three exceptions which should prove the rule .It is a pleasure to thank Rosemarie Stampfel and Gail Lemmond for theirsuperb job in typing the manuscript .


Distribution function1 .1 Monotone functionsWe begin with a discussion of distribution functions as a traditional wayof introducing probability measures . It serves as a convenient bridge fromelementary analysis to probability theory, upon which the beginner may pauseto review his mathematical background and test his mental agility . Some ofthe methods as well as results in this chapter are also useful in the theory ofstochastic processes .In this book we shall follow the fashionable usage of the words "positive","negative", "increasing", "decreasing" in their loose interpretation .For example, "x is positive" means "x > 0" ; the qualifier "strictly" will beadded when "x > 0" is meant . By a "function" we mean in this chapter a realfinite-valued one unless otherwise specified .Let then f be an increasing function defined on the real line (-oc, +oo) .Thus for any two real numbers x1 and x2,(1) x1 < x2 = f (XI) < f (x2) •We begin by reviewing some properties of such a function . The notation"t 'r x" means "t < x, t - x" ; "t ,, x" means "t > x, t -a x" .


2 1 DISTRIBUTION FUNCTION(i) For each x, bothunilateral limits(2) h m f (t) = f (x-) and lm f (t) = f (x+)exist and are finite . Furthermore the limits at infinityt o m f (t) = f (-oo) and t am f (t) = f (+oo)exist ; the former may be -oo, the latter may be +oo .This follows from monotonicity ; indeedf (x-) = sup f (t), f (x+) = inf f (t) .-oo


1 .1 MONOTONE FUNCTIONS 1 3Example 1 . Let x0 be an arbitrary real number, and define a function f as follows :f(x)=0 for x


4 1 DISTRIBUTION FUNCTIONbe replaced by an arbitrary countable set without any change of the argument . Wenow show that the condition of countability is indispensable . By "countable" we meanalways "finite (possibly empty) or countably infinite" .(iv) The set of discontinuities of f is countable .We shall prove this by a topological argument of some general applicability. In Exercise 3 after this section another proof based on an equally usefulcounting argument will be indicated . For each point of jump x consider theopen interval IX = (f (x-), f (x+)) . If x' is another point of jump and x < x',say, then there is a point x such that x < x < x' . Hence by monotonicity wehavef (x+) < f (x) < f (x' - ) .It follows that the two intervals Ix and I x , are disjoint, though they mayabut on each other if f (x+) = f (x'-) . Thus we may associate with the set ofpoints of jump in the domain of f a certain collection of pairwise disjoint openintervals in the range of f . Now any such collection is necessarily a countableone, since each interval contains a rational number, so that the collection ofintervals is in one-to-one correspondence with a certain subset of the rationalnumbers and the latter is countable . Therefore the set of discontinuities is alsocountable, since it is in one-to-one correspondence with the set of intervalsassociated with it .(v) Let f 1 and f 2 be two increasing functions and D a set that is (everywhere)dense in (-oc, +oc) . Suppose thatVx E D :f 1(x) = f 2(X)-Then f 1 and f 2 have the same points of jump of the same size, and theycoincide except possibly at some of these points of jump .To see this, let x be an arbitrary point and let t o E D, t ;, E D, t, 'r x,t,' x . Such sequences exist since D is dense . It follows from (i) that(6)f 1(x-) = lim f 1 (t„) = lira f 2 (t„) = f 2 (x-),nf 1 (x+)=lnmf l (t;,)=limf2(t;,)= f2(x+) .nIn particularVx : f I (X+) - f 1(x- ) = f 2(X+) - f2(x - ) .The first assertion in (v) follows from this equation and (ii) . Furthermore iff 1 is continuous at x, then so is f 2 by what has just been proved, and we


1 .1 MONOTONE FUNCTIONS 1 5havef I (x) = f 1 (x- ) = f 2(x- ) = f 2(x),proving the second assertion .How can f 1 and f 2 differ at all? This can happen only when f 1 (x)and f2(x) assume different values in the interval (f1(x-), f 1 (x+)) =(f 2(x-), f 2(x+)) . It will turn out in Chapter 2 (see in particular Exercise 21of Sec . 2 .2) that the precise value of f at a point of jump is quite unessentialfor our purposes and may be modified, subject to (3), to suit our convenience .More precisely, given the function f, we can define a new function f inseveral different ways, such asf (x) = f (x - ), f (x) = f (x+), f (x) = f (x - )+ f (x+) ,2and use one of these instead of the original one . The third modification isfound to be convenient in Fourier analysis, but either one of the first two ismore suitable for probability theory . We have a free choice between them andwe shall choose the second, namely, right continuity .(vi) If we putVx :f (x) = f (x+),then f is increasing and right continuous everywhere .Let us recall that an arbitrary function g is said to be right continuous atx iff lim t 4x g(t) exists and the limit, to be denoted by g(x+), is equal to g(x) .To prove the assertion (vi) we must show thatVx : lm f (t+) = f (x+) .This is indeed true for any f such that f (t+) exists for every t . For then :given any E > 0, there exists 8(E) > 0 such thatVs E (x, x + 8) : I f (s) - f (x+)I


6 1 DISTRIBUTION FUNCTIONand so on of f on its domain of definition if in the usual definitions we restrictourselves to the points of D . Even if f is defined in a larger domain, we maystill speak of these properties "on D" by considering the "restriction of fto D" .(vii) Let f be increasing on D, and define f on (-oo, +oo) as follows :Vx: f (x) = inf f (t) .z 0 be given . There exists to E D, to > xo,such thatHence if t E D, xo < t < to, we havef(to) - E < f (xo) < f (to) .0 < .f (t) - .f (xo) < .f (to) - .f (xo)


8 1 DISTRIBUTION FUNCTIONb ithe size at jump at ai, thenF(ai) - F(ai-) = bisince F(ai +) = F(ai) .Consider the functionFd(x) =J(x)which represents the sum of all the jumps of F in the half-line (-00, x) . It is .clearly increasing, right continuous, with(3)Fd (-oc) = 0, Fd (+oo) =Hence Fd is a bounded increasing function . It should constitute the "jumpingpart" of F, and if it is subtracted out from F, the remainder should be positive,contain no more jumps, and so be continuous . These plausible statements willnow be proved -they are easy enough but not really trivial .iTheorem 1 .2 .1 .LetF, (x) = F(x) - FAX) ;then F, is positive, increasing, and continuous .PROOF .Let x < x', then we have(4) Fd(x') - FAX) =x


1 .2 DISTRIBUTION FUNCTIONS 1 9Theorem 1 .2 .2 . Let F be a d .f. Suppose that there exist a continuous functionG, and a function Gd of the formGd(x) - b ja~ (x).l[where {a~ } is a countable set of real numbers and> ; Ib' I < oo], such thatthenF=G,+Gd,G C = F, Gd = Fd,where F c and Fd are defined as before .PROOF. If Fd 0 Gd, then either the sets {a j } and {a'} are not identical,or we may relabel the a i so that a i. = aJ for all j but b i:A bj for some j . Ineither case we have for at least one j, and a = aj or a' :[Fd(a) - Fd(a - )] - [Gd(a) - Gd (a-)] 0 0 .Since F c- G, = Gd - Fd, this implies thatF, (a) - Gja) - [Fc (a-) - G, (a- )] 0 0,contradicting the fact that F, - G c is a continuous function . Hence Fd = Gdand consequently Fc = G,DEFINITION . A d .f . F that can be represented in the formF= ) .b j 8 ajwhere {aj } is a countable set of real numbers, b j > 0 for every j and > ; b j = 1,is called a discrete d .f. A d .f . that is continuous everywhere is called a continuousd .f.Suppose F c =A 0, F d # 0 in Theorem 1 .2.1, then we may set a = F d (oo)sothat 0


10 1 DISTRIBUTION FUNCTIONF2 - F ; in either extreme case (5) remains valid . We may now summarizethe two theorems above as follows .Theorem 1 .2 .3 . Every d .f. can be written as the convex combination of adiscrete and a continuous one . Such a decomposition is unique .EXERCISES1 . Let F be a d .f. Then for each x,lim[F(x+E)-F(x-E)]=0C40unless x is a point of jump of F, in which case the limit is equal to the sizeof the jump .*2. Let F be a d .f. with points of jump {aj } . Prove that the sumX-E 0 . The set of all such x is called thesupport of F . Show that each point of jump belongs to the support, and thateach isolated point of the support is a point of jump . Give an example of adiscrete d .f. whose support is the whole line .


1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 117. Prove that the support of any d .f. is a closed set, and the support ofany continuous d .f. is a perfect set .1 .3 Absolutely continuous and singular distributionsFurther analysis of d .f.'s requires the theory of Lebesgue measure . Throughoutthe book this measure will be denoted by m ; "almost everywhere" on the realline without qualification will refer to it and be abbreviated to "a .e ." ; anintegral written in the form f . . . dt is a Lebesgue integral ; a function f issaid to be "integrable" in (a, b) iffGbf (t) dtis defined and finite [this entails, of course, that f be Lebesgue measurable] .The class of such functions will be denoted by L 1 (a, b), and L l (-oo, oo) isabbreviated to L 1 . The complement of a subset S of an understood "space"such as (-oo, +oo) will be denoted by S` .DEFINITION . A function F is called absolutely continuous [in (-oc, oo)and with respect to the Lebesgue measure] iff there exists a function f in L 1such that we have for every x < x' :(1) F(x) - F(x) _ ff (t) dt .It follows from a well-known proposition (see, e .g ., Natanson [3]*) that sucha function F has a derivative equal to f a .e . In particular, if F is a d .f., thenJXX/(2) f > 0 a .e . andf~f (t) dt = 1 .Conversely, given any f in L 1defined by(3) dx : F(x) _ Jf (t) dtsatisfying the conditions in (2), the function Fis easily seen to be a d .f. that is absolutely continuous .DEFINITION . A function F is called singular iff it is not identically zeroand F' (exists and) equals zero a .e .a* Numbers in brackets refer to the General Bibliography .


12 ( DISTRIBUTION FUNCTIONThe next theorem summarizes some basic facts of real function theory ;see, e .g ., Natanson [3] .Theorem 1 .3 .1 . Let F be bounded increasing with F(-oo) = 0, and let F'denote its derivative wherever existing . Then the following assertions are true .(a) If S denotes the set of all x for which F'(x) exists with 0 < F(x) 0 .IHence F S is also increasing and F s < F . We are now in a position to announcethe following result, which is a refinement of Theorem 1 .2 .3 .Theorem 1 .3 .2 . Every d .f. F can be written as the convex combination ofa discrete, a singular continuous, and an absolutely continuous d .f. Such adecomposition is unique .EXERCISES1 . A d .f . F is singular if and only if F = F s ; it is absolutely continuousif and only if F - FQc .2. Prove Theorem 1 .3 .2 .*3 . If the support of a d .f . (see Exercise 6 of Sec . 1 .2) is of measure zero,then F is singular . The converse is false .


1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 13*4 . Suppose that F is a d .f. and (3) holds with a continuous f . ThenF' = f > 0 everywhere .5 . Under the conditions in the preceding exercise, the support of F is theclosure of the set {t I f (t) > 0} ; the complement of the support is the interiorof the set {t I f (t) = 0) .6. Prove that a discrete distribution is singular . [Cf. Exercise 13 ofSec . 2 .2 .]7. Prove that a singular function as defined here is (Lebesgue) measurablebut need not be of bounded variation even locally . [FuNT : Such a function iscontinuous except on a set of Lebesgue measure zero ; use the completenessof the Lebesgue measure .]The remainder of this section is devoted to the construction of a singularcontinuous distribution . For this purpose let us recall the construction of theCantor (ternary) set (see, e .g ., Natanson [3]) . From the closed interval [0,1],the "middle third" open interval (3 , 3 ) is removed ; from each of the tworemaining disjoint closed intervals the middle third, ( 9 , 9 ) and (9 , 9 ), respectively,are removed and so on . After n steps, we have removed1+2+ . . .+2n-1=2'i-1disjoint open intervals and are left with 2' disjoint closed intervals each oflength 1/3 n . Let these removed ones, in order of position from left to right,be denoted by J n ,k, 1 < k < 2" - 1, and their union by U,, . We have1 2 4 2n -1 2 nm(U„)=3+32+33+ . . . } 3n =1_ ( 2As n T oo, U n increases to an open set U ; the complement C of U withrespect to [0,I] is a perfect set, called the Cantor set . It is of measure zerosincem(C)=1-m(U)=1-1=0 .Now for each n and k, n > 1, 1 < k < 2n - 1, we putkCn .k = Zn;and define a function F on U as follows :(7) F(x) = C,,,k for X E J,l,k .This definition is consistent since two intervals, J,,,k and J„ ',k, are eitherdisjoint or identical, and in the latter case so are c n ,k = c n ',k- . The last assertion


14 1 DISTRIBUTION FUNCTIONbecomes obvious if we proceed step by step and observe thatJn+1,2k = Jn,k, Cn+1,2k = Cn,k for 1 < k < 2n - 1 .The value of F is constant on each J n , k and is strictly greater on any otherJ n ', k ' situated to the right of J n , k . Thus F is increasing and clearly we havelim F(x) = 0, lim F(x) = 1 .xyoxt1Let us complete the definition of F by settingF(x) = 0 for x < 0, F(x) = 1 for x > 1 .F is now defined on the domain D = (- oo, 0) U U U (1, oo) and increasingthere . Since each J n , k is at a distance > 1/3n from any distinct Jn,k' and thetotal variation of F over each of the 2' disjoint intervals that remain afterremoving J n , k , 1 < k < 2n - 1, is 1/2n, it follows that0


1 .3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 1 1510 . For each x c [0, 1 ], we have2F ( 3 ) = F(x), 2F(3 + 3 - 1 = F(x) .11 . Calculate1I11x2 dF(x),0 0ffe" dF(x) .[H NT : This can be done directly or by using Exercise 10 ; for a third methodsee Exercise 9 of Sec . 5 .3 .]12 . Extend the function F on [0,1 ] trivially to (-oc, oo) . Let {rn } be anenumeration of the rationals andG(x) =O0 1n=12nF(rn + x) .Show that G is a d .f. that is strictly increasing for all x and singular . Thus wehave a singular d .f. with support (-oo, oo) .*13 . Consider F on [0,I] . Modify its inverse F-1 suitably to make itsingle-valued in [0,1] . Show that F-1 so modified is a discrete d .f . and findits points of jump and their sizes .14. Given any closed set C in (-oo, +oo), there exists a d .f. whosesupport is exactly C . [xmJT : Such a problem becomes easier when the correspondingmeasure is considered ; see Sec . 2 .2 below .]*15. The Cantor d .f. F is a good building block of "pathological"examples . For example, let H be the inverse of the homeomorphic map of [0,1 ]onto itself : x ---> 2[F +x] ; and E a subset of [0,1] which is not Lebesguemeasurable . Show that1H(E)°H = lEwhere H(E) is the image of E, 1B is the indicator function of B, and o denotesthe composition of functions . Hence deduce : (1) a Lebesgue measurable functionof a strictly increasing and continuous function need not be Lebesguemeasurable ; (2) there exists a Lebesgue measurable function that is not Borelmeasurable .


Measure theory2 .1 Classes of setsLet Q be an "abstract space", namely a nonempty set of elements to becalled "points" and denoted generically by co . Some of the usual operationsand relations between sets, together with the usual notation, are givenbelow .Union : E U F, U E/111Intersection : E f1 F, n E,ComplementDifferenceSymmetric differenceSingleton'IE` = Q\EE\F=Ef F`E A F = (E\F) U (F\E){cv}Containing (for subsets of Q as well as for collections thereof) :E C F, F D E (not excluding E = F)c/ C A, -~3 D ,cY (not excluding


18 1 MEASURE THEORYby (iv), which holds in a field, F n C F,,+1 andhence U~_ 1 E j E cJ by (vi) .00 00U Ej = U Fj ;j=1 j=1The collection J of all subsets of S2 is a B .F . called the total B.F. ; thecollection of the two sets {o, Q} is a B .F. called the trivial B.F. If A is anyindex set and if for every a E A, ~Ta is a B .F . (or M.C .) then the intersectionn ,,,,-A `~a of all these B .F.'s (or M .C .'s), namely the collection of sets eachof which belongs to all Via , is also a B .F . (or M.C .) . Given any nonemptycollection 6 of sets, there is a minimal B.F . (or field, or M .C .) containing it ;this is just the intersection of all B .F .'s (or fields, or M.C .'s) containing -E,of which there is at least one, namely the J mentioned above . This minimalB .F . (or field, or M .C .) is also said to be generated by 6 . In particular ifis a field there is a minimal B .F . (or M .C .) containing A .Theorem 2 .1 .2 . Let 3~ be a field, 6 the minimal M .C . containing ~)/b, ~ theminimal B .F . containing A, then 3 = 6 .PROOF . Since a B .F. is an M .C ., we have ~T D 6. To prove Z c 6 it issufficient to show that { is a B .F. Hence by Theorem 2 .1 .1 it is sufficient toshow that 6 is a field . We shall show that it is closed under intersection andcomplementation . Define two classes of subsets of 6 as follows :The identitiesnF,1 =UE j E .C/j=1C,={EE S' :EnFE6~'forallFE },(2 ={EE~' :EnFE6 for allFE6} .Fn UEj = U(FnEj)(j=1 j= 100 00Fn nE j = n(FnE j )j=1 j=1show that both (i'1 and (2 are M.C .'s . Since T is closed under intersection andcontained in 6, it is clear that - C 1 . Hence f,C (i by the minimality of .6,


2 .1 CLASSES OF SETS 1 19and so = 6„71 . This means for any F E J~ and E E ~' we have F fl E E ',which in turn means A C (2 . Hence {' = f2 and this means f, is closed underintersection .Next, define another class of subsets of §' as follows :The (DeMorgan) identities63={EEC : E` E9}0000U E j ) c = I I Ej(j=1 j=1c0000nEj( j=1UE jj=1show that e3 is a M .C . Since lb c e3, it follows as before that ' = - E3, whichmeans ' is closed under complementation . The proof is complete .Corollary . Let 3~ be a field, 07 the minimal B .F. containing A ; -F a classof sets containing A and having the closure properties (vi) and (vii), then -Ccontains ~T .The theorem above is one of a type called monotone class theorems . Theyare among the most useful tools of measure theory, and serve to extend certainrelations which are easily verified for a special class of sets or functions to alarger class . Many versions of such theorems are known ; see Exercise 10, 11,and 12 below .EXERCISES*1 . (UjAj)\(UjB1) C Uj(Aj\Bj) . (f jAj)\(fljBj) C U j (Aj\Bj) . Whenis there equality?*2. The best way to define the symmetric difference is through indicatorsof sets as follows :lAOB = l A + 1 B (mod 2)where we have arithmetical addition modulo 2 on the right side . All propertiesof o follow easily from this definition, some of which are rather tedious toverify otherwise . As examples :(A AB) AC=AA(BAC),(A AB)o(BAC)=AAC,


20 1 MEASURE THEORY(A AB) A(C AD) = (A A C) A(B AD),AAB=C' A =BAC,AAB=CAD4 AAC=BAD .3 . If S2 has exactly n points, then J has 2" members . The B .F . generatedby n given sets "without relations among them" has 2 2 " members .4. If Q is countable, then J is generated by the singletons, andconversely . [HNT : All countable subsets of S2 and their complements forma B .F.]5 . The intersection of any collection of B .F .'s a E A) is the maximalB .F . contained in all of them ; it is indifferently denoted by I IaEA ~a or A aE a .*6 . The union of a countable collection of B .F .'s {5~ } such that :~Tj C z~+rneed not be a B .F ., but there is a minimal B .F . containing all of them, denotedby v j ; 7~ . In general V A denotes the minimal B .F. containing all Oa , a E A .[HINT : Q = the set of positive integers ; 9Xj = the B .F. generated by those upto j .]7. A B .F . is said to be countably generated iff it is generated by a countablecollection of sets . Prove that if each j is countably generated, then sois v~__ 1 .*8 . Let `T be a B .F. generated by an arbitrary collection of sets {E a , a EA) . Prove that for each E E ~T, there exists a countable subcollection {Eaj , j >1 } (depending on E) such that E belongs already to the B .F . generated by thissubcollection . [Hrwr : Consider the class of all sets with the asserted propertyand show that it is a B .F. containing each Ea .]9 . If :T is a B .F . generated by a countable collection of disjoint sets{A„ }, such that U„ A„ = Q, then each member of JT is just the union of acountable subcollection of these A,, 's .10 . Let 2 be a class of subsets of S2 having the closure property (iii) ;let .,/ be a class of sets containing Q as well as , and having the closureproperties (vi) and (x) . Then ~/ contains the B .F . generated by !7 . (This isDynkin's form of a monotone class theorem which is expedient for certainapplications . The proof proceeds as in Theorem 2 .1 .2 by replacing ~' 3 and -6-with 9 and .~/ respectively .)11 . Take Q = .:;4" or a separable metric space in Exercise 10 and let 7be the class of all open sets . Let ;XF be a class of real-valued functions on Qsatisfying the following conditions .(a) 1 E '' and l D E ~Y' for each D E 1 ;(b) dY' is 'a vector space, namely : if f i E f 2 E XK and c 1 , c 2 are anytwo real constants, then c1 f 1 + c2f2 E X' ;


2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 21(c))K is closed with respect to increasing limits of positive functions,namely : if f, E', 0 < f, < f i+1 for all n, and f =limn T f n


22 1 MEASURE THEORYAxiom (ii) is called "countable additivity" ; the corresponding axiomrestricted to a finite collection {E j ) is called "finite additivity" .The following proposition(1) En ., 0 =#- ) - ~- 0is called the "axiom of continuity" . It is a particular case of the monotoneproperty (x) above, which may be deduced from it or proved in the same wayas indicated below .Theorem 2 .2 .1 . The axioms of finite additivity and of continuity togetherare equivalent to the axiom of countable additivity .If E, 4,PROOF . Let E„ ~ . We have the obvious identity :00 00E n = U (Ek\Ek+1) U n Ek .k=n k=10, the last term is the empty set . Hence if (ii) is assumed, we haveVn > 1 : J (E n ) _00k=n-,~)"(Ek\Ek+1)~the series being convergent, we have limn,,,,-,"P(En) = 0 . Hence (1) is true .Conversely, let {Ek , k > 1) be pairwise disjoint, then00U E k ~ok=n+1(why?) and consequently, if (1) is true, then00lim :~, U Ek = 0 .)t -4 00(k=n+1Now if finite additivity is assumed, we haveU Ek = ?P U Ek + JP U EkC, ~'\k=1 k=1 k=n+1 =Ono+11 00?11(Ek) + ~) U Ek -k=1 k=n+1This shows that the infinite series Ek°1 J'(Ek) converges as it is bounded bythe first member above . Letting n -+ oo, we obtain


2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 23xU Ekk=1= limn-+oc00nk=1?/'(Ek) -f- 11m11-+00U Ek=n+1k=1J'(Ek ) •Hence (ii) is true .Remark . For a later application (Theorem 3 .3 .4) we note the followingextension . Let J be defined on a field :l which is finitely additive and satisfiesaxioms (i), (iii), and (1) . Then (ii) holds whenever U k E k E T, . For thenUk_r,+1 Ek also belongs to , and the second part of the proof above remainsvalid .The triple (S2, 97 , °J') is called a probability space (triple) ; S2 alone iscalled the sample space, and co is then a sample point .Let A c S2, then the trace of the B .F . ~7 on A is the collection of all setsof the form A n F, where F E 3~7 . It is easy to see that this is a B .F. of subsetsof A, and we shall denote it by A n ~F . Suppose A E 9 and 9"(A) > 0 ; thenwe may define the set function J'o on A n J~ as follows :?P(E)dE E A n J : ?PA (E) = JP(Q) .It is easy to see that GPA is a p.m. on A n J~ . The triple (A, A n ~T , ~To) willbe called the trace of (S2, J , 7') on A .Example 1 . Let Q be a countable set : Q = (U)j, j E J}, where J is a countable indexset, and let :' be the total B .F . of S2 . Choose any sequence of numbers { p j , j E J)satisfying(2) YjEJ :pj>_0 ;jEJp= l ;and define a set function :~ on as follows :(3) `dE E ' : '~'(E) =pWjEEIn words, we assign p j as the value of the "probability" of the singleton {w j }, andfor an arbitrary set of wj's we assign as its probability the sum of all the probabilitiesassigned to its elements . Clearly axioms (i), (ii), and (iii) are satisfied . Hence ~/, sodefined is a p .m .Conversely, let any such ?T be given on ",T . Since (a)j} E :-'7 for every j, :~,,)({wj})is defined, let its value be pj . Then (2) is satisfied . We have thus exhibited all the


24 1 MEASURE THEORYpossible p .m .'s on Q, or rather on the pair (S2, J) ; this will be called a discrete samplespace . The entire first volume of Feller's well-known book [13] with all its rich contentis based on just such spaces .Example 2 . Let (0, 1 ], C the collection of intervals :C'={(a,b] :0


2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 253. In the preceding example show that for each real number a in [0, 1 ]there is an E in -C such that ^E) = a . Is the set of all primes in f ? Givean example of E that is not in -r .4. Prove the nonexistence of a p .m . on (Q, ./'), where (S2, J) is as inExample 1, such that the probability of each singleton has the same value .Hence criticize a sentence such as : "Choose an integer at random" .5. Prove that the trace of a B .F. 3A on any subset A of Q is a B .F .Prove that the trace of (S2, ~T, ,~P) on any A in ~ is a probability space, ifP7(A) > 0 .*6. Now let A 0 3 be such thatACFE~'F= P(F)=1 .Such a set is called thick in (S2, ~ , 91) . If E = A f1 F, F E ~~ 7 , define ~P* (E) =~P(F) . Then GJ'* is a well-defined (what does it mean?) p .m . on (A, A l 3 ) .This procedure is called the adjunction of A to (Q, ZT,7. The B .F . A1 on ~1 is also generated by the class of all open intervalsor all closed intervals, or all half-lines of the form (-oo, a] or (a, oo), or theseintervals with rational endpoints . But it is not generated by all the singletonsof J l nor by any finite collection of subsets of R 1 .8. A1 contains every singleton, countable set, open set, closed set, Gsset, Fa set . (For the last two kinds of sets see, e .g ., Natanson [3] .)* 9 . Let 6 be a countable collection of pairwise disjoint subsets {E j , j > 1 }of R 1 , and let ~ be the B .F. generated by , . Determine the most generalp.m . on zT- and show that the resulting probability space is "isomorphic" tothat discussed in Example 1 .10 . Instead of requiring that the E d 's be pairwise disjoint, we may makethe broader assumption that each of them intersects only a finite number inthe collection . Carry through the rest of the problem .The question of probability measures on J3 1 is closely related to thetheory of distribution functions studied in Chapter 1 . There is in fact a one-toonecorrespondence between the set functions on the one hand, and the pointfunctions on the other . Both points of view are useful in probability theory .We establish first the easier half of this correspondence .Lemma . Each p .m . ,a on J3 1 determines a d .f. F through the correspondence(4) Vx E ,W 1 : x]) = F(x) .


26 1 MEASURE THEORYAs a consequence, we have for -oo < a < b < +oo :µ((a, b]) = F(b) - F(a),(5)µ((a, b)) = F(b-) - F(a),IL([a, b)) = F(b-) - F(a-),µ([a, b]) = F(b) - F(a-) .Furthermore, let D be any dense subset of R. 1 , then the correspondence isalready determined by that in (4) restricted to x E D, or by any of the fourrelations in (5) when a and b are both restricted to D .PROOF .Let us writedx E ( 1 : IX = (- 00, x] .Then IX A1 E so that µ(Ix ) is defined ; call it F(x) and so define the functionF on R 1 . We shall show that F is a d .f. as defined in Chapter 1 . First of all, Fis increasing by property (viii) of the measure . Next, if x, 4, x, then IX, 4 Ix,hence we have by (ix)(6) F(x,,) = µ(Ix„) 4 h(Ix) = F (x) .Hence F is right continuous . [The reader should ascertain what changes shouldbe made if we had defined F to be left continuous .] Similarly as x ,[ -oo, Ix ,-0 ; as x T +oc, Ix T J? 1 . Hence it follows from (ix) again thatlim F(x) = lim µ(Ix ) = µ(Q1) = 0 ;x~-oox,j-00x 11n F(x) = xlim /-t (Ix ) = µ(S2) = 1 .This ends the verification that F is a d .f. The relations in (5) follow easilyfrom the following complement to (4) :g (( - oc, x)) = F(x-) .To see this let x„ < x and x„ T x . Since IX „ T (-oc, x), we have by (ix) :F(x-) = lim F(x„) = x,,)) T µ((-oo, x)) .11 -). 00To prove the last sentence in the theorem we show first that (4) restrictedto x E D implies (4) unrestricted . For this purpose we note that µ((-oc, x]),as well as F(x), is right continuous as a function of x, as shown in (6) .Hence the two members of the equation in (4), being both right continuousfunctions of x and coinciding on a dense set, must coincide everywhere . Now


2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 27suppose, for example, the second relation in (5) holds for rational a and b . Foreach real x let a n , b„ be rational such that a„ ,(, -oc and b„ > x, b,, ,4, x . Thenµ((a,,, b„ )) -- µ((-oo, x]) and F(b"-) - F(a") -* F(x) . Hence (4) follows .Incidentally, the correspondence (4) "justifies" our previous assumptionthat F be right continuous, but what if we have assumed it to be left continuous?Now we proceed to the second-half of the correspondence .Theorem 2.2 .2. Each d .f. F determines a p .m . µ on 0 through any one ofthe relations given in (5), or alternatively through (4) .This is the classical theory of Lebesgue-Stieltjes measure ; see, e .g .,Halmos [4] or Royden [5] . However, we shall sketch the basic ideas as animportant review . The d .f. F being given, we may define a set function forintervals of the form (a, b] by means of the first relation in (5) . Such a functionis seen to be countably additive on its domain of definition . (What does thismean?) Now we proceed to extend its domain of definition while preservingthis additivity . If S is a countable union of such intervals which are disjoint :S = U(ai, b l ]we are forced to define µ(S), if at all, byµ(S) = µ((ay, bi]) _ {F(bt) - F(ai)} .iiBut a set S may be representable in the form above in different ways, sowe must check that this definition leads to no contradiction : namely that itdepends really only on the set S and not on the representation . Next, we noticethat any open interval (a, b) is in the extended domain (why?) and indeed theextended definition agrees with the second relation in (5) . Now it is wellknown that any open set U in JR1 is the union of a countable collection ofdisjoint open intervals [there is no exact analogue of this in R " for n > 1], sayU = U ; (c ; ,; d,) and this representation is unique . Hence again we are forcedto define µ(U), if at all, byLt(U) = µ((ci, d,)) ={F(d, - ) - F(c,)} .Having thus defined the measure for all open sets, we find that its values forall closed sets are thereby also determined by property (vi) of a probabilitymeasure . In particular, its value for each singleton {a} is determined to beF(a) - F(a-), which is nothing but the jump of F at a . Now we also know itsvalue on all countable sets, and so on -all this provided that no contradiction


28 1 MEASURE THEORYis ever forced on us so far as we have gone . But even with the class of openand closed sets we are still far from the B .F . :- 9 . The next step will be the G8sets and the FQ sets, and there already the picture is not so clear . Althoughit has been shown to be possible to proceed this way by transfinite induction,this is a rather difficult task . There is a more efficient way to reach the goalvia the notions of outer and inner measures as follows . For any subset S ofj)l consider the two numbers :µ*(S) = inf µ(U),U open, UDSA * (S) = sup WC) .C closed, CcSA* is the outer measure, µ* the inner measure (both with respect to the givenF) . It is clear that g* (S) > µ* (S) . Equality does not in general hold, but whenit does, we call S "measurable" (with respect to F) . In this case the commonvalue will be denoted by µ(S) . This new definition requires us at once tocheck that it agrees with the old one for all the sets for which p. has alreadybeen defined. The next task is to prove that : (a) the class of all measurablesets forms a B .F ., say i ; (b) on this I, the function ,a is a p .m . Details ofthese proofs are to be found in the references given above . To finish : since' is a B .F ., and it contains all intervals of the form (a, b], it contains theminimal B .F. A i with this property . It may be larger than A i , indeed it is(see below), but this causes no harm, for the restriction of µ to 3i is a p.m .whose existence is asserted in Theorem 2 .2 .2 .Let us mention that the introduction of both the outer and inner measuresis useful for approximations . It follows, for example, that for each measurableset S and E > 0, there exists an open set U and a closed set C such thatUDS3C and(7) AM - E < /.c(S) < µ(C) + E .There is an alternative way of defining measurability through the use of theouter measure alone and based on Caratheodory's criterion .It should also be remarked that the construction described above for(J,i J3 1 µ) is that of a "topological measure space", where the B .F . is generatedby the open sets of a given topology on 9 1 , here the usual Euclideanone . In the general case of an "algebraic measure space", in which there is notopological structure, the role of the open sets is taken by an arbitrary field A,and a measure given on Z)b be extended to the minimal B .F . ?~T containingin a similar way . In the case of M. l , such an 33 is given by the field Z4o ofsets, each of which is the union of a finite number of intervals of the form (a,b], (-oo, b], or (a, oo), where a E oRl, b E M i . Indeed the definition of the


2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 29outer measure given above may be replaced by the equivalent one :(8) A* (E) = inf a(U n ) .where the infimum is taken over all countable unions Un U n such that eachU n E Mo and Un U n D E . For another case where such a construction isrequired see Sec . 3 .3 below .There is one more question : besides the µ discussed above is there anyother p.m. v that corresponds to the given F in the same way? It is importantto realize that this question is not answered by the preceding theorem . It isalso worthwhile to remark that any p .m . v that is defined on a domain strictlycontaining A1 and that coincides with µ on ?Z 1 (such as the µ on f asmentioned above) will certainly correspond to F in the same way, and strictlyspeaking such a v is to be considered as distinct from u . Hence we shouldphrase the question more precisely by considering only p .m.'s on A 1 . Thiswill be answered in full generality by the next theorem .Theorem 2 .2 .3 . Let µ and v be two measures defined on the same B .F . 97,which is generated by the field A . If either µ or v is a-finite on A, andµ(E) = v(E) for every E E A, then the same is true for every E E 97, andthus µ = v .PROOF . We give the proof only in the case where µ and v are both finite,leaving the rest as an exercise . LetC = {E E 37 : µ(E) = v(E)},then C 3 by hypothesis . But e is also a monotone class, for if En E forevery n and En T E or E n 4, E, then by the monotone property of µ and v,respectively,µ(E) = lim µ(E n ) = lim v(E„) = v(E) .nnIt follows from Theorem 2 .1 .2 that -C D 97, which proves the theorem .Remark . In order that µ and v coincide on A, it is sufficient that theycoincide on a collection ze~ such that finite disjoint unions of members ofconstitute A .Corollary . Let µ and v be a-finite measures on A 1 that agree on all intervalsof one of the eight kinds : (a, b], (a, b), [a, b), [a, b], (-oc, b], (-oo, b), [a, oc),(a, oc) or merely on those with the endpoints in a given dense set D, thenthey agree on A' .n


30 1 MEASURE THEORYPROOF . In order to apply the theorem, we must verify that any of thehypotheses implies that µ and v agree on a field that generates J3 . Let us takeintervals of the first kind and consider the field z~ 713o defined above . If µ and vagree on such intervals, they must agree on Ao by countable additivity . Thisfinishes the proof .Returning to Theorems 2 .2.1 and 2 .2.2, we can now add the followingcomplement, of which the first part is trivial .Theorem 2 .2 .4 . Given the p .m . µ on A 1 , there is a unique d .f . F satisfying(4) . Conversely, given the d .f. F, there is a unique p .m . satisfying (4) or anyof the relations in (5) .We shall simply call µ the p.m. of F, and F the d.f. of ic .Instead of (R 1 , ^ we may consider its restriction to a fixed interval [a,b] . Without loss of generality we may suppose this to be 9l = [0, 1] so that weare in the situation of Example 2 . We can either proceed analogously or reduceit to the case just discussed, as follows . Let F be a d .f. such that F = 0 for x 1 . The probability measure u of F will then have supportin [0, 1], since u((-oo, 0)) = 0 =,u((1, oo)) as a consequence of (4) . Thusthe trace of (9. 1 , ?A3 1 , µ) on J« may be denoted simply by (9I, A, µ), whereA is the trace of -7 , 3 1 on 9/ . Conversely, any p .m. on J3 may be regarded assuch a trace . The most interesting case is when F is the "uniform distribution"on 9,/ :0 for x < 0,F(x)= x for 0 1 .The corresponding measure m on ",A is the usual Borel measure on [0, 1 ],while its extension on J as described in Theorem 2 .2 .2 is the usual Lebesguemeasure there . It is well known that / is actually larger than °7 ; indeed (J, m)is the completion of ( .7, m) to be discussed below .DEFINITION . The probability space (Q, is said to be complete iffany subset of a set in -T with l(F) = 0 also belongs to ~T .Any probability space (Q, T , J/') can be completed according to the nexttheorem . Let us call a set in J 7T with probability zero a null set . A propertythat holds except on a null set is said to hold almost everywhere (a .e .), almostsurely (a .s.), or for almost every w .Theorem 2 .2 .5 . Given the probability space (Q, 3 , U~'), there exists acomplete space (S2, , %/') such that :~i C ?T and /) = J~ on r~ .


2 .2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 31PROOF . Let "1 " be the collection of sets that are subsets of null sets, andlet ~~7 be the collection of subsets of Q each of which differs from a set in z~;7by a subset of a null set . Precisely :(9)~T={ECQ :EAFEA" for some FE~7} .It is easy to verify, using Exercise 1 of Sec . 2 .1, that ~~ is a B .F . Clearly itcontains 27 . For each E E 7, we puto'(E) = -,:; P(F'),where F is any set that satisfies the condition indicated in (7) . To show thatthis definition does not depend on the choice of such an F, suppose thatThen by Exercise 2 of Sec . 2 .1,E A F 1 E .1l', E A F2 E A' .(E A Fl) o(E o F2) = (F1 A F2) o(E o E) = F1 A F2 .Hence F 1 A F 2 E J" and so 2P(F 1 A F 2 ) = 0 . This implies 7)(F 1 ) = GJ'(F 2 ),as was to be shown . We leave it as an exercise to show that 0 1 is a measureon o . If E E 97, then E A E = 0 E .A', hence GJ'(E) = GJ'(E) .Finally, it is easy to verify that if E E 3 and 9171(E) = 0, then E E -A'' .Hence any subset of E also belongs to . A l^ and so to JT . This proves that(Q, ~ , J') is complete .What is the advantage of completion? Suppose that a certain property,such as the existence of a certain limit, is known to hold outside a certain setN with J'(N) = 0 . Then the exact set on which it fails to hold is a subsetof N, not necessarily in 97 , but will be in k with 7(N) = 0 . We need themeasurability of the exact exceptional set to facilitate certain dispositions, suchas defining or redefining a function on it ; see Exercise 25 below .EXERCISESIn the following, g is a p.m . on %3 1 and F is its d .f.*11. An atom of any measure g on A1 is a singleton {x} such thatg({x}) > 0 . The number of atoms of any a-finite measure is countable . Foreach x we have g({x}) = F(x) - F(x-) .12 . g is called atomic iff its value is zero on any set not containing anyatom. This is the case if and only if F is discrete . g is without any atom oratomless if and only if F is continuous .13 . g is called singular iff there exists a set Z with m(Z) = 0 suchthat g (Z`') = 0 . This is the case if and only if F is singular . [Hurt : One halfis proved by using Theorems 1 .3 .1 and 2 .1 .2 to get fB F(x) dx < g(B) forB E A 1 ; the other requires Vitali's covering theorem .]


32 1 MEASURE THEORY14 . Translate Theorem 1 .3 .2 in terms of measures .* 15 . Translate the construction of a singular continuous d .f. in Sec . 1 .3 interms of measures . [It becomes clearer and easier to describe!] Generalize theconstruction by replacing the Cantor set with any perfect set of Borel measurezero . What if the latter has positive measure? Describe a probability schemeto realize this .16. Show by a trivial example that Theorem 2 .2.3 becomes false if thefield ~~ is replaced by an arbitrary collection that generates 27 .17. Show that Theorem 2 .2 .3 may be false for measures that are a-finiteon ~T . [HINT : Take Q to be 11, 2, . . . , oo} and ~~ to be the finite sets excludingoc and their complements, µ(E) = number of points in E, ,u,(oo) :A v(oo) .]18 . Show that the Z in (9) is also the collection of sets of the formF U N [or F\N] where F E and N E . 1' .19 . Let -1 - be as in the proof of Theorem 2 .2 .5 and A o be the set of allnull sets in (Q, ZT, ~?) . Then both these collections are monotone classes, andclosed with respect to the operation "\" .*20. Let (Q, 37, 9') be a probability space and 91 a Borel subfield oftJ . Prove that there exists a minimal B .F. ~ satisfying 91 C S c ~ and1o c 3~, where J6 is as in Exercise 19 . A set E belongs to ~ if and onlyif there exists a set F in 1 such that EAF E A0 . This O~ is called theaugmentation of j with respect to (Q,.97,9)21 . Suppose that F has all the defining properties of a d .f. except that itis not assumed to be right continuous . Show that Theorem 2 .2 .2 and Lemmaremain valid with F replaced by F, provided that we replace F(x), F(b), F(a)in (4) and (5) by F(x+), F(b+), F(a+), respectively . What modification isnecessary in Theorem 2 .2 .4?22 . For an arbitrary measure ?/-' on a B .F . T, a set E in 3 is called anatom of J7' iff .(E) > 0 and F C E, F E T imply q'(F) = 9(E) or 9'(F) =0 . JT is called atomic iff its value is zero over any set in 3 that is disjointfrom all the atoms . Prove that for a measure µ on ?/3 1 this new definition isequivalent to that given in Exercise 11 above provided we identify two setswhich differ by a JI-null set .23 . Prove that if the p.m . JT is atomless, then given any a in [0, 1]there exists a set E E zT with JT(E) = a . [HINT : Prove first that there exists Ewith "arbitrarily small" probability . A quick proof then follows from Zorn'slemma by considering a maximal collection of disjoint sets, the sum of whoseprobabilities does not exceed a . But an elementary proof without using anymaximality principle is also possible .]*24. A point x is said to be in the support of a measure µ on A" iffevery open neighborhood of x has strictly positive measure . The set of allsuch points is called the support of µ . Prove that the support is a closed setwhose complement is the maximal open set on which µ vanishes . Show that


2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 1 33the support of a p .m . on A' is the same as that of its d .f., defined in Exercise 6of Sec . 1 .2 .*25 . Let f be measurable with respect to ZT, and Z be contained in a nullset . Define_ft f on Zc,1 K on Z,where K is a constant . Then f is measurable with respect to 7 provided that(Q, , Y) is complete . Show that the conclusion may be false otherwise .


3 Random variable .Expectation . Independence3 .1 General definitionsLet the probability space (Q, ~ , °J') be given . Ri = (-oo, -boo) the (finite)real line, A* = [-oc, +oo] the extended real line, A 1 = the Euclidean Borelfield on A", A* = the extended Borel field . A set in U, * is just a set in Apossibly enlarged by one or both points +oo .DEFINITION OF A RANDOM VARIABLE . A real, extended-valued random variableis a function X whose domain is a set A in ~ and whose range iscontained in M* = [-oo, +oc] such that for each B in A*, we have(1) {N : X(60) E B} E A n :-Twhere A n ~T is the trace of ~T- on A . A complex-valued random variable isa function on a set A in ~T to the complex plane whose real and imaginaryparts are both real, finite-valued random variables .This definition in its generality is necessary for logical reasons in manyapplications, but for a discussion of basic properties we may suppose A = Qand that X is real and finite-valued with probability one . This restricted meaning


3 .1 GENERAL DEFINITIONS 1 3 5of a "random variable", abbreviated as "r.v .", will be understood in the bookunless otherwise specified . The general case may be reduced to this one byconsidering the trace of (Q, ~T, ~') on A, or on the "domain of finiteness"A0 = {co : IX(w)J < oc), and taking real and imaginary parts .Consider the "inverse mapping" X-1 from M, ' to Q, defined (as usual)as follows :VA C R' : X -1 (A) = {co : X (w) E A} .Condition (1) then states that X -1 carries members of A1 onto members of .~ :(2) dB E 9,3 1 : X -1 (B) E `T ;or in the briefest notation :X-11 )C 3;7 .Such a function is said to be measurable (with respect to ) . Thus, an r .v . isjust a measurable function from S2 to R' (or R*) .The next proposition, a standard exercise on inverse mapping, is essential .Theorem 3 .1 .1 . For any function X from Q to g1 (or R*), not necessarilyan r .v ., the inverse mapping X -1 has the following properties :X -1 (A ` ) _ (X-' (A))` .X -1(UAa) = UX -1 (A«),X -1n A a = nX - '(A c,) .a «where a ranges over an arbitrary index set, not necessarily countable .Theorem 3 .1 .2 . X is an r .v. if and only if for each real number x, or eachreal number x in a dense subset of cal, we havePROOF .{(0 :X(0)) < x} E ;7- T-The preceding condition may be written as(3) Vx : X-1((-00,x]) E .Consider the collection .~/ of all subsets S of R1 for which X -1 (S) E i . FromTheorem 3 .1 .1 and the defining properties of the Borel field f , it follows thatif S E c/, thenX-' (S`)= (X -1 (S))` E :'- ;


36 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEif V j : Sj E /, thenX -1 (us)j = UX -1 (Sj) E -~ .lThus S` E c/ and U, Sj E c'/ and consequently c/ is a B .F . This B .F . containsall intervals of the form (-oc, x], which generate 73 1 even if x is restricted toa dense set, hence ,


3 .1 GENERAL DEFINITIONS 1 37Specifically, F is given byF(x) = A(( - oo, x]) = 91{X < x) .While the r .v. X determines lc and therefore F, the converse is obviouslyfalse . A family of r .v .'s having the same distribution is said to be "identicallydistributed" .Example 1 . Let (S2, J) be a discrete sample space (see Example 1 of Sec . 2 .2) .Every numerically valued function is an r .v .Example 2 . (91, A, m) .In this case an r .v . is by definition just a Borel measurable function . According tothe usual definition, f on 91 is Borel measurable iff f -' (A3') C ?Z . In particular, thefunction f given by f (co) - w is an r .v . The two r .v .'s co and 1 - co are not identicalbut are identically distributed ; in fact their common distribution is the underlyingmeasure m .Example 3 . (9' A' µ)The definition of a Borel measurable function is not affected, since no measureis involved ; so any such function is an r .v ., whatever the given p .m . µ may be . Asin Example 2, there exists an r .v . with the underlying µ as its p.m. ; see Exercise 3below .We proceed to produce new r .v .'s from given ones .Theorem 3 .1 .4 . If X is an r .v ., f a Borel measurable function [on (R1 A 1 )],then f (X) is an r .v .PROOF . The quickest proof is as follows . Regarding the function f (X) ofco as the "composite mapping" :f ° X : co -* f (X (w)),we have (f ° X) -1 = X -1 ° f -1 and consequently(f ° X)-1 0) = X-1(f-1 W)) C X -1 (A1) CThe reader who is not familiar with operations of this kind is advised to spellout the proof above in the old-fashioned manner, which takes only a littlelonger .We must now discuss the notion of a random vector . This is just a vectoreach of whose components is an r .v . It is sufficient to consider the case of twodimensions, since there is no essential difference in higher dimensions apartfrom complication in notation .


1 RANDOM VARIABLE38. EXPECTATION . INDEPENDENCEWe recall first that in the 2-dimensional Euclidean space : 2, or the plane,the Euclidean Borel field _32 is generated by rectangles of the form{(x,y):a


3 .1 GENERAL DEFINITIONS 1 39by (2) . Now the collection of sets A in ?i2 for which (X, Y) - '(A) E T formsa B .F. by the analogue of Theorem 3 .1 .1 . It follows from what has just beenshown that this B .F . contains %~o hence it must also contain A2 . Hence eachset in /3 2 belongs to the collection, as was to be proved .Here are some important special cases of Theorems 3 .1 .4 and 3 .1 .5 .Throughout the book we shall use the notation for numbers as well as functions:(6) x v y= max(x, y), x A y= min(x, y) .Corollary . If X is an r .v . and f is a continuous function on R 1 , then f (X)is an r .v . ; in particular X r for positive integer r, 1XIr for positive real r, e-U,e"x for real A and t, are all r .v .'s (the last being complex-valued) . If X and Yare r, v .' s, thenXvY, XAY, X+Y, X-Y, X •Y ,X/Yare r .v .'s, the last provided Y does not vanish .Generalization to a finite number of r .v.'s is immediate . Passing to aninfinite sequence, let us state the following theorem, although its analogue inreal function theory should be well known to the reader .Theorem 3 .1 .6 . If {X j , j > 1} is a sequence of r .v .'s, theninf X j , sup X j, lim inf X j , lim sup XjJ J J jare r .v .'s, not necessarily finite-valued with probability one though everywheredefined, andlim X jj -+00is an r .v . on the set A on which there is either convergence or divergence to+0O .PROOF . To see, for example, that sup j X j is an r .v ., we need only observethe relationvdx E Jai' {supX j < x} = n{X j < x}JJand use Theorem 3 .1 .2 . Sincelim sup X j = inf (sup X j ),I n j>n


40 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEand limj_,,,, X j exists [and is finite] on the set where lim sup s X j = lim inf iX j[and is finite], which belongs to ZT, the rest follows .Here already we see the necessity of the general definition of an r .v .given at the beginning of this section .DEFINITION . An r .v . X is called discrete (or countably valued) iff thereis a countable set B C R.1 such that 9'(X E B) = 1 .It is easy to see that X is discrete if and only if its d .f. is . Perhaps it isworthwhile to point out that a discrete r .v . need not have a range that is discretein the sense of Euclidean topology, even apart from a set of probability zero .Consider, for example, an r.v . with the d .f. in Example 2 of Sec . 1 .1 .The following terminology and notation will be used throughout the bookfor an arbitrary set 0, not necessarily the sample space .DEFINTTION . For each A C Q, the function 1 ° ( •) defined as follows :b'w E S2 : IA(W) _ 1,is called the indicator (function) of 0 .if w E A,- 0, if w E Q\0,Clearly 1 ° is an r.v . if and only if A E 97 .A countable partition of 0 is a countable family of disjoint sets {Aj},with Aj E zT for each j and such that Q = U ; Aj . We have then1=1More generally, let b j be arbitrary real numbers, then the function co definedbelow :Vw E Q : cp(w) = bj 1 A, (w),iis a discrete r .v . We shall call cp the r .v . belonging to the weighted partition{A j ; b j } . Each discrete r .v . X belongs to a certain partition . For let {bj } bethe countable set in the definition of X and let A j = {w : X (o)) = b 1 }, then Xbelongs to the weighted partition {A j ; b j } . If j ranges over a finite index set,the partition is called finite and the r .v . belonging to it simple .EXERCISES1. Prove Theorem 3 .1 .1 . For the "direct mapping" X, which of theseproperties of X -1 holds?


3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION ( 4 12 . If two r .v .'s are equal a .e ., then they have the same p .m .*3. Given any p .m . µ on define an r .v. whose p.m . is µ . Canthis be done in an arbitrary probability space?* 4. Let 9 be uniformly distributed on [0,1] . For each d.f. F, define G(y) _sup{x : F(x) < y} . Then G(9) has the d .f. F .*5. Suppose X has the continuous d .f . F, then F(X) has the uniformdistribution on [0,I] . What if F is not continuous?6. Is the range of an r .v. necessarily Borel or Lebesgue measurable?7. The sum, difference, product, or quotient (denominator nonvanishing)of the two discrete r .v .'s is discrete .8. If S2 is discrete (countable), then every r .v . is discrete . Conversely,every r .v . in a probability space is discrete if and only if the p .m . is atomic .[Hi' r: Use Exercise 23 of Sec . 2 .2 .]9 . If f is Borel measurable, and X and Y are identically distributed, thenso are f (X) and f (Y) .10. Express the indicators of A, U A2, A 1 fl A2, A 1 \A 2 , A 1 A A2,lim sup A„ , lim inf A, in terms of those of A 1 , A 2 , or A, [For the definitionsof the limits see Sec . 4 .2 .]*11 . Let T {X} be the minimal B .F . with respect to which X is measurable .Show that A E f {X} if and only if A = X -1 (B) for some B E A' . Is this Bunique? Can there be a set A ~ A1 such that A = X -1 (A)?12 . Generalize the assertion in Exercise 11 to a finite set of r .v .'s . [It ispossible to generalize even to an arbitrary set of r .v .'s .]3 .2 Properties of mathematical expectationThe concept of "(mathematical) expectation" is the same as that of integrationin the probability space with respect to the measure ' . The reader is supposedto have some acquaintance with this, at least in the particular case (V, A, m)or %3 1 , in) . [In the latter case, the measure not being finite, the theoryof integration is slightly more complicated .] The general theory is not muchdifferent and will be briefly reviewed . The r .v.'s below will be tacitly assumedto be finite everywhere to avoid trivial complications .For each positive discrete r .v. X belonging to the weighted partition{A j ; b j }, we define its expectation to be(1) J`(X) = >b j {A ;} .JThis is either a positive finite number or +oo . It is trivial that if X belongsto different partitions, the corresponding values given by (1) agree . Now let


42 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEX be an arbitrary positive r.v . For any two positive integers m and n,the setn n + 1A mn = w :2m < X (O))


3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 43and call it "the integral of X (with respect to 91) over the set A" . We shallsay that X is integrable with respect to JP over A iff the integral above existsand is finite .In the case of (:T 1 , ?P3 1 , A), if we write X = f , (o = x, the integralI = X(w)9'(dw) f f (x)li(dx)nis just the ordinary Lebesgue-Stieltjes integral of f with respect to p . If Fis the d .f. of µ and A = ( a, b], this is also written as(a b]f (x) dF(x) .This classical notation is really an anachronism, originated in the days when apoint function was more popular than a set function . The notation above mustthen amend to1b+0 b+0 b-0 b-0a+0 1-0 1 +0 a-0to distinguish clearly between the four kinds of intervals (a, b], [a, b], (a, b),[a, b) .In the case of (2l, A, m), the integral reduces to the ordinary Lebesgueintegralabf (x)m(dx) =ff (x) dx .Here m is atomless, so the notation is adequate and there is no need to distinguishbetween the different kinds of intervals .The general integral has the familiar properties of the Lebesgue integralon [0,1] . We list a few below for ready reference, some being easy consequencesof others . As a general notation, the left member of (4) will beabbreviated to fn X dJ/' . In the following, X, Y are r .v .'s ; a, b are constants ;A is a set in ,~ .(i) Absolute integrability . fn X d J~' is finite if and only ifab(ii) Linearity .JAIX I dJ' < oc .f(aX+bY)cl=afXd+bf.°~'YdJAprovided that the right side is meaningful, namely not +oo - oc or -oo + oc .


44 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE(iii) Additivity over sets .If the A n 's are disjoint, then(iv) Positivity .Ju n AnXd9'=~ f Xd~'XnAnIf X > 0 a .e . on A, then(v) Monotonicity .fMean value theorem .(vii) Modulus inequality .fnX d °J' > 0 .If X1 < X < X2 a .e . on A, thenX i dam'< f XdGJ'< f X2d?P .n n nIf a < X < b a .e . on A, thenaGJ'(A) < f X d°J' < b°J'(A) .nJAX d°J'fIXId?7J.(viii) Dominated convergence theorem . If lim n, X n = X a .e . or merelyin measure on A and `dn : IXn I < Y a.e. on A, with fA Y dY' < oc, then(5) lim f Xn dGJ' =fX dJ' = f lim X n d-OP .n moo A JA A n-'°°(ix) Bounded convergence theorem . If lim n, X n = X a .e . or merelyin measure on A and there exists a constant M such that `'n : IX„ I < M a .e .on A, then (5) is true .(x) Monotone convergence theorem . If X, > 0 and X n T X a .e . on A,then (5) is again true provided that +oo is allowed as a value for eithermember. The condition "X,, > 0" may be weakened to : "6'(X„) > -oo forsome n" .(xi) Integration term by term . IfEf IXn I dJ' < oo,nnthen En IXn I < oc a .e . on A so that En X n converges a.e. on A and'Xn dJ' = f Xn d?P .n


3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 45(xii)Fatou's lemma . If X, > 0 a .e . on A, thenA ( f lim X,) &l' < lim X, 1 d:-Z'.n-+oc n-+00 ALet us prove the following useful theorem as an instructive example .Theorem 3 .2.1 .We have00 00(6) E (IXI >- n) < "( IXI) < 1 + '( IXI >_ n)I1=1 n=1so that (`(IXI) < oo if and only if the series above converges .PROOF . By the additivity property (iii), if A 11 = { n < IXI < n + 1),dGJ' .n=0 nnHence by the mean value theorem (vi) applied to each set A n0000(`°(IXI) = Y,'(7 ) ng'(An) < ( IXI) (n + 1)~P(An) = 1 + n~P(An)jn=0n=0n=0It remains to show00(8) n_,~P(An) _ ( IXI > n),n=0n=0finite or infinite . Now the partial sums of the series on the left may be rearranged(Abel's method of partial summation!) to yield, for N > 1,N(9) En{J~'(IXI > n) -'(IXI > n + 1)}n=0f IXI0000n=1- - 1)}"P(IXI >- n) -N~(IXI > N+ 1)Thus we haveNn=1NN'(IXI > n) - NJ'(IXI > N + 1) .(10) Ln'~,)(An) < ° ( IXI > n) .~/(An)+N,I'(IXI > N+ 1) .n=1n=1n=1


46 I RANDOM VARIABLE. EXPECTATION. INDEPENDENCEAnother application of the mean value theorem givesN:~(IX I> N + 1) < f IXI dT .(IXI?N+1)Hence if c`(IXI) < oo, then the last term in (10) converges to zero as N --* 00and (8) follows with both sides finite . On the other hand, if c1(IXI) = o0, thenthe second term in (10) diverges with the first as N -* o0, so that (8) is alsotrue with both sides infinite .Corollary .If X takes only positive integer values, then~(X) =J'(X > n) .nEXERCISES1. If X > 0 a .e. on A and fA X dY = 0, then X = 0 a .e. on A .*2 . If c (IXI) < oo and limn,x "J(A f ) = 0, then limn,oo fA X d.°J' = 0 .In particularlimXdA=0Xn->oo f (IXI>n)3. Let X > 0 and f 2 X d~P = A, 0 < A < oo . Then the set function vdefined on J;7 as follows :v(A)= A fn X dJT,is a probability measure on ~ .4 . Let c be a fixed constant, c > 0 . Then ct(IXI) < o0 if and only ifxn=1~ft XI > cn) < oo .In particular, if the last series converges for one value of c, it converges forall values of c .r)5. For any r > 0, (, (IX I < oc if and only ifxn=1nr -1 "T(IXI > n) < oc .


3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION ( 47*6. Suppose that sup, IX„ I < Y on A with fA Y d?/' oc. Deduce from 0, there exists a simple r .v .(see the end of Sec . 3 .1) such thatd` ( IX - X E I) < E .Hence there exists a sequence of simple r.v .'sWe can choose IX,,) so that I Xm Ilim ((IX - X,n 1)-0 .M_+ 00< IXI for all m* 8. For any two sets A 1 and A 2 in ~T, defineXm such that.p(A1, A2) _ 9'(A1 A A2) ;then p is a pseudo-metric in the space of sets in 3~ 7; call the resulting metricspace M(27, J') . Prove that for each integrable r .v. X the mapping of M(J~ , ~')to t given by A -f fA X d2' is continuous . Similarly, the mappings onM(T , ?P) x M(3 , 7') to M(7 , P) given by(A1, A2) ---are all continuous . If (see Sec . 4.2 below)A1 U A2, A1 fl A2, A1\A2, A1 A A2lim sup A„ = lim inf A„itnmodulo a null set, we denote the common equivalence class of these two setsby lim„ A,, . Prove that in this case [A,,) converges to lim„ A,, in the metricp . Deduce Exercise 2 above as a special case .There is a basic relation between the abstract integral with respect to/' over sets in : T on the one hand, and the Lebesgue-Stieltjes integral withrespect to µ over sets in 1 on the other, induced by each r .v. We give theversion in one dimension first .Theorem 3 .2 .2. Let X on (Q, ;~i , .T) induce the probability space%3 1 , µ) according to Theorem 3 .1 .3 and let f be Borel measurable . Thenwe have(11) f f (X (cw)) ;'/'(dw) = f (x)A(dx)provided that either side exists .


48 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEPROOF. Let B E . %1 1 , and f = 1B, then the left side in (11) is ' X E B)and the right side is µ(B) . They are equal by the definition of p. in (4) ofSec . 3 .1 . Now by the linearity of both integrals in (11), it will also hold iff= bj1 Bj ,namely the r .v . on (MI, A) belonging to an arbitrary weighted partition{B j ; b j } . For an arbitrary positive Borel measurable function f we can define,as in the discussion of the abstract integral given above, a sequence f f, m >1 } of the form (12) such that f . .. T f everywhere . For each of them we have(13) [fm(X)d~'= f .fordµ ;hence, letting in -+ oo and using the monotone convergence theorem, weobtain (11) whether the limits are finite or not . This proves the theorem forf > 0, and the general case follows in the usual way .We shall need the generalization of the preceding theorem in severaldimensions . No change is necessary except for notation, which we will givein two dimensions . Instead of the v in (5) of Sec . 3 .1, let us write the "masselement" as it 2 (dx, d y) so thatv(A) = ff µ( 2 dx , d y) .ATheorem 3 .2 .3 . Let (X, Y) on (Q, f, J') induce the probability space( , ,2 , ;,~32 , µ2)and let f be a Borel measurable function of two variables .Then we have( 14) f f (X(w), Y(w)) .o(dw) = ff f (x, y)l.2 (dx, dy) .s~Note that f (X, Y) is an r .v . by Theorem 3 .1 .5 .As a consequence of Theorem 3 .2 .2, we have : if µx and Fx denote,respectively, the p .m . and d .f. induced by X, then we haveand more generally~( (X) = f xµx (dx)~~ oc= f x d Fx (x) ;(15) (`( .f (X)) = f , . f (x)l.x(dx) = f .f (x) dFx(x)00with the usual proviso regarding existence and finiteness .


3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 49Another important application is as follows : let µ2 be as in Theorem 3 .2 .3and take f (x, y) to be x + y there . We obtain(16) i(X + Y) = f f (x + Y)Ii2(dx, dy)= ff xµ2(dx, dy) + ff yµ2(dx, dy)o~? zJR~On the other hand, if we take f (x, y) to be x or y, respectively, we obtainand consequently(` (X) = JfxIL2(dx, dy), (`` (Y) = ff yh .2(dx, dy)R2 9?2(17) (-r,(X + Y) _ (-"(X) + e(Y) .This result is a case of the linearity of oc' given but not proved here ; the proofabove reduces this property in the general case of (0, 97, ST) to the correspondingone in the special case (R2, ?Z2, µ2) . Such a reduction is frequentlyuseful when there are technical difficulties in the abstract treatment .We end this section with a discussion of "moments" .Let a be real, r positive, then cf(IX - air) is called the absolute momentof X of order r, about a . It may be +oo ; otherwise, and if r is an integer,~`((X - a)r) is the corresponding moment . If µ and F are, respectively, thep .m . and d .f. of X, then we have by Theorem 3 .2 .2 :('(IX - air) = f Ix - al rµ(dx) = f Ix - air dF(x),00cc `((X . - a)r) = (x - a)rµ(dx) = f (x - a)r dF(x)fJi .00For r = 1, a = 0, this reduces to (`(X), which is also called the mean of X .The moments about the mean are called central moments . That of order 2 isparticularly important and is called the variance, var (X) ; its positive squareroot the standard deviation . 6(X) :var (X) = Q2 (X ) = c`'{ (X - `(X))2) = (,`(X2) { (X ))2 .We note the inequality a2(X) < c`(X2), which will be used a good dealin Chapter 5 . For any positive number p, X is said to belong to LP =LP(Q, :~ , :J/') iff c (jXjP) < oo .


5 0 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEThe well-known inequalities of Holder and Minkowski (see, e .g .,Natanson [3]) may be written as follows . Let X and Y be r.v .'s, 1 < p < 00and l / p + l /q = 1, then(18) I f (XY)I <


3 .2 PROPERTIES OF MATHEMATICAL EXPECTATION 1 5 1each u > 0 :7{IXI >- u} < {~P(X)}cp(u)PROOF . We have by the mean value theorem :flco(X )} =~o(X )dam' >_infrom which the inequality follows .I~o(X)d-,~P >_ ~ o (u)'' f IXI >- u}jX1>U)The most familiar application is when co(u) = I uI P for 0 < p < oo, sothat the inequality yields an upper bound for the "tail" probability in terms ofan absolute moment .EXERCISES9. Another proof of (14) : verify it first for simple r .v .'s and then useExercise 7 of Sec . 3 .2 .10 . Prove that if 0 < r < r' and (,,1 (IX I r') < oo, then (, (IX I r) < oo. Alsothat (r ( IX I r) r< oo if and only if AI X - a I ) < oo for every a .*11 . If (r(X2 ) = 1 and cP(IXI) > a > 0, then UJ'{IXI > Xa} > (1 - X ) 2a2for 0 0, p_> 0, then e { (X + Y )P } < 2P { AXP) +AYP)} . If p > 1, the factor 2P may be replaced by 2P -1 . If 0 < p < 1, itmay be replaced by 1 .*13 . If X j > 0, thennPnc~ < or c~ (X P )j=1 j=1according as p < 1 or p > 1 .*14 . If p > 1, we haveP1


5 2 1 RANDOM VARIABLE . EXPECTATION. INDEPENDENCEwe have alsoCompare the inequalities .15 . If p > 0, (`(iXi ) < oc, then xp?7-1{iXI > x} = o(l) as x -± oo .Conversely, if xPJ' J I X I > x) = o(1), then ((IX I P - E) < oc for 0 < E < p .* 16 . For any d .f. and any a > 0, we have0 0f[F(x + a) - F(x)] dx = a .0017 . If F is a d .f. such that F(0-) = 0, thenThus if X is a positive r .v .,{1 - F(x)} dx =J ~x dF(x) < +oo .f/ 0 0then we havec` (X)_ I~ 3A{X > x} dx =fo000GJ'{X > x} dx .18 . Prove that f "0,,, l x i d F(x) < oc if and only if0F(x) dx < ocand000[1 - F(x)] dx < oo .*19 . If {X,} is a sequence of identically distributed r .v .'s with finite mean,then1lim - c`'{ max iX j I } = 0 .n fl I x) dx = °l'(X llr > v)rvr -1 dv,0 0fsubstitute and invert the order of the repeated integrations .]


3 .3 INDEPENDENCE 1 533.3 IndependenceWe shall now introduce a fundamental new concept peculiar to the theory ofprobability, that of "(stochastic) independence" .DEFINITION OF INDEPENDENCE . The r .v .'s {X j, 1 < j < n} are said to be(totally) independent iff for any linear Borel sets {Bj, 1 < j < n } we haven(1) °' n(Xj E Bj) = fl ?)(Xj E Bj) .j=1 j=1The r .v.'s of an infinite family are said to be independent iff those in everyfinite subfamily are. They are said to be pairwise independent iff every twoof them are independent .Note that (1) implies that the r .v .'s in every subset of {X j , 1 < j < n)are also independent, since we may take some of the B j 's as 9. 1 . On the otherhand, (1) is implied by the apparently weaker hypothesis : for every set of realnumbers {xj , 1 < j < n }n(2) {~(Xi


5 4 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEFrom now on when the probability space is fixed, a set in `T will also becalled an event . The events {E j , 1 < j < n } are said to be independent iff theirindicators are independent; this is equivalent to : for any subset { j 1, . . . j f } of11, . . . , n }, we have(4)een E j k. = H?1') (Ej k ) .k=1 k=1Theorem 3 .3 .1 . If {X j , 1 < j < n} are independent r.v .'s and { f j , 1 < j


3 .3 INDEPENDENCE 1 55PROOF . We give two proofs in detail of this important result to illustratethe methods . Cf. the two proofs of (14) in Sec . 3 .2, one indicated there, andone in Exercise 9 of Sec . 3 .2 .First proof. Suppose first that the two r .v.'s X and Y are both discretebelonging respectively to the weighted partitions {A 1 ;c j } and {Mk .;d k } suchthat A j = {X = c j }, M k = { Y = dd . ThusNow we havec`(X) = j ,JT(A 1 ), c`'(Y) = d :%I'(M k ) .jn UMk = U(AjMk)kj,kandX (w)Y(w) = c j dk if Co E A jMk .Hence the r.v . XY is discrete and belongs to the superposed partitions{A jMk ; cjdk} with both the j and the k varying independently of each other .Since X and Y are independent, we have for every j and k :~'(A jMk) = P(X = c1 ; Y = d k ) = ?P(X = c j )~'(Y = dk) _ 'T(Aj)^Mk) ;and consequently by definition (1) of Sec . 3 .2 :c`(XY) = jdk'(AjMk) = c1 :Y(A1) dk'J'(Mk)j .kj= c (X)c`(Y) .Thus (5) is true in this case .Now let X and Y be arbitrary positive r .v .'s with finite expectations . Then,according to the discussion at the beginning of Sec . 3 .2, there are discreter .v .'s X,,, and Y,,, such that ((X,,,) T c`(X) and (`(Y, n ) T C "(Y) . Furthermore,for each m, X,n and Y,,, are independent . Note that for the independence ofdiscrete r .v .'s it is sufficient to verify the relation (1) when the B j 's are theirpossible values and "E" is replaced by "=" (why ?). Here we haven n' n n+1 n' n'+l~~p Xm - 2rn' Yn, = 2177 { 2rn -X < 2m , 2rn < Y


5 6 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE~Ym ' ~= J7~ {xm = = .n2 m ~2nThe independence of Xm and Ym is also a consequence of Theorem 3 .3 .1,since X," = [2'"X]/2"' where [X] denotes the greatest integer in X . Finally, itis clear that X,,,Y,,, is increasing with m and0


3 .3 INDEPENDENCE 1 57Corollary . If {X j , 1 < j < n} are independent r .v .'s with finite expectations,then(6) c fJX j = 11 c'(X j) .nnj=1 j=1This follows at once by induction fromtwo r .v .'sftx1and11 Xjj=1 j=k+1n(5), provided we observe that theare independent for each k, I < k < n - 1 . A rigorous proof of this fact maybe supplied by Theorem 3 .3 .2 .Do independent random variables exist? Here we can take the cue fromthe intuitive background of probability theory which not only has given risehistorically to this branch of mathematical discipline, but remains a source ofinspiration, inculcating a way of thinking peculiar to the discipline . It maybe said that no one could have learned the subject properly without acquiringsome feeling for the intuitive content of the concept of stochastic independence,and through it, certain degrees of dependence . Briefly then : events aredetermined by the outcomes of random trials . If an unbiased coin is tossedand the two possible outcomes are recorded as 0 and 1, this is an r.v ., and ittakes these two values with roughly the probabilities 1 each . Repeated tossingwill produce a sequence of outcomes . If now a die is cast, the outcome maybe similarly represented by an r.v . taking the six values 1 to 6 ; again thismay be repeated to produce a sequence . Next we may draw a card from apack or a ball from an urn, or take a measurement of a physical quantitysampled from a given population, or make an observation of some fortuitousnatural phenomenon, the outcomes in the last two cases being r .v .'s takingsome rational values in terms of certain units ; and so on . Now it is veryeasy to conceive of undertaking these various trials under conditions suchthat their respective outcomes do not appreciably affect each other ; indeed itwould take more imagination to conceive the opposite! In this circumstance,idealized, the trials are carried out "independently of one another" and thecorresponding r .v .'s are "independent" according to definition . We have thus"constructed" sets of independent r .v.'s in varied contexts and with variousdistributions (although they are always discrete on realistic grounds), and thewhole process can be continued indefinitely .Can such a construction be made rigorous? We begin by an easy specialcase .


5 8 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEExample 1 . Let n > 2 and (S2 j , Jj', ~'j) be n discrete probability spaces . We definethe product space2' = S2 1 x . . . x S2 n (n factors)to be the space of all ordered n-tuples w" = (cot, . . . , 600 1 where each w j E 2, . Theproduct B.F. ,I" is simply the collection of all subsets of S2", just as lj is that for S2 j .Recall that (Example 1 of Sec . 2 .2) the p .m . ~j is determined by its value for eachpoint of S2, . Since S2" is also a countable set, we may define a p .m . on Jn by thefollowing assignment :( 7 )1(10), 1) = IIPi ({wj}),j=1nnamely, to the n-tuple (w 1 , . . . , co n ) the probability to be assigned is the product of theprobabilities originally assigned to each component wj by ~Pj . This p .m . will be calledthe product measure derived from the p .m .'s {-,~'Pj , 1 < j < n } and denoted by X j=1 iIt is trivial to verify that this is indeed a p .m . Furthermore, it has the following productproperty, extending its definition (7) : if S j E 4, 1 < j < n, then(8)nn~" X Sj -II,j (S j ) .J=1j=1To see this, we observe that the left side is, by definition, equal toE . . . E ~~~({wl, . . ., (On)) _ E . . . E 11=~,j({0)j})W1ES1 (0,Es n wIES1 Wn Esn j= 1nn({ ) _ f -,~Pi j(Si)+nthe second equation being a matter of simple algebra .Now let X j be an r .v . (namely an arbitrary function) on Q j ; Bj be an arbitraryBorel set ; and S j = Xi 1 (Bj ), namely :so that Sj E Jj, We have then by (8) :Sj = {(0j E Qj : Xj(w j ) E Bj)11n n(9) LGl'" {*x1EB1I _ '" X Sj = 11 ~j(Sj) =11 ~j{X j E Bj} •J=1 j=1 -'=1 j=1To each function X j on Q 1 let correspond the function X j on S2" defined below, inwhich w = (w1, . . . , w n ) and each "coordinate" w j is regarded as a function of thepoint w :Vw E Q'1 : X j (w) = X j (w


3 .3 INDEPENDENCE 1 59Then we havennn{co :X j (w) E Bj} = X {wj:Xj(wj) E Bj)j=1 j=1since{(O :Xj(w) E Bj) = Q1 x . . . X Qj_i X {(wj :X j(wj) E Bj} x Qj+l x . . . X Q n .It follows from (9) thatn2 n[X j E Bj] = f 'op"{X j E B j } .j=1 j=1nTherefore the r .v .'s {Xj , 1 < j < n} are independent .Example 2 . Let W11 be the n-dimensional cube (immaterial whether it is closed ornot) :2l' ={(xi, . . .,xn) : 0


60 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCEExample 4 . Can we construct r .v .'s on the probability space (W, A, m) itself, withoutgoing to a product space? Indeed we can, but only by imbedding a product structurein %/ . The simplest case will now be described and we shall return to it in the nextchapter .For each real number in (0,1 ], consider its binary digital expansionx = .E1 E2 . . .0CEn . . . 2" Enn=1each c, = 0 or 1 .This expansion is unique except when x is of the form m/2n ; the set of such x iscountable and so of probability zero, hence whatever we decide to do with them willbe immaterial for our purposes . For the sake of definiteness, let us agree that onlyexpansions with infinitely many digits "1" are used . Now each digit E j of x is afunction of x taking the values 0 and 1 on two Borel sets . Hence they are r .v .'s . Let{c j , j ? 1 } be a given sequence of 0's and l's . Then the set{x : Ej(x) = cj, I < j < n}=nix :EJ(x) = cj}is the set of numbers x whose first n digits are the given c /s, thusnj=1x = .C1C2 . . .CnEn+t€n+2 . . .with the digits from the (n + 1)st on completely arbitrary . It is clear that this set isjust an interval of length 1/2n, hence of probability 1/2n . On the other hand for eachj, the set {x : E j (x) = cj } has probability ; for a similar reason . We have therefore91 {Ej1= cj, 1 < j < n} = 2n = 1112 = fl ~{Ej = cj} .j= 1 j=1This being true for every choice of the cj 's, the r.v .'s {E j , j > 1} are independent . Let{fj , j > 1 } be arbitrary functions with domain the two points {0, 11, then {fj (E j ), j >1 } are also independent r .v .'s .This example seems extremely special, but easy extensions are at hand (seeExercises 13, 14, and 15 below) .We are now ready to state and prove the fundamental existence theoremof product measures .Theorem 3 .3 .4 . Let a finite or infinite sequence of p .m .'s 11-t j ) on (AP, A'),or equivalently their d .f .'s, be given . There exists a probability space(Q, :'T ,J ) and a sequence of independent r .v .'s {X j } defined on it such thatfor each j, It j is the p .m . of X j .PROOF . Without loss of generality we may suppose that the givensequence is infinite . (Why?) For each n, let (Qn, J n , ., -A„) be a probability spacenn


3 .3 INDEPENDENCE 1 61in which there exists an r .v . X„ with µ n as its p .m . Indeed this is possible if wetake (S2„ , ?l„ , ;J/;,) to be (J/? 1 , J/,3 1 , µ„) and X„ to be the identical function ofthe sample point x in now to be written as w, (cf. Exercise 3 of Sec . 3 .1) .Now define the infinite product spaceS200= X Qnn=1on the collection of all "points" co = {c ) l, cot, . . . , cv n , . . .}, where for each n,co,, is a point of S2„ . A subset E of Q will be called a "finite-product set" iffit is of the form00(11) E = X Fn,n=1where each F n E 9„ and all but a finite number of these F n 's are equal to thecorresponding St n 's. Thus cv E E if and only if con E F n , n > 1, but this isactually a restriction only for a finite number of values of n . Let the collectionof subsets of 0, each of which is the union of a finite number of disjoint finiteproductsets, be A . It is easy to see that the collection A is closed with respectto complementation and pairwise intersection, hence it is a field . We shall takethe z5T in the theorem to be the B .F. generated by A . This ~ is called theproduct B . F. of the sequence {fin , n > 1) and denoted by X n __1 n .We define a set function ~~ on as follows . First, for each finite-productset such as the E given in (11) we set00(12) ~(E) _ fl -~ (F'n ),n=1where all but a finite number of the factors on the right side are equal to one .Next, if E E A andnE = U E (k) ,k=1where the E (k) 'sare disjoint finite-product sets, we put11(13) °/'(E (k)) .k=1If a given set E in A~ has two representations of the form above, then it isnot difficult to see though it is tedious to explain (try drawing some pictures!)that the two definitions of , ~"(E) agree . Hence the set function J/' is uniquely


~^62 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEdefined on ib ; it is clearly positive with :%'(52) = 1, and it is finitely additiveon ib by definition . In order to verify countable additivity it is sufficient toverify the axiom of continuity, by the remark after Theorem 2 .2 .1 . Proceedingby contraposition, suppose that there exist a 3 > 0 and a sequence of sets{C,,, n > l} in :rb such that for each n, we have C,, D C„+1 and JT(C,,) >S > 0 ; we shall prove that n,°__ 1 C„ :A 0 . Note that each set E in , as wellas each finite-product set, is "determined" by a finite number of coordinatesin the sense that there exists a finite integer k, depending only on E, such thatif w = (wl, w2, . . .) and w' = (wl , w 2 , . . .) are two points of 52 with w j = w~for 1 < j < k, then either both w and w' belong to E or neither of them does .To simplify the notation and with no real loss of generality (why?) we maysuppose that for each n, the set C„ is determined by the first n coordinates .Given w?, for any subset E of Q let us write (E I w?) for 21 x E1, where E1 isthe set of points (cot, w3, . . .) in Xn_ 2 Q,, such that (w?, w2, w3, . . .) E E . If w 0does not appear as first coordinate of any point in E, then (E I cv?) = 0 . Notethat if E E , then (E I cv?) E A for each wo . We claim that there exists ana), such that for every n, we have '((C n 1 (v?)) > S/2 . To see this we beginwith the equation(14) ~(C,) = jo=JA((Cn I co1))~'1(dwl ) .This is trivial if C„ is a finite-product set by definition (12), and follows byaddition for any C„ in To . Now put B n = {w1 : ^(C„ I w1)) > S/2}, a subsetof Q 1 ; then it follows thatS S/2 . Choose any cvo in nc,,o1 B,, . Then 9'((C,, I w?)) > S/2 .Repeating the argument for the set (C„ I cv?), we see that there exists an wosuch that for every n, JP((C„ I coo cv o )) > S/4, where (C,, I wo, (oz) = ((C,,(v?) 1 wo) is of the form Q1 X Q2 x E3 and E3 is the set (W3, w4 . . . . ) inX,°°_3 52„ such that (cv?, w°, w3, w4, . . .) E C,, ; and so forth by induction . Thusfor each k > 1, there exists wo such thatV11 :- ~/) (( Cn I w?, . . . , w°))S> Zk.Consider the point w o = ()? wo w o .) . Since (C k I w ° wo ) :Athere is a point in Ck whose first k coordinates are the same as those of w o ;since C k is determined by the first k coordinates, it follows that w o E C k . Thisis true for all k, hence w o E nk 1 Ck,as was to be shown .


3 .3 INDEPENDENCE 1 63We have thus proved that ?-P as defined on ~~ is a p .m . The next theorem,which is a generalization of the extension theorem discussed in Theorem 2 .2 .2and whose proof can be found in Halmos [4], ensures that it can be extendedto ~A as desired. This extension is called the product measure of the sequence{ ~6n , n > 1 } and denoted by X n°_ 1 ?;71, with domain the product field X n°_ 1 n .Theorem 3 .3 .5 . Let 3 3 be a field of subsets of an abstract space Q, and °J'a p.m . on ~j . There exists a unique p .m . on the B .F . generated by A thatagrees with GJ' on 3 .The uniqueness is proved in Theorem 2 .2 .3 .Returning to Theorem 3 .3 .4, it remains to show that the r .v .'s {w3 } areindependent . For each k > 2, the independence of {w3, 1 < j < k) is a consequenceof the definition in (12) ; this being so for every k, the independenceof the infinite sequence follows by definition .We now give a statement of Fubini's theorem for product measures,which has already been used above in special cases . It is sufficient to considerthe case n = 2. Let 0 = Q 1 x 522, 3;7= A x .lz and _ 37'1 x A be theproduct space, B .F ., and measure respectively .Let (z~Kj, 9'1 ), (3,, A) and (j x , °J'1 x A) be the completionsof ( 1 , and (~1 x ~,JPj x A), respectively, according toTheorem 2 .2 .5 .Fubini's theorem . Suppose that f is measurable with respect to 1 xand integrable with respect to ?/-~1 x A . Then(i) for each w1 E Q1\N1 where N 1 E ~1 and 7 (N1) = 0, f (c0 1 , .) ismeasurable with respect to ,J~ and integrable with respect to A, ;(ii) the integralf f ( •, w2)A(dw2)is measurable with respect to `Tj and integrable with respect to ?P1 ;(iii) The following equation between a double and a repeated integralholds :( 15 ) f f f (a) 1, (02)?/'l x `~'2(dw) = f f f (wl, w2)'~z(dw2) ~i (dw1)Q, X Q'Furthermore, suppose f is positive and measurable with respect toJ% x 1 ; then if either member of (15) exists finite or infinite, so does theother also and the equation holds .sl,c2 z


64 1 RANDOM VARIABLE . EXPECTATION . INDEPENDENCEFinally if we remove all the completion signs "-" in the hypothesesas well as the conclusions above, the resulting statement is true with theexceptional set N 1 in (i) empty, provided the word "integrable" in (i) bereplaced by "has an integral ."The theorem remains true if the p .m .'s are replaced by Q-finite measures ;see, e .g ., Royden [5] .The reader should readily recognize the particular cases of the theoremwith one or both of the factor measures discrete, so that the correspondingintegral reduces to a sum . Thus it includes the familiar theorem on evaluatinga double series by repeated summation .We close this section by stating a generalization of Theorem 3 .3 .4 to thecase where the finite-dimensional joint distributions of the sequence {Xj } arearbitrarily given, subject only to mutual consistency . This result is a particularcase of Kolmogorov's extension theorem, which is valid for an arbitrary familyof r .v .'s .Let m and n be integers : 1 < m < n, and define nmn to be the "projectionmap" of Am onto 7A" given by`dB E '" : 7r mn (B) _ {(x1, xn ) : (X1, . . , Xm ) E B) .Theorem 3 .3.6 . For each n > 1, let µ" be a p.m . on (mil", A") such that(16) dm < n : µ n ° 7rmn = µm .Then there exists a probability space (0, ~ , 0 ') and a sequence of r .v .'s {X j }on it such that for each n, µ" is the n-dimensional p .m . of the vector(X1, • • , Xn) .Indeed, the Q and x may be taken exactly as in Theorem 3 .3 .4 to be theproduct spaceand X ~TilJwhere (Q 1 , JTj ) = (A l , A 1 ) for each j ; only .JT is now more general . In termsof d.f.'s, the consistency condition (16) may be stated as follows . For eachin > 1 and (x1, . . . , x, n ) E "', we have if n > m :.cmlim~l-.xFn (X1, . . . , Xm, Xm+l, . . . , xn) = Fm(X1, . . . , X m ) .An-CJFor a proof of this first fundamental theorem in the theory of stochasticprocesses, see Kolmogorov [8] .


3 .3 INDEPENDENCE I 65EXERCISES1 . Show that two r .v .'s on (S2, ~T ) may be independent according to onep.m . ?/' but not according to another!*2. If X I and X 2 are independent r .v .'s each assuming the values +1and -1 with probability 2, then the three r .v .'s {X 1 , X2 , X I X2 } are pairwiseindependent but not totally independent. Find an example of n r.v .'s such thatevery n - 1 of them are independent but not all of them . Find an examplewhere the events A 1 and A 2 are independent, A 1 and A3 are independent, butA I and A2 U A 3 are not independent .*3. If the events {Ea , a E A} are independent, then so are the eventsIF,, a E A), where each Fa may be Ea or Ea ; also if {Ap, ,B E B), whereB is an arbitrary index set, is a collection of disjoint countable subsets of A,then the eventsU Ea, 8 E B,aEApare independent .4. Fields or B .F .'s 3 a (C 3) of any family are said to be independent iffany collection of events, one from each ./a., forms a set of independent events .Let a° be a field generating Via . Prove that if the fields a are independent,then so are the B .F .'s 3 . Is the same conclusion true if the fields are replacedby arbitrary generating sets? Prove, however, that the conditions (1) and (2)are equivalent . [HINT : Use Theorem 2 .1 .2 .]5 . If {X a } is a family of independent r .v .'s, then the B .F .'s generatedby disjoint subfamilies are independent. [Theorem 3 .3 .2 is a corollary to thisproposition .]6. The r .v . X is independent of itself if and only if it is constant withprobability one. Can X and f (X) be independent where f E A I ?7 . If {E j , 1 < j < oo} are independent events theng)00 ccn E j = 11UT (Ej),(j=1 j=1where the infinite product is defined to be the obvious limit ; similarly%/' U E j _ - 11(1 - 91'(E,))~ ( 0000J=1 j=18. Let {X j , 1 < j < n} be independent with d .f.'s {F j , I < J < n} . Findthe d .f. of maxi Xj and mini X j .


6 6 1 RANDOM VARIABLE. EXPECTATION . INDEPENDENCE*9. If X and Y are independent and (_,`(X) exists, then for any Bore] setB, we have{X d .JT = c`(X)~'(Y E B) .YEB}* 10 . If X and Y are independent and for some p > 0 : ()(IX + Y I P) < 00,then (IXIP) < oc and e (IYIP) < oo .11 . If X and Y are independent, (t( IX I P) < oo for some p > 1, andiP(Y) = 0, then c` (IX + Y I P) > (_,~(IX I P) . [This is a case of Theorem 9 .3 .2 ;but try a direct proof!]12. The r .v .'s 16j ) in Example 4 are related to the "Rademacher functions":rj (x) = sgn(sin 2 1 rx) .What is the precise relation?13 . Generalize Example 4 by considering any s-ary expansion where s >2 is an integer :00Enx = where E n = 0, l , . . ., s - 1 .n=1S 1* 14 . Modify Example 4 so that to each x in [0, 1 ] there corresponds asequence of independent and identically distributed r .v .'s {E n , n > 1}, eachtaking the values 1 and 0 with probabilities p and 1 - p, respectively, where0 < p < 1 . Such a sequence will be referred to as a "coin-tossing game (withprobability p for heads)" ; when p = 1 it is said to be "fair" .15 . Modify Example 4 further so as to allow c,, to take the values 1 and0 with probabilities p„ and 1 - p,1 , where 0 < p n < 1, but p n depends on n .16. Generalize (14) toJ' (C) = ~"((C I wl, . . . , (0k))r'T1(dwl). . . I (d(Ok)C(1 k)= f ^(C I w1, . . ., wk))~'(dw),where C(l, . . . , k) is the set of (w1, . . . , wk) which appears as the first kcoordinates of at least one point in C and in the second equation w 1 , . . . , wkare regarded as functions of w .* 17 . For arbitrary events {E j , 1 < j < n), we havenj=1:JII(Ej) - T(EjEk) .1


3 .3 INDEPENDENCE I 67If do : {E~n 1 , 1 < j < n } are independent events, andnJ' U E(' ) -~ 0 as n -* oo,(j=lthenGl'nnU E~n ) ~(E ( n ) ) .j=I j=118. Prove that A x A =,4 A x where J3 is the completion of A withrespect to the Lebesgue measure ; similarly A x 3 .19 . If f E 3l x ~ andfffd I I(~j x A) < oo,0 1 X Q2thenfn ' f f dGJ-'z ] dJ' 1 = fn2 [L1 f d?P i d?Pz .20 . A typical application of Fubini's theorem is as follows . If f is aLebesgue measurable function of (x, y) such that f (x, y) = 0 for each x E 9 1and y N, where m(NX ) = 0 for each x, then we have also f (x, y) = 0 foreach y N and X E N Y', where m(N) = 0 and m(Ny) = 0 for each y N .


4Convergence concepts4 .1 Various modes of convergenceAs numerical-valued functions, the convergence of a sequence of r .v .'s{X„ , n ? I), to be denoted simply by {X, } below, is a well-defined concept .Here and hereafter the term "convergence" will be used to mean convergenceto a finite limit. Thus it makes sense to say : for every co E A, whereA E ~T, , the sequence {X„ (w)} converges . The limit is then a finite-valuedr.v. (see Theorem 3 .1 .6), say X(w), defined on A . If Q = A, then we have"convergence every-where", but a more useful concept is the following one .DEFINITION OF CONVERGENCE "ALMOST EVERYWHERE" (a .e .) . The sequence ofr.v . {X, } is said to converge almost everywhere [to the r .v . X] iff there existsa null set N such that(1) Vw E Q\N : Jim X„ (w) = X (o)) finite .li -~ OCRecall that our convention stated at the beginning of Sec . 3 .1 allowseach r .v. a null set on which it may be ±oo . The union of all these sets beingstill a null set, it may be included in the set N in (1) without modifying the


4 .1 VARIOUS MODES OF CONVERGENCE 1 69conclusion . This type of trivial consideration makes it possible, when dealingwith a countable set of r .v .'s that are finite a .e ., to regard them as finiteeverywhere .The reader should learn at an early stage to reason with a single samplepoint coo and the corresponding sample sequence {X n (coo), n > 1 } as a numericalsequence, and translate the inferences on the latter into probability statementsabout sets of co . The following characterization of convergence a .e . isa good illustration of this method .Theorem 4 .1 .1 . The sequence {X n } converges a.e. to X if and only if forevery c > 0 we have(2) lim J'{IX n -XI < E for all n > m} = 1 ;M--+00or equivalently(2') lim J'{ IX n - X 1 > E for some n > m} = 0 .M_+00PROOF . Suppose there is convergence a .e. and let 00 = Q\N where N isas in (1) . For m > 1 let us denote by Am (E) the event exhibited in (2), namely :(3) Am(E) = n {IXn - XI < E} .n=m00Then Am (E) is increasing with m . For each coo, the convergence of {X n (c )o)}to X(coo) implies that given any c > 0, there exists m(coo, c) such that(4) n > m(wo, E) = Wn(wo) -X(wo)I E .Hence each such coo belongs to some Am (E) and so Q o C U,° 1 A,n (E) .It follows from the monotone convergence property of a measure thatlimm, %'(Am (E)) = 1, which is equivalent to (2) .Conversely, suppose (2) holds, then we see above that the set A(E) _U00=1 Am (E) has probability equal to one . For any 0.)o E A(E), (4) is true forthe given E . Let E run through a sequence of values decreasing to zero, forinstance [11n) . Then the setO0 1A= nAnn=1still has probability one since95(A) = lira ?T1(A C n.


70 ( CONVERGENCE CONCEPTSIf coo belongs to A, then (4) is true for all E = 1 /n, hence for all E > 0 (why?) .This means {X„ (wo)} converges to X(a)o) for all wo in a set of probability one .A weaker concept of convergence is of basic importance in probabilitytheory .DEFINITION OF CONVERGENCE "IN PROBABILITY" (in pr .) . The sequence {X„ }is said to converge in probability to X iff for every E > 0 we have(5) lim :%T{IXn -XI > E} = 0 .n-+o0Strictly speaking, the definition applies when all X n and X are finitevalued. But we may extend it to r .v .'s that are finite a .e . either by agreeingto ignore a null set or by the logical convention that a formula must first bedefined in order to be valid or invalid . Thus, for example, if X, (w) = +ooand X (w) = +oo for some w, then Xn (w) - X (CO) is not defined and thereforesuch an w cannot belong to the set { IX„ - X I > E} figuring in (5) .Since (2') clearly implies (5), we have the immediate consequence below .Theorem 4 .1 .2. Convergence a .e. [to X] implies convergence in pr . [to X] .Sometimes we have to deal with questions of convergence when no limitis in evidence . For convergence a.e . this is immediately reducible to the numericalcase where the Cauchy criterion is applicable . Specifically, {Xn } convergesa .e . iff there exists a null set N such that for every w E Q\N and every c > 0,there exists m(a), E) such thatn ' > n > m(w, c) = IX,, (w) -X n '(w)I < E .The following analogue of Theorem 4 .1 .1 is left to the reader .Theorem 4 .1 .3 .we haveThe sequence {X, } converges a.e . if and only if for every e(6) lim %/'{ IX, - X n ' I > E for some n' > n > ,n} = 0 .m-00For convergence in pr., the obvious analogue of (6) is(7) lim J~{IX,, -Xn 'I > E} = 0 .n--xIt can be shown (Exercise 6 of Sec . 4 .2) that this implies the existence of afinite r.v . X such that X„ --+ X in pr .


4 .1 VARIOUS MODES OF CONVERGENCE I 71DEFINITION OF CONVERGENCE IN LP, 0 < p < oc . The sequence {X n } is saidto converge in LP to X iff X, E LP, X E LP and(8) lim ("(IX, - XIP) = 0 .n-+ooIn all these definitions above, Xn converges to X if and only if X„ - Xconverges to 0 . Hence there is no loss of generality if we put X - 0 in thediscussion, provided that any hypothesis involved can be similarly reducedto this case . We say that X is dominated by Y if I X I < Y a .e ., and that thesequence {X n } is dominated by Y iff this is true for each X, with the sameY . We say that X or {X„ } is uniformly bounded iff the Y above may be takento be a constant.Theorem 4 .1 .4 . If X n converges to 0 in LP, then it converges to 0 in pr .The converse is true provided that {X n } is dominated by some Y that belongsto LP .Remark . If X n ->. X in LP, and {X n } is dominated by Y, then {X n -X}is dominated by Y + I X I, which is in LP . Hence there is no loss of generalityto assume X - 0 .PROOF . By Chebyshev inequality with ~p(x) - IxI P, we have( (IXnIP)(9) 9/1 1 IX, I > E) 0 by hypothesis, hence so does the left,which is equivalent to (5) with X = 0 . This proves the first assertion . If nowIX„ I < Y a .e . with E(YP) < oo, then we havef %(IX,I P ) = IX n I P d~PIX,I P d :J~ 0, the last-written integral -+ 0 by Exercise 2 inSec . 3 .2 . Letting first n -+ oo and then E --~ 0, we obtain ("(IX, I P) - 0 ; henceX„ converges to 0 in LP .As a corollary, for a uniformly bounded sequence {X„ } convergence inpr . and in LP are equivalent . The general result is as follows .Theorem 4.1 .5 . X„ --+ 0 in pr . if and only ifIXn I(10)(Furthermore, the functional p( ., .) given by1+IX„I 0-IX - YIp(X,Y)= (`C 1+Ix-YI


is metric it is sufficient to show that72 l CONVERGENCE CONCEPTSis a metric in the space of r .v.'s, provided that we identify r .v .'s that areequal a .e .PROOF . If p(X, Y) = 0, then (``(IX - Y j) = 0, hence X = Y a .e. byExercise 1 of Sec . 3 .2 . To show that p( •, •)I ( IX - YIIXI lYlC', +1+IX-YI 1+IXI 1+IYIFor this we need only verify that for every real x and y :(11) Ix + yl _1,1


4 .1 VARIOUS MODES OF CONVERGENCE 1 73Now if we replace cok, by k"Pcokf , where p > 0, then -7?{X,, > 01 = 1/k" -+0 so that X,, -+ 0 in pr ., but for each n, we have i (X„ P) = 1 . Consequentlylim,, -,, (` (IX„ - 01P) = 1 and X,, does not -* 0 in LP .Example 2 . Convergence a .e . does not imply convergence in LP .In (Jl/, .l, m) defineX" (w) =12", if WE 0,- ;n0, otherwise .Then ("- (IX,, JP) = 2"P/n -+ +oo for each p > 0, but X, --* 0 everywhere .The reader should not be misled by these somewhat "artificial" examplesto think that such counterexamples are rare or abnormal . Natural examplesabound in more advanced theory, such as that of stochastic processes, butthey are not as simple as those discussed above . Here are some culled fromlater chapters, tersely stated . In the symmetrical Bernoullian random walk{S,,, n > 1), let 1 {sn = 0} . Then limn 0 for every p > 0, butexists) = 0 because of intermittent return of S, to the origin (seeSec . 8 .3) . This kind of example can be formulated for any recurrent processsuch as a Brownian motion . On the other hand, if {~„ , n > 1) denotes therandom walk above stopped at 1, then 0 for all n but g{lim„, co ~„ _11 = 1 . The same holds for the martingale that consists in tossing a faircoin and doubling the stakes until the first head (see Sec . 9 .4) . Anotherstriking example is furnished by the simple Poisson process{N(t), t > 01 (seeSect . 5 .5) . If ~(t) = N(t)/t, then c (~(t)) = ? for all t > 0 ; but °J'{limao 0) =01 = 1 because for almost every co, N(t, w) = 0 for all sufficiently small valuesof t . The continuous parameter may be replaced by a sequence t, , . 0 .Finally, we mention another kind of convergence which is basic in functionalanalysis, but confine ourselves to L 1 . The sequence of r .v .'s {X„ 1 in L 1is said to converge weakly in L 1 to X iff for each bounded r .v . Y we havelim (,(X,,Y) = (XY), finite .n--00It is easy to see that X E L 1 and is unique up to equivalence by takingY = llxox-} if X' is another candidate . Clearly convergence in L 1 definedabove implies weak convergence ; hence the former is sometimes referred toas "strong" . On the other hand, Example 2 above shows that convergencea .e. does not imply weak convergence ; whereas Exercises 19 and 20 belowshow that weak convergence does not imply convergence in pr . (nor even indistribution ; see Sec . 4 .4) .


74 1 CONVERGENCE CONCEPTSEXERCISES1 . X„ -~ +oc a .e . if and only if VM > 0 : 91{X, < M i .o .) = 0 .2 . If 0 < X,, X, < X E L1 and X, -* X in pr ., then X, -~ X in L 1 .3. If X n -+ X, Y„ -* Y both in pr ., then X, ± Y„ -+ X + Y, X„ Y nX Y, all in pr.*4 . Let f be a bounded uniformly continuous function in Then X,0 in pr . implies (`if (X,,)) - f (0) . [Example :-af (x)_ 1XI1+1XIas in Theorem 4 .1 .5 .]5. Convergence in LP implies that in L' for r < p .6. If X, ->. X, Y, Y, both in LP, then X, ± Y n --* X ± Y in LP . IfX, --* X in LP and Y, -~ Y in Lq, where p > 1 and 1 / p + l 1q = 1, thenX n Y n -+ XY in L 1 .7. If X n X in pr. and X n -->. Y in pr ., then X = Y a .e .8 . If X, -+ X a .e . and µn and A are the p .m .'s of X, and X, it does notfollow that /.c„ (P) - µ(P) even for all intervals P .9. Give an example in which e(X n ) -)~ 0 but there does not exist anysubsequence Ink) - oc such that Xnk -* 0 in pr.* 10 . Let f be a continuous function on M. 1 . If X, -+ X in pr., thenf (X n ) - f (X) in pr . The result is false if f is merely Borel measurable .[HINT : Truncate f at +A for large A .]11 . The extended-valued r .v . X is said to be bounded in pr. iff for eachE > 0, there exists a finite M(E) such that T{IXI < M(E)} > 1 - E . Prove thatX is bounded in pr . if and only if it is finite a .e .12 . The sequence of extended-valued r.v . {X n } is said to be bounded inpr . iff supra W n I is bounded in pr . ; {Xn ) is said to diverge to +oo in pr . iff foreach M > 0 and E > 0 there exists a finite n o (M, c) such that if n > n o , thenJT { jX„ j > M } > 1 - E . Prove that if {X„ } diverges to +00 in pr . and {Y„ } isbounded in pr ., then {X n + Y„ } diverges to +00 in pr .13 . If sup„ X n = +oc a .e ., there need exist no subsequence {X nk } thatdiverges to +oo in pr .14. It is possible that for each co, lim nXn (co) = +oo, but there doesnot exist a subsequence {n k } and a set A of positive probability such thatlimk X nk (c)) = +00 on A . [HINT : On (//, A) define X n (co) according to thenth digit of co .]


4 .2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 75*15. Instead of the p in Theorem 4 .1 .5 one may define other metrics asfollows . Let Pi (X, Y) be the infimum of all e > 0 such that' (IX - YI > E) < E .Let p2 (X , Y) be the infimum of UT{ IX - Y I > E} + E over all c > 0. Provethat these are metrics and that convergence in pr . is equivalent to convergenceaccording to either metric .* 16 . Convergence in pr . for arbitrary r .v.'s may be reduced to that ofbounded r .v.'s by the transformationX' = arctan X .In a space of uniformly bounded r .v .'s, convergence in pr . is equivalent tothat in the metric po(X, Y) = (-P(IX - YI) ; this reduces to the definition givenin Exercise 8 of Sec . 3.2 when X and Y are indicators .17. Unlike convergence in pr ., convergence a .e. is not expressibleby means of metric . [r-nwr : Metric convergence has the property that ifp(x,,, x) -+* 0, then there exist c > 0 and ink) such that p(x,1 , x) > E forevery k .]18 . If X„ ~ X a .s ., each X, is integrable and inf„ (r(X„) > -oo, thenX„- XinL 1 .19. Let f„ (x) = 1 + cos 27rnx, f (x) = 1 in [0, 1] . Then for each g E L 1[0, 1 ] we have1I f rig dx0 10fg dx,but f „ does not converge to f in measure . [HINT : This is just theRiemann-Lebesgue lemma in Fourier series, but we have made f „ > 0 tostress a point .]20 . Let {X, } be a sequence of independent r .v.'s with zero mean andunit variance . Prove that for any bounded r .v . Y we have lim, f`(X„Y) = 0 .[HINT : Consider (1'71[Y - ~ _ 1 ~ (XkY)Xk] 2 } to get Bessel's inequality c(Y 2 ) >E k=l c`(Xk 7 ) 2 . The stated result extends to case where the X„ 's are assumedonly to be uniformly integrable (see Sec . 4 .5) but now we must approximateY by a function of (X 1 , . . . , X,,,), cf. Theorem 8 .1 .1 .]4.2 Almost sure convergence; Borel-Cantelli lemmaAn important concept in set theory is that of the "lim sup" and "lim inf" ofa sequence of sets . These notions can be defined for subsets of an arbitraryspace Q .


76 1 CONVERGENCE CONCEPTSDEFINITION . Let E n be any sequence of subsets of S2 ; we define0C 00 00 00lim sup En = n U En 5 lim infE n = U nEn .n m=1 n=m n m=1 n=mLet us observe at once that(1) lim inf E n = ( lim sup En )`nnso that in a sense one of the two notions suffices, but it is convenient toemploy both . The main properties of these sets will be given in the followingtwo propositions .(i) A point belongs to I'M sup, E n if and only if it belongs to infinitelymany terms of the sequence {E n , n > 1} . A point belongs to liminfn En ifand only if it belongs to all terms of the sequence from a certain term on .It follows in particular that both sets are independent of the enumeration ofthe E n 's .PROOF . We shall prove the first assertion . Since a point belongs toinfinitely many of the E n 's if and only if it does not belong to all En froma certain value of n on, the second assertion will follow from (1) . Now if wbelongs to infinitely many of the En 's, then it belongs tohence it belongs to00F - U E n for every m ;n=mn F M = lim sup E n .m=1 nConversely, if w belongs ton cc 1 Fm , then w E Fm for every m . Were co tobelong to only a finite number of the E n 's there would be an m such thatw V E,7 for n > m, so that00w¢ UE„=Fill .n=inThis contradiction proves that w must belong to an infinite number of the E n 's .In more intuitive language : the event lim sup, E n occurs if and only if theevents E,; occur infinitely often . Thus we may write/'(I'm sup E n ) = ^ En i .o . )11


4 .2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 77where the abbreviation "i .o ." stands for "infinitely often" . The advantage ofsuch a notation is better shown if we consider, for example, the events "IX,E" and the probability JT{IX n I > E i .o .} ; see Theorem 4 .2 .3 below .(ii) If each E n E :T, then we have(2) T(lim sup E n ) = lim ?-il U Ennm-+0000n=m(3) ?)'(1im inf E n ) = lim 91nm-+0000nn=mE nPROOF . Using the notation in the preceding proof, it is clear that F mdecreases as m increases . Hence by the monotone property of p .m .'s :00~' n F M = lim -6P' (Fm ),M-+00m=1which is (2) ; (3) is proved similarly or via (1) .Theorem 4 .2 .1 . We have for arbitrary events {E n } :(4) E 9'(E n ) < oo = J(E, i .o .) = 0 .nPROOF. By Boole's inequality for p .m .'s, we have00(Fm) < ), ~(Ejn=mHence the hypothesis in (4) implies that 7(Fm ) -+ 0, and the conclusion in(4) now follows by (2) .As an illustration of the convenience of the new notions, we may restateTheorem 4 .1 .1 as follows . The intuitive content of condition (5) below is thepoint being stressed here .Theorem 4 .2 .2. X„ -a 0 a .e . if and only if(5) YE > 0 : ?/1 1 IX, I > E i .o . } = 0 .PROOF . Using the notation A n , {IX, I E i .o .} = n U {IXnI > E) = n AM .,n=1 n=m m=1


78 1 CONVERGENCE CONCEPTSAccording to Theorem 4 .1 .1, X, - 0 a .e. if and only if for each E > 0,J'(A' ) -+ 0 as m -+ oo ; since Am decreases as m increases, this is equivalentto (5) as asserted .Theorem 4 .2 .3 . If X, -* X in pr ., then there exists a sequence ink) of integersincreasing to infinity such that Xnk -± X a .e . Briefly stated : convergencein pr. implies convergence a .e . along a subsequence .PROOF . We may suppose X - 0 as explained before . Then the hypothesismay be written astik > 0 : nlim~ ~ 1Xn I >ZkI = 0 .It follows that for each k we can find n ksuch thatand consequentlyGT C 1Xnk I >2k1- 2kk' (IxflkI> 2kkHaving so chosen Ink), we let Ek be the event "iX nk I > 1/2 k". Then we haveby (4) :1(IxlZkI> 2ki .o . = 0 .[Note : Here the index involved in "i .o ." is k; such ambiguity being harmlessand expedient .] This implies Xnk -* 0 a .e. (why?) finishing the proof .EXERCISES1. Prove thatJ'(lim sup E n ) > lim ?7 (E„ ),nnJ'(lim inf E n ) < lim J'(E„ ) .n 112. Let {B,, } be a countable collection of Borel sets in `'Z/ . If there existsa 8 > 0 such that m(Ba ) > 8 for every n, then there is at least one point in -ithat belongs to infinitely many Bn 's .3 . If {X n } converges to a finite limit a .e ., then for any c there existsM(E) < oc such that JT{sup IXn I < M(E)} > 1 - E .


4.2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 79*4 . For any sequence of r .v .'s {X„ } there exists a sequence of constants{A„ } such that X n /An -* 0 a.e .*5 . If {Xn } converges on the set C, then for any E > 0, there existsCo C C with 9'(C\Co) < E such that X„ converges uniformly in Co . [Thisis Egorov's theorem . We may suppose C = S2 and the limit to be zero . LetFmk= n,0 ° =k{( o : I X, (co) < 1/m} ; then Ym, 3k(m) such thatTake Co = fl 1 Fm, k(m) .]l(Fm,k(m)) > 1 - E/2m .6. Cauchy convergence of {X,} in pr . (or in LP) implies the existence ofan X (finite a.e .), such that X„ converges to X in pr . (or in LP) . [HINT: Choosen k so thatE ~P {Ixflk+lk1- Xnk, I > 2k < oo ;cf. Theorem 4 .2 .3 .]*7 . {X,) converges in pr . to X if and only if every subsequence{Xnk} contains a further subsequence that converges a .e. to X . [ENT: UseTheorem 4 .1 .5 .]8. Let {Xn, n > 1 } be any sequence of functions on Q to .* and letC denote the set of co for which the numerical sequence {Xn(co), n > 1}converges . Show thatwherec= 00 00 00 {ixn - Xn' I< m = n 00 00 00U n l \(m, n, n')m=1 n=1 n'=n+1 m=1 n=1 n'=n+1A(m,n,n')=w: max 1X j (w) - X k (w) In


80 1 CONVERGENCE CONCEPTSthen lim n ._, X„ exists a .e . but it may be infinite . [HINT : Consider all pairs ofrational numbers (a, b) and take a union over them .]Under the assumption of independence, Theorem 4 .2 .1complement .has a strikingTheorem 4 .2 .4 .If the events {En) are independent, then(6)PROOF.1:nBy (3) we haveJ7(En) = oc =:~ 1~6(E n i .o .) = l .(7)00?{lim inf En } = lim I n Enn my 00n=mThe events {E i } are independent as well as {E n }, by Exercise 3 ofSec . 3 .3 ; hence we have if m' > m :m' m' m'n En = 11 ~(Ec) = - 11(1~(En ))n=m n=m n=mNow for any x > 0, we have 1 - x < e -x;does not exceedit follows that the last term abovem'11 e -JP(E ^ ) = exp - )7 (E n )n=mn=mLetting in' --* oc, the right member above -* 0, since the series in theexponent ---> +oo by hypothesis . It follows by monotone property thatnccin'J' E; = lim %' n En = 0 .m -+ 00nn=mThus the right member in (7) is equal to 0, and consequently so is the leftmember in (7) . This is equivalent to :JT(En i .o .) = 1 by (1) .Theorems 4 .2 .1 and 4 .2.4 together will be referred to as theBorel-Cantelli lemma, the former the "convergence part" and the latter "thedivergence part" . The first is more useful since the events there may becompletely arbitrary . The second has an extension to pairwise independentr .v .'s ; although the result is of some interest, it is the method of proof to begiven below that is more important . It is a useful technique in probabilitytheory .m'


4 .2 ALMOST SURE CONVERGENCE ; BOREL-CANTELLI LEMMA 1 81Theorem 4 .2 .5. The implication (6) remains true if the events {E„ } are pairwiseindependent .PROOF . Let I„ denote the indicator of E,,, so that our present hypothesisbecomes(8)dm :A n : L (Im 1 n ) = 1(I .)(%'(I n ) .Consider the series of r .v .'s : >n °_ 1 I n (w) . It diverges to +oo if and only ifan infinite number of its terms are equal to one, namely if w belongs to aninfinite number of the En 's . Hence the conclusion in (6) is equivalent to(9)00What has been said so far is true for arbitrary En 's . Now the hypothesis in(6) may be written as00E AI 1 ) = +00 -n=1Consider the partial sum Jk = En=1 I n .have for every A > 0 :Using Chebyshev's inequality, we(10) {I Jk - 0 (Jk)I < A6(Jk)} > 1 -U 2 (Jk)A2U2(Jk)= 1 - 1A 2 'where or 2 (J) denotes the variance of J . WritingPit = ((I,) = ~~( En ),we may calculate a 2 (Jk) by using (8), as follows :kn=1In + 21


82 1 CONVERGENCE CONCEPTSIlcnceQ 2 (Jk) = c` (J2) - C`(Jk) 2 = (pn - p2) Q2 (In)n=1 n =1kThis calculation will turn out to be a particular case of a simple basic formula ;see (6) of Sec . 5 .1 . Since En =1 p n = (Jk) -+ 00, it follows thatk9(h) (Jk) 112 = o(~r(Jk))in the classical "o, 0" notation of analysis . Hence if k > k0 (A),1 1~' A > 2 (Jk) > 1 - A2(10) implies(where i may be replaced by any constant < 1) . Since A increases with kthe inequality above holds a fortiori when the first A there is replaced bylimk,~ A ; after that we can let k --+ oo in (Jk ) to obtain1{k A = +oo} > 1 - A2.Since the left member does not depend on A, and A is arbitrary, this impliesthat lim k, 0 Jk = + 00 a .e ., namely (9) .Corollary. If the events {En } are pairwise independent, then?-7-1 (lim sup En ) = 0 or 1naccording as E n A(En) < 00 or = cc .This is an example of a "zero-or-one" law to be discussed in Chapter 8,though it is not included in any of the general results there .EXERCISESBelow X, , Y„ are r .v .'s, E n events .11 . Give a trivial example of dependent {E n } satisfying the hypothesisbut not the conclusion of (6) ; give a less trivial example satisfying the hypothesisbut with ~6 (lim sup n E n ) = 0 . [HINT : Let E n be the event that a real numberin [0, 1] has its n-ary expansion begin with 0 .]*12 . Prove that the probability of convergence of a sequence of independentr .v .'s is equal to zero or one .13 . If {X,) is a sequence of independent and identically distributed r .v .'snot constant a .e ., then /'{X„ converges} = 0 .


4 .2 ALMOST SURE CONVERGENCE; BOREL-CANTELLI LEMMA 1 83*14 . If IX, j is a sequence of independent r .v.'s with d .f.'s {F n }, then%T{limn X„ = 0) = 1 if and only if vE > 0 : ~n {1 - Fn (E) + F„ (-E)} < oc .15 . If >n ~(IXnI > n) < oc, thenlim sup IX" I < 1 a .e .n n*16. Strengthen Theorem 4 .2.4 by proving thatlim Jn = 1 a .e .n-*oc (f(Jn )[HINT : Take a subsequence {k n } such that e(J nk ) k 2 ; prove the result firstfor this subsequence by estimating P I Jk - f(Jk) I > S (-r(Jk)) ; the generalcase follows because if n k < n < nk+l,Jnk/~P(Jnk+,) < Jnl L(Jn) 1 } > 0 -n-+00[This is due to Kochen and Stone . Truncate Xn at A to Y n with A so largethat C "(Y,) > 1 -,e for all n ; then apply Exercise 6 of Sec . 3 .2 .]*18 . Let {En} be events and {In} their indicators . Prove the inequalityn n 2 n 2(11) J' U Ek > E~ kk=1 k=1 k=1Deduce from this that if (i) En 31)(E,1) oo and (ii) there exists c > 0 suchthat we haveb'm < n : ~l'(E,nE„) < cJ''(Em)~'(En-m) ;then{lira sup En) > 0 .n19 . If En;'(En)= oo and11mnn nJ'(E jEk).j=1 k=1nk=1~'(Ek) = 1,}°then I'{limsup, E n } = 1 . [HINT : Use (11) above .]


84 1 CONVERGENCE CONCEPTS20. Let {En } be arbitrary events satisfying(i) lim /'(En) = 0, (ii) 'T(EnEi +1 ) < 00 ;nnthen ~fllim sup„ E„ } = 0 . [This is due to Barndorff-Nielsen .]4.3 Vague convergenceIf a sequence of r .v .'s {X n } tends to a limit, the corresponding sequence ofp.m .'s {,u } ought to tend to a limit in some sense . Is it true that lim n µ,, (A)exists for all A E A 1 or at least for all intervals A? The answer is no fromtrivial examples .Example 1 . Let X„ = c„ where the c„ 's are constants tending to zero . Then X„ - -* 0deterministically . For any interval I such that 0 0 I, where I is the closure of I, wehave limo µ o (I) = 0 = µ(I) ; for any interval such that 0 E I°, where I° is the interiorof I, we have limo µ„ (I) = 1 = µ(I) . But if [Co) oscillates between strictly positiveand strictly negative values, and I = (a, 0) or (0, b), where a < 0 < b, then µ o (I )oscillates between 0 and 1, while µ(I) = 0 . On the other hand, if I = (a, 0] or [0, b),then µ o (I) oscillates as before but µ (I) = 1 . Observe that {0} is the sole atom of µand it is the root of the trouble .Instead of the point masses concentrated at c o , we may consider, e .g ., r .v .'s {X,}having uniform distributions over intervals (c,, c ;,) where c, < 0 < c', and c, -+ 0,c„ -* 0 . Then again X„ -+ 0 a.e . but ,a, ((a, 0)) may not converge at all, or convergeto any number between 0 and 1 .Next, even if {µ„ } does converge in some weaker sense, is the limit necessarilya p.m .? The answer is again no .Example 2 . Let X„ = c„ where c„ --+ +oo . Then X o -+ +oo deterministically .According to our definition of a r.v ., the constant +oo indeed qualifies . But forany finite interval (a, b) we have limo µ„ ((a, b)) = 0, so any limiting measure mustalso be identically zero . This example can be easily ramified ; e .g . let a„ --* -oc,b„ -+ +oo anda„ with probability a,X„ = 0 with probability 1 - a - ~,b„ with probability fi .Then X„ -). X whereFor any finite intervalX =+oo with probability a,0with probability 1 - a - ~,-oo with probability fi .(a, b) containing 0 we havelim µ o ((a, b)) = lim µ„ ({0}) = 1 - a - fl .


4 .3 VAGUE CONVERGENCE 1 85In this situation it is said that masses of amount a and ,B "have wandered off to +ooand -oc respectively ." The remedy here is obvious: we should consider measures onthe extended line R* = [-oo, +00], with possible atoms at {+oo} and {-00} . Weleave this to the reader but proceed to give the appropriate definitions which take intoaccount the two kinds of troubles discussed above .DEFINITION . A measure g on (LI, A I ) with g( I ) < 1 will be called asubprobability measure (s .p .m .) .DEFINITION OF VAGUE CONVERGENCE . A sequence {g n , n > 1} of s .p .m .'s issaid to converge vaguely to an s .p.m . g iff there exists a dense subset D ofR I such that(1) Va E D, b E D, a < b : An ((a, b]) -± /,t ((a, b]) .This will be denoted by(2) An v gand g is called the vague limit of [An)-We shall see that it is unique below .For brevity's sake, we will write g((a, b]) as it (a, b] below, and similarlyfor other kinds of intervals . An interval (a, b) is called a continuity intervalof g iff neither a nor b is an atom of g ; in other words iff p. (a, b) = It [a, b] .As a notational convention, g(a, b) = 0 when a > b .Theorem 4 .3 .1 . Let 111n) and g be s .p.m .'s . The following propositions areequivalent .(i) For every finite interval (a, b) and E > 0, there exists an n o (a, b, c)such that if n > no, then(3) g(a+E,b-E)-E < gn(a,b) < g(a-E,b+E)+E .Here and hereafter the first term is interpreted as 0 if a + E > b - E .(ii) For every continuity interval (a, b] of g, we havev(iii) An -* gAn (a, b] g(a, b] .PROOF. To prove that (i) = (ii), let (a, b) be a continuity interval of g .It follows from the monotone property of a measure thatE o g(a + E, b - E) = /j, (a, b) = [a, b] = limo /-t (a - E, b + E) .Letting n -+ oc, then c 4, 0 in (3), we haveA(a, b) < lim An (a, b)n< l nm gn [a, b] < g[a, b] = A(a, b),


86 1 CONVERGENCE CONCEPTSwhich proves (ii) ; indeed (a, b] may be replaced by (a, b) or [a, b) or [a, b]there . Next, since the set of atoms of µ is countable, the complementary setD is certainly dense in X4 1 . If a E D, b E D, then (a, b) is a continuity intervalof µ . This proves (ii) = (iii) . Finally, suppose (iii) is true so that (1) holds .Given any (a, b) and E > 0, there exist a l , a2, b l , b2 all in D satisfyinga-E


4 .3 VAGUE CONVERGENCE 1 87PROOF . The employment of two numbers 6 and c instead of one c as in (3)is an apparent extension only (why?) . Since (i') : (i), the preceding theoremyields at once (i') = (ii) q (iii) . It remains to prove (ii) = (i') when the µ„'sand µ are p.m .'s . Let A denote the set of atoms of µ . Then there exist aninteger £ and a j E A`, 1 < j < £, satisfying :andaj


88 1 CONVERGENCE CONCEPTSwhich states : The set of all s .p.m .'s is sequentially compact with respect tovague convergence . It is often referred to as "Helly's extraction (or selection)principle" .Theorem 4.3 .3 . Given any sequence of s .p.m .'s, there is a subsequence thatconverges vaguely to an s .p.m .PROOF . Here it is convenient to consider the subdistribution function(s .d .f.) F„ defined as follows :Vx : F n (X) = µn (- 00, X] .If , , is a p.m ., then F n is just its d .f. (see Sec . 2 .2) ; in general F n is increasing,right continuous with F n (-oo) = 0 and Fn (+oo) = An (9 1 ) < 1 .Let D be a countable dense set of R 1 , and let irk, k > 1) be anenumeration of it . The sequence of numbers {F n (rl ), n > 1} is bounded, henceby the Bolzano-Weierstrass theorem there is a subsequence {F lk , k > 1} ofthe given sequence such that the limitlim Flk(rl) = fik-+ o0exists ; clearly 0 < L 1 < 1 . Next, the sequence of numbers {F lk (r2 ), k > 1 } isbounded, hence there is a subsequence {F 2k , k > 1} of {F l k, k > 1} such thatlim F2k(r2) = £2k--*00where 0 < £ 2 < 1 . Since {F 2k } is a subsequence of {FU}, it converges also atrl to f l . Continuing, we obtainF11, F12, . . . , Flk, . . . converging at rl ;F21, F22, . . . , F2k, . . . converging at r l , r2 ;. . . . . . . . . . . . . . . . . . . .Fj 1, Fi2, . . . , Fkk, . . . converging at rl, r2, . . . , rj ;. . . . . . . . . . . . . . . . . . . .Now consider the diagonal sequence {Fkk, k > 11 . We assert that it convergesat every rj , j > 1 . To see this let r j be given . Apart from the first j - 1 terms,the sequence {Fkk, k > 1) is a subsequence of {Fjk, k > 11, which convergesat rj and hence limky 00 Fkk (rj ) = f j, as desired .We have thus proved the existence of an infinite subsequence Ink) and afunction G defined and increasing on D such thatYr E D : lim F nk (r) = G(r) .k a o0From G we define a function F on %il as follows :Yx E ?Rl : F (x) = inf G (r) .x


4 .3 VAGUE CONVERGENCE ( 89By Sec . 1 .1, (vii), F is increasing and right continuous . Let C denote the setof its points of continuity ; C is dense in ?4 1 and we show that(8)Vx E C : lim F„ k (x) = F(x) .k+ ocFor, let x E C and c > 0 be given, there exist r, r', and r" in D such thatr < r' < x < r" and F(r") - F(r) < E . Then we haveF(r)


90 1 CONVERGENCE CONCEPTSinfinity such that the numbers An, (a, b) converge to a limit, say L :A g(a, b) .By Theorem 4 .3 .3, the sequence {g,,, k > 1} contains a subsequence, sayLu n t I k > 11, which converges vaguely, hence to g by hypothesis of thetheorem. Hence again by Theorem 4 .3 .1, (ii), we haveA n , (a, b) -+ A(a, b) .But the left side also -f L, which is a contradiction .EXERCISES1. Perhaps the most logical approach to vague convergence is as follows .The sequence {An, n > 1 } of s .p.m .'s is said to converge vaguely iff thereexists a dense subset D of £ such that for every a E D, b E D, a < b, thesequence (An (a, b), n > 1 } converges . The definition given before implies this,of course, but prove the converse .2 . Prove that if (1) is true, then there exists a dense set D', such thatAn (I) -+ g(I) where I may be any of the four intervals (a, b), (a, b], [a, b),[a, b] with a E D', b c D' .3. Can a sequence of absolutely continuous p .m .'s converge vaguely toa discrete p.m .? Can a sequence of discrete p .m .'s converge vaguely to anabsolutely continuous p.m.?4. If a sequence of p .m .'s converges vaguely to an atomless p.m ., thenthe convergence is uniform for all intervals, finite or infinite . (This is due toPolya .)5. Let {fn } be a sequence of functions increasing in R1 and uniformlybounded there : Supra,,, I fn WI < M < oo . Prove that there exists an increasingfunction f on 91 and a subsequence {n k } such that f nk (x) -* f (x) for everyx. (This is a form of Theorem 4 .3 .3 frequently given ; the insistence on "everyx" requires an additional argument .)6. Let {g n } be a sequence of finite measures on /31 . It is said to convergevaguely to a measure g iff (1) holds . The limit g is not necessarily a finitemeasure . But if It, (M ) is bounded in n, then g is finite .7 . If 9'n is a sequence of p .m.'s on (Q, 3T) such that ,/'n (E) convergesfor every E E ~7, then the limit is a p .m . ~' . Furthermore, if f is boundedand 9,T-measurable, thenI f d .~'n -- ' f dJ' .J s~ ~(The first assertion is the Vitali-Hahn-Saks theorem and rather deep, but itcan be proved by reducing it to a problem of summability ; see A. Renyi, [24] .


4.4 CONTINUATION 1 918. If g„ and g are p .m .'s and g„ (E) -- u(E) for every open set E, thenthis is also true for every Borel set . [HINT: Use (7) of Sec . 2 .2 .]9. Prove a convergence theorem in metric space that will include bothTheorem 4 .3 .3 for p .m .'s and the analogue for real numbers given before thetheorem . [IIIIVT : Use Exercise 9 of Sec . 4 .4 .]4 .4 ContinuationWe proceed to discuss another kind of criterion, which is becoming ever morepopular in measure theory as well as functional analysis . This has to do withclasses of continuous functions on R1 .CK = the class of continuous functions f each vanishing outside acompact set K(f ) ;Co = the class of continuous functions f such thatlima, f (x) = 0 ;CB = the class of bounded continuous functions ;C = the class of continuous functions .We have CK C Co C CB C C . It is well known that Co is the closure of CKwith respect to uniform convergence .An arbitrary function f defined on an arbitrary space is said to havesupport in a subset S of the space iff it vanishes outside S . Thus if f E CK,then it has support in a certain compact set, hence also in a certain compactinterval . A step function on a finite or infinite interval (a, b) is one with supportin it such that f (x) = cj for x E (aj, aj+i) for 1 < j < £, where £ is finite,a = al < . . . < at = b, and the cg's are arbitrary real numbers . It will becalled D-valued iff all the ad's and cg's belong to a given set D . When theinterval (a, b) is M1, f is called just a step function . Note that the values off at the points aj are left unspecified to allow for flexibility ; frequently theyare defined by right or left continuity . The following lemma is basic .Approximation Lemma. Suppose that f E CK has support in the compactinterval [a, b] . Given any dense subset A of °41 and e > 0, there exists anA-valued step function f E on (a, b) such that(1) sup I f (x) - f E (x) I < E .x E.RIf f E C0, the same is true if (a, b) is replaced by M1This lemma becomes obvious as soon as its geometric meaning is grasped .In fact, for any f in CK, one may even require that either f E < f or f E > f .


92 1 CONVERGENCE CONCEPTSThe problem is then that of the approximation of the graph of a plane curveby inscribed or circumscribed polygons, as treated in elementary calculus . Butlet us remark that the lemma is also a particular case of the Stone-Weierstrasstheorem (see, e .g ., Rudin [2]) and should be so verified by the reader . Sucha sledgehammer approach has its merit, as other kinds of approximationsoon to be needed can also be subsumed under the same theorem . Indeed,the discussion in this section is meant in part to introduce some modernterminology to the relevant applications in probability theory . We can nowstate the following alternative criterion for vague convergence .Theorem 4 .4 .1 . Let [/-t,) and µ be s .p .m .'s . Then µ n - µ if and only if(2) Vf E CK[or Co] : f~ f (x)µn (dx) f (x)µ (dx) .PROOF . Suppose µ n - lc ; (2) is true by definition when f is the indicatorof (a, b] for a E D, b E D, where D is the set in (1) of Sec . 4 .3 . Hence by thelinearity of integrals it is also true when f is any D-valued step function . Nowlet f E C o and c > 0 ; by the approximation lemma there exists a D-valuedstep function f E satisfying (1) . We have(3) ffdll- n ffdµ


4 .4 CONTINUATION 1 93that is linear in (a - 6, a) and (b, b + 6) . It is clear that g, < g < 92 < 91 + 1and consequently(4) fgidn ld._ f g d A n _< f 92 dl-t,(5) fgid g< f g dg < f gd . 2 gSince gi E CK , 92 E CK, it follows from (2) that the extremeconverge to the corresponding terms in (5) . Sinceterms in (4)f 92 dg - f gl dg < fU1dµ - µ(U) < E,and E is arbitrary, it follows that the middle term in (4) also converges to thatin (5), proving the assertion .Corollary . If {g„ } is a sequence of s .p.m .'s such that for every f E CK,exists, then 1g, j converges vaguely .limff (x)g, (dx)n ~,For by Theorem 4 .3 .3 a subsequence converges vaguely, say to g . ByTheorem 4 .4 .1, the limit above is equal to f,,,,, f (x)g (dx) . This must thenbe the same for every vaguely convergent subsequence, according to thehypothesis of the corollary. The vague limit of every such sequence istherefore uniquely determined (why?) to be g, and the corollary follows fromTheorem 4 .3 .4 .Theorem 4 .4 .2 . Let {g n } and g be p.m .'s . Then gn -v,. A if and only if(6) Vf E CB : f f (x) A, (dx) -a f f (x) A (A) .fIPROOF . Suppose g„ -v, g . Given E > 0, there exist a and b in D such that(7) lt((a, b] c ) = 1 - g((a, b]) < E .It follows from vague convergence that there exists n0(E) such that ifn > n0(E), then(8) pt, ((a, b]`) = 1 - An ((a, b]) < e .Let f E C B be given and suppose that I f I < M < oc . Consider the functionf E , which is equal to f on [a, b], to zero on (-oo, a - 1) U (b + 1, oc),


94 1 CONVERGENCE CONCEPTSand which is linear in [a - 1, a) and in (b, b + I] . Then f E E Ch andf - f E < 2M . We have by Theorem 4 .4 .1(9) / fEdlit„ -~ fEdA .On the other hand, we have( 10 ) f If - f E I d A, 2M d it, < 2ME~~1 < I (a bhby (8) . A similar estimate holds with µ replacing µ„ above, by (7) . Now theargument leading from (3) to (2) finishes the proof of (6) in the same way .This proves that lt„ -v+ g implies (6) ; the converse has already been provedin Theorem 4 .4 .1 .Theorems 4 .3.3 and 4 .3 .4 deal with s .p.m .'s .Even if the given sequence{µ„ } consists only of strict p.m .'s, the sequential vague limit may not be so .This is the sense of Example 2 in Sec . 4 .3 . It is sometimes demanded thatsuch a limit be a p.m . The following criterion is not deep, but applicable .Theorem 4 .4 .3 . Let a family of p .m .'s {µ a , a E A) be given on an arbitraryindex set A . In order that every sequence of them contains a subsequencewhich converges vaguely to a p.m ., it is necessary and sufficient that thefollowing condition be satisfied : for any E > 0, there exists a finite interval Isuch that(11) inf ha (I) > 1 - E .aEAPROOF . Suppose (11) holds . For any sequence {,a } from the family, thereexists a subsequence {,u ;t } such that µn ~ it . We show that µ is a p.m . Let Jbe a continuity interval of µ which contains the I in (11) . Thenµ(L' )> lt(J) = lim lt„ (J) > lim m1(I)11Since E is arbitrary, lc (%T ~) = 1 . Conversely, suppose the condition involving(11) is not satisfied, then there exists c > 0, a sequence of finite intervals I„increasing to ~W l , and a sequence {,u,,, } from the family such thatVn : p,, (I„) < 1 - E .Let {µn } and It be as before and J any continuity interval of A . Then J C I„for all sufficiently large n, so thatµ(J) s=limµ„(J) < lim -i (I„)


4.4 CONTINUATION 1 95A family of p .m .'s satisfying the condition above involving (11) is said tobe tight . The preceding theorem can be stated as follows : a family of p .m .'s isrelatively compact if and only if it is tight . The word "relatively" purports thatthe limit need not belong to the family ; the word "compact" is an abbreviationof "sequentially vaguely convergent to a strict p.m ." Extension of the resultto p .m .'s in more general topological spaces is straight-forward but plays animportant role in the convergence of stochastic processes .The new definition of vague convergence in Theorem 4 .4 .1 has theadvantage over the older ones in that it can be at once carried over to measuresin more general topological spaces . There is no substitute for "intervals" insuch a space but the classes C K, C o and C B are readily available . We willillustrate the general approach by indicating one more result in this direction .Recall the notion of a lower semicontinuous function on R1 defined by :(12) /x E M1 : f (x) < lim f (y) .Y_ XThere are several equivalent definitions (see, e .g ., Royden [5]) butthe following characterization is most useful : f is bounded and lowersemicontinuous if and only if there exists a sequence of functions f k E C Bwhich increases to f everywhere, and we call f upper semicontinuous iff -fis lower semicontinuous . Usually f is allowed to be extended-valued ; but toavoid complications we will deal with bounded functions only and denoteby L and U respectively the classes of bounded lower semicontinuous andbounded upper semicontinuous functions .Theorem 4 .4 .4 . If {µ n } and µ are p.m .'s, then µn 4 µ if and only if oneof the two conditions below is satisfied :(13) V f E L : l ttm f ,f (x)µ 11 (dx) >ff (x)µ (dx)y54xVg E U : li1m Jg(x)µ n (dx) < Jg(x)µ (A) .PROOF . We begin by observing that the two conditions above are equivalentby putting f = -g. Now suppose µ„ 4 µ and let f k E CB, f k T f .Then we have(14) l „m [f(x)jin (dx) > lim f f k (x)lin (dx) = f f k (x)µ (dx)by Theorem 4 .4 .2. Letting k -* oc the last integral above converges tof f (x)µ(dx) by monotone convergence . This gives the first inequality in (13) .Conversely, suppose the latter is satisfied and let co c CB, then ~p belongs to


merely X a convenient turn of speech96 1 CONVERGENCE CONCEPTSboth L and U, so that~o(dx)f (P(x)µ (dx) < lim n I cp(x)µn (dx) < n lien J (p[(x)iLn (dx) < [(x)ṗwhich proveslinen f cp(x)µn (dx) _ f (p(x) . (dx) .Hence µ„ - p. by Theorem 4 .4 .2 .Remark . (13) remains true if, e .g ., f is lower semicontinuous, with +ooas a possible value but bounded below .Corollary. The conditions in (13) may be replaced by the following :We leave this as an exercise .for every open 0 : limit, (0) > µ(O) ;nfor every closed C : lien µ, (C) < µ(C) .nFinally, we return to the connection between the convergence of r .v .'sand that of their distributions .DEFINITION OF CONVERGENCE "IN DISTRIBUTION" (in dist .) . A sequence ofr .v.'s {Xn } is said to converge in distribution to F iff the sequence [Fn)of corresponding d .f.'s converges vaguely to the d .f. F .If X is an r .v . that has the d .f. F, then by an abuse of language we shallalso say that {X, } converges in dist . to.Theorem 4 .4 .5 . Let IF, J, F be the d .f .'s of the r .v .'s {Xn }, X . If Xn -+ X inpr., then Fn - F . More briefly stated, convergence in pr . implies convergencein dist .PROOF . If Xn -+ X in pr ., then for each f E CK, we have f (X,) -* f (X)in pr . as easily seen from the uniform continuity of f (actually this is truefor any continuous f, see Exercise 10 of Sec . 4 .1). Since f is bounded theconvergence holds also in LI by Theorem 4 .1 .4 . It follows thatf {f (X,,)) -+ (t{f M),which is just the relation (2) in another guise, hence µ, 4 µ .Convergence of r .v .'s in d ist . i s; itdoes not have the usual properties associated with convergence . For instance,


4 .4 CONTINUATION 1 97if X„ -+ X in dist. and Y, -* Y in dist ., it does not follow by any meansthat X„ + Y, will converge in dist . to X + Y . This is in contrast to the trueconvergence concepts discussed before ; cf. Exercises 3 and 6 of Sec . 4 .1 . Butif X, and Y, are independent, then the preceding assertion is indeed true as aproperty of the convergence of convolutions of distributions (see Chapter 6) .However, in the simple situation of the next theorem no independence assumptionis needed . The result is useful in dealing with limit distributions in thepresence of nuisance terms .Theorem 4 .4 .6 . If X, --* X in dist, and Y„ --* 0 in dist ., then(a) X, + Y, -* X in dist .(b) X„ Y n --* 0 in dist .PROOF. We begin with the remark that for any constant c, Y, --* c in dist .i s equivalent to Y„ - c in pr. (Exercise 4 below) . To prove (a), let f E CK ,I f I < M . Since f is uniformly continuous, given e > 0 there exists 8 suchthat Ix - yI < 8 implies I f (x) - f (y) I < E . Hence we have{If(Xn+Yn) - f(XI)I}< Ef{I f (X,, + Y,1) - f(X,,)I < E} I +2WP7 {I f(X,, + Y,) - f(Xn)I > E}< E+2MJ {IYnI > 8) .The last-written probability tends to zero asn -- oo ; it follows thatlira ("If (X n + Y n )} = lim c {f (X")'= c {f (X )l1100 11-*00by Theorem 4 .4 .1, and (a) follows by the same theorem .To prove (b), for given e > 0 we choose Ao so that both ±A0 are pointsof continuity of the d .f. of X, and so large thatlim 9~'{IX I > A 0 } = ~fl IXI > A o } < E .n-*ooThis means %~'{IX„ I > Ao} < E for n > n o (-e) . Furthermore we choose A > A0so that the same inequality holds also for n < no(E) . Now it is clear that;-~'{lXnYnI > E} < ~ ,1 {IX,I >A}+ ?/'{IYnI > A1


98 I CONVERGENCE CONCEPTSEXERCISES*1. Let g, and A be p.m .'s such that An A . Show that the conclusionin (2) need not hold if (a) f is bounded and Borel measurable and all Anand A are absolutely continuous, or (b) f is continuous except at one pointand every A„ is absolutely continuous . (To find even sharper counterexampleswould not be too easy, in view of Exercise 10 of Sec . 4 .5 .)2 . Let A, ~ A when the ,a 's are s .p.m .'s . Then for each f E C andeach finite continuity interval I we have fl f dA„ -). rt f dµ .*3 . Let it, and It be as in Exercise 1 . If the f„ 's are bounded continuousfunctions converging uniformly to f, then f f „ dl-t,, -* f f d A .*4 . Give an example to show that convergence in dist . does not imply thatin pr. However, show that convergence to the unit mass 8Q does imply that inpr. to the constant a .5. A set {µa } of p.m .'s is tight if and only if the corresponding d .f.'s{F a } converge uniformly in a as x --> -oc and as x -+ +oo .6. Let the r .v .'s {X a } have the p .m .'s {A a } . If for some real r > 0,{ I Xa I'} is bounded in a, then {A a } is tight .7. Prove the Corollary to Theorem 4 .4 .4 .8 . If the r .v.'s X and Y satisfyJ'{ IX - Y I> E} < Efor some E, then their d .f.'s F and G satisfying the inequalities :(15) Vx E Z I : F(x - E) - E < G(x) < F(x + E) + E .Derive another proof of Theorem 4 .4 .5 from this .*9. The Levy distance of two s .d .f.'s F and G is defined to be the infimumof all E > 0 satisfying the inequalities in (15) . Prove that this is indeed a metricin the space of s .d .f.'s, and that F„ converges to F in this metric if and onlyif F n --'*' Fand fxdF„~ f~dF .10 . Find two sequences of p .m .'s (,a,,) and {v„ } such thatbut for no finite (a, b) is it true thatVfECK :J fdA„- / fdv„>0 ;An (a, b) - v„ (a, b) - 0 .[HINT : Let A„ = 8,,, v„ = &,, and choose {r„}, {s„} suitably .]11 . Let {An } be a sequence of p.m .'s such that for each f E CB, thesequence f ,, f dA„ converges ; then An 4 A, where A is a p.m . [HINT: If the


4 .5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 99hypothesis is strengthened to include every f in C, and convergence of realnumbers is interpreted as usual as convergence to a finite limit, the result iseasy by taking an f going to oo . In general one may proceed by contradictionusing an f that oscillates at infinity .]*12 . Let F n and F be d .f.'s such that F n 4 F . Define G n (9) and G(0)as in Exercise 4 of Sec . 3 .1 . Then G,(0) - G(9) in a .e . [HINT : Do this firstwhen F n and F are continuous and strictly increasing . The general case isobtained by smoothing Fn and F by convoluting with a uniform distributionin [-6, +8] and letting 8 ~ 0 ; see the end of Sec . 6 .1 below .]4.5 Uniform integrability; convergence of momentsThe function IxI', r > 0, is in C but not in CB, hence Theorem 4 .4 .2 doesnot apply to it. Indeed, we have seen in Example 2 of Sec . 4 .1 that evenconvergence a .e. does not imply convergence of any moment of order r > 0 .For, given r, a slight modification of that example will yield X„ - X a .e .,(IXn ~') = 1 but


100 1 CONVERGENCE CONCEPTSTheorem 4 .5 .2 . If {X n } converges in d ist . to X, and for some p > 0,sup,, cc'{ IX„ I P } = M < oo, then for each r < p :Jimr) 00(2)c (IX, ( r ) = U~(IX A ;(-A)r, if x < -A .Then fA E CB , hence by Theorem 4 .4.4 the "truncated moments" converge :Next we have00 00f fA(x)dFn(x) f fA (x)dF(x) .00f IfA(x) - x r IdFn(x) < fIxl r dF,(x)=JIX n I r dJ~00 ~xI>A jX„ I>A1< Ap_r IXnI P d~' < MS2AP-rThe last term does not depend on n, and converges to zero as A -* 00 . Itfollows that as A -+ oo, f 00 fA dF n converges uniformly in n to f 0 xr dF .Hence by a standard theorem on the inversion of repeated limits, we havexf 00 fAdF= f 00 fAdF fl_ 00 A-*0O 00 A-*o0 n-*00 I .,(4) x r dF = lim lim limf00 00= lim lim fA dF n = lim / xr dF n .n oo A-* 00/l -_, 0000We now introduce the concept of uniform integrability, which is of basicimportance in this connection . It is also an essential hypothesis in certainconvergence questions arising in the theory of martingales (to be treated inChapter 9) .DEFINITION OF UNIFORM INTEGRABILITY . A family of r .v .'s {X,}, t E T,where T is an arbitrary index set, is said to be uniformly integrable iff(5) limIXtId :~6 = 0A->o0 1X,1>Auniformly in t E T .00


4 .5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 101Theorem 4 .5 .3 . The family {X t } is uniformly integrable if and only if thefollowing two conditions are satisfied :(a) ~(IX, I) is bounded in t E T ;(b) For every c > 0, there exists8(e) > 0 such that for any E E~P(E) < 8(e) ==;~ f IXt I dJ' < F for every t E T .EPROOF . Clearly (5) implies (a) . Next, let E E ~T and write E t for the set{w : IXt (w) I > A) . We have by the mean value theoremfIXtI dGP' = f + f IXtI dGJ'E EriE, E\E, < fEl E,dg'+AJ'(E) .Given c > 0, there exists A = A(e) such that the last-written integral is lessthan c/2 for every t, by (5) . Hence (b) will follow if we set 8 = E/2A . Thus(5) implies (b) .Conversely, suppose that (a) and (b) are true . Then by the Chebyshevinequality we have for every t,lfl IXtI>A) < AIXt1) < MA - A'where M is the bound indicated in (a) . Hence if A > M/8, then GJ'(E t ) < 8and we have by (b) :Thus (5) is true .fE,IXt I dGJ' < E .Theorem 4 .5 .4 . Let 0 < r < oo, X, E Lr, and X„ -+ X in pr . Then thefollowing three propositions are equivalent :(i) {IX n I r} is uniformly integrable ;(ii) X n -* X in Lr ;(iii) (IXnI r ) (IXIr) < 00 .PROOF . Suppose (i) is true ; since X, -~ X in pr . by Theorem 4 .2 .3, thereexists a subsequence ink) such that Xfk -+ X a .e. By Theorem 4 .5 .1 and (a)above, X E Lr . The easy inequalityIXn -XIr < 2r {IXnI r + (XI r ),valid for all r > 0, together with Theorem 4 .5 .3 then implies that the sequencefixn - XIr } is also uniformly integrable . For each c > 0, we have


102 1 CONVERGENCE CONCEPTS(6) f IXn _XIrd=f!I' IXn -xlrd~T+ IX n -XI'dJ'2 X„-XI>E /I X, -XIESince J/'{IX n - X I > E} - 0 as n -+ oo by hypothesis, it follows from (b)above that the last written integral in (6) also tends to zero . This being truefor every E > 0, (ii) follows .Suppose (ii) is true, then we have (iii) by the second assertion ofTheorem 4 .5 .1 .Finally suppose (iii) is true . To prove (i), let A > 0 and consider a functionfA in CK satisfying the following conditions :fA(x)= IxlrjX jr A + 1 ;cf. the proof of Theorem 4 .4 .1 for the construction of such a function . Hencewe havelimIX, I r d-~711 > lim (-r{fA(Xn)} _ ~~{fA(X)} > IXl r dg,n-*oo IX,lrA+1fIx" Ir d 7 AThe last integral does not depend on n and converges to zero as A -+ oo . Thismeans : for any c > 0, there exists Ao = Ao(E) and no = no(Ao(E)) such thatwe havesupfixn>no , V r>A+1IX n I" d °T < Eprovided that A > Ao . Since each IX n Ir is integrable, there exists A 1 = A 1 (E)such that the supremum above may be taken over all n > 1 provided thatA > Ao v A, . This establishes (i), and completes the proof of the theorem .In the remainder of this section the term "moment" will be restricted to amoment of positive integral order . It is well known (see Exercise 5 of Sec . 6 .6)that on (?/, A) any p.m. or equivalently its d .f. is uniquely determined byits moments of all orders . Precisely, if F 1 and F 2 are two d .f.'s such that


4.5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 103F i (0) = 0, F ; (1) = 1for i = l, 2 ; and1 1do > l :f x" dF 1 (x) = 0 I0x" dF2(x),then F1 - F2 . The corresponding result is false in (0, J3 1 ) and a furthercondition on the moments is required to ensure uniqueness . The sufficientcondition due to Carleman is as follows :O0 11/2rr=1 m2rwhere Mr denotes the moment of order r . When a given sequence of numbers{m r , r > 1) uniquely determines a d .f. F such that(7) Mr =Jxr dF(x),we say that "the moment problem is determinate" for the sequence . Of coursean arbitrary sequence of numbers need not be the sequence of moments forany d .f. ; a necessary but far from sufficient condition, for example, is thatthe Liapounov inequality (Sec . 3 .2) be satisfied . We shall not go into thesequestions here but shall content ourselves with the useful result below, whichis often referred to as the "method of moments" ; see also Theorem 6 .4 .5 .Theorem 4 .5 .5 . Suppose there is a unique d .f. F with the moments {m ( r ) , r >11, all finite . Suppose that {F„ } is a sequence of d .f.'s, each of which has allits moments finite :00In nr) = x r dF„ .Finally, suppose that for every r > 1 :(8) lim m nr) = mf r > .ri-00Then F ,1 --v+ F .PROOF .Let A,, be the p .m . corresponding to F,, . By Theorem 4 .3 .3 thereexists a subsequence of {µ„ } that converges vaguely . Let {µ„ A } be any subsequenceconverging vaguely to some µ . We shall show that µ is indeed ap.m . with the d .f. F . By the Chebyshev inequality, we have for each A > 0 :µ 11A (-A, +A) > 1 - A-2 mn~~ .Since m ;,2) - m (2) < 00, it follows that as A -+ oo, the left side convergesuniformly in k to one . Letting A - oo along a sequence of points such that


104 I CONVERGENCE CONCEPTSboth ::LA belong to the dense set D involved in the definition of vague convergence,we obtain as in (4) above :4 (m 1 ) = lim g (-A, +A) = lim lim /.t,, (-A, +A)A--ocA-+oo k-+oo= lim lim g nk (-A, +A) = lim g fk (A 1 ) = 1 .k-+oo A-), oo k-+o0Now for each r, let p be the next larger even integer . We haveXP dg, = m(np) --~ m (p) ,hence mnp ) is bounded in k . It follows from Theorem 4 .5 .2 thataog~ x r dg .f :xrdnk fJ 00But the left side also converges to m (r) by (8). Hence by the uniquenesshypothesis g is the p .m. determined by F . We have therefore proved that everyvaguely convergent subsequence of [An}, or equivalently {F n }, has the samelimit g, or equivalently F . Hence the theorem follows from Theorem 4 .3 .4 .EXERCISESI 1 . If SUP, IXn E LP and X n --* X a .e ., then X E LP and X n -a X in LP .2 . If {X,) is dominated by some Y in L", and converges in d ist . to X,then (-`(IXnV) C(IXIp) .3 . If X„ -* X in dist ., and f E C, then f (X„) - f (X) in dist .*4. Exercise 3 may be reduced to Exercise 10 of Sec . 4 .1 as follows . LetF n , 1 < n < cc, be d .f .'s such that Fn $ F . Let 0 be uniformly distributed on[0, 1 ] and put X„ = F- 1 (0), 1 < n < oo, where Fn 1 (y) = sup{x : F n (x) < y}(cf. Exercise 4 of Sec . 3 .1) . Then X n has d .f. F n and X, -+ X,,, in pr .5. Find the moments of the normal d .f. 4 and the positive normal d .f .(D + below :x() 1 e _,,'/2 dy, + (x) V n2 foe -yz/2d y, if x> 0 ;4) x = ~ =.,/27r -x0, if x < 0 .Show that in either case the moments satisfy Carleman's condition .6 . If {X, } and {Y, } are uniformly integrable, then so is {X, + Y 1 } and{Xt + Y t } .7 . If {X„ } is dominated by some Y in L 1 or if it is identically distributedwith finite mean, then it is uniformly integrable .


4 .5 UNIFORM INTEGRABILITY ; CONVERGENCE OF MOMENTS 1 105*8 . If sup„ (r(IX„ I p) < oc for some p > 1, then {X, } is uniformly integrable.9 . If {X,, } is uniformly integrable, then the sequenceis uniformly integrable .*10 . Suppose the distributions of {X„ , 1 < n < oc) are absolutely continuouswith densities {g n } such that g n -+ g,,,, in Lebesgue measure . Thengn -a g . in L' (-oo, oo), and consequently for every bounded Borel measurablefunction f we have fl f (X„ ) } - S { f (Xo,)} . [HINT : f (go, - g, ) + dx =f (g oo - g n )- dx and (g am - g,)+ < g am ; use dominated convergence .]


5 Law of large numbers .Random series5 .1 Simple limit theoremsThe various concepts of Chapter 4 will be applied to the so-called "law oflarge numbers" - a famous name in the theory of probability . This has to dowith partial sumsS,lof a sequence of r .v .'s . In the most classical formulation, the "weak" or the"strong" law of large numbers is said to hold for the sequence according as/7j=1(1) Sri - (1 ~ (Sll ) 0nin pr . or a .e. This, of course, presupposes the finiteness of (,`(S,,) .generalization is as follows :A naturalSilb„allwhere {a„ } is a sequence of real numbers and {b„ } a sequence of positivenumbers tending to infinity . We shall present several stages of the development,- -*


5 .1 SIMPLE LIMIT THEOREMS 1 107even though they overlap each other to some extent, for it is just as importantto learn the basic techniques as the results themselves .The simplest cases follow from Theorems 4 .1 .4 and 4 .2 .3, according towhich if Z n is any sequence of r .v .'s, then c`'(Z2) __+ 0 implies that Z, -+ 0in pr . and Znk -+ 0 a .e. for a subsequence Ink) . Applied to Z„ = S,, In, thefirst assertion becomes(2) ((S n) = o(n 2 ) Sn 0 in pr .nNow we can calculate f (Sn) more explicitly as follows :(3)X~ + 21_


108 1 LAW OF LARGE NUMBERS . RANDOM SERIESwithout it the definitions are hardly useful . Finally, it is obvious that pairwiseindependence implies uncorrelatedness, provided second moments are finite .If {X,, } is a sequence of uncorrelated r .v .'s, then the sequence {X„ -(, (X,,)) is orthogonal, and for the latter (3) reduces to the fundamental relationbelow :(6)U2 (Sn)nj= 12 (X j),which may be called the "additivity of the variance" . Conversely, the validityof (6) for n = 2 implies that X 1 and X2 are uncorrelated . There are only nterms on the right side of (6), hence if these are bounded by a fixed constantwe have now 62 (S n ) = 0(n) = o(n 2 ) . Thus (2) becomes applicable, and wehave proved the following result .Theorem 5 .1 .1 . If the X j 's are uncorrelated and their second moments havea common bound, then (1) is true in L 2 and hence also in pr .This simple theorem is actually due to Chebyshev, who invented hisfamous inequalities for its proof . The next result, due to Rajchman (1932),strengthens the conclusion by proving convergence a .e . This result is interestingby virtue of its simplicity, and serves well to introduce an importantmethod, that of taking subsequences .Theorem 5 .1 .2 .also a .e .Under the same hypotheses as in Theorem 5 .1 .1, (1) holdsPROOF . Without loss of generality we may suppose that (F (X j ) = 0 foreach j, so that the X j 's are orthogonal . We have by (6) :(t (S2) < Mn,where M is a bound for the second moments . It follows by Chebyshev'sinequality that for each c > 0 we haveMnM~{IS,I > nE} < - 2E2 = nE2 .If we sum this over n, the resulting series on the right diverges . However, ifwe confine ourselves to the subsequence {n 2 }, thenn~S n 21 > n 2 E) _Hence by Theorem 4 .2 .1 (Borel-Cantelli) we havennM2 < 00 .(7)°/'{I sit21 > n 2 E i .o .} = 0 ;


5 .1 SIMPLE LIMIT THEOREMS 1 109and consequently by Theorem 4 .2 .2(8)n2--* 0 a .e .We have thus proved the desired result for a subsequence ; and the "methodof subsequences" aims in general at extending to the whole sequence a resultproved (relatively easily) for a subsequence . In the present case we must showthat Sk does not differ enough from the nearest S n 2 to make any real difference .Put for each n > 1 :Then we haveD n = max ISk -S n z 1 .n2


110 1 LAW OF LARGE NUMBERS . RANDOM SERIESvk" ) (w) denote the number of digits among the first n digits of w that areequal to k . Then vk"~(w)/n is the relative frequency of the digit k in the firstn places, and the limit, if existing :lim v k" (60) = (Pk (w),n-*oo nmay be called the frequency of k in w . The number w is called simply normal(to the scale 10) iff this limit exists for each k and is equal to 1/10 . Intuitivelyall ten possibilities should be equally likely for each digit of a number picked"at random". On the other hand, one can write down "at random" any numberof numbers that are "abnormal" according to the definition given, such as• 1111 . . ., while it is a relatively difficult matter to name even one normalnumber in the sense of Exercise 5 below . It turns out that the number1 2345678910111213 . . .,which is obtained by writing down in succession all the natural numbers inthe decimal system, is a normal number to the scale 10 even in the stringentdefinition of Exercise 5 below, but the proof is not so easy . As fordetermining whether certain well-known numbers such as e - 2 or r - 3 arenormal, the problem seems beyond the reach of our present capability formathematics . In spite of these difficulties, Borel's theorem below asserts thatin a perfectly precise sense almost every number is normal . Furthermore, thisstriking proposition is merely a very particular case of Theorem 5 .1 .2 above .Theorem 5 .1 .3 . Except for a Borel set of measure zero, every number in[0, 1] is simply normal .PROOF . Consider the probability space (/, J3, m) in Example 2 ofSec . 2 .2 . Let Z be the subset of the form m/l0" for integers n > 1, m > 1,then m(Z) = 0 . If w E 'Z/\Z, then it has a unique decimal expansion ; if w E Z,it has two such expansions, but we agree to use the "terminating" one for thesake of definiteness . Thus we havew= •~ 1~2 . . .~n . . . .where for each n > 1, "() is a Borel measurable function of w . Just as inExample 4 of Sec . 3 .3, the sequence [~", n > 1) is a sequence of independentr.v .'s withfi n = k) _ ,lo , k=0,1, . . .,9 .Indeed according to Theorem 5 .1 .2 we need only verify that the ~"'s areuncorrelated, which is a very simple matter . For a fixed k we define the


5 .1 SIMPLE LIMIT THEOREMS 1 1 1 1r .v. X„ to be the indicator of the set {c ) : 4, (co) = k}, then


112 1 LAW OF LARGE NUMBERS . RANDOM SERIESlarge numbers . Without using Theorem 5 .1 .2 we may operate with (F(S4in 4 )as we did with C`(S /n 2 ) . Note that the full strength of independence is notneeded .]5. We may strengthen the definition of a normal number by consideringblocks of digits . Let r > 1, and consider the successive overlapping blocks ofr consecutive digits in a decimal ; there are n - r + 1 such blocks in the firstn places . Let 0")(w) denote the number of such blocks that are identical witha given one ; for example, if r = 5, the given block may be "21212" . Provethat for a .e . w, we have for every r :lim v (")(w) = 1-n-±oo n l or[HINT : Reduce the problem to disjoint blocks, which are independent .]*6. The above definition may be further strengthened if we consider differentscales of expansion . A real number in [0, 1 ] is said to be completelynormal iff the relative frequency of each block of length r in the scale s tendsto the limit 1 /sr for every s and r . Prove that almost every number in [0, 1 ]is completely normal .7. Let a be completely normal . Show that by looking at the expansionof a in some scale we can rediscover the complete works of Shakespearefrom end to end without a single misprint or interruption . [This is Borel'sparadox .]*8 . Let X be an arbitrary r .v . with an absolutely continuous distribution .Prove that with probability one the fractional part of X is a normal number .[xmT : Let N be the set of normal numbers and consider ~P{X - [X] E N} .]9 . Prove that the set of real numbers in [0, 1 ] whose decimal expansionsdo not contain the digit 2 is of measure zero . Deduce from this the existenceof two sets A and B both of measure zero such that every real number isrepresentable as a sum a + b with a E A, b E B .* 10 . Is the sum of two normal numbers, modulo 1, normal? Is the product?[HINT : Consider the differences between a fixed abnormal number and allnormal numbers : this is a set of probability one .]5 .2 Weak law of large numbersThe law of large numbers in the form (1) of Sec . 5 .1 involves only the firstmoment, but so far we have operated with the second . In order to drop anyassumption on the second moment, we need a new device, that of "equivalentsequences", due to Khintchine (1894-1959) .


5.2 WEAK LAW OF LARGENUMBERS 1 1 1 3DEFWUION . Two sequences of r .v .'s {X n } and {Y n } are said to be equivalentiff( 1 ) 1: °J'{Xn 0 Y n } < 00 .nIn practice, an equivalent sequence is obtained by "truncating" in variousways, as we shall see presently .Theorem 5 .2 .1 . If {Xn } and [Y,) are equivalent, thenE(X n - Yn ) converges a .e .nFurthermore if a nf oo, then(2)PROOF.j=1Y j ) -)~ 0 a .e .By the Borel-Cantelli lemma, (1) implies that"P7 {Xn 0 Y n i.0 .} = 0 .This means that there exists a null set N with the following property :co E c2\N, then there exists no(co) such thatifn > no(w) , Xn (w) = Yn (CA) .Thus for such an co, the two numerical sequences {X, (c))} and {Y n (c ))) differonly in a finite number of terms (how many depending on co) . In other words,the seriesE(Xn (a)) - Yn (a)))nconsists of zeros from a certain point on . Both assertions of the theorem aretrivial consequences of this fact .Corollary .With probability one, the expression~` 1 nL~Xn or - ann j=1converges, diverges to +oo or -oc, or fluctuates in the same way asE Yn17or


114 1 LAW OF LARGE NUMBERS . RANDOM SERIESrespectively . In particular, ifconverges to X in pr ., then so doesTo prove the last assertion of the corollary, observe that by Theorem 4 .1 .2the relation (2) holds also in pr . Hence ifthen we have(3)n(see Exercise 3 of Sec . 4 .1) .The next law of large numbers is due to Khintchine . Under the strongerhypothesis of total independence, it will be proved again by an entirelydifferent method in Chapter 6 .Theorem 5 .2 .2 . Let {X„ } be pairwise independent and identically distributedr .v.'s with finite mean m . Then we havePROOF .n 1 nJ X i ~- -aJ=1 j=1 n .1=1Sn- -~ m in pr .nLet the common d .f. be F so that( -Xi)-+X+O=X inpr.00 00M = r(X n ) _JxdF(x), t (jX,,1) =Jlxi dF(x) < oc .00By Theorem 3 .2 .1 the finiteness of (,,'(IX, 1) is equivalent tonMIX, I>n)


5.2 WEAK LAW OF LARGENUMBERS 1 1 1 5We introduce a sequence of r .v .'s {Y n } by "truncating at n" :Yn (w)= Xn (w), if IXn (co) I < n ;0, if IXn (w) I > n .This is equivalent to {X n } by (4), since GJ'(IXn I > n) = MXn :A Y n ) . LetT nnj=1By the corollary above, (3) will follow if (and only if) we can prove Tn /n --+m in pr. Now the Y n 's are also pairwise independent by Theorem 3 .3 .1(applied to each pair), hence they are uncorrelated, since each, being bounded,has a finite second moment . Let us calculate U 2 (Tn ) ; we have by (6) ofSec . 5 .1,na 2 (T,) =x 2 dF(x)j=1IxI_


116 1 LAW OF LARGE NUMBERS . RANDOM SERIESconsequently, by (2) of Sec . 5 .1,Now it is clear that as n -+ oo, ""(Y n ) - (K (X) = m ; hence also112nj=1c (Y j) --)~ m .It follows thatT nnas was to be proved .For totally independent r.v .'s, necessary and sufficient conditions forthe weak law of large numbers in the most general formulation, due toKolmogorov and Feller, are known . The sufficiency of the following criterionis easily proved, but we omit the proof of its necessity (cf. Gnedenko andKolmogorov [121) .Theorem 5 .2 .3 . Let {X n } be a sequence of independent r .v .'s with d .f.'sIF, J ; and S„ _ I ;=1 X j . Let {b n } be a given sequence of real numbers increasingto +oo .Suppose that we have0) I~=1 fxj>bn dF j (x) = 0(1),1(ii) b2 Ej=1 f XI


5.2 WEAK LAW OF LARGENUMBERS I 1 1 7Then if (6) holds for the given {b n } and any sequence of real numbers {a n },the conditions (i) and (ii) must hold .Remark .Condition (7) may be written as9{X n < 0} > A, GJ'{Xn > 0) >,k ;when ~. = 2 this means that 0 is a median for each X,, ; see Exercise 9 below .In general it ensures that none of the distribution is too far off center, and itis certainly satisfied if all F n are the same ; see also Exercise 11 below .It is possible to replace the a n in (5) byand maintain (6) ; see Exercise 8 below .nxdFj(x)j=1 Ix I 1 and 1 bn ;J< n :and writeT nnj=1Then condition (i) may be written asn{Yn,jj=1XI} = O(1)iand it follows from Boole's inequality that{C(Y1(8) {Tn Sn } n, X 1) {Yn,j X1) = O(1) .Next, condition (ii) may be written asnj=1 j=1Y n,j(b nj=1from which it follows, since {Y n , j , 1 < j < n} are independent r.v .'s :nnY i t,jYn,j1 2b nj=1 b " j=1( ( b,, - 0(l)/2n


118 I LAW OF LARGE NUMBERS . RANDOM SERIESHence as in (2) of Sec . 5 .1,(9)T" - (t (T, ) -~ 0bnin pr .It is clear (why?) that (8) and (9) together implySinceSn - (Tn) --* 0 in pr .bnf (T n) _d (Y ,,jj= j 1xdFj(x) = an ,(6) is proved .As an application of Theorem 5 .2 .3 we give an example where the weakbut not the strong law of large numbers holds .Example. Let {X n } be independent r .v .'s with a common d .f. F such thatn} = 9'{X, _ -n} = n2 log n'Cn=3,4, . . .,where c is the constantWe have then, for large values of n,1 03 ~ 12 n 3 n 2 log n) -IdF(x) = nk>nCk2 log ktiClog n '1 2 1 ck 2. n • x aT (x) _ -tiiXj n)n log n 'so that, since X, and X,, have the same d .f .,IX,, I > n) _ "fl IX ,I > n} = oo .Hence by Theorem 4 .2 .4 (Borel-Cantelli),(' 1 11X, I > n i .o .) = 1 .


5 .2 WEAK LAW OF LARGENUMBERS I 1 19But IS,, -S,,- 1 I = IX, I > n implies IS, I > n12 or ISn _ 1 I > n/2 ; it follows thatIsn l > 2and so it is certainly false that S n /n --* 0 a .e . However, we can prove more . For anyA > 0, the same argument as before yieldsand consequently"Pfl IX n I > An i .o .) = 1I S n I > An2 1 .0 . = 1 .This means that for each A there is a null set Z(A) such that if wE S2\Z(A), then(10) lIM Snn->oo n(w) > 200Let Z = U;_ 1 Z(m) ; then Z is still a null set, and if w E S2\Z, (10) is true for everyA, and therefore the upper limit is +oo . Since X is "symmetric" in the obvious sense,it follows thatEXERCISESlim Sn = -oo, urn Sn = +oo a .e .n->oo n n -oonSnnj=11. For any sequence of r .v .'s {Xn }, and any p > 1 :X n -+ 0 a .e . = S" -+ 0 a .e .,nXn -+0inL P ==~ SnThe second result is false for p < 1 .- 0inLP0n2. Even for a sequence of independent r .v .'s {X J, },Xn - -)~ 0 in pr . St - 0 in pr .n[HINT : Let X, take the values 2" and 0 with probabilities n -1 and 1 - n -1 .]3. For any sequence {X,) :Sn -+ 0 in pr . X„ -* 0 in pr .nnMore generally, this is true if n is replaced by b n , where b,,+i/b„ -* 1 .


1 2 0 1 LAW OF LARGE NUMBERS . RANDOM SERIES*4. For any 8 > 0, we haveurn - P)n-k = 0lk-n p >nSuniformly in p : 0 < p < 1 .*5. Let .7T(X1 = 2n) = 1/2n, n > 1 ; and let {X n , n > l} be independentand identically distributed . Show that the weak law of large numbers does nothold for b„ = n ; namely, with this choice of b n no sequence {a n } exists forwhich (6) is true . [This is the St . Petersburg paradox, in which you win 2n ifit takes n tosses of a coin to obtain a head . What would you consider as afair entry fee? and what is your mathematical expectation?]*6. Show on the contrary that a weak law of large numbers does hold forb n = n log n and find the corresponding a n . [xirrr: Apply Theorem 5 .2 .3 .]7. Conditions (i) and (ii) in Theorem 5 .2.3 imply that for any 8 > 0,n~x 1>6b,dF j (x) = o(1)and that a„ = of nb„ ) .8. They also imply that1 nn j=1b j 2, J'{X>a}> 2 .Show that such a number always exists but need not be unique .* 10 . Let {X,, 1 < n < oo} be arbitrary r .v.'s and for each n let mn be amedian of X n . Prove that if Xn -a X,, in pr. and m,, is unique, then m n -±m om . Furthermore, if there exists any sequence of real numbers {c n } such thatX n - C n -> 0 in pr ., then X n - Mn ---> 0 in pr .11 . Derive the following form of the weak law of large numbers fromTheorem 5 .2 .3 . Let {b„ } be as in Theorem 5 .2 .3 and put X, = 2b n for n > 1 .Then there exists {an } for which (6) holds but condition (i) does not .


5 .3 CONVERGENCE OF SERIES 1 12112. Theorem 5 .2 .2 may be slightly generalized as follows . Let IX,) bepairwise independent with a common d .f. F such that(i) x dF(x) = o(1), (ii) n fdF(x) = 0(1) ;Jx1 (p(n) i .o .} = 1, and so S, -- oo a .e . [HINT : Let N n denote the numberof k < n such that Xk < cp(n)/n . Use Chebyshev's inequality to estimatePIN, > n /2} and so conclude J'{S n > rp(n )/2} > 1 - 2F((p(n )/n ) . This problemwas proposed as a teaser and the rather unexpected solution was givenby Kesten .J14 . Let {b n } be as in Theorem 5 .2 .3 . and put X n = 2b„ for n > 1 . Thenthere exists {a n } for which (6) holds, but condition (i) does not hold. Thuscondition (7) cannot be omitted .IxI>n5 .3 Convergence of seriesIf the terms of an infinite series are independent r .v .'s, then it will be shownin Sec . 8 .1 that the probability of its convergence is either zero or one . Herewe shall establish a concrete criterion for the latter alternative . Not only isthe result a complete answer to the question of convergence of independentr .v .'s, but it yields also a satisfactory form of the strong law of large numbers .This theorem is due to Kolmogorov (1929) . We begin with his two remarkableinequalities . The first is also very useful elsewhere ; the second may be circumvented(see Exercises 3 to 5 below), but it is given here in Kolmogorov'soriginal form as an example of true virtuosity .Theorem 5.3 .1 . Let {X,, } be independent r .v .'s such thatVn :(-`(X,,) = 0, c`(XZ) = U 2 (Xn ) < oo .Then we have for every c > 0 :(1) max jS j I > F} < a 2 (Sn)1< j


1 22 1 LAW OF LARGE NUMBERS . RANDOM SERIESPROOF .let us defineFix c > 0 . For any co in the setA = {co :max ISj (w)I > E},1< j -< nv(w) = min{j : 1 < j < n, ISj(w)i > E) .Clearly v is an r.v . with domain A .PutA k = {co : v(co) = k} _ {w : max ISj(w)I < E, ISk(w)I > E1,1


5 .3 CONVERGENCE OF SERIES 1 123where the last inequality is by the mean value theorem, since I Sk I > E on A k bydefinition . The theorem now follows upon dividing the inequality above by E 2 .Theorem 5 .3 .2 . Let {Xn } be independent r .v.'s with finite means and supposethat there exists an A such that(3) Vn : IX,, - e(X n )I < A < oo .Then for every E > 0 we have(4) ~'{ max 13 j I< E} < ( + 4E)21 < j


(4E124 1LAW OF LARGE NUMBERS . RANDOM SERIES1 1la k - ak+1 1 = Sk dJ' - Sk d?,/19'(Mk) Mk '(Mk+l) fM117~(Mk+l )Mk+1Xk+1 dJ~ < 2e +A .It follows that, since I Sk I < E on Qk+l,1 2 < fo(I SkI +,e + 2E +A +A)2 dJ' < (4+ 2A)2~(Qk+1 ) •k+1On the other hand, we have1l =fk{(Sk - ak) 2 + (ak - ak+1 ) 2 + `I'k+1 + 2(S'k - ak)(ak - ak+l)M+ 2(Sk - ak)Xk+1 + 2(ak - ak+1)Xk+1 } d i .The integrals of the last three terms all vanish by (5) and independence, hence11 ?Mk(Sk - ak) 2 d°J' .+ Xk+l dGJ'fM k= f (Sk - ak)2 d°J' + - (Mk)a 2 (Xk+l ) .MkSubstituting into (6), and using M k M, we obtain for 0 < k < n - 1 :Mk+,(Sk+l - ak+1 ) 2 d~ - (Sk - ak ) 2Mkdam'> ~'(Mn)62(Xk+l) - (4E + 2A) 2J'(Q k+l ) •Summing over k and using (7) again for k = n :hencewhich is (4) .4E2 (Mn) > f (ISn I + E) 2 d~ > ' 2 JM„ , (Sn -a n ) dM„n> ~'(M")-j=1(2A + 46) 2 > 91(ml)nj=1+ 2A) 2 GJ'(Q\M n ),


5 .3 CONVERGENCE OF SERIES 1 1 2 5We can now prove the "three series theorem" of Kolmogorov (1929) .Theorem 5 .3 .3 . Let IX,,) be independent r .v .'s and define for a fixed constantA > 0 :n (w), ifY n IX,, (w) I < A(w) - X0, if IXn (w)I > A .Then the series >n X n converges a .e . if and only if the following three seriesall converge :(i) E„ -OP{IXnI> A} _ En 9~1{Xn :A Yn},(11) E17 cr(Yll ),(iii) En o 2 (Yn ) .PROOF . Suppose that the three series converge . Applying Theorem 5 .3 .1to the sequence {Y n - d (Y,, )}, we have for every m > 1 :If we denote the probability on the left by /3 (m, n, n'), it follows from theconvergence of (iii) that for each m :lim lim _?(m, n, n') = 1 .n-aoo n'-ooThis means that the tail of >n {Y n - 't(Y ,1 )} converges to zero a .e ., so that theseries converges a .e . Since (ii) converges, so does En Y n . Since (i) converges,{X„ } and {Y n } are equivalent sequences ; hence En X„ also converges a .e . byTheorem 5 .2 .1 . We have therefore proved the "if' part of the theorem .Conversely, suppose that En X n converges a .e. Then for each A > 0 :° l'(IXn I> A i .o .) = 0 .It follows from the Borel-Cantelli lemma that the series (i) must converge .Hence as before E n Y,, also converges a .e . But since I Y n - (r (Y n ) I< 2A, wehave by Theorem 5 .3 .2j=nmaxn


126 1 LAW OF LARGE NUMBERS . RANDOM SERIESbounded by 1, so the series could not converge . This contradiction proves that(iii) must converge . Finally, consider the series En {Y n - (1"'(Y,)} and applythe proven part of the theorem to it . We have ?PJIY n - (5'(Y n )I > 2A) = 0 ande` (Yn - (`(Y,)) = 0 so that the two series corresponding to (i) and (ii) with2A for A vanish identically, while (iii) has just been shown to converge . Itfollows from the sufficiency of the criterion that En {Y n - ([(Y„ )} convergesa .e ., and so by equivalence the same is true of En {X n - (r(Y n )} . Since E„ X nalso converges a .e. by hypothesis, we conclude by subtraction that the series(ii) converges . This completes the proof of .the theorem .The convergence of a series of r.v .'s is, of course, by definition the sameas its partial sums, in any sense of convergence discussed in Chapter 4 . Forseries of independent terms we have, however, the following theorem due toPaul Levy .Theorem 5 .3 .4 . If {X,} is a sequence of independent r .v .'s, then the convergenceof the series E n X n in pr . is equivalent to its convergence a .e .PROOF . By Theorem 4 .1 .2, it is sufficient to prove that convergence ofEll X„ in pr . implies its convergence a.e. Suppose the former ; then, givenE : 0 < E < 1, there exists m 0 such that if n > m > m0 , we have(8) {Isrn,nI > E} < E,whereSin,_nj=m+1It is obvious that for in < k < n we haven(9) U { in axI Sin, j I < 2E ; ISm, I > 2E ;I Sk,n I :!S E} C { ISin,,, I > E}M< J_k=m+ 1where the sets in the union are disjoint . Going to probabilities and usingindependence, we obtainnk=m+ 1Jl max ISin, j 126 ;I Sin, k l > 2E} J~'{ ISk,n I < E} E} .m< j 2E},n< j 2E} min J/'{ I Sk,n I _< E} { I Sm,n I > 6}m< j


5 .3 CONVERGENCE OF SERIES 1 127This inequality is due to Ottaviani . By (8), the second factor on the left exceeds1 -c, hence if m > m 0 ,(10) GJ'{ max ISm,jl > 2E}< 1 1 E-Opt ISm,nI > E} < 1 E E .m< j oc, then m -± oo, and finally c --* 0 through a sequence ofvalues, we see that the triple limit of the first probability in (10) is equalto zero . This proves the convergence a .e . of >n X, by Exercise 8 of Sec . 4 .2 .It is worthwhile to observe the underlying similarity between the inequalities(10) and (1) . Both give a "probability upper bound" for the maximum ofa number of summands in terms of the last summand as in (10), or its varianceas in (1) . The same principle will be used again later .Let us remark also that even the convergence of S, in d ist . i s equivalentto that in pr. or a.e . as just proved ; see Theorem 9 .5 .5 .We give some examples of convergence of series .Example . > n ±1/n .This is meant to be the "harmonic series" with a random choice of signs in eachterm, the choices being totally independent and equally likely to be + or - in eachcase . More precisely, it is the seriesnwhere {Xn , n > 1) is a sequence of independent, identically distributed r .v .'s takingthe values ±1 with probability z each .We may take A = 1 in Theorem 5 .3 .3 so that the two series (i) and (ii) vanish identically. Since u'(Xn ) = u 2 (Y n ) = 1/n'- , the series (iii) converges . Hence, 1:„ ±1/nconverges a .e . by the criterion above . The same conclusion applies to En ±1/n 8 if2 < 0 < 1 . Clearly there is no absolute convergence . For 0 < 0 < 1, the probabilityof convergence is zero by the same criterion and by Theorem 8 .1 .2 below .EXERCISES1 . Theorem 5 .3 .1 has the following "one-sided" analogue . Under thesame hypotheses, we have[This is due to A . W . Marshall .]0 1 max S j > E} < (S" )1


1 2 8 1 LAW OF LARGE NUMBERS . RANDOM SERIES*2 . Let {X„ } be independent and identically distributed with mean 0 andvariance 1 . Then we have for every x :PJ max S j > x} < 2,91'{S n > x - 2n } .1


5 .4 STRONG LAW OF LARGENUMBERS 1 1297. For arbitrary {X,, }, ifE (-f' '(IX,, I) < 00,nthen E„ X n converges absolutely a .e .8. Let {Xn }, where n = 0, ±1, ±2, . . ., be independent and identicallydistributed according to the normal distribution c (see Exercise 5 of Sec . 4 .5) .Then the series of complex-valued r .v .'s00 einxX nn=1where i = and x is real, converges a .e. and uniformly in x . (This isWiener's representation of the Brownian motion process .)*9 . Let {X n } be independent and identically distributed, taking the values0 and 2 with probability 2 each ; then00 X nn=1converges a .e. Prove that the limit has the Cantor d .f. discussed in Sec . 1 .3 .Do Exercise 11 in that section again ; it is easier now .*10 . If E n ±X n converges a .e. for all choices of ±1, where the Xn 's arearbitrary r .v .'s, then >n Xn 2 converges a.e . [HINT : Consider >n rn (t)X, (w)where the rn 's are coin-tossing r .v .'s and apply Fubini's theorem to the spaceof (t, cw) .]3n5 .4 Strong law of large numbersTo return to the strong law of large numbers, the link is furnished by thefollowing lemma on "summability" .Kronecker's lemma . Let {xk} be a sequence of real numbers, {ak} asequence of numbers > 0 and T oo . ThenPROOF .xn- < converges -n anJ=For 1 < n < oc letnnxj --* 0 .


1 3 0 1 LAW OF LARGE NUMBERS . RANDOM SERIESIf we also write a 0- 0, bo = 0, we havexn = an (bn - bn-1)and(Abel's method of partial summation) . Since aj+ l - a j > 0,n -. - bj_1) = b n - b j (aj+l - aj)j=0(aj+l - aj) = 1,and b„ --+ b,,,, we havexj-*b,,, -b,,~ b~=The lemma is proved .as Ixl(1)Now let cp be a positive, even, and continuous function on -R,'increases,co(x)cp(x)T ,IxI x 2such thatTheorem 5 .4 .1 . Let {Xn } be a sequence of independent r .v.'s with ((Xn ) _0 for every n ; and 0 < a,, T oo . If ~o satisfies the conditions above and(2)thenEn(t(~O(Xn)) < 00,W(an)(3)nXn converges a .e .a,PROOF . Denote the d .f. Of Xn by Fn . Define for each n(4) Yn (w)_ f Xn(w), if IXn (w) I < an,0, if IXn (w) I >an .ThenQ)Y„ 2n a n nx 22 dF„(x) .fxj


5 .4 STRONG LAW OF LARGENUMBERS 1 131By the second hypothesis in (1), we haveIt follows that2X< 'p(x) for < Ix Ia n .a~(an n)n2Y n ~ Yn(P (X)Q - < < dF2n (x)ann a n n Ixl


1 3 2 1 LAW OF LARGE NUMBERS . RANDOM SERIESApplying Kronecker's lemma to (3) for each c) in a set of probabilityone, we obtain the next result .Corollary .Under the hypotheses of the theorem, we have(6)1 na n j=1- * 0 a .e .Particular cases . (i) Let ~p(x) = Ix I P , I < p < 2 ; a n - n . Then we haven(7) nP ~(IXn I,) < 00 ~ 1 -+ 0 a .e .n j=1For p - 2, this is due to Kolmogorov ;Marcinkiewicz and Zygmund .(ii) Suppose for some 8, 0 < 8 < 1Vn : cO(IXn I 1+s ) < M .for 1 < p < 2, it is due toand M < oc we haveThen the hypothesis in (7) is clearly satisfied with p = 1 + 8 . This case isdue to Markov . Cantelli's theorem under total independence (Exercise 4 ofSec . 5 .1) is a special case .(iii) By proper choice of {a n }, we can considerably sharpen the conclusion(6) . Supposendn : Q 2 (Xn) = 6n < 00, a2 (Sn) = sn - Q~ -~ oo .Choose co(x) = x 2 and a n = Sn (log sJ 1 / 2) +E,Theorem 5 .4 .1 . Then((Xn )na 2n nby Dini's theorem, and consequently2Orn < 00sn (log Sn )1+2cE > 0, in the corollary toS nS n (log Sn ) (1/2)+E-+ 0 a .e .In case all on = 1 so that sn = n, the above ratio has a denominator that isclose to n 1 / 2 . Later we shall see that n 1 / 2 is a critical order of magnitudefor S n .


5.4 STRONG LAW OF LARGENUMBERS 1 133We now come to the strong version of Theorem 5 .2 .2 in the totally independentcase . This result is also due to Kolmogorov .Theorem 5 .4 .2 . Let {X n } be a sequence of independent and identically distributedr .v .'s . Then we have(8) c`(IX i I) < 00 ==~ Sn ((X 1 ) a .e .,n(9) (-r(1X11) = 00 =:~ lim is" In-+oo n= +oo a .e .PROOF . To prove (8) define {Y n } as in (4) with an = n . SinceE -'-'P{Xn 0 Y,)nn0 P{1XnI > n} =E°'{IX1I > n} < 00nby Theorem 3 .2 .1, {Xn } and {Y n } are equivalent sequences . Let us apply (7)to {Y„ with ~p(x) = x2 . We have(10)n02 (Yn) "(I2n) 2n 2n n nnIxI


1 34 1 LAW OF LARGE NUMBERS . RANDOM SERIESClearly cr(Y,,) -+ (`(X 1 ) as n -+ oc ; hence alsof (Y 1 ) -+ (%'(X 1),and consequentlynj=1-- . e(X I ) a .e .By Theorem 5 .2.1, the left side above may be replaced by (1/n) I:;=1 X j ,proving (8) .To prove (9), we note that ~(IX 1 1) = oc implies AIX1 I/A) = oo for eachA > 0 and hence, by Theorem 3 .2 .1,G.1'(IX 1 I > An) = +oo .nSince the r .v .'s are identically distributed, it follows thatE ~(lXn I > An) = + oo .nNow the argument in the example at the end of Sec . 5.2 may be repeatedwithout any change to establish the conclusion of (9) .Let us remark that the first part of the preceding theorem is a special caseof G . D . Birkhoff's ergodic theorem, but it was discovered a little earlier andthe proof is substantially simpler .N. Etemadi proved an unexpected generalization of Theorem 5 .4 .2 : (8)is true when the (total) independence of {Xn } is weakened to pairwiseindependence (An elementary proof of the strong law of large numbers,Z. Wahrscheinlichkeitstheorie 55 (1981), 119-122) .Here is an interesting extension of the law of large numbers when themean is infinite, due to Feller (1946) .Theorem 5 .4 .3 . Let {X n } be as in Theorem 5 .4 .2 with ct (I X 11) = oc . Let{a„ } be a sequence of positive numbers satisfying the condition an /n T . Thenwe have(11) lim is" I n an= 0 a .e ., or= oo a .e .according as(12) t'{1X n I > a n } - dF(x) < oo, or = oc .nnPROOF . WritingccIXI >ank=nJak < Ixl


5 .4 STRONG LAW OF LARGENUMBERS 1 135substituting into (12) and rearranging the double series, we see that the seriesin (12) converges if and only if(13) 1: dF(x) < oo .k k [ k_1


136 1 LAW OF LARGE NUMBERS . RANDOM SERIESWe now estimate the quantity(16)n11k = --xdF(x)a " k=1 lIk=1JxI


5.4 STRONG LAW OF LARGENUMBERS 1 1373. Let {X„ } be independent and identically distributed r .v .'s such thatc`(IX1IP) < od for some p : 0 < p < 2 ; in case p > 1, we assume also thatC`(X 1 ) = 0 . Then S„n-('In)_E --* 0 a .e . For p = 1 the result is weaker thanTheorem 5 .4 .2 .4. Both Theorem 5 .4 .1 and its complement in Exercise 2 above are "bestpossible" in the following sense . Let {a, 1 ) and (p be as before and suppose thatb11 > 0,b n~T(all) -n._ ~Then there exists a sequence of independent and identically distributed r .v .'s{Xn } such that cr-(Xn ) = 0, (r(Cp(Xn )) = b, andXna nconverges = 0 .[HINT : Make En'{ JX, I > an } = oo by letting each X, take two or threevalues only according as bn /Wp(an ) _< I or >1 .15. Let Xn take the values ±n o with probability 1 each . If 0 < 0 2 ? [HINT : To answer the question, useTheorem 5 .2 .3 or Exercise 12 of Sec . 5 .2 ; an alternative method is to considerthe characteristic function of S 71 /n (see Chapter 6) .]6. Let t(X 1 ) = 0 and {c, 1 } be a bounded sequence of real numbers . Thenj --}0 a . e .[HINT : Truncate X„ at n and proceed as in Theorem 5 .4 .2 .]7. We have S„ /n --). 0 a .e. if and only if the following two conditionsare satisfied :(i) S 11 /n -+ 0 in pr .,(ii) S2"/2" -± 0 a .e . ;an alternative set of conditions is (i) and(ii') VC > 0 : En JP(1S 2 ,'+1- S 2 ,' ( > 211 E < cc .*8 . If ("(`X11) < oo, then the sequence {S,1 /n } is uniformly integrable andS n /n -* ( (X 1 ) in L 1 as well as a .e .


1 38 I LAW OF LARGE NUMBERS . RANDOM SERIES9. Construct an example where i(X i) = + c and S„ /n -+00 a.e . [HINT : Let 0 < a < fi < 1 and take a d .f. F such that I - F(x) x-"as x --+ 00, and f °0, IxI ~ dF(x) < 00 . Show thatn/'( max X1 +00has been given recently by K . B . Erickson .10. Suppose there exist an a, 0 < a < 2, a ; 1, and two constants A 1and A2 such thatbn, b'x > 0 :X" < J'{IXnI > x} < A2If a > 1, suppose also that (1'(X n ) = 0 for each n . Then for any sequence {a n }increasing to infinity, we havez'{ Is, I > an i-0-1 _ 5 0 1if~ 1 ' aa„ -n[This result, due to P . Levy and Marcinkiewicz, was stated with a superfluouscondition on {a„} . Proceed as in Theorem 5 .3 .3 but truncate Xn at an ; directestimates are easy .]11. Prove the second alternative in Theorem 5 .4 .3 .12 . If c~(X 1 ) ; 0, then maxi n } < oc .]5.5 ApplicationsThe law of large numbers has numerous applications in all parts of probabilitytheory and in other related fields such as combinatorial analysis andstatistics . We shall illustrate this by two examples involving certain importantnew concepts .The first deals with so-called "empiric distributions" in sampling theory .Let {X„, n > l} be a sequence of independent, identically distributed r .v .'swith the common d .f. F . This is sometimes referred to as the "underlying"or "theoretical distribution" and is regarded as "unknown" in statistical lingo .For each w, the values X,(&)) are called "samples" or "observed values", andthe idea is to get some information on F by looking at the samples . For each


is an r cv) as follows co) (o) is called and F( the empiric distributionwhich ranges over the whole real line, how much better it would be to makea global statement about the functions Fn ( •, •) . We shall do this in5.5 APPLICATIONS 1 1 39n, and each cv E S2, let the n real numbers {X j (co), 1 _< j < n } be arranged inincreasing order as(1)Yn1(w) < Yn2((0) < . . . < Ynn(w)-Now define a discrete d .f. Fn ( •,:Fn (x, cv) = 0, if x < Yn1(w),kF, (x, cv) =nif Ynk(co) < X < Yn,k+l(w), 1 < k Y„n (c)) .In other words, for each x, nFn (x, c)) is the number of values of j, 1 < j < n,for which X j (co) < x ; or again Fn (x, w) is the observed frequency of samplevalues not exceeding x . The function Fn ( •,function based on n samples from F .For each x, Fn (x, •).v ., as we can easily check . Let us introducealso the indicator r .v .'s {~j(x), j > 1) as follows :if X j (c)) < x,~1(x, w) = 1 0 if X j (co) > x .We have thennFn (x, co) = - ~ j (x, cv) .j=1For each x, the sequence {~j(x)} is totally independent since {X j} is, byTheorem 3 .3 .1 . Furthermore they have the common "Bernoullian distribution",taking the values 1 and 0 with probabilities p and q = 1 - p, wherep = F(x), q = 1 - F(x) ;thus c_'(~ j (x)) = F (x) . The strong law of large numbers in the formTheorem 5 .1 .2 or 5 .4.2 applies, and we conclude that(2) F,, (x, c)) --) . F(x) a .e .Matters end here if we are interested only in a particular value of x, or a finitenumber of values, but since both members in (2) contain the parameter x,the theorem below . Observe first the precise meaning of (2) : for each x, thereexists a null set N(x) such that (2) holds for co E Q\N(x) . It follows that (2)also holds simultaneously for all x in any given countable set Q, such as the


1 4 0 1 LAW OF LARGE NUMBERS . RANDOM SERIESset of rational numbers, for co E Q\N, whereN= UN(x)XEQis again a null set . Hence by the definition of vague convergence in Sec . 4 .4,we can already assert thatF,(-,w)4 F( •) for a .e . co .This will be further strengthened in two ways : convergence for all x anduniformity . The result is due to Glivenko and Cantelli .Theorem 5 .5 .1 .We have as n -± ocsup IF, (x, co) - F(x)I -3 0 a .e .-00


5 .5 APPLICATIONS 1 1 4 1PROOF . Suppose the contrary, then there exist E > 0, a sequence {nk} ofintegers tending to infinity, and a sequence {xk} in ~~1 such that for all k :(5) IFnk(xk) - F(xk)I > E > 0 .This is clearly impossible if xk -+ +oc or Xk -+ -oo . Excluding these cases,we may suppose by taking a subsequence that xk -~ ~ E~:-4 1 . Now considerfour possible cases and the respective inequalities below, valid for all sufficientlylarge k, where r 1 E Q, r2 E Q, r l < 4 < r2 .Case 1 . xk t 4, xk < i :• < Fnk (xk) - F(xk) < Fnk (4- ) - F(rl )< Fnk (4-) - Fn k (44) + Fnk (r2) - F(r2) + F(r2) - F(rl ) .Case 2 . xk t 4, xk < 4 :• < F(Xk) - Fnk(xk) < F(4- ) - Fnk(rl)= F(4-) - F(rl) + F(r1) - F,,, (rl ) .Case 3 . xk ~ 4, xk > 4• < F(xk) - Fnk(xk) < F(r2) - Fnk(4)< F(r2) - F(r1) + F(r1) - F nk (rl ) + F nk (4-) - F nk (4) .Case 4 . xk ,j, 4, xk > 4• < F„k (xk) - F(xk) < Fn k(r2) - F(4)= Fn k (r2) - Fn k (ri ) + F„ k (r j ) - F(ri) + F(ri) - F(4) .In each case let first k -a oo, then r1 T 4, r2 ~ 4 ; then the last member ofeach chain of inequalities does not exceed a quantity which tends to 0 and acontradiction is obtained .Remark . The reader will do well to observe the way the proof aboveis arranged . Having chosen a set of c) with probability one, for each fixedc) in this set we reason with the corresponding sample functions F„ ( ., (0)and F( ., co) without further intervention of probability . Such a procedure isstandard in the theory of stochastic processes .Our next application is to renewal theory . Let IX,,, n > 1 } again be asequence of independent and identically distributed r .v .'s . We shall furtherassume that they are positive, although this hypothesis can be dropped, andthat they are not identically zero a .e. It follows that the common mean is


142 1 LAW OF LARGE NUMBERS . RANDOM SERIESstrictly positive but may be +oo . Now the successive r .v .'s are interpreted as"lifespans" of certain objects undergoing a process of renewal, or the "returnperiods" of certain recurrent phenomena . Typical examples are the ages of asuccession of living beings and the durations of a sequence of services . Thisraises theoretical as well as practical questions such as : given an epoch intime, how many renewals have there been before it? how long ago was thelast renewal? how soon will the next be?Let us consider the first question . Given the epoch t > 0, let N(t, w)be the number of renewals up to and including the time t . It is clear thatwe have(6) {w : N(t, w) = n } = {w : S„ (w) < t < S„+1(w)}valid for n > 0, provided So =- 0 . Summing over n < m - 1, we obtain(7) {w : N(t, (o) < m} = {w : Sm (w) > t} .This shows in particular that for each t > 0, N(t) = N(t, .) is a discrete r .v .whose range is the set of all natural numbers . The family of r .v .'s {N(t)}indexed by t E [0, oo) may be called a renewal process . If the common distributionF of the X„'s is the exponential F(x) = 1 - e- , x > 0 ; where a. > 0,then {N(t), t > 0} is just the simple Poisson process with parameter A .Let us prove first that(8) lim N(t) = +oo a .e .,t-> 00namely that the total number of renewals becomes infinite with time . This isalmost obvious, but the proof follows . Since N(t, w) increases with t, the limitin (8) certainly exists for every w . Were it finite on a set of strictly positiveprobability, there would exist an integer M such thatThis implies by (7) that.P{ sup N(t, w) < M} >0 .0


5 .5 APPLICATIONS 1 1 43null set Z 1such thatYw E 2\Z1 : lim X1((0) + . . . + X n (w) = m .n-aoonWe have just proved that there exists a null set Z2 such thatVw E S2\Z2 : lim N(t, w) = +oo .t-+00Now for each fixed coo , if the numerical sequence {an (coo ), n > 1 } convergesto a finite (or infinite) limit m and at the same time the numerical function{N(t, wo), 0 < t < oo} tends to +oo as t -~ +oo, then the very definition of alimit implies that the numerical function {aN(t,uo)(wo), 0 < t < oo} convergesto the limit m as t -+ +oo . Applying this trivial but fundamental observation tonj=1for each w in Q\(Z 1 U Z 2 ), we conclude thatSN(t,W)(w)(10) tli~N(t w)= m a.e .By the definition of NQ, w), the numerator on the left side should be close tot ; this will be confirmed and strengthened in the following theorem .Theorem 5 .5 .2 .(11)andWe haveN(t) 1lim = - a .e .t-* oo t mIN (t)} _ 1lim -,t->oo t mboth being true even if m = +oo, provided we take 1 /m to be 0 in that case .PROOF . It follows from (6) that for every w :SN(t,w) (w) - 0,SN(t, w ) (w) < t


1 44 1 LAW OF LARGE NUMBERS . RANDOM SERIESThe deduction of (12) from (11) is more tricky than might have beenthought . Since X„ is not zero a .e ., there exists 8 > 0 such thatYn :A{X„>8)=p>0 .DefineX, (0)) _ f8, if X„ (w) > 8 ;0, if X„ (w) < 8 ;and let S ;, and N' (t) be the corresponding quantities for the sequence {X ;, , n >11 . It is obvious that Sn < S, and N'(t) > N(t) for each t . Since the r .v .'s{X;, /8} are independent with a Bernoullian distribution, elementary computations(see Exercise 7 below) show thatf {N'(t) 2 }t 2= 0 (g2as t -) . oo .Hence we have, 8 being fixed,~N(t)) 2 < f(N r ~t)) 2 = O(1) .Since (11) implies the convergence of N(t)/t in distribution to 8 1/,,,, an applicationof Theorem 4 .5 .2 with X, = N(n)/n and p = 2 yields (12) with treplaced by n in (12), from which (12) itself follows at once .Corollary . For each t, f{N(t)} < oo .An interesting relation suggested by (10) is that f{SN (, ) } should be closeto m f {N(t)} when t is large . The precise result is as follows :f {X 1 + . .. + XN(r)+1 } = f`{X 1) f{N(t) + 1 1This is a striking generalization of the additivity of expectations when thenumber of terms as well as the summands is "random ." This follows from thefollowing more general result, known as "Wald's equation" .Theorem 5.5 .3 . Let {X„ , n > 1 } be a sequence of independent and identicallydistributed r .v .'s with finite mean . For k > 1 let A, 1 < k < oo, be theBorel field generated by {X j , 1 < j < k} . Suppose that N is an r .v . takingpositive integer values such that(13) Vk>1 :{N


5 .5 APPLICATIONS 1 1 45PROOF.Since So = 0 as usual, we have00(14) (r(SN) - f SNd~'= Skd9'=fX j dTk=1 (N=k) k=1 j=1 {N=k}00 00= ~7 E JXjdOP= X j dJTj=1 k=j {N k} fj=1 {N'j }Now the set {N < j - 1) and the r .v . X j are independent, hence the last writtenintegral is equal to (t(X j ) 7{N < j - 1} . Substituting this into the above, weobtaincc00( (SN) = (-F(Xj)z~-1{N > j} _ Y(X1) 9{N > j} = ((X1)cP(N),j=1 j=1the last equation by the corollary to Theorem 3 .2 .1 .It remains to justify the interchange of summations in (14), which isessential here . This is done by replacing X j with jX j I and obtaining as beforethat the repeated sum is equal to c (IX11)(-P(N) < cc .We leave it to the reader to verify condition (13) for the r .v . N(t) + 1above . Such an r .v . will be called "optional" in Chapter 8 and will play animportant role there .Our last example is a noted triumph of the ideas of probability theoryapplied to classical analysis . It is S . Bernstein's proof of Weierstrass' theoremon the approximation of continuous functions by polynomials .Theorem 5 .5 .4 . Let f be a continuous function on [0, 1], and define theBernstein polynomials {p„) as follows :pn (x) _00(--,- (Xj)-L Xj dA .j=1 {N


146 1 LAW OF LARGE NUMBERS . RANDOM SERIESand let S„thatso that_ Ek"-1 X, as usual . We know from elementary probability theory{Sn = k} =(1 _ x) n-k , 08 < 1 -a2 S n nx(1 - x) < 1n - 62 n 8 2n 2 - 48 2nThis is nothing but Chebyshev's proof of Bernoulli's theorem . Hence if n >IIf II/8 2E, we get I p, (x) - f (x)I < E in (16) . This proves the uniformity .


5 .5 APPLICATIONS 1 1 4 7One should remark not only on the lucidity of the above derivation butalso the meaningful construction of the approximating polynomials . Similarmethods can be used to establish a number of well-known analytical resultswith relative ease ; see Exercise 11 below for another example .EXERCISES{X„ } is a sequence of independent and identically distributed r .v .'s ;S,nj=11. Show that equality can hold somewhere in (1) with strictly positiveprobability if and only if the discrete part of F does not vanish .*2 . Let F n and F be as in Theorem 5 .5 .1 ; then the distribution ofsupIF n (x, co) - F(x)-00 0, then for every t > 0 andr > 0 we have v(t) > n } < An for some k < 1 and all large n ; consequently{v(t)r} < oo. This implies the corollary of Theorem 5 .5 .2 without recourseto the law of large numbers . [This is Charles Stein's theorem .]*7. Consider the special case of renewal where the r .v .'s are Bernoulliantaking the values 1 and 0 with probabilities p and 1 - p, where 0 < p < 1 .


1 4 8 1 LAW OF LARGE NUMBERS . RANDOM SERIESFind explicitly the d .f . of v(O) as defined in Exercise 6, and hence of v(t)for every t > 0 . Find [[{v(t)} and (, {v(t) 2 ) . Relate v(t, w) to the N(t, w) inTheorem 5 .5 .2 and so calculate (1{N(t)} and "{N(t)2} .8. Theorem 5 .5 .3 remains true if (r'(X1) is defined, possibly +00 or -oo .*9 . In Exercise 7, find the d .f. of X„ ( , ) for a given t . (`'{X„(,)} is the meanlifespan of the object living at the epoch t ; should it not be the same as C`{X 1 },the mean lifespan of the given species? [This is one of the best examples ofthe use or misuse of intuition in probability theory .]10 . Let r be a positive integer-valued r .v . that is independent of the X, 's .Suppose that both r and X 1 have finite second moments, thena2(St)= (q r)a2 (X1)+u2 (r)(U(X1))2 .Then* 11 . Let f be continuous and belong to L ' (0, oo) for some r > 1, and00(t) dt .f (x) =g(A) _ fe -Xt f(- 1)n-1nnooo - ( n - 1)! C x /0nl (n-1) (ngwhere g ( n -1) is the (n - 1)st derivative of g, uniformly in every finite interval .[xarrr : Let ) > 0, T {X 1 (a.) < t } = 1 - e - ~. . Then`xL {f (Sn(1))) _ [(-1)n-1/(n _ l)!]A .ng(n-1) ( .)and S, (n /x) -a x in pr . This is a somewhat easier version of Widder's inversionformula for Laplace transforms .]12. Let .{X 1 = k} = p k , 1 < k < f, ik-1 pk = 1 . Let N(n, w) be thenumber of values of j, 1 < j < n, for which X1 = k andH( ) = n, w ljPPk (n w) .k=1'Prove thatlim I log jj(n, w) exists a .e .n + oc nand find the limit . [This is from information theory .]Bibliographical NoteBorel's theorem on normal numbers, as well as the Borel-Cantelli lemmaSecs . 4 .2-4 .3, is contained inin


5 .5 APPLICATIONS 1 149Smile Borel, Sur les probabilites denombrables et leurs applications arithmetiques,Rend . Circ . Mat . Palermo 26 (1909), 247-271 .This pioneering paper, despite some serious gaps (see Frechet [9] for comments),is well worth reading for its historical interest . In Borel's Jubile Selecta (Gauthier-Villars, 1940), it is followed by a commentary by Paul Levy with a bibliography onlater developments .Every serious student of probability theory should read :A . N . Kolmogoroff, Ober die Summen durch den Zufall bestimmten unabhangigerGrossen, Math . Annalen 99 (1928), 309-319 ; Bermerkungen, 102 (1929),484-488 .This contains Theorems 5 .3 .1 to 5 .3 .3 as well as the original version of Theorem 5 .2 .3 .For all convergence questions regarding sums of independent r.v .'s, the deepeststudy is given in Chapter 6 of Levy's book [11] . After three decades, this book remainsa source of inspiration .Theorem 5 .5 .2 is taken fromJ . L. Doob, Renewal theory from the point of view ofprobability, Trans. Am. Math .Soc . 63 (1942), 422-438 .Feller's book [13], both volumes, contains an introduction to renewal theory as wellas some of its latest developments .


6Characteristic function6 .1 General properties ; convolutionsAn important tool in the study of r.v .'s and their p .m .'s or d .f .'s is the characteristicfunction (ch .f .) . For any r .v . X with the p .m . 11. and d .f . F, this isdefined to be the function f on :T . 1 as follows, Vt E R1 :_ f= µ(dx) = e dF(x) .tc itX(1) f O = ( (e )eitX(m) .1)v(dw)eitxirx00The equality of the third and fourth terms above is a consequence ofTheorem 3 .32 .2, while the rest is by definition and notation . We remind thereader that the last term in (1) is defined to be the one preceding it, where theone-to-one correspondence between µ and F is discussed in Sec . 2 .2 . We shalluse both of them below . Let us also point out, since our general discussionof integrals has been confined to the real domain, that f is a complex-valuedfunction of the real variable t, whose real and imaginary parts are givenrespectively byR f (t) = fcosxtic(dx), I f (t) = fsin xtµ(dx) .


6.1 GENERAL PROPERTIES; CONVOLUTIONS 1 1 5 1Here and hereafter, integrals without an indicated domain of integration aretaken over W. 1 .Clearly, the ch .f. is a creature associated with µ or F, not with X, butthe first equation in (1) will prove very convenient for us, see e .g . (iii) and (v)below . In analysis, the ch .f . is known as the Fourier-Stieltjes transform of µor F . It can also be defined over a wider class of µ or F and, furthermore,be considered as a function of a complex variable t, under certain conditionsthat ensure the existence of the integrals in (1) . This extension is important insome applications, but we will not need it here except in Theorem 6 .6 .5 . Asspecified above, it is always well defined (for all real t) and has the followingsimple properties .(i) `dt E iw 1If WI < 1 = f (0), f (-t) = f (t),where z denotes the conjugate complex of z .(ii) f is uniformly continuous in R1 .To see this, we write for real t and h :f (t + h) - f (t) = f (e i(t+b)X - e itX )l-t(dx),I f (t + h) - f (t)I < f Ieitx IIe n,X - l IA(dx) =fIe''ix - 1I µ(dx) .The last integrand is bounded by 2 and tends to 0 as h --* 0, for each x . Hence,the integral converges to 0 by bounded convergence . Since it does not involvet, the convergence is surely uniform with respect to t .(iii) If we write fx for the ch .f. of X, then for any real numbers a andb, we havefax+b(t)= fx (at)e't bf-x(t) = fx(t) .This is easily seen from the first equation in(1), fort ( e it(aX+b)) = t~( e i(ta)X e itb) = c ( e i(ta)X) e itb(iv) If If,, n > 1 } are ch .f.'s, X n > 0, T11=1 X,, = 1, then00~~,nfnn=1is a ch .f. Briefly : a convex combination of ch .f .'s is a ch .f .


1 5 2 1 CHARACTERISTIC FUNCTIONFor if {µ„ , n > 1 } are the corresponding p .m.'s, then E n=1 ;~ n µ n is a p .m .whose ch .f. is n=1 4 f n(v) If {fj , 1 < j < n) are ch .f.'s, thennis a ch .f.By Theorem 3 .3 .4, there exist independent r .v .'s {Xj , 1 < j < n} withprobability distributions {µ j, 1 < j < n}, where µ j is as in (iv) . LettingS nj=1we have by the corollary to Theorem 3 .3 .3 :or in the notation of (iii) :(2)(3)and written as(t(eitsn) _ s' (fjeitxiS(e itxJ) = f j (t) ;= 11 11n(j=1 j=1 j=1fsnn= J fx,fj=1(For an extension to an infinite number of f j 's see Exercise 4 below .)The ch .f. of S n being so neatly expressed in terms of the ch .f.'s of thesummands, we may wonder about the d .f. Of Sn . We need the followingdefinitions .DEFINTT1oN . The convolution of two d .f .'s F 1 and F2 is defined to be thed .f. F such thatVx E J. 1 : F(x) =J700F=F 1 *F 2 .F, (x - y) dF2(y),It is easy to verify that F is indeed a d .f . The other basic properties ofconvolution are consequences of the following theorem .Theorem 6 .1 .1 . Let X 1 and X2 be independent r .v .'s with d .f .'s F 1 and F 2 ,respectively . Then X1 + X2 has the d .f . F1 * F2 .n


6.1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1 53PROOF .We wish to show that(4) Vx : ,ql{Xi + X2 < x) = (F 1 * F2)(x) .For this purpose we define a function f of (x1, x2 ) as follows, for fixed x :_ 1, if x1 + x2 < x ;f (x1, x2) 0, otherwise .f is a Borel measurable function of two variables . By Theorem 3 .3 .3 andusing the notation of the second proof of Theorem 3 .3 .3, we havef f(X1, X2)d?T = fff(xi,x2)92µ 2 (dxi,dx2)= f 112 (dx2) f (XI, x2µl (dxl )= f 1L2(dx2) f µl (dxl )(-oo,x- X21= f00dF2(x2)F1(x - X2)-This reduces to (4) . The second equation above, evaluating the double integralby an iterated one, is an application of Fubini's theorem (see Sec . 3 .3) .Corollary . The binary operation of convolution * is commutative and associative.For the corresponding binary operation of addition of independent r .v .'shas these two properties .DEFINITION . The convolution of two probability density functions P1 andP2 is defined to be the probability density function p such that(5)and written asdx E R,' :P(x) ='COOP1(x - Y)P2(), ) dy,P=P1*P2 •We leave it to the reader to verify that p is indeed a density, but we willspell out the following connection .Theorem 6 .1 .2. The convolution of two absolutely continuous d .f .'s withdensities p1 and P2 is absolutely continuous with density P1 * P2 .


1 54 1 CHARACTERISTIC FUNCTIONPROOF . We have by Fubini's theorem :X X 00Jf p(u) du = duf f00 00 00P1 (U - v)p2(v) dv0000~ -oo(u v)dp2(v)dv[TO0 [f00 Pi_u]F1(x - v)p2(v) A= foo F, (x - v) dF2(v) = (F1 * F'2)(x) •00This shows that p is a density of F1 * F2 .What is the p.m ., to be denoted by Al * µ2, that corresponds to F 1 * F 2 ?For arbitrary subsets A and B of R 1 , we denote their vector sum and differenceby A + B and A - B, respectively :(6) A±B={x±y :xEA,yEB} ;and write x ± B for {x} ± B, -B for 0 - B .confusing A - B with A\B .There should be no danger ofTheorem 6 .1 .3 .For each B E .A, we have(7) (A1 * /x2)(B) = f~~ AIM - Y)A2(dY) .For each Borel measurable function g that is integrable with respect to µ1 *,a2,we have(8) f g(u)(A] * 92)(du) = f fg(x + Y)µ] (dx)/i2(dy) .PROOF . It is easy to verify that the set function (Al * ADO defined by(7) is a p .m . To show that its d .f. is F1 * F2, we need only verify that its valuefor B = (- oo, x] is given by the F(x) defined in (3) . This is obvious, sincethe right side of (7) then becomesf F, (x - Y)A2(dy) = fF, (x - Y) dF 2(Y)-Now let g be the indicator of the set B, then for each y, the function g y definedby g y (x) = g(x + y) is the indicator of the set B - y . Hencefg(x + Y)It1(dx) = /t, (B - y)I


6 .1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1 5 5and, substituting into the right side of (8), we see that it reduces to (7) in thiscase . The general case is proved in the usual way by first considering simplefunctions g and then passing to the limit for integrable functions .As an instructive example, let us calculate the ch .f. of the convolution/11 * g2 . We have by (8)f e itu (A] * g2)(du) = fe ity e uX gl(dx)g2(dy)= f e irz g1(dx) f e ityg2(dy) .This is as it should be by (v), since the first term above is the ch .f. of X + Y,where X and Y are independent with it, and g2 as p.m .'s . Let us restate theresults, after an obvious induction, as follows .Theorem 6 .1 .4 . Addition of (a finite number of) independent r .v .'s correspondsto convolution of their d .f .'s and multiplication of their ch .f.'s .Corollary . If f is a ch .f., then so is If 12 .To prove the corollary, let X have the ch .f . f . Then there exists on someQ (why?) an r .v. Y independent of X and having the same d .f., and so alsothe same ch .f . f . The ch .f. of X - Y is( eit(X-Y) ) = (eitX)cr(e-itY) = (t ) cP12 .f (t)f (-t) = I fThe technique of considering X - Y and I f 12 instead of X and f willbe used below and referred to as "symmetrization" (see the end of Sec . 6 .2) .This is often expedient, since a real and particularly a positive-valued ch .f .such as I f (2 is easier to handle than a general one .Let us list a few well-known ch .f.'s together with their d .f.'s or p .d .'s(probability densities), the last being given in the interval outside of whichthey vanish .(1) Point mass at a :d .f . 8 Q ; ch .f. e`a t .(2) Symmetric Bernoullian distribution with mass 1 each at +1 and -1 :d .f. (8 1 +6- 1 ) ; ch .f . cost .1(3) Bernoullian distribution with "success probability" p, and q = I - p :d .f . q8o + p8 1 ; ch .f. q + pe' t = 1 + p(e' t - 1) .


1 56 1 CHARACTERISTIC FUNCTION(4) Binomial distribution for n trials with success probability p :nd .f . Y:pkgn-k8k ;ch .f. (q + pe ) n itk=o (n(5) Geometric distribution with success probability p :00d .f. qn p6n ; ch .f. p(l - qe` t )-1n =O(6) Poisson distribution with (mean) parameter ). :00~. nd .f.Sn ; ch . f. e" e 1)n .n=0(7) Exponential distribution with mean ), -1 :p .d . ke-~ in [0, oc) ; ch .f. (1 - A -1 it) -1 .(8) Uniform distribution in [-a, +a] :p .d .I in [-a, a] ; ch .f . sin2a(9) Triangular distribution in [-a, a] :att (- 1 for t = 0) .p .d .(10) Reciprocal of (9) :a - Ixl . 2(1 - cosat)in [-a, a] ; ch .f.a2a2 t 2at2 /1 -cosax / Itlp .d . i n (-oo, oo) ; ch .f. f l - - v 0 .7rax 2a(11) Normal distribution N(m, a2 ) with mean m and variance a21 (x-m)2p .d . exp - in (-oc, oo) ;1/27ra2Q2ch .f . exp Cimt -Q 2 t 2Unit normal distribution N(0, 1) = c with mean 0 and variance 1 :p .d . 1 e -x2 / 2 in (-oc, oc) ; ch .f. e -t2 / 2 .1/2Tr(12) Cauchy distribution with parameter a > 0 :2p .d .r(a 2 a 7+ in (-oc, oc) ; ch .f. e-'I t l .x2)


CS6 .1 GENERAL PROPERTIES; CONVOLUTIONS 1 1 5 7Convolution is a smoothing operation widely used in mathematical analysis,for instance in the proof of Theorem 6 .5 .2 below . Convolution with thenormal kernel is particularly effective, as illustrated below .Let ns be the density of the normal distribution N(0, 8 2 ), namelyn s ( -12= 270 . exp x, - 00 < x < 00 .282)For any bounded measurable function f on 'V, put00 "0(9) fs(x) = (f * ns)(x) = f f (x - Y)ns(Y)dY = f ns(x - Y)f (Y)dy .00It will be seen below that the integrals above converge . Let CB denote theclass of functions on R1 which have bounded derivatives of all orders ; C Uthe class of bounded and uniformly continuous functions on R 1 .Theorem 6 .1 .5 . For each 8 > 0, we have f s E COO . Furthermore if f E CU ,then f s --> f uniformly in R 1 .PROOF. It is easily verified that n s E CB . Moreover its kth derivative n 3(k)is dominated by Ck .an2s where ck, s is a constant depending only on k and 8 sothat00fnsk) (x - y)f (Y)dY_ ck,sIIfIIn2S(x-Y)dy=Ck,sIIfII-Thus the first assertion follows by differentiation under the integral of the lastterm in (9), which is justified by elementary rules of calculus . The secondassertion is proved by standard estimation as follows, for any 77 > 0 :00If (x)-fs(x)I < fIf(x)-f(x-Y)Ins(Y)dy< sup If(x)-f(x-Y)I +2IIf1I f n6(Y)dy .1),I-'I>77Here is the probability idea involved . If f is integrable over 0, we maythink of f as the density of a r .v . X, and n 8 as that of an independent normalr .v . Ys . Then f s is the density of X + Ys by Theorem 6 .1 .2 . As 8 4 0, Ysconverges to 0 in probability and so X + Y s converges to X likewise, hencealso in distribution by Theorem 4 .4 .5 . This makes it plausible that the densitieswill also converge under certain analytical conditions .As a corollary, we have shown that the class CB is dense in the class Cuwith respect to the uniform topology on i~' 1 . This is a basic result in the theory


158 1 CHARACTERISTIC FUNCTIONof "generalized functions" or "Schwartz distributions" . By way of application,we state the following strengthening of Theorem 4 .4 .1 .Theorem 6 .1 .6 . If {li ,1 } and A are s .p.m .'s such thatthen lµ„ -~ µ .b f E C,', : f f (x)l- ,, (dx) -~ f f (x)Iu (dx),This is an immediate consequence of Theorem 4 .4.1, and Theorem 6 .1 .5,if we observe that Co C C U . The reduction of the class of "test functions"from Co to CB is often expedient, as in Lindeberg's method for proving centrallimit theorems ; see Sec . 7.1 below .EXERCISES1 . . If f is a ch .f., and G a d .f. with G(0-) = 0, then the followingfunctions are all ch .f.'s :00 00ff (ut)e -u du, fe_I t lu dG(u),f f (ut) du,0 0f o" e_t-u00dG(u) f f (ut) dG(u) .0 o*2 . Let f (u, t) be a function on (-oo, oo) x (-oo, oo) such that for eachu, f (u, •) is a ch .f. and for each t, f ( •, t) is a continuous function ; thenfooccf (u, t) dG(u)is a ch .f. for any d .f. G . In particular, if f is a ch .f. such that lim t, 00 f (t)exists and G a d .f. with G(0-) = 0, thenfo°O tf - dG(u) is a ch .f.u3 . Find the d .f. with the following ch .f .'s (a > 0, ,6 > 0) :a2 1 1a2 + t 2 ' (1 - ait)V (1 + of - ape" ) 1 /[HINT : The second and third steps correspond respectively to the gamma andPolya distributions .]


6 .1 GENERAL PROPERTIES ; CONVOLUTIONS 1 1594. Let S1 , be as in (v) and suppose that S, --+ S,,, in pr. Prove that00flfj(t)j=1converges in the sense of infinite product for each t and is the ch .f. of S am .5 . If F 1 and F 2 are d .f.'s such thatF1 =jb- iand F 2 has density p, show that F 1 * F 2 has a density and find it .*6 . Prove that the convolution of two discrete d .f.'s is discrete ; that of acontinuous d .f. with any d .f. is continuous ; that of an absolutely continuousd .f. with any d .f. is absolutely continuous .7. The convolution of two discrete distributions with exactly m and natoms, respectively, has at least m + n - I and at most mn atoms .8. Show that the family of normal (Cauchy, Poisson) distributions isclosed with respect to convolution in the sense that the convolution of anytwo in the family with arbitrary parameters is another in the family with someparameter(s) .9 . Find the nth iterated convolution of an exponential distribution .*10 . Let {Xj , j > 1} be a sequence of independent r .v.'s having thecommon exponential distribution with mean 1/),, a, > 0 . For given x > 0 letv be the maximum of n such that Sn < x, where So = 0, S, = En=1 Xj asusual . Prove that the r .v . v has the Poisson distribution with mean ).x . SeeSec . 5 .5 for an interpretation by renewal theory .11. Let X have the normal distribution (D . Find the d .f., p .d ., and ch .f.of X 2 .12 . Let {X j, 1 < j < n} be independent r .v.'s each having the d .f. (D .Find the ch .f. ofand show that the corresponding p .d . is 2- n 12 F(n/2) -l x (n l 2)-l e -x" 2 in (0, oc) .This is called in statistics the "X 2 distribution with n degrees of freedom" .13. For any ch .f. f we have for every t :R[1 - f (t)] > 4R[1 - f (2t)] .14. Find an example of two r .v .'s X and Y with the same p .m . µ that arenot independent but such that X + Y has the p .m . µ * It . [HINT : Take X = Yand use ch .f.]


1 CHARACTERISTIC FUNCTIONDO h2 h +(h) -< QF (h) A QG (h)1 60*15 . For a d .f. F and h > 0, defineQF(h) = sup[F(x + h) - F(x-)] ;xQF is called the Levy concentration function of F . Prove that the sup aboveis attained, and if G is also a d .f., we haveVh > 0 : QF•G.16 . If 0 < hA < 2n, then there is an absolute constant A such thatQF(h) < - f if (t)I dt,where f is the ch .f. of F . [HINT : Use Exercise 2 of Sec . 6.2 below .]17. Let F be a symmetric d .f ., with ch .f. f > 0 then(Dr (h) _f2 dF(x) = h 00 e-ht f(t) dtoo x 0is a sort of average concentration function . Prove that if G is also a d .f. withch .f. g > 0, then we have Vh > 0 :(PF*G(h) < cOF(h) A ip0(h) ;1 - cPF*G(h) < [1 - coF(h)] + [1 - cPG(h)] .*18 . Let the support of the p .m. a on ~RI be denoted by supp p . Provethatsupp (j.c * v) = closure of supp . + supp v ;supp (µ i * µ2 * . . . ) = closure of (supp A l + supp / 2 + • )where "+" denotes vector sum .6.2 Uniqueness and inversionTo study the deeper properties of Fourier-Stieltjes transforms, we shall needcertain "Dirichlet integrals" . We begin with three basic formulas, where "sgna" denotes 1, 0 or -1, according as a > 0, = 0, or < 0 .ry sin ax(1) dy>0:0


x[ J6 .2 UNIQUENESS AND INVERSION 1161(3)°° 1 - os axA0= 2 Icel .The substitution ax = u shows at once that it is sufficient to prove all threeformulas for a = 1 . The inequality (1) is proved by partitioning the interval[0, oo) with positive multiples of Jr so as to convert the integral into a seriesof alternating signs and decreasing moduli . The integral in (2) is a standardexercise in contour integration, as is also that in (3) . However, we shall indicatethe following neat heuristic calculations, leaving the justifications, which arenot difficult, as exercises .r °0 sin x f °O f °° f °0 r °°°01 2osxdx= 00dx sin x e-XU du dx =fJ o I = J J00 du n1 u 2 2 '00 u 00 dxX sin u du dx = sin10 0 x2 o o U~O° sin u ndu = - .o u 2Je-xu sin x dx duWe are ready to answer the question : given a ch .f. f, how can we findthe corresponding d .f. F or p.m . µ? The formula for doing this, called theinversion formula, is of theoretical importance, since it will establish a oneto-onecorrespondence between the class of d .f.'s or p .m .'s and the class ofch .f.'s (see, however, Exercise 12 below) . It is somewhat complicated in itsmost general form, but special cases or variants of it can actually be employedto derive certain properties of a d .f. or p .m . from its ch .f. ; see, e .g ., (14) and(15) of Sec . 6 .4 .Theorem 6 .2 .1 . If x1 < x2, then we have(4 ) A((X1, X2)) + 2-~({X1 }) + 2/~({x2})1 T e-irx1 - e -i'x2= 2ir it f (t) dtToo I T(the integrand being defined by continuity at t = 0) .PROOF . Observe first that the integrand above is bounded by Ixl - x21everywhere and is O(ItI -1 ) as It! --+ oo ; yet we cannot assert that the "infiniteintegral" f0 exists (in the Lebesgue sense) . Indeed, it does not in general(see Exercise 9 below) . The fact that the indicated limit, the so-called Cauchylimit, does exist is part of the assertion .du


t1 62 1 CHARACTERISTIC FUNCTIONWe shall prove (4) by actually substituting the definition of f into (4)and carrying out the integrations . We have(5)1 T e- l"' - e-itx2 0027r I T it f 00e' rx µ(dx)dt~cc T e it(x-x,) - e it(x-x2)00 [~ dt] µ(dx) .T27ri ttLeaving aside for the moment the justification of the interchange of the iteratedintegral, let us denote the quantity in square brackets above by I (T, x, x1, X2)-Trivial simplification yieldsI(T, x, x1, x2) =- 0It follows from (2) that1 1 T sin t(x - xl) 1 dt -t_ 21 _(_ 21)_1 10 - (- 2 ) = 21 1lim I (T, x, x1, x2) _ 2-(- 2 )=11 - 0 - 1T-+ oo2 21 12 -2=0 -01 T sin t(x - x2)dt .Furthermore, I is bounded in T by (1) . Hence we may let T ---> oo under theintegral sign in the right member of (5) by bounded convergence, since0-0for x < x1,for x = x1,for xl < x < x2,for x = X2,for x > x2 .by (1) .The result isJI(T, x, x1, x2 )1< 2 sinx dx7r 0 XO+ Jx 2 + i+ f 1 + 0}µ(dx)oo, xx ,z x 2 (x2, 00 ))))( i) { ~ l( i 2) { z?= 2A({x1}) + µ((x1, x2)) + 2µ({x2}) .This proves the theorem . For the justification mentioned above, we invokeFubini's theorem and observe thate it(x-x,)- e it(x-xz)it,Ixie-itu duwhere the integral is taken along the real axis, and< IX l -x21,T1? 1 ITIxl - x2 I dt,u,(dx) < 2T Ix, -x21 < oo,


6 .2 UNIQUENESS AND INVERSION 1 163so that the integrand on the right of (5) is dominated by a finitely integrablefunction with respect to the finite product measure dt - u(dx) on [-T, +T] x?) 1 . This suffices .Remark . If x1 and x2 are points of continuity of F, the left side of (4)is F(x2) - F(xl ) .The following result is often referred to as the "uniqueness theorem" forthe "determining" p. or F (see also Exercise 12 below) .Theorem 6 .2 .2 .same .If two p .m .'s or d .f.'s have the same ch .f., then they are thePROOF . If neither x 1 nor x2 is an atom of µ, the inversion formula (4)shows that the value of µ on the interval (x1, x2 ) is determined by its ch .f. Itfollows that two p .m .'s having the same ch .f. agree on each interval whoseendpoints are not atoms for either measure . Since each p .m . has only a countableset of atoms, points of ?P,, 1 that are not atoms for either measure form adense set. Thus the two p .m .'s agree on a dense set of intervals, and thereforethey are identical by the corollary to Theorem 2 .2 .3 .We give next an important particular case of Theorem 6 .2 .1 .Theorem 6 .2 .3 . If f E L 1 (-oo, +oo), then F is continuously differentiable,and we have00(6) F(x) = Zn J 00e- `X ' f (t) dt .PROOF . Applying (4) for x2 = x and x1 = x - h with h > 0 and using Finstead of µ, we haveF(x) + F(x-) F(x - h) + F(x - h-) I O° e"~'-12 2 2n ,~ ~ ite - "X f (t) dt .Here the infinite integral exists by the hypothesis on f, since the integrandabove is dominated by I h f (t)! . Hence we may let h --)- 0 under the integralsign by dominated convergence and conclude that the left side is 0 . Thus, Fis left continuous and so continuous in Now we can writeF(x)-F(x-h) 1 °O e"h-1 _ e-itx f(t) dt .h 27r _~ ithThe same argument as before shows that the limit exists as h -* 0. HenceF has a left-hand derivative at x equal to the right member of (6), the latterbeing clearly continuous [cf . Proposition (ii) of Sec . 6 .1] . Similarly, F has a


164 1 CHARACTERISTIC FUNCTIONright-hand derivative given by the same formula . Actually it is known for acontinuous function that, if one of the four "derivates" exists and is continuousat a point, then the function is continuously differentiable there (see, e .g .,Titchmarsh, The theory offunctions, 2nd ed ., Oxford Univ . Press, New York,1939, p . 355) .The derivative F' being continuous, we have (why?)Vx : F(x) =fF'(u)du .Thus F' is a probability density function . We may now state Theorem 6 .2 .3in a more symmetric form familiar in the theory of Fourier integrals .Corollary . If f E L I , then p E L 1 , whereandp(x) =1 °°2n Je-txt f(t) dt,= f f (t) 00 e1txpxdx ().The next two theorems yield information on the atoms of µ by means off and are given here as illustrations of the method of "harmonic analysis" .Theorem 6 .2.4 .For each x0, we have(7) lim 1 e-`txo f(t) dtTT---) .w 2T -T= µ({xo}) .PROOF . Proceeding as in the proof of Theorem 6 .2 .1, we obtain for theintegral average on the left side of (7) :(8)sin T(x - x0)µ(dx) + lµ(dx) .~~-{x } T(.,c- x0) {xo)The integrand of the first integral above is bounded by 1 and tends to 0 asT -). oc everywhere in the domain of integration ; hence the integral convergesto 0 by bounded convergence. The second term is simply the right memberof (7) .Theorem 6 .2 .5 .(9)We have1 TT (t)I 2 dt2T J-T I fXE_/? I


6 .2 UNIQUENESS AND INVERSION 1 165PROOF . Since the set of atoms is countable, all but a countable numberof terms in the sum above vanish, making the sum meaningful with a valuebounded by 1 . Formula (9) can be established directly in the manner of (5)and (7), but the following proof is more illuminating . As noted in the proofof the corollary to Theorem 6 .1 .4, 12 1 f is the ch .f. of the r.v . X - Y there,whose distribution is g * g', where g'(B) = It (-B) for each B E A ApplyingTheorem 6 .2.4 with x0 = 0, we see that the left member of (9) is equal to(A * g')({0}) .By (7) of Sec . 6 .1, the latter may be evaluated asI g'({-y})g(dy) = g({y})g({y}),~tsince the integrand above is zero unless -y is an atom of g', which is thecase if and only if y is an atom of g . This gives the right member of (9) . Thereader may prefer to carry out the argument above using the r .v .'s X and -Yin the proof of the Corollary to Theorem 6 .1 .4 .Corollary. g is atomless (F is continuous) if and only if the limit in the leftmember of (9) is zero .yE9?'This criterion is occasionally practicable .DEFINITION . The r .v . X is called symmetric iff X and -X have the samedistribution .For such an r .v ., the distribution g has the following property :VB E .A : 11(B) = g(-B) .Such a p .m . may be called symmetric ; an equivalent condition on its d .f.as follows :Yx E R I : F(x) = 1 - F(-x-),F is(the awkwardness of using d .f. being obvious here) .Theorem 6 .2 .6. X or g is symmetric if and only if its ch .f. is real-valued(for all t) .PROOF. If X and -X have the same distribution, they must "determine"the same ch .f. Hence, by (iii) of Sec . 6 .1, we have.f (t) = .f (t)


1 6 6 1 CHARACTERISTIC FUNCTIONand so f is real-valued . Conversely, if f is real-valued, the same argumentshows that X and -X must have the same ch .f. Hence, they have the samedistribution by the uniqueness theorem (Theorem 6 .2 .2) .EXERCISESfis the ch .f. of F below .1. Show that*2. Show that for each T > 0 :J °O (sinx)2 7rx dx = 2 .1 / °O (1 -cos Tx) cos tx00 X 2dx = (T - ItI) v 0 .Deduce from this that for each T > 0, the function of tIti(1_)vOTgiven byis a ch .f. Next, show that as a particular case of Theorem 6 .2 .3,1 - cos Tx _ 1 fT2 T (T - It I)e` txdt .X2 JFinally, derive the following particularly useful relation (a case of Parseval'srelation in Fourier analysis), for arbitrary a and T > 0 :~'OO 1 - cos T(x - a) 1 1dF(x) = 2 Tz T (T - ItI)e-`t° f(t)dt .J [T(x - a)]2J*3. Prove that for each a > 0 :J a [F(x + u) - F(x - u)] du -cos ate_,tX f (t) dtot o.As a sort of reciprocal, we havefT12a u o0/ du / f (t) dt = 1 - cosx 2 ax dF(x) .0 u o°4 . If f (t)/t c L 1 (-oo, oo), then for each a > 0 such that ±a are pointsof continuity of F, we have00sin at f.F(a) - F(-a) = 1(t)dt .rr 0 t


I6 .2 UNIQUENESS AND INVERSION 1 1 6 75. What is the special case of the inversion formula when f - 1? Deducealso the following special cases, where a > 0 :1 f °O sin at sin tdt=aAl,7r xt21 °O sin at (sin t)2 a 2--006. For each n > 0, we haveJ",t3 d t = a - 4for a < 2 ; 1 for a > 2 .1 °O sin ttn+2 j2 udt =where Cp l = 2and tOn = cpn-1 * (PI for n > 2 .f~pn (t)dtdu,*7. If F is absolutely continuous, then liml t l,, f (t) = 0 . Hence, if theabsolutely continuous part of F does not vanish, then lim t, I f (t)I < 1 . IfF is purely discontinuous, then lim, f (t) = 1 . [The first assertion is theRiemann-Lebesgue lemma ; prove it first when F has a density that is asimple function, then approximate . The third assertion is an easy part of theobservation that such an f is "almost periodic" .]8. Prove that for 0 < r < 2 we have00rIx dFxI O= C(r) 1Itff 00R+I (t) dtW 00whereC(r) -Cf 00°O 1 -cos a -1 F(r + 1) r7rIuIr+1du =7rsin 2 ,thus C (l) = 1/,7 . [HINT :IxI r= C(r)°° 1 - cos xt-00t Ir+1dt .]*9. Give a trivial example where the right member of (4)replaced by the Lebesgue integral1 00 e-ttx1 - e-Itx227r J-c,, R f (t) dt .itcannot beBut it can always be replaced by the improper Rieniann integral :1 T2 e-uxi - e -ux2R f (t) dt .T~--x 27r TjitT,- +x


1 6 8 1 CHARACTERISTIC FUNCTION10 . Prove the following form of the inversion formula (due to Gil-Palaez) :1 1 T eitx f(-t) - e-itx f (t)2 {F(x+) + F(x-)} = 2 + Jim dt .Trx a 27rit[Hmrr : Use the method of proof of Theorem 6 .2 .1 rather than the result .]11. Theorem 6 .2 .3 has an analogue in L 2 . If the ch .f. f of F belongsto L 2 , then F is absolutely continuous . [Hmrr : By Plancherel's theorem, thereexists ~0 E L 2 such thatx 1 0o e -itx - 1cp(u) du = ~ f (t) dt .0 27r Jo -itNow use the inversion formula to show thatxF(x) - F(O) _ cp(u) du .]2n o* 12. Prove Theorem 6 .2 .2 by the Stone-Weierstrass theorem . [= : Cf.Theorem 6 .6 .2 below, but beware of the differences . Approximate uniformlyg l and 92 in the proof of Theorem 4 .4 .3 by a periodic function with "arbitrarilylarge" period .]13. The uniqueness theorem holds as well for signed measures [or functionsof bounded variations] . Precisely, if each i.ci, i = 1, 2, is the differenceof two finite measures such thatVt :I eitxA, (dx) = fe itx µ2(dx),then µl = /J, 2-14. There is a deeper supplement to the inversion formula (4) orExercise 10 above, due to B . Rosen . Under the condition(1 + log Ix 1) dF(x) < oo,00the improper Riemann integral in Exercise 10 may be replaced by a Lebesgueintegral . [HINT : It is a matter of proving the existence of the latter . Sinced00 fNJ~ dF(y) sin(x - y)t00t < dF(y){1 + log(1 +NIx - yl)} < oc,Jt fwe havef~Nsin(x t y)t dt =dF(y) N dt~00 Jo Jo j 000000sin(x - y)tdF(y) .


6 .3 CONVERGENCE THEOREMS I 169For fixed x, we havey¢xdF(y)f00 sin(x - y)t dtNt+ Cz f dF(y), oo .]6.3 Convergence theoremsFor purposes of probability theory the most fundamental property of the ch .f.is given in the two propositions below, which will be referred to jointly as theconvergence theorem, due to P . Levy and H . Cramer. Many applications willbe given in this and the following chapter . We begin with the easier half .Theorem 6 .3 .1 . Let {/.6 n , 1 < n < oc) be p.m .'s on R.1 with ch .f.'s { f n , 1


170 1 CHARACTERISTIC FUNCTIONThen we have(a) where µ,, is a p.m . ;(f) f,, is the ch .f. of µ ms .PROOF . Let us first relax the conditions (a) and (b) to require only convergenceof f„ in a neighborhood (-80, 8) of t = 0 and the continuity of thelimit function f (defined only in this neighborhood) at t = 0 . We shall provethat any vaguely convergent subsequence of {µ„ } converges to a p .m . µ . Forthis we use the following lemma, which illustrates a useful technique forobtaining estimates on a p .m . from its ch .f.Lemma .For each A > 0, we haveA - '(2) µ([-2A, 2A]) > A-f (t) dt - 1 .A - ~PROOF OF THE LEMMA . By (8) of Sec . 6 .2, we have(3) ~T T f (t) dtTJ -oosin Tx µ(dx) .Since the integrand on the right side is bounded by 1 for all x (it is defined to be1 at x = 0), and by ITx1 -1 < (2TA) -1 for Ix I > 2A, the integral is bounded by1/1([ -2A, 2A]) + 2TA {1-,u([-2A, 2A]))_ (1 -2TA) /L([ -2A, 2A]) + 2TA'Putting T = A -1 in (3),we obtainAA 12f-Af (t) dt1 1< 2 /u ([-2A, 2A]) + 2which reduces to (2) . The lemma is proved .Now for each 8, 0 < 8 < 80, we have(4)1a28f s f„(t)dt1s28 f sf (t) dt1s28f aIf,, (t) - f (t)I dt .The first term on the right side tends to I as 8 .4, 0, since f (0) = 1 and fis continuous at 0 ; for fixed 8 the second term tends to 0 as n -* oc, bybounded convergence since If,, - f I < 2 . It follows that for any given c > 0,


6 .3 CONVERGENCE THEOREMS 1 171there exist 8 = 8(E) < 8o and n o = n o (E) such that if n > n o , then the leftmember of (4) has a value not less than 1 - E. Hence by (2)(5) µ n ([-28-1 , 28 -1 ]) > 2(1 - E) - 1 > 1 - 2E .Let {/i, ) be a vaguely convergent subsequence of {,u, ), which alwaysexists by Theorem 4 .3 .3 ; and let the vague limit be a, which is always ans .p.m. For each 8 satisfying the conditions above, and such that neither -28 -1nor 28-1 is an atom of µ, we have by the property of vague convergenceand (5) :/t( 1 ) ? /t([-23 -1 , 25 -1 1)lim µn ([-28 -1 , 28 -1 ]) > 1 - 2E .n-+ooSince c is arbitrary, we conclude that µ is a p .m ., as was to be shown .Let f be the ch .f. of a . Then, by the preceding theorem, f,, --* f everywhere; hence under the original hypothesis (a) we have f = f 00 . Thus everyvague limit ,u considered above has the same ch .f . and therefore by the uniquenesstheorem is the same p .m. Rename it µ 00 so that u 00 is the p .m. havingthe ch .f. f,,, . Then by Theorem 4 .3 .4 we have µn-'* ,u 00 . Both assertions (a)and (,8) are proved .As a particular case of the above theorem : if (An, 1 < n < oo} and{ f n, 1 < n < oc) are corresponding p .m.'s and ch .f.'s, then the converse of(1) is also true, namely :(6) An --+ /too '4#' fn --> fcc •This is an elegant statement, but it lacks the full strength of Theorem 6 .3 .2,which lies in concluding that f ,,z) is a ch .f. from more easily verifiable conditions,rather than assuming it . We shall see presently how important this is .Let us examine some cases of inapplicability of Theorems 6 .3 .1 and 6 .3 .2 .Example 1 . Let µ„ have mass z'-at 0 and mass 2at n . Then µ„ --->. /-µx, where it,,,has mass ; at 0 and is not a p .m . We have1 1 intf„ (t) = Z + 7i ewhich does not converge as n -* oo, except when t is equal to a multiple of 27r .Example 2 . Let µ„ be the uniform distribution [-n, n] . Then µ„ --* µ,x , where µ,,,is identically zero . We havefn(t) =1,'if t 0 ;if t=0 ;


172 1 CHARACTERISTIC FUNCTIONandif t = 0 ;f » (t) -~ f (t) = ~ 0,1, if t=0 .Thus, condition (a) is satisfied but (b) is not .Later we shall see that (a) cannot be relaxed to read : f , ( t) converges inI t I < T for some fixed T (Exercise 9 of Sec . 6 .5) .The convergence theorem above settles the question of vague convergenceof p .m .'s to a p .m . What about just vague convergence without restrictionon the limit? Recalling Theorem 4 .4 .3, this suggests first that we replacethe integrand e uTX in the ch .f . f by a function in Co . Secondly, going over thelast part of the proof of Theorem 6 .3 .2, we see that the choice should be madeso as to determine uniquely an s .p .m . (see Sec . 4 .3) . Now the Fourier-Stieltjestransform of an s .p.m . is well defined and the inversion formula remains valid,so that there is unique correspondence just as in the case of a p .m . Thus anatural choice of g is given by an "indefinite integral" of a ch .f., as follows :(7) g(u) = f u[f e`rXµ(dx) dt= fe' - I µ(dx) .0 I ixLet us call g the integrated characteristic function of the s .p.m . µ . We arethus led to the following companion of (6), the details of the proof being leftas an exercise .Theorem 6 .3 .3 . A sequence of s .p .m .'s {,cc,,, 1 < n < oo} converges (to p 0 )if and only if the corresponding sequence of integrated ch .f.'s {g, 1 } converges(to the integrated ch .f. of µ,,) .Another question concerning (6) arises naturally . We know that vagueconvergence for p .m .'s is metric (Exercise 9 of Sec . 4 .4) ; let the metric bedenoted by Uniform convergence on compacts (viz ., in finite intervals)for uniformly bounded subsets of C B ( :% 1) is also metric, with the metricdenoted by ( •, •) 2 , defined as follows :(f, g)2If (t) -g(t)1= suptE'4l 1 + tIt is easy to verify that this is a metric on CB and that convergence in this metricis equivalent to uniform convergence on compacts ; clearly the denominatorI + t 2 may be replaced by any function continuous on Mi l , bounded belowby a strictly positive constant, and tending to +oo as ItI -- oo . Since there isa one-to-one correspondence between ch .f.'s and p .rn .'s, we may transfer the


6.3 CONVERGENCE THEOREMS 1 173metric ( •, .) 2 to the latter by setting(µ, v)2 = (f u , fl,) 2in obvious notation . Now the relation (6) may be restated as follows .Theorem 6 .3 .4 . The topologies induced by the two metrics ( ) 1 and ( ) 2 onthe space of p .m .'s on Z1 are equivalent .This means that for each u and given c > 0, there exists S(1.t, c) such that :(u, v)1 :!S S(µ, E) = (A, v)2 < E,(ii, v)2 < S(µ, E) (A, v)1 < E .Theorem 6 .3 .4 needs no new proof, since it is merely a paraphrasing of (6) innew words . However, it is important to notice the dependence of S on µ (aswell as c) above . The sharper statement without this dependence, which wouldmean the equivalence of the uniform structures induced by the two metrics,is false with a vengeance ; see Exercises 10 and 11 below (Exercises 3 and 4are also relevant) .EXERCISES1 . Prove the uniform convergence of f n in Theorem 6 .3 .1 by an integrationby parts of f e` tX d F, (x) .*2. Instead of using the Lemma in the second part of Theorem 6 .3 .2,prove that µ is a p.m . by integrating the inversion formula, as in Exercise 3of Sec . 6 .2. (Integration is a smoothing operation and a standard technique intaming improper integrals : cf. the proof of the second part of Theorem 6 .5 .2below .)3. Let F be a given absolutely continuous d .f. and let F„ be a sequenceof step functions with equally spaced steps that converge to F uniformly inJT 1 . Show that for the corresponding ch .f.'s we haveVn : sup I f (t) - f,,(t)I = 1 .tER 14. Let F,,, G,, be d .f.'s with ch .f.'s f, and g,, . If f„ - g„ -+ 0 a .e .,then for each f E CK we have f f d F, - f f d G, --- 0 (see Exercise 10of Sec . 4 .4) . This does not imply the Levy distance (F, G,), -+ 0 ; finda counterexample . [HINT : Use Exercise 3 of Sec . 6.2 and proceed as inTheorem 4 .3 .4 .]5. Let F be a discrete d .f . with points of jump {a j , j > 1 } and sizes ofjump {b j , j > 11 . Consider the approximating s .d .f.'s F„ with the same jumpsbut restricted to j < n . Show that F,-'+'F .


174 1 CHARACTERISTIC FUNCTION* 6 . If the sequence of ch .f.'s { fn } converges uniformly in a neighborhoodof the origin, then if, } is equicontinuous, and there exists a subsequence thatconverges to a ch .f. [HINT : Use Ascoli-Arzela's theorem .]7 . If F„ - F and G n -* G, then F,, * G n 4 F * G . [A proof of this simpleresult without the use of ch .f .'s would be tedious .]*8. Interpret the remarkable trigonometric identitysin t °O tt = H cos nn=1in terms of ch .f .'s, and hence by addition of independent r .v .'s . (This is anexample of Exercise 4 of Sec . 6 .1 .)9. Rewrite the preceding formula assin t 00 ttt = 11Cos 22k-1 H COs 22kk=1 k=1Prove that either factor on the right is the ch .f. of a singular distribution . Thusthe convolution of two such may be absolutely continuous . [xiwr : Use thesame r .v .'s as for the Cantor distribution in Exercise 9 of Sec . 5 .3 .]10 . Using the strong law of large numbers, prove that the convolution oftwo Cantor d .f .'s is still singular . [Hmrr : Inspect the frequency of the digits inthe sum of the corresponding random series ; see Exercise 9 of Sec . 5 .3 .]*11. Let Fn , G n be the d .f.'s of µ„ , v n , and f n , g n their ch .f.'s . Even ifsupxEg? , IF,, (x) - G n (x) I - * 0, it does not follow that (f It, gn) 2 -+ 0 ; indeedit may happen that (f,,, g„)2 = 1 for every n . [HINT : Take two step functions"out of phase" .]12 . In the notation of Exercise 11, even if sup t .R, I f n (t) - g,, (t) I -* 0,it does not follow that (F,,, G n ) --> 0 ; indeed it may - k 1 . [HINT : Let f be anych .f. vanishing outside (-1, 1), f j (t) = e-initf (m j t), g j (t) = e' n i t f (mj t), andF j , G j be the corresponding d .f .'s . Note that if mjn,. 1 0, then F j (x) -k 1,G j (x) -a 0 for every x, and that f j - g j vanishes outside (-m i1 ,mi 1 ) andis 0(sin n j t) near t = 0 . If m j = 2j 2 and n j = jm j then E j (f j - g j) isuniformly bounded in t : for nk+1 < t < nk 1 consider j > k, j = k, j < k separately. LetItn* -1g n = n ~gj,j=1j=1then sup I f n - gn I = O(n -1 ) while F ; - Gn --* 0. This example is due toKatznelson, rivaling an older one due to Dyson, which is as follows . For00


6 .4 SIMPLE APPLICATIONS I 175b > a > 0, letx 2log + b 2F(x) _ - (x 2 +a 22 a logbafor x < 0 and = 1 for x > 0 ; G(x) = 1 - F(-x) .Then,f (t)t [e -° l t l - e-' ]- g(t) _ -7ri ~tl blog-/ aIf a is large, then (F, G) is near 1 . If b/a is large, then (f, g) is near 0 .]6 .4 Simple applicationsA common type of application of Theorem 6 .3 .2 depends on power-seriesexpansions of the ch .f., for which we need its derivatives .Theorem 6 .4 .1 . If the d .f . has a finite absolute moment of positive integralorder k, then its ch .f. has a bounded continuous derivative of order k given by(1) f (k) (t) = f(jxe1tx ) kdF(x) .Conversely, if f has a finite derivative of even order k at ta finite moment of order k .= 0, then F hasPROOF . For k = 1, the first assertion follows from the formula :f (t + h) - f (t)00 ei(t+h)x - e itxh _ 00 hdF(x) .An elementary inequality already used in the proof of Theorem 6 .2 .1 showsthat the integrand above is dominated by txj . Hence if f Jxl dF(x) < oo, wemay let h 0 under the integral sign and obtain (1) . Uniform continuity ofthe integral as a function of t is proved as in (ii) of Sec . 6 .1 . The case of ageneral k follows easily by induction .To prove the second assertion, let k = 2 and suppose that f "(0) existsand is finite . We havef1/ (0)-hof(h)-2f(0)+f(-h)h2


176 1 CHARACTERISTIC FUNCTIONihx_= ;io e h2eihzdF(x)(2)_ 1 cos hx-21o h2dF(x) .As h -* 0, we have by Fatou's lemma,1 - cos hx 1 - cos hxfx 2 dF(x) = 2 f h dF(x) < lim 2 dF(x)o h 2h~O fh2_ - f"(0) .Thus F has a finite second moment, and the validity of (1) for k = 2 nowfollows from the first assertion of the theorem .The general case can again be reduced to this by induction, as follows .Suppose the second assertion of the theorem is true for 2k - 2, and thatf(2k) (0) is finite . Then f (2k-2) (t) exists and is continuous in the neighborhoodof t = 0, and by the induction hypothesis we have in particular(- I )k-1f x 2k-2 dF(x) = f(2k-2) (0)Put G(x) = fX" y2k-2 dF(y) for every x, then G( •)/G(oo) is a d .f. with thech .f.t1( ) = f e itxx2k-2 dF (x) = (-1)k-i f (2k-2) (t)G(oc)G(oo)Hence 1/r"exists, and by the case k = 2 proved above, we have-V(2)(0)Upon cancelling G(oo), we obtainx2k dFx( )G(o) o fx2dG(x) G(«~) f(_ 1) k f(2k) (O)= fx 2k dF(x),which proves the finiteness of the 2kth moment . The argument above failsif G(oc) = 0, but then we have (why?) F = 60 , f = 1, and the theorem istrivial .Although the next theorem is an immediate corollary to the precedingone, it is so important as to deserve prominent mention .Theorem 6 .4 .2 . If F has a finite absolute moment of order k, k an integer> 1, then f has the following expansion in the neighborhood of t = 0 :kJl ,n (i) t .i +o(I tI k ),I I •


6 .4 SIMPLE APPLICATIONS I 177k-1 j(3') f (t) = -j_0m (j) t i+ k~ A (k) Itl k ;where m ( j ) is the moment of order j, / .t (k) is the absolute moment of order k,and l9k I


178 1 CHARACTERISTIC FUNCTIONNow let {X„ , n > 1 } be a sequence of independent r .v .'s with the commond .f. F, and S, = En=1 X j , as in Chapter 5 .Theorem 6 .4.3 .If F has a finite mean m, thenS nn-f m in pr .PROOF. Since convergence to the constant m is equivalent to that in d ist .to Sn, (Exercise 4 of Sec . 4 .4), it is sufficient by Theorem 6 .3 .2 to prove thatthe ch .f. of S„ /n converges to etmt (which is continuous) . Now, by (2) ofSec . 6 .1 we havetE(e it(Sn/n) ) = E(e i(t/n)Sn ) = f -nnBy Theorem 6 .4 .2, the last term above may be written ast1 ~-im-+ ontnnfor fixed t and n --* oo . It follows from (5) that this converges to ell' asdesired .Theorem 6 .4 .4. If F has mean m and finite variance a 2> 0, thenS n - mn -* (D in dist .a Inwherec is the normal distribution with mean 0 and variance 1 .PROOF . We may suppose m = 0 by considering the r .v .'s X j - m, whosesecond moment is Q 2 . As in the preceding proof, we haveC,SnC exp it = f tn(ft,1+ i2a2t2+ o)(t 2 2 n -= j 1- + o t --+ e r~/2l 2n nThe limit being the ch .f . of c, the proof is ended .The convergence theorem for ch .f.'s may be used to complete the methodof moments in Theorem 4 .5 .5, yielding a result not very far from what ensuesfrom that theorem coupled with Carleman's condition mentioned there .


6 .4 SIMPLE APPLICATIONS 1 179Theorem 6 .4 .5 . In the notation of Theorem 4 .5 .5, if (8) there holds togetherwith the following condition :m (k) t k(6) Vt E'W 1 : lim = 0,k-+ oo k !then F n - 17). F .PROOF . Let f n be the ch .f. of Fn . For fixed t and an odd k we have bythe Taylor expansion for e`tt with a remainder term :fdFn (x) _n (t) = f e`tXkemnk+l) tk+l(lt)j m (j) +nJij=o(k + 1)! '(ltx)j +l itxl k+1 dFn9(x)j=0 J . (k + 1)'.where 0 denotes a "generic" complex number of modulus 0, by condition (6) there exists an odd k = k(E) such that for thefixed t we have(8)(2m (k+l)+1 )t k+l E(k + 1) ! < 2Since we have fixed k, there exists n o = n 0 (E) such that if n > n 0 , thenm (k+l) < m (k+l) + 1n - 'and moreover,max I m (') - MW I < e -Itl E1


180 1 CHARACTERISTIC FUNCTIONHence f „ (t) -+ f (t) for each t, and since f is a ch .f., the hypotheses ofTheorem 6 .3 .2 are satisfied and so F„ -4 F .As another kind of application of the convergence theorem in which alimiting process is implicit rather than explicit, let us prove the followingcharacterization of the normal distribution .Theorem 6 .4 .6 . Let X and Y be independent, identically distributed r .v .'swith mean 0 and variance 1 . If X + Y and X - Y are independent then thecommon distribution of X and Y is (D .PROOF . Let f be the ch .f ., then by (1), f'(0) = 0, f"(0) = -1 . The ch .f.of X + Y is f (t)2 and that of X - Y is f (t) f (-t) . Since these two r .v .'s areindependent, the ch .f . f (2t) of their sum 2X must satisfy the following relation :(9) f (2t) = f (t) 3 f ( - t) .It follows from (9) that the function f never vanishes . For if it did at t0 , thenit would also at t0/2, and so by induction at t0 /2" for every n > 1 . This isimpossible, since limn ,, f (t 0/2n) = f (0) = 1 . Setting for every t :we obtainWp(t) = f ((t)'(10) p(2t) = p(t) 2 .Hence we have by iteration, for each t :C2°tt 2nP(t)=p = 1-{ 0121 211/ }by Theorem 6 .4 .2 and the previous lemma . Thus p(t) - 1, f (t) - f (-t), and(9) becomes(11) f (2t) = f (t)4 .Repeating the argument above with (11),t4f(t)=f 211=)2we have1 tt+O2C 21~2n24"_t 2 /2eThis proves the theorem without the use of "logarithms" (see Sec . 7 .6) .EXERCISES*1 . If f is the ch .f . of X, andf(t)- 1_Ezlim = > - 00,r~0 t 2 2


6 .4 SIMPLE APPLICATIONS 1 181then c` (X) = 0 and c`(X 2 ) = a 2 . In particular, if f (t) = 1 + 0(t 2 ) as t -+ 0,then f - 1 .*2 . Let {Xn } be independent, identically distributed with mean 0 and variancea2 , 0 < a2 < oo . Prove thatISn 1 S.+ 2lira ` /~ = 2 lim c~ = -an-oov"n--4oo -,fn- 7i[If we assume only A{X 1 0 0} > 0, c_~ (jX11) < oo and €(X 1 ) = 0, then wehave c`(IS n 1) > C .,fn- for some constant C and all n ; this is known asHornich's inequality .] [Hu rr : In case a 2 = oo, if lim n (-r(1 S,/J) < oo, thenthere exists Ink} such that S n / n k converges in distribution ; use an extensionof Exercise 1 to show I f (t//,i)I2n -* 0 . This is due to P. Matthews .]3. Let GJ'{X = k} = Pk, 1 < k < f < oc, Ek=1 pk = 1 . The sum S n ofn independent r .v .'s having the same distribution as X is said to have amultinomial distribution . Define it explicitly . Prove that [Sn - c``(S n )]/a(S n )converges to 4) in distribution as n -± oc, provided that a(X) > 0 .*4. Let X n have the binomial distribution with parameter (n, p n ), andsuppose that n p n --+ a, > 0 . Prove that X n converges in dist . to the Poisson d .f.with parameter A . (In the old days this was called the law of small numbers .)5. Let X A have the Poisson distribution with parameter A . Prove thatconverges in dist . to' as k --* oo .*6. Prove that in Theorem 6 .4 .4, S n /6 .,fn- does not converge in probability. [HINT : Consider S,/a/ and S tn /o 2n .]7. Let f be the ch .f. of the d .f. F . Suppose that as t ---> 0,f (t) - 1 = O(ItI « )where 0 < a < 2, then as A --> 00,IxI >AdF(x) = O(A -") .[HINT : Integrate f xI>A (1 - cos tx) dF(x) < Ct" over t in (0, A) .]8. If 0 < a < 1 and f IxI" dF(x) < oc, then f (t) - 1 = o(jtI') as t -+ 0 .For 1 < a < 2 the same result is true under the additional assumption thatf xdF(x) = 0 . [HIT : The case 1 < a < 2 is harder . Consider the real andimaginary parts of f (t) - 1 separately and write the latter asIXI < E/lsin tx dF(x) +Jsin tx dF(x) .IXI>E 1 ItI


182 1 CHARACTERISTIC FUNCTIONThe second is bounded by (Its/E)« f i, 1, 11 , 1 Jxj « dF(x) = o(ltl«) for fixed c . Inthe first integral use sin tx = tx + O(ItxI 3 ),IXIE/ItixdF(x),IXI 0, 0 < a < 2, is a ch .f. (Theorem 6 .5 .4below) . Let {Xj , j > 11 be independent and identically distributed r .v .'s witha common ch .f . of the form1 - $jti« + o0ti«)as t -a 0. Determine the constants b and 9 so that the ch .f. of S, /bne convergesto e -ltl°10. Suppose F satisfies the condition that for every q > 0 such that asA --± oo,fX>AdF(x) = O(e-"A ) .Then all moments of F are finite, and condition (6) in Theorem 6 .4 .5 is satisfied .11 . Let X and Y be independent with the common d .f . F of mean 0 andvariance 1 . Suppose that (X + Y)1,12- also has the d .f . F . Then F =- 4) . [HINT :Imitate Theorem 6 .4 .5 .]*12 . Let {X j , j > 11 be independent, identically distributed r .v .'s withmean 0 and variance 1 . Prove that both11 n,ln-j=1EXjj=1andnconverge in dist . t o (D . [Hint : Use the law of large numbers .]13 . The converse part of Theorem 6 .4 .1 is false for an odd k. Example .F is a discrete symmetric d .f. with mass C/n 2 log n for integers n > 3, whereO is the appropriate constant, and k = 1 . [HINT : It is well known that the seriesconverges uniformly in t .]nsin ntn log n


6 .4 SIMPLE APPLICATIONS 1 183We end this section with a discussion of a special class of distributionsand their ch .f .'s that is of both theoretical and practical importance .A distribution or p .m . on 31 is said to be of the lattice type iff itssupport is in an arithmetical progression -that is, it is completely atomic(i .e ., discrete) with all the atoms located at points of the form {a + jd }, wherea is real, d > 0, and j ranges over a certain nonempty set of integers . Thecorresponding lattice d.f. is of form :F(x) =00j= -00where p j > 0 and p j = 1 . Its ch .f. isa+jd (x),(12) f (t) = e00ait ejditj=-00which is an absolutely convergent Fourier series . Note that the degenerate d .f .Sa with ch .f. e' is a particular case . We have the following characterization .Theorem 6 .4.7 . A ch .f. is that of a lattice distribution if and only if thereexists a t o :A 0 such that I f (t o ) I = 1 .I PROOF . The "only if" part is trivial, since it is obvious from (12) that fis periodic of period 2rr/d . To prove the "if" part, let f (t o ) = e i9°, where 00is real; then we have1 = e-i00 f(to)= I e i(t°X-00) g (dx )and consequently, taking real parts and transposing :(13) 0 = f[1 - cos(tox - 00)]g(dx) .The integrand is positive everywhere and vanishes if and only if for someinteger j,BOx +>C to )It follows that the support of g must be contained in the set of x of this formin order that equation (13) may hold, for the integral of a strictly positivefunction over a set of strictly positive measure is strictly positive . The theoremis therefore proved, with a = 00/t0 and d = 27r/to in the definition of a latticedistribution .


184 1 CHARACTERISTIC FUNCTIONIt should be observed that neither "a" nor "d" is uniquely determinedabove ; for if we take, e .g ., a' = a + d' and d' a divisor of d, then the supportof z is also contained in the arithmetical progression {a' + jd'} . However,unless ,u is degenerate, there is a unique maximum d, which is called the"span" of the lattice distribution . The following corollary is easy .Corollary . Unless I f I - 1, there is a smallest to > 0 such that f (to) = 1 .The span is then 27r/to .Of particular interest is the "integer lattice" when a = 0, d = 1 ; that is,when the support of ,u is a set of integers at least two of which differ by 1 . Wehave seen many examples of these . The ch .f. f of such an r.v . X has period27r, and the following simple inversion formula holds, for each integer j :I7r(14) 9'(X = j) = Pj = - ~ f (t)e-Pu dt,- 2 ,7where the range of integration may be replaced by any interval of length2,7. This, of course, is nothing but the well-known formula for the "Fouriercoefficients" of f . If {Xk } is a sequence of independent, identically distributedr .v .'s with the ch .f. f, then S„ Xk has the ch .f. (f)', and the inversionformula above yields :(15) ~'(S" = j) = 2n[f (t)]"edt .-nThis may be used to advantage to obtain estimates for S, (see Exercises 24to 26 below) .EXERCISESf or f, is a ch .f. below .14 . If I f (t) I = 1, 1 f (t')I = 1 and t/t' is an irrational number, then f isdegenerate . If for a sequence {t k } of nonvanishing constants tending to 0 wehave I f (0 1 = 1, then f is degenerate .*15 . If l f „ (t) l -)~ 1 for every t as n --* oo, and F" is the d .f. correspondingto f ", then there exist constants a,, such that F" (x + a" )48o . [HINT :Symmetrize and take a,, to be a median of F" .]* 16 . Suppose b,, > 0 and I f (b" t) I converges everywhere to a ch .f . thatis not identically 1, then b" converges to a finite and strictly positive limit .[HINT: Show that it is impossible that a subsequence of b" converges to 0 orto +oo, or that two subsequences converge to different finite limits .]* 17 . Suppose c" is real and that e`'^`` converges to a limit for every tin a set of strictly positive Lebesgue measure . Then c" converges to a finite


6 .4 SIMPLE APPLICATIONS 1 185limit . [HINT : Proceed as in Exercise 16, and integrate over t . Beware of anyargument using "logarithms", as given in some textbooks, but see Exercise 12of Sec . 7 .6 later .]* 18 . Let f and g be two nondegenerate ch .f .'s . Suppose that there existreal constants a„ and bn > 0 such that for every t :.f n (t) -* f (t) and eltan /b„ fn t -* g(t) .b nThen a n -- a, bn --* b, where a is finite, 0 < b < oo, and g(t) = eitalb f(t/b) .[HINT : Use Exercises 16 and 17 .]*19. Reformulate Exercise 18 in terms of d .f.'s and deduce the followingconsequence . Let F n be a sequence of d .f.'s a n , a'n real constants, b n > 0,b;, > 0 . IfFn (bn x + a n ) -v+ F(x)andF n (bnx + an ) -4 F(x),where F is a nondegenerate d .f., thenbn -~ 1 and an an ~ 0 .nb n[Two d .f.'s F and G such that G(x) = F(bx + a) for every x, where b > 0and a is real, are said to be of the same "type" . The two preceding exercisesdeal with the convergence of types .]20 . Show by using (14) that I cos tj is not a ch .f. Thus the modulus of ach .f. need not be a ch .f ., although the squared modulus always is .21 . The span of an integer lattice distribution is the greatest commondivisor of the set of all differences between points of jump .22 . Let f (s, t) be the ch .f. of a 2-dimensional p .m . v . If I f (so , to) I = 1for some (so, t o ) :0 (0, 0), what can one say about the support of v?*23 . If {Xn } is a sequence of independent and identically distributed r .v .'s,then there does not exist a sequence of constants {c n } such that En (X„ - c„ )converges a .e ., unless the common d .f. is degenerate .In Exercises 24 to 26, let S n Xj , where the X's are independent r .v .'swith a common d .f. F of the integer lattice type with span 1, and taking both>0 and


186 1 CHARACTERISTIC FUNCTION25 . If F 0 60, then there exists a constant A such that for every j :5{S,-1 /2= j} < An .[HINT : Use a special case of Exercise 27 below .]26 . If F is symmetric and f Ix I dF(x) < oo, thenn-13111S, = j } --+ oc .[HINT : 1 -f(t) =o(ItI) as t-+ 0 .127. If f is any nondegenerate ch . f, then there exist constants A > 0and 8 > 0 such thatf (t)I < 1 -At 2 for I t I < 8 .[HINT: Reduce to the case where the d .f. has zero mean and finite variance bytranslating and truncating .]28 . Let Q,, be the concentration function of S, = En j-1 X j , where theXD 's are independent r.v.'s having a common nondegenerate d .f . F . Then forevery h > 0,Qn (h) < An -1 / 2[HINT: Use Exercise 27 above and Exercise 16 of Sec . 6 .1 . This result is dueto Levy and Doeblin, but the proof is due to Rosen .]In Exercises 29 to 35, it or Ak is a p.m . on u41 = (0, 1] .29 . Define for each n :fu(n) = fe2mnx A(dx) .9rProve by Weierstrass's approximation theorem (by trigonometrical polynomials)that if f A , (n) = f ,,, (n) for every n > 1, then µ 1 - µ2 . The conclusionbecomes false if =?,l is replaced by [0, 1] .30 . Establish the inversion formula expressing µ in terms of the f µ (n )'s .Deduce again the uniqueness result in Exercise 29 . [HINT : Find the Fourierseries of the indicator function of an interval contained in W .]31 . Prove that I f µ (n) I = 1 if and only if . has its support in the set{eo + jn -1 , 0 < j < n - 1} for some 00 in (0, n -1 ] .*32. ,u, is equidistributed on the set {jn -1 , 0 < j < n - 1 } if and only iff A(i) = 0 or 1 according to j t n or j I n .* 33 . itk- v µ if and only if f :-, k ( ) -,, f ( ) everywhere .34 . Suppose that the space 9/ is replaced by its closure [0, 1 ] and thetwo points 0 and I are identified ; in other words, suppose 9/ is regarded as the


6 .5 REPRESENTATIVE THEOREMS 1 187circumference of a circle ; then we can improve the result in Exercise 33as follows . If there exists a function g on the integers such that f , , A ( .) -+ g( .)eve where, then there exists a p.m . µ on such that g = f µ and µk -µon 'l .35. Random variables defined on are also referred to as defined"modulo 1" . The theory of addition of independent r .v .'s on 0 is somewhatsimpler than on 0, as exemplified by the following theorem . Let {X j , j > 1}be independent and identically distributed r .v .'s on 0 and let Sk = Ek j=1 Xj .Then there are only two possibilities for the asymptotic distributions of Sk .Either there exist a constant c and an integer n > I such that Sk - kc convergesin dist . t o the equidistribution on {jn -1 , 0 < j < n - 1} ; or Sk converges indist . to the uniform distribution on [HINT : Consider the possible limits of(f N, (n) )k as k -+ oo, for each n . ]6 .5 Representation theoremsA ch .f . is defined to be the Fourier- Stieltjes transform of a p .m . Can it be characterizedby some other properties? This question arises because frequentlysuch a function presents itself in a natural fashion while the underlying measureis hidden and has to be recognized . Several answers are known, but thefollowing characterization, due to Bochner and Herglotz, is most useful . Itplays a basic role in harmonic analysis and in the theory of second-orderstationary processes .A complex-valued function f defined on R1 is called positive definite ifffor any finite set of real numbers t j and complex numbers z j (with conjugatecomplex zj), 1 < j < n, we have(1)E T~ f (tj - tk)ZjZk ? 0 -i=1 k=1nnLet us deduce at once some elementary properties of such a function .Theorem 6 .5 .1 . If f is positive definite, then for each t E0 :f(-t) = f (t), I f (t)I 0 :(2)f (s - t)~(s)~(t) ds dt > 0 .


188 1 CHARACTERISTIC FUNCTIONPROOF . Taking n = 1, t 1 = 0, zl = 1 in (1), we see thatf(0)>0 .Taking n = 2, t1 = 0, t2 = t, ZI = Z2 = 1, we havechanging z 2to i, we have2 f (0) + f (t) + f (-t) - 0 ;f (0 ) + f (t)i - f ( -t)i + f (0) > 0 .Hence f (t) + f (-t) is real and f (t) - f (-t) is pure imaginary, which implythat f (t) = f (-t) . Changing zl to f (t) and z2 to -If (t)I, we obtain2f (0) If (t) 12 - 21f (t)1 3 > 0 .Hence f (0) > I f (t) 1, whether I f (t) I = 0 or > 0. Now, if f (0) = 0, thenf ( .) = 0 ; otherwise we may suppose that f (0) = 1. If we then take n = 3,t l = 0, t2 = t, t3 = t + h, a well-known result in positive definite quadraticforms implies that the determinant below must have a positive value :f (0) f (-t) f (-t - h)f (t) f (0) f (-h)f(t+h) f(h) f(0)It follows that= 1 - If (t)1 2 - I f (t + h) 1 2 - I f (h) 12 + 2R{f (t)f (h)f (t + h)} ? 0 .if (t)-f(t+h)1 2=1f(t)1 2 +if(t+h)1 2 -2R{f(t)f(t+h)}< 1 - I f (h) 1 2 + 2R{f (t)f (t + h)[f (h) - 11)< 1 - If (h) 1 2 + 211 - f (h)I < 411 - f (h)I .Thus the "modulus of continuity" at each t is bounded by twice the squareroot of that at 0; and so continuity at 0 implies uniform continuity everywhere. Finally, the integrand of the double integral in (2) being continuous,the integral is the limit of Riemann sums, hence it is positive because thesesums are by (1) .Theorem 6 .5 .2 . f is a ch .f . if and only if it is positive definite and continuousat 0 with f (0) = 1 .Remark .It follows trivially that f is the Fourier-Stieltjes transformeltX v(dx)


l6 .5 REPRESENTATIVE THEOREMS 1 189of a finite measure v if and only if it is positive definite and finite continuousat 0 : then v(A 1 ) = f (0) .PROOF . If f is the ch .f. of the p .m . g, then we need only verify that it ispositive definite . This is immediate, since the left member of (1) is thenxe"'zje` rkzkµ(dx) _ne`t) zjj=12,u (dx) > 0 .Conversely, if f is positive definite and continuous at 0, then byTheorem 6 .5 .1, (2) holds for ~(t) = e-"x . Thus(3)2 1T Tf f f (s - t)e-`(s-r)X ds dt > 0 .0 oby a change of variable and partial integra-Denote the left member by PT (x) ;tion, we have(4)TPT(x) = 2n f T Cl - ~tJ f (t)e-''x dt .Now observe that for a > 0,1 fa 1 a 2 sin Pt 2(1 - cos at)o dj3 -itX dx=- f d fi =a fPe a t ate(where at t = 0 the limit value is meant, similarly later) ; it follows thatwhere1aa f d~ PT (x) dx =0 7r T(5) fT(t) =-T I t 1- cos at1 - 1 f (t) ate dt-cos at dt. ff T(t) at271f °° ~ ~ 1 -t cos tf T zd t,7r1 - ItI ) f(t), if ItI < T ;C0, if I t I > T .Note that this is the product of f by a ch .f. (see Exercise 2 of Sec . 6 .2)and corresponds to the smoothing of the would-be density which does notnecessarily exist .a


190 1 CHARACTERISTIC FUNCTIONSince I fT (t)I < I f (t) I < 1 by Theorem 6 .5 .1, and (1 - cost)/t 2 belongs toL 1 (-oc, oc), we have by dominated convergence :(6) Jim af o1~d,Bf PT1(x) dx = 7r- 1Here the second equation follows from the continuity of f T7r( 00 1 -at 0 withf T(0) = 1, and the third equation from formula (3) of Sec . 6 .2 . Since PT ? 0,the integral f ~PT (x)dx is increasing in ,B, hence the existence of the limit ofits "integral average" in (6) implies that of the plain limit as f -+ oo, namely :limfT ( ṯ ~ cost dt00 a-* °° a t2f0 1 - costt 2 dt = 1 .(7)ffi00 6-+ 00 f8PT (x) dx = lim PT (x) dx = 1 .Therefore PT is a probability density functio n . Returning to (3) and observingthat for real z :dx = 2 sin $(z - t)e'iXe-`txz-twe obtain, similarly to (4) :1a o d~ e PT (x)dx = ~~ fT(t)7r - a(zfco- t)2a J_ 18IrX1- 1 f00- cos a(z - t)TTC oo a t 2dtt 1 -costz-- dt .Note that the last two integrals are in reality over finitea -* oc, we obtain by bounded convergence as before :intervals . Letting(8)e'COO -00= fT(z),the integral on the left existing by (7) . Since equation (8) is valid for each r,we have proved that fT is the ch .f. of the density function PT . Finally, sincef T (z) --+ f (z) as T -~ oc for every z, and f is by hypothesis continuous atz = 0, Theorem 6 .3 .2 yields the desired conclusion that f is a ch .f.As a typical application of the preceding theorem, consider a family of(real-valued) r .v .'s {X 1 , t E +}, where °X+ _ [0, oc), satisfying the followingconditions, for every s and t in :W+ :


6 .5 REPRESENTATIVE THEOREMS 1 191(i) f' (X 2 ) = 1 ;(ii) there exists a function r( .) on R1 such that $ (X SX t ) = r(s - t) ;(iii) limo cF((X 0 -Xt ) 2 ) = 0 .A family satisfying (i) and (ii) is called a second-order stationary process orstationary process in the wide sense ; and condition (iii) means that the processis continuous in L2(Q, 3~7, ' q) .For every finite set of t j and z j as in the definitions of positive definiteness,we haveThus rn ntj Zjj=1 k=1_}is a positive definite function . Next, we havee(X tj X tk )ZjZk = tj - tk)ZjZkk=11r(0) - r(t) = 9(Xo (Xo - X t )),hence by the Cauchy-Schwarz inequality,~r(t) - r(0)1 2 < G (X 2 ) g ((Xo - Xt) 2 ) .It follows that r is continuous at 0, with r(O) = 1 . Hence, by Theorem 6 .5 .2,r is the ch .f. of a uniquely determined p .m . R :r(t) = Ie` txR(dx) .This R is called the spectral distribution of the process and is essential in itsfurther analysis .Theorem 6 .5.2 is not practical in recognizing special ch .f.'s or inconstructing them . In contrast, there is a sufficient condition due to Polyathat is very easy to apply .Theorem 6 .5 .3 . Let f on %, . 1 satisfy the following conditions, for each t :(9) f (0) = 1, f (t) > 0, f (t) = f (-t),f is decreasing and continuous convex in R+ _ [0, oc) . Then f is a ch .f.PROOF .Without loss of generality we may suppose thatf (oc) = lim f (t) = 0 ;t +00otherwise we consider [f (t) - f (oc)]/[ f (0) - f (oc)] unless f (00) = 1, inwhich case f - 1 . It is well known that a convex function f has right-hand


192 1 CHARACTERISTIC FUNCTIONand left-hand derivatives everywhere that are equal except on a countable set,that f is the integral of either one of them, say the right-hand one, which willbe denoted simply by f', and that f is increasing . Under the conditions ofthe theorem it is also clear that f is decreasing and f is negative in R+ .Now consider the f T as defined in (5) above, and observe that-fT'(t) - = -(1-T f'(t)+ f(t), if 0


6 .5 REPRESENTATIVE THEOREMS I 193works for 0 < a < 2 . Consider the density functionp(x) =and compute its ch .f. f as follows, usingaif lxl > 1,2lxla+i0 if lxl


194 1 CHARACTERISTIC FUNCTIONthe function f T to be f in [-T, T], linear in [-T', -T] and in [T, T'], andzero in (-oo, -T') and (T', oo) . Clearly f T also satisfies the conditions ofTheorem 6 .5 .3 and so is a ch .f . Furthermore, f = fT in [-T, T] . We havethus established the following theorem and shown indeed how abundant thedesired examples are .Theorem 6 .5.5 . There exist two distinct ch .f.'s that coincide in an intervalcontaining the origin .That the interval of coincidence can be "arbitrarily large" is, of course,trivial, for if f 1 = f 2 in [-8, 8], then gl = $2 in [-n8, n8], whereg1(t)=f1Cn , g2(t)=f2Cn/.Corollary . There exist three ch .f.'s, f 1, f 2, f 3, such that f 1 f 3- f 2 f 3 butfl # f2 .To see this, take f I and f 2 as in the theorem, and take f 3 to be anych .f. vanishing outside their interval of coincidence, such as the one describedabove for a sufficiently small value of T . This result shows that in the algebraof ch .f.'s, the cancellation law does not hold .We end this section by another special but useful way of constructingch .f .'s .Theorem 6 .5 .6 . If f is a ch .f., then so is e" ( f -1) for each A > 0 .PROOF . For each A > 0, as soon as the integer n > A, the function1--+~'f=1+k(f-1)n nnis a ch .f., hence so is its nth power ; see propositions (iv) and (v) of Sec .As n -* oc,1 + ), (f -1)n --+ eA(f-1)nand the limit is clearly continuous . Hence it is a ch .f. by Theorem 6 .3 .2 .Later we shall see that this class of ch .f.'s is also part of the infinitelydivisible family . For f (t) = e", the correspondinge;Je,_00 e -,lAne itnn=0is the ch .f . of the Poisson distribution which should be familiar to the reader .n !


6 .5 REPRESENTATIVE THEOREMS I 195EXERCISES1 . If f is continuous in :W' and satisfies (3) for each x E and eachT > 0, then f is positive definite .2. Show that the following functions are ch .f.'s :1 _ 1 - Itl", if Iti < 1 ;1 -+ Itl'f (t) 0, if Itl > 1 ; 0 < a < l,f(t) =1- Itl, if 0 < Itl < i4iif Iti > z .3 . If {X,} are independent r .v.'s with the same stable distribution ofexponent a, then Ek=l X k /n 'I' has the same distribution . [This is the originof the name "stable" .]4 . If F is a symmetric stable distribution of exponent a, 0 < a < 2, thenf 00 Ix J' dF(x) < oo for r < a and = oo for r > a . [HINT : Use Exercises 7and 8 of Sec . 6 .4 .]*5. Another proof of Theorem 6 .5 .3 is as follows . Show thatand define the d .f.Next show thatG on R+ by1000tdf'(t) = 1G(u) = ft d f'(t) .[0 u](1 - Itl dG(u) = f (t) .JoUHence if we setf(u,t)=(1_u) v0u(see Exercise 2 of Sec . 6 .2), thenf (t) = f[O,oc)f (u, t) dG(u) .Now apply Exercise 2 of Sec . 6 .1 .6 . Show that there is a ch .f. that has period 2m, m an integer > 1, andthat is equal to 1 - Itl in [-1, +1] . [HINT : Compute the Fourier series of sucha function to show that the coefficients are positive .]


196 1 CHARACTERISTIC FUNCTION7. Construct a ch .f . that vanishes in [-b, -a] and [a, b], where 0 < a


6 .6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 1 197The condition (2) is to be contrasted with the following identify in onevariable :Vt : fX+Y (t) = fX (Of Y (t),where fx+Y is the ch .f . of X + Y . This holds if X and Y are independent, butthe converse is false (see Exercise 14 of Sec . 6 .1) .PROOF OF THEOREM 6 .6 .1 . If X and Y are independent, then so are e isx ande"Y for every s and t, hence(~(e i (sx+tY)) = cf(e'sX , eitY) = e(eisx) (t(e itY ) ,which is (2) . Conversely, consider the 2-dimensional product measure µ i x µ2,where ILl and µ2 are the 1-dimensional p .m .'s of X and Y, respectively, andthe product is defined in Sec . 3 .3 . Its ch .f. is given by definition as(sXy)xffei+t (ILi A2)(dx, dy) = ffe isxg?2 °Rze `t) /L l ( dx)IL2 (dy)= f eisXµi(dx) f eity 92(dy) = fx(s)f y(t)(Fubini's theorem!) . If (2) is true, then this is the same as f (x, Y) (s, t), so thatµ l x µ2 has the same ch .f. as µ, the p .m . of (X, Y) . Hence, by the uniquenesstheorem mentioned above, we have I,c l x i 2 = / t . This is equivalent to theindependence of X and Y .The multidimensional analogues of the convergence theorem, and someof its applications such as the weak law of large numbers and central limittheorem, are all valid without new difficulties . Even the characterization ofBochner has an easy extension . However, these topics are better pursued withcertain objectives in view, such as mathematical statistics, Gaussian processes,and spectral analysis, that are beyond the scope of this book, and so we shallnot enter into them .We shall, however, give a short introduction to an allied notion, that ofthe Laplace transform, which is sometimes more expedient or appropriate thanthe ch .f., and is a basic tool in the theory of Markov processes .Let X be a positive (>O) r .v . having the d.f. F so that F has support in[0, oc), namely F(0-) = 0 . The Laplace transform of X or F is the functionF on [0, oc) given by(3) F(X) = -P(e - ;'X ) = f e - ~`X dF(x) .[o,oo)It is obvious (why?) thatF(0) = hm F(A) = 1, F(oo) = lim F(A) = F(0) .x J,ox-+oo


198 1 CHARACTERISTIC FUNCTIONMore generally, we can define the Laplace transform of an s .d .f. or a functionG of bounded variation satisfying certain "growth condition" at infinity . Inparticular, if F is an s .d .f., the Laplace transform of its indefinite integralis finite for ~ . > 0 and given byXG(x) = / F(u) du0G(~ ) _ e--xF(x) dx = fe-~ dxf/.c(d y)4[O,oo) [O,oo) [O,x]_fa(dy) fe-Xx dx = - f e_ ~,cc(dy)[O, ) ) [O oo) Awhereµ is the s .p.m . of F . The calculation above, based on Fubini's theorem,replaces a familiar "integration by parts" . However, the reader should bewareof the latter operation . For instance, according to the usual definition, asgiven in Rudin [1] (and several other standard textbooks!), the value of theRiemann-Stieltjes integralis 0 rather than 1, but1000e-~'x d6o(x)or e-~ d8o (x) = Jim 80 (x)e-; x IIE + f00AToc80 (x)Ae - ~ dxis correct only if the left member is taken in the Lebesgue-Stieltjes sense, asis always done in this book .There are obvious analogues of propositions (i) to (v) of Sec . 6 .1 .However, the inversion formula requiring complex integration will be omittedand the uniqueness theorem, due to Lerch, will be proved by a different method(cf. Exercise 12 of Sec . 6 .2) .Theorem 6 .6 .2 . Let F j be the Laplace transform of the d .f . F j with supportin °+,j=1,2 .If F1 = F2, then F1=F2 .PROOF . We shall apply the Stone-Weierstrass theorem to the algebragenerated by the family of functions {e - ~` X, A > 0}, defined on the closedpositive real line : 7+ = [0, oo], namely the one point compactification ofM,+ = [0, oo) . A continuous function of x on R + is one that is continuousin ~+ and has a finite limit as x --* oo . This family separates points on A' +and vanishes at no point of ~f? + (at the point +oo, the member e-0x = 1 of thefamily does not vanish!) . Hence the set of polynomials in the members of the


6 .6 MULTIDIMENSIONAL CASE ; LAPLACE TRANSFORMS I 199family, namely the algebra generated by it, is dense in the uniform topology,in the space C B 04 + ) of bounded continuous functions on o + . That is to say,given any g E CB(:4), and e > 0, there exists a polynomial of the formng, (x) = cje _ ;, ' X ,j=1where c j are real and A j > 0, such thatsup 1g(X) - gE(x)I < E .XE~+Consequently, we havef Ig(x) - gE(x)I dF j (x) < E, j = 1, 2 .By hypothesis, we have for each a . > 0 :fedFi(x)-"x= fedF2(x)-~-- ,and consequently,f gE (x)dF1(x) = fg€ (x)dF2(x) .It now follows, as in the proof of Theorem 4 .4 .1, first thatf g(x) dF1(x) = f g(x) dF2(x)for each g E CB (J2+ ) ; second, that this also holds for each g that is theindicator of an interval in R+ (even one of the form (a, oc] ) ; third, thatthe two p .m .'s induced by F 1 and F2 are identical, and finally that F 1 = F 2as asserted .Remark . Although it is not necessary here, we may also extend thedomain of definition of a d .f. F to 7+ ; thus F(oc) = 1, but with the newmeaning that F(oc) is actually the value of F at the point oo, rather thana notation for limX,,,,F(x) as previously defined . F is thus continuous atoc . In terms of the p.m . µ, this means we extend its domain to ~+ but setit(fool) = 0 . On other occasions it may be useful to assign a strictly positivevalue to u({oo}) .Passing to the convergence theorem, we shall prove it in a form closestto Theorem 6 .3 .2 and refer to Exercise 4 below for improvements .Theorem 6 .6 .3 . Let IF,, 1 < n < oo} be a sequence of s .d .f.'s with supportsin ?,o+ and IF, } the corresponding Laplace transforms . Then F n4F,,,,,, whereF,,, is a d .f., if and only if :


200 1 CHARACTERISTIC FUNCTION(a) lim, , ~ oc F„ (,~) exists for every k . > 0 ;(b) the limit function has the limit 1 as k ., 0 .Remark . The limit in (a) exists at k = 0 and equals 1 if the F n 's ared .f.'s, but even so (b) is not guaranteed, as the example F n = 6, shows .PROOF . The "only if" part follows at once from Theorem 4 .4.1 and theremark that the Laplace transform of an s .d .f. is a continuous function in R+ .Conversely, suppose lim F,, (a,) = GQ.), A > 0 ; extended G to M+ by settingG(0) = 1 so that G is continuous in R+ by hypothesis (b) . As in the proofof Theorem 6 .3 .2, consider any vaguely convergent subsequence Ffk with thevague limit F,,,, necessarily an s .d .f. (see Sec . 4 .3) . Since for each A > 0,e-XX E C0, Theorem 4 .4 .1 applies to yield P,, (A) -± F,(A) for A > 0, whereF,, is the Laplace transform of F, Thus F,, (A) = GQ.) for A > 0, andconsequently for ~. > 0 by continuity of F~ and G at A = 0 . Hence everyvague limit has the same Laplace transform, and therefore is the same F~ byTheorem 6 .6 .2. It follows by Theorem 4 .3 .4 that F n ~ F 00 . Finally, we haveFoo (oo) = Fc (0) = G(0) = 1, proving that F. is a d .f .There is a useful characterization, due to S . Bernstein, of Laplace transformsof measures on R+ . A function is called completely monotonic in aninterval (finite or infinite, of any kind) iff it has derivatives of all orders theresatisfying the condition :(4)(- 1) n f (n) (A) > 0for each n > 0 and each a. in the domain of definition .Theorem 6 .6 .4 . A function f on (0, oo) is the Laplace transform of a d .f. F :e-~X(5) f (.) =dF(x),1?+if and only if it is completely monotonic in (0, oo) with f (0+) = 1 .Remark . We then extend f to M+ by setting f (0) = 1, and (5) willhold for A > 0 .PROOF.The "only if" part is immediate, sincef (n) ('~) = f (-x) n e -ax dF(x) .'Turning to the "if" part, let us first prove that f is quasi-analytic in (0, oo),namely it has a convergent Taylor series there . Let 0 < Ao < A < it, then, by


6 .6 MULTIDIMENSIONAL CASE ; LAPLACE TRANSFORMS 1 201Taylor's theorem, with the remainder term in the integral form, we have(6) f (~) = (~ - ~)k-I f(j)(µ)Jj=0 J '+ (~- µ)k 1 (1 - t)kIf (k) (µ + (~ - µ)t) dt .(k-1)! 0Because of (4), the last term in (6) is positive and does not exceed~- k 1( µ)I (1 - t)k-1f(1)(µ + (Xo - µ)t)dt .(k - 1) ! oFor if k is even, then f (k) 4, and (~ - µ) k > 0, while if k is odd then f (k) Tand (~ - µ) k < 0 . Now, by (6) with ). replaced by ) . o , the last expression isequal to(7);~4 _µ - µ/k-iµ)k- f(j)f (4)(4 - A) Jj=0 J •F, (x) =[nx]n (-1)J f (j) (n) .j=0 j!~_µ(X0 -µkf (, 0 ),where the inequality is trivial, since each term in the sum on the left is positiveby (4) . Therefore, as k -a oc, the remainder term in (6) tends to zero and theTaylor series for f (A) converges .Now for each n > 1, define the discrete s .d .f. F n by the formula :This is indeed an s .d .f., since for each c > 0 and k > 1 we have from (6) :k-1 f(j)()T,j I1 = f (0+) > f (E) > n (E - n )Jj=0Letting E ~ 0 and then k T oo, we see that F,, (oc) < 1 . The Laplace transformof F„ is plainly, for X > 0 :\°O`Je - 'xdF 7 ,(x) = -~(jl~i) ( '~) f(j)(n )Lej=0`~~ - .l/») - n) J f (J) (n) = f(n (1 - e -Zl11 )),=L li (n(1 - ej=0the last equation from the Taylor series . Letting n --a oc, we obtain for thelimit of the last term f (A), since f is continuous at each A . It follows from


2 02 ( CHARACTERISTIC FUNCTIONTheorem 6 .6 .3 that IF, } converges vaguely, say to F, and that the Laplacetransform of F is f . Hence F(oo) = f (0) = 1, and F is a d .f. The theoremis proved .EXERCISES1 . If X and Y are independent r .v .'s with normal d .f.'s of the samevariance, then X + Y and X - Y are independent .*2 . Two uncorrelated r .v .'s with a joint normal d .f. of arbitrary parametersare independent . Extend this to any finite number of r.v .'s .*3. Let F and G be s .d .f.'s . If a. 0 > 0 andPQ.) = G(a,) for all A > A0, thenF = G . More generally, if F(nA0) = G(n .10) for integer n > 1, then F = G .[HINT : In order to apply the Stone-Weierstrass theorem as cited, adjoin theconstant 1 to the family {e - ~` -x, .k > A0} ; show that a function in Co can actuallybe uniformly approximated without a constant .]*4. Let {F„ } be s .d .f.'s . If A0 > 0 and limnw Fn ( ;,) exists for all ~ . > a.0,then IF,J converges vaguely to an s .d .f .5. Use Exercise 3 to prove that for any d .f. whose support is in a finiteinterval, the moment problem is determinate .6 . Let F be an s .d .f . with support in R+ . Define G 0 = F,Gn (x) = IXGn-1 (u) dufor n > 1 . Find G, (A) in terms of F (A. ) .7 . Let f (A) = f°° e`` f (x) dx where f E L 1 (0, oo) . Suppose that f hasa finite right-hand derivative f'(0) at the origin, thenf (0) =lim J,f-00f'(0)= ~. 00A[Af (A) - f (0)] . )*8 . In the notation of Exercise 7, prove that for every a,, µ E M+:(µ - )0 C,*, 00Ce -(~s+ur) f(s+t)fdsdt9. Given a function a on W + that is finite, continuous, and decreasingto zero at infinity, find a a-finite measure µ on A'+ such that`dt > 0 : 1 6(t - s)~i(ds) = 1 .[0,t][HINT : Assume a(0) = 1 and consider the d .f. 1 - a .]


6 .6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 1 203*10 . If f > 0 on (0, oo) and has a derivative f' that is completely monotonicthere, then 1/ f is also completely monotonic .11 . If f is completely monotonic in (0, oo) with f (0+) = + oo, then fis the Laplace transform of an infinite measure /.t on ,W+ :f (a.) = f e- ;`Xµ(dx) .[HIwr : Show that F n (x) < e26 f (B) for each 8 > 0 and all large n, where F,is defined in (7) . Alternatively, apply Theorem 6 .6 .4 to f (A. + n - ')/ f (n -1 )for A > 0 and use Exercise 3 .]12. Let {g,1 , 1 < n < oo} on M+ satisfy the conditions : (i) for eachn, g,, ( .) is positive and decreasing ; (ii) g00 (x) is continuous ; (iii) for eachA>0,0lim f e-kxgn (x) dx = fn-+ooo oe_ ;,Xg00 (x) dx.Thenlim gn (x) = g. (x) for every x E + .n-+co[HINT : For E > 0 consider the sequence fO° e - EX gn (x) dx and show thatbb~~li o f e -EX g„ (x) dx = e-EXg 00 (x) dx, `l gn (b) < goo (b - ),and so on .]aa00Formally, the Fourier transform f and the Laplace transform F of a p.m .with support in T+ can be obtained from each other by the substitution t = iAor A = -it in (1) of Sec . 6 .1 or (3) of Sec . 6 .6 . In practice, a certain expressionmay be derived for one of these transforms that is valid only for the pertinentrange, and the question arises as to its validity for the other range . Interestingcases of this will be encountered in Secs . 8 .4 and 8 .5. The following theoremis generally adequate for the situation .Theorem 6 .6 .5 .The function h of the complex variable z given byh(z) = feu dF(x)is analytic in Rz < 0 and continuous in Rz < 0 . Suppose that g is anotherfunction of z that is analytic in Rz < 0 and continuous in Rz -< 0 such thatVIE 1 : h(it) = g(it) .


2 04 1 CHARACTERISTIC FUNCTIONThen h(z) - g(z) in Rz < 0 ; in particularV). E J??+ : h(-A) = gPROOF .For each integer m > 1, the function hm defined by°° x nh .(Z) _ e'dF(x) = n dF(x)[O,m][O ,m ] n .n-0is clearly an entire function of z . We havee 7-x dF(x) < dF(x)fm,oo) f ,oo)in Rz < 0 ; hence the sequence hm converges uniformly there to h, and his continuous in Rz < 0 . It follows from a basic proposition in the theory ofanalytic functions that h is analytic in the interior Rz < 0 . Next, the differenceh - g is analytic in Rz < 0, continuous is Rz < 0, and equal to zero on theline Rz = 0 by hypothesis . Hence, by Schwarz's reflection principle, it can beanalytically continued across the line and over the whole complex plane . Theresulting entire functions, being zero on a line, must be identically zero in theplane . In particular h - g - 0 in Rz


7Central limit theorem andits ramifications7.1 Liapounov's theoremThe name "central limit theorem" refers to a result that asserts the convergencein dist . of a "normed" sum of r .v .'s, (S 1, - a„)/b 11 , to the unit normal d .f. (D .We have already proved a neat, though special, version in Theorem 6 .4 .4 .Here we begin by generalizing the set-up . If we write(1)we see that we are really dealing with a double array, as follows . For eachn > 1 let there be k, 1 r .v .'s {X„ j , 1 < j < k11}, where kit --* oo as n - oo :X11, X12, . . . , X1k, ;(2)X21, X22, . . .,X2k2 ;. . . . . . . . . . . . . . . . .X ,11 ,X,12, . . . , X 11k,, ,


206 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSThe r.v .'s with n as first subscript will be referred to as being in the nth row .Let F, ,j be the d .f., f n j the ch .f. of X nj ; and put`(Xnj) = a n j,knS n= Sn,knThe particular case kn = n for each n yields a triangular array, and if, furthermore,X,, j = X j for every n, then it reduces to the initial sections of a singlesequence {X j , j > 1) .We shall assume from now on that the r.v.'s in each row in (2) areindependent, but those in different rows may be arbitrarily dependent as inthe case just mentioned -indeed they may be defined on different probabilityspaces without any relation to one another . Furthermore, let us introduce thefollowing notation for moments, whenever they are defined, finite or infinite :(- (Sn ) n j = an,(3) j=1j=1o (1i'n j) = Un j ,kno 2 (Sn) Un 2 2j =S n ,j=1(4)e` (IXnj1 3 )= Ynj,In the special case of (1), we haveIf we take b„ = s ,Xnj = X b j,nthenknj=1U 2 (Xn j) = 1 .By considering X,, j - a,, j instead of Xnj , we may suppose(5) `dn, bj : a„j = 0whenever the means exist . The reduction (sometimes called "norming")leading to (4) and (5) is always available if each X nj has a finite secondmoment, and we shall assume this in the following .In dealing with the double array (2), it is essential to impose a hypothesisthat individual terms in the sumj=1Ynj .2 a 2 (Xj)6 (X n j)= b2nj=1


7.1 LIAPOUNOV'S THEOREM l 207are "negligible" in comparison with the sum itself . Historically, this arosefrom the assumption that "small errors" accumulate to cause probabilisticallypredictable random mass phenomena . We shall see later that such a hypothesisis indeed necessary in a reasonable criterion for the central limit theorem suchas Theorem 7 .2 .1 .In order to clarify the intuitive notion of the negligibility, let us considerthe following hierarchy of conditions, each to be satisfied for every e > 0 :(a) Vj : lim G~'{IXnjI > E} = 0 ;n-*o0(b) lim max vi'{ IX n j I >,e) = 0 ;11 +00 1< j E = 0 ;12 +00 1< j E} = 0 .n+o0j=1It is clear that (d) ~ (c) = (b) = (a) ; see Exercise 1 below . It turns out that(b) is the appropriate condition, which will be given a name .DEFINITION . The double array (2) is said to be holospoudic* iff (b) holds .Theorem 7 .1 .1 .holospoudic is :A necessary and sufficient condition for(2) to be(6)PROOF .Vt E 7 1 : lira max I f n j (t) - 11 =0 .n- *00 1< jE J IxI~E< 2dF n j (x) + It] Ix Ix I dFnj(x)IxJ>E Ix I Eand consequentlymaxif nj (t)- 11 < 2maxJ'{IXnjI > e}+Eltl .Jj* I am indebted to Professor M . Wigodsky for suggesting this word, the only new term coinedin this book .


208 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSLetting n -~ oc, then e -- 0, we obtain (6) . Conversely, we have by theinequality (2) of Sec . 6 .3,J d F„ j(x) < 2 - 2 f f , ; (t)d tXI>EItI_2/E 0, where 9 is a (finite) complex number .()Then we havek„(7) 11(1 +0„ i ) -~ e 9 .j=1PROOF . By (i), there exists n o such that if n > n o , then I9„ j I < 2for allj, so that 1 + 9, j :A 0. We shall consider only such large values of n below,and we shall denote by log (1 + 0,, j ) the determination of logarithm with anangle in (-7r, 7r] . Thus(8) log(1 + 0„ j ) = 9„ j + AI0', j 12 ,where A is a complex number depending on various variables but boundedby some absolute constant not depending on anything, and is not necessarily


7.1 LIAPOUNOV'S THEOREM 1 209the same in each appearance . In the present case we have in factIlog(1 + 9nj) - OnjI =m0njG(2)so that the absolute constant mentioned above may be taken to be 1 . (Thereader will supply such a computation next time .) Hencem-2_ 10n .12 < 1,nj=1log(l+0 )=.+A(This A is not the same as before, but bounded by the same l!) . It followsfrom (ii) and (i) thatn(9) ( „ jI2 < max 10,1


210 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSCondition (i) is satisfied since2max I9n jI 0 .1< jU 2 (X n j)1< j


(D7.1 LIAPOUNOV'S THEOREM 1 211If1,3 -0,Snthen Sn Is, converges in dist . to.This is obtained by setting Xn j = X j /sn . It should be noticed that the doublescheme gets rid of cumbersome fractions in proofs as well as in statements .We proceed to another proof of Liapounov's theorem for a singlesequence by the method of Lindeberg, who actually used it to prove hisversion of the central limit theorem, see Theorem 7 .2.1 below . In recent timesthis method has been developed into a general tool by Trotter and Feller .The idea of Lindeberg is to approximate the sum X1 + . . . + X, in (13)successively by replacing one X at a time with a comparable normal (Gaussian)r .v. Y, as follows . Let {Y j, j > 1) be r .v.'s having the normal distributionN(0, a ) ; thus Y j has the same mean and variance as the corresponding X jabove ; let all the X's and Y's be totally independent . Now putZj=Y1+ . . .+Yj-1-f-Xj+1+ . . .+Xn, 1 < j


212 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSNote that the r .v .'s f (~), f'(4), and f"(t;) are bounded hence integrable . If ~is another r .v . independent of ~ and having the same mean and variance as 77,and c` { l1 3 } < oc, we obtain by replacing 77 with ~ in (16) and then taking thedifference :(17) IC `{f( +77)} - { f( +0}4 < 6 {171 3 +1 1 3 } .This key formula is applied to each term on the right side of (15), with~ = Z j/s„, i = X i /s„ , = Y j/s„ . The bounds on the right-hand side of (17)then add up tofy, + cadS3S 3where c = ,,f8/7r since the absolute third moment of N(0, Q 2 ) is equal to ccr .By Liapounov's inequality (Sec . 3 .2) a < yj , so that the quantity in (18) is0(F„/s ; ) . Let us introduce a unit normal r .v . N for convenience of notation,so that (Y 1 + + Y„ )/s„ may be replaced by N so far as its distribution isconcerned. We have thus obtained the following estimate :(19) Vf E C 3 : c fsn ) } - S{ .f (N)} < O (k) .nConsequently, under the condition (14), this converges to zero as n - * oc . Itfollows by the general criterion for vague convergence in Theorem 6 .1 .6 thatSn IS, converges in distribution to the unit normal . This is Liapounov's form ofthe central limit theorem proved above by the method of ch .f .'s . Lindeberg'sidea yields a by-product, due to Pinsky, which will be proved under the sameassumptions as in Liapounov's theorem above .Theorem 7.1 .3 . Let {x„ } be a sequence of real numbers increasing to +ocbut subject to the growth condition : for some c > 0,2(20) logs~l + - ( 1 + E) -* -ocnas n --* oc . Then for this c, there exists N such that for all n > N we have2 2(21) exp - 2 (1 + 6) < .°~'{ S„ >_ x„s„} < exp -2(1 - E)PROOF . This is derived by choosing particular test functions in (19) . Letf E C 3 be such thatf(x)=0 forx


7 .1 LIAPOUNOV'S THEOREM I 213and put for all x :fn(x)=f(x-xn-2), gn(x)=f(x-xn+2) .Thus we have, denoting by IB the indicator function of B C ~%l?' :I[xn +1,oo) :S f n (x) xn-1}+Or n3S nNow an elementary estimate yields for x -+ +oo :1 °° T{N > x}2 / dy1 x 22n f exp C2nx exp 2 ,(see Exercise 4 of Sec . 7 .4), and a quick computation shows further thatx2GJ'{N > x+ 1} =exp - 2 (1 +o(1)) ,Thus (24) may be written asXP{Sn > xnSn} =exp - (1+0(1)) + O 2 (S3"nSuppose n is so large that the o(1) above is strictly less thanvalue ; in order to conclude (23) it is sufficient to have[4 (1rnS3- = o exp +,G) ,nrnn --)~ oo .This is the sense of the condition (20), and the theorem is proved .E in absolute


214 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSRecalling that s„ is the standard deviation of S,,, we call the probabilityin (21) that of a "large deviation" . Observe that the two estimates of thisprobability given there have a ratio equal to e EX n 2 which is large for each e,_ Xn `/2as x„ --+ -I-oo . Nonetheless, it is small relative to the principal factor eon the logarithmic scale, which is just to say that Ex 2 is small in absolutevalue compared with -x2 /2 . Thus the estimates in (21) is useful on such ascale, and may be applied in the proof of the law of the iterated logarithm inSec . 7 .5 .EXERCISES*1 . Prove that for arbitrary r .v .'s {X„j} in the array (2), the implications(d) : (c) ==~ (b) = (a) are all strict . On the other hand, if the X„ j 's areindependent in each row, then (d) _ (c) .2. For any sequence of r .v .'s {Y,}, if Y„/b, converges in dist . for anincreasing sequence of constants {bn }, then Y, /bn converges in pr . to 0 ifbn = o(b', ) . In particular, make precise the following statement : "The centrallimit theorem implies the weak law of large numbers ."3. For the double array (2), it is possible that S„ /b n converges in dist .for a sequence of strictly positive constants b„ tending to a finite limit . Is itstill possible if b„ oscillates between finite limits?4. Let {XJ ) be independent r .v .'s such that maxi 0,there exists a 8(E) such that F n < 8(E) = L(Fn , (D) < E, where L is Levydistance . Strengthen the conclusion to read :sup JF n (x) - c(x)j < E .xE2A* 6 . Prove the assertion made in Exercise 5 of Sec . 5 .3 using the methodsof this section . [HINT : use Exercise 4 of Sec . 4 .3 .]7 .2 Lindeberg-Feller theoremWe can now state the Lindeberg-Feller theorem for the double array (2) ofSec . 7 .1 (with independence in each row) .Theorem 7 .2 .1 . Assume ~L < oc for each n and j and the reductionhypotheses (4) and (5) of Sec . 7 .1 . In order that as n - oc the two conclusionsbelow both hold :


7 .2 LINDEBERG-FELLER THEOREM 1 215(i) S„ converges in dist . to (D,(ii) the double array (2) of Sec . 7.1 is holospoudic ;it is necessary and sufficient that for eachq > 0, we have(1)nx2 dF„ j (x) -+ 0 .fI=1 IXI>??The condition (1) is called Lindeberg's condition ; it is manifestlyequivalent to(1') n f x 2 dF11j (x) ~ 1 .i=1IXI `'?PROOF . Sufficiency . By the argument that leads to Chebyshev's inequality,we have(2) J'{IXnjl > 17} < 2f x 2 dF,,j (x) .17IxI>nHence (ii) follows from (1) ; indeed even the stronger form of negligibility (d)in Sec . 7 .1 follows . Now for a fixed 17, 0 < 17 < 1, we truncate X,_j as follows :(3)X'__ X,1 j, if IX" ; N n(X",,)I < f IxI dF,l1(x) nand so by (1),~'(S ;,1 `k") I < - L f x 2 dF 1 , j (x) > 0 .j=1 IxI>rlNext we have by the Cauchy-Schwarz inequality( < XI f 2ZxdF n j(x) fIxI> ,~IXI>i1 dF„j(x)'


216 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSand consequentlya2 2(X"1) _ x2 dF ni (x) - C (X;,~ )2fix> f _f }x 2 dFn ~ (x) .I


7.2 LINDEBERG-FELLER THEOREM 1 217We apply the lemma to (1) as follows . For each m > 1, we havelim m f x 2 d F n jn(x) = 0 .+oo j1 = IXI>1/mIt follows that there exists a sequence {>7n }decreasing to 0 such thatfj=1XI > 1/nx 2 dF n j (x) --)~ 0 .Now we can go back and modify the definition in (3) by replacing 77 withi) n . As indicated above, the cited corollary becomes applicable and yields theconvergence of [S ;, - ~~(S;,)]/sn in dist . to (Y, hence also that of Sn/s' asremarked .Finally we must go from Sn to S n . The idea is similar to Theorem 5 .2 .1but simpler . Observe that, for the modified Xn j in (3) with 27 replaced by 17n,we have~P{Sn :A Sn } -


218 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSsuch that if n > n o (T ),thenmax max 1 f n j (t) - 11 < 2jtj


7 .2 LINDEBERG-FELLER THEOREM 1 219< limn-+oofI >7?2 dFn (x)< lim 2n-+ooIQn j 2 2~2 = 2 '77the last inequality above by Chebyshev's inequality . Since 0 < 1 - cos 9 0,IxI7 IxI>?? '72+sdF n(x)Ixl 2+s dFni. (x),showing that (10) implies (1) .The essence of Theorem 7 .2 .1 is the assumption of the finiteness of thesecond moments together with the "classical" norming factor s n , which is thestandard deviation of the sum S n ; see Exercise 10 below for an interestingpossibility . In the most general case where "nothing is assumed," we have thefollowing criterion, due to Feller and in a somewhat different form to Levy .


220 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSTheorem 7.2 .2. For the double array (2) of Sec . 7 .1 (with independence ineach row), in order that there exists a sequence of constants {a„) such that(i) Ek"_1 Xn j - a„ converges in dist . t o (D, and (ii) the array is holospoudic,it is necessary and sufficient that the following two conditions hold for everyq>0 :( a ) E k j n =1 fxl > d F„ j (x)-+ 0 ;(b) ~j=1{fxl


7 .2 LINDEBERG-FELLER THEOREM 1 221Lemma 2 . For each n, the r .v .'s {X n j, 1 < j < n } are independent with thefollowing distributions :1'~{Xnj=m}=- for 0 0, and sufficiently large n, we haven!2 n 3nS 2 -4 n 36 *lx„ j I< j -1


222 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSHence Lindeberg's condition is satisfied for the double arrayX,,;1< j


7 .2 LINDEBERG-FELLER THEOREM 1 223*7. Find an example where Lindeberg's condition is satisfied butLiapounov's is not for any 8 > 0 .In Exercises 8 to 10 below {X j , j > 1 } is a sequence of independent r .v .'s .8. For each j let X j have the uniform distribution in [-j, j] . Show thatLindeberg's condition is satisfied and state the resulting central limit theorem .9. Let X,. be defined as follows for some a > 1 :1fja , with probability 6 j2(a_1) each ;X~ - 10, with probability 1 - 3 j2(a-1)Prove that Lindeberg's condition is satisfied if and only if a < 3/2 .* 10 . It is important to realize that the failure of Lindeberg's conditionmeans only the failure of either (i) or (ii) in Theorem 7 .2 .1 with the specifiedconstants s, A central limit theorem may well hold with a different sequenceof constants . Letwith probability each ;12j 2with probability 12 each ;with probability 1 - 1 - 16 6j 2Prove that Lindeberg's condition is not satisfied . Nonetheless if we take b, _11 3/18, then S„ /b„ converges in d ist . to (D . The point is that abnormally largevalues may not count! [HINT : Truncate out the abnormal value .]11 . Prove that fo x 2 d F (x) < oo implies the condition (11), but notvice versa .* 12 . The following combinatorial problem is similar to that of the numberof inversions . Let Q and '~P be as in the example in the text . It is standardknowledge that each permutation1a,2a2can be uniquely decomposed into the product of cycles, as follows . Considerthe permutation as a mapping n from the set (1, . . ., n) onto itself suchthat 7r(j) = a j . Beginning with 1 and applying the mapping successively,1 --* 7r (1) - 2r 2 (1) -* . . ., until the first k such that rr k (1) = 1 . Thus(1, 7r(l), 7r2(1), . . . , 7rk-1(1))


2241 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSis the first cycle of the decomposition . Next, begin with the least integer,say b, not in the first cycle and apply rr to it successively ; and so on . Wesay 1 --+ r(1) is the first step k-1 (1) --f 1 the kth step, b --k 7r(b) the(k + 1)st step of the decomposition, and so on . Now define X n j (cv) to be equalto 1 if in the decomposition of co, a cycle is completed at the jth step ; otherwiseto be 0 . Prove that for each n, {Xnj , 1 < j < n} is a set of independent r .v .'swith the following distributions :~P{X nj=1}= n1- j+1'~'{Xnj=O}=1-n-j+11Deduce the central limit theorem for the number of cycles of a permutation .7.3 Ramifications of the central limit theoremAs an illustration of a general method of extending the central limit theoremto certain classes of dependent r .v .'s, we prove the following result . Furtherelaboration along the same lines is possible, but the basic idea, attributed toS . Bernstein, consists always in separating into blocks and neglecting small ones .Let {X n , n > 1 } be a sequence of r .v .'s ; let ~n be the Borel field generatedby {Xk, 1 < k < n), and ,,7 ' that by {Xk, n < k < oo} . The sequence is calledm-dependent iff there exists an integer m > 0 such that for every n the fields`gin and z~„+m are independent . When m = 0, this reduces to independence .Theorem 7 .3 .1 . Suppose that {Xn } is a sequence of m-dependent, uniformlybounded r .v .'s such that(T(Snn 1/3 ) - +ooas n -->. oo . Then [S n - f (S n )]/6(S n ) converges in dist . t o (P .PROOF . Let the uniform bound be M . Without loss of generality we maysuppose that f (Xn ) = 0 for each n . For an integer k > 1 let n j = [jn /k],0 < j < k, and put for large values of n :We have thenYj =Xnj+l +Xn j+2 + - • . +Xn ;+1-in ;Zj = Xn jti -m+1 +Xn j+i -m+2 + . . . +Xnj +i .k-1k-1Y 1 +j=0 j=0+ s 11 , say .


(D if Sn /s'n does7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 225It follows from the hypothesis of m-dependence and Theorem 3 .3 .2 that theYd's are independent ; so are the Zj's, provided nj+l - m + 1 - nj > m,which is the case if n 1k is large enough . Although S ;, and S," are notindependent of each other, we shall show that the latter is comparativelynegligible so that S, behaves like S ;, . Observe that each term X, in S ;,'is independent of every term X, in Sn except at most m terms, and that('(X,X,) = 0 when they are independent, while I ci (X,X,) I < M2 otherwise .Since there are km terms in S ;;, it follows thatI(y(S;,Sn)I


226 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSthe last relation from (1) and the one preceding it from a hypothesis of thetheorem . Thus for each 77 > 0, we have for all sufficiently large nJ x > 27 S' nx2 dF n j (x) = 0, 0 < j < k n - 1,where F,, j is the d .f. of Y, j . Hence Lindeberg's condition is satisfied for thedouble array {Y nj/s;,}, and we conclude that S;,/s n converges in d ist. t o (D .This establishes the theorem as remarked above .The next extension of the central limit theorem is to the case of a randomnumber of terms (cf. the second part of Sec . 5 .5) . That is, we shall deal withthe r .v . S vn whose value at c) is given by Svn (w) (co), whereS"( 60 ) X - ( )j=1as before, and {vn (c)), n > 1 } is a sequence of r.v .'s . The simplest case,but not a very useful one, is when all the r .v .'s in the "double family"{Xn , Vn , n > 1 } are independent . The result below is more interesting and isdistinguished by the simple nature of its hypothesis . The proof relies essentiallyon Kolmogorov's inequality given in Theorem 5 .3 .1 .Theorem 7 .3 .2 . Let {X j , j > 1 } be a sequence of independent, identicallydistributed r .v.'s with mean 0 and variance 1 . Let {v,,, n > 1} be a sequenceof r.v .'s taking only strictly positive integer values such that(3)where c is a constant : 0 < c < oc . Then S vn / vn converges in dist . to (D .PROOF . We know from Theorem 6 .4 .4 that S 7, / Tn- converges in d ist . toJ, so that our conclusion means that we can substitute v n for n there . Theremarkable thing is that no kind of independence is assumed, but only the limitproperty in (3) .Vnnnin pr .,First of all, we observe that in the result of Theorem 6 .4.4 wemay substitute [cn] (= integer part of cn) for n to conclude the convergencein dist . of S[un]/,/[cn] to c (why?) . Next we writeS Vn _ S[cn] S v, - S[cn]V n /[cn ] + f[cn ]The second factor on the right converges to 1 in pr ., by (3) . Hence a simpleargument used before (Theorem 4 .4 .6) shows that the theorem will be provedif we show thatVn(4)Sv n - S[ `n] -a 0 in pr .,/[cn ]


7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM ( 227Let E be given, 0 < E < 1 ; puta n = [(1 - E3 )[cn]], bn = [( 1 + E 3 )[cn]] - 1 .By (3), there exists no(E) such that if n > no(E), then the setA = { C) : an < vn (co) < bn }has probability > 1 -,e . If co is in this set, then S„ n (W)(co) is one of the sumsS j with a n < j < b n . For [cn ] < j < b n , we havehence by Kolmogorov's inequalitySj - S[cn ] = X [cn ]+1 + X [cn]+2 + + X j ;JP max I Sj - S[cn] I > E cn _< 6 2 (Sbn - S[cn]) < E3[cn] < E .}[cn] E cn } < 2 E .an -< j


228 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSLet {X j , j > 1 } be a sequence of independent and identically distributedr.v .'s with mean 0 and variance 1 ; andSn =j=1It will become evident that these assumptions may be considerably weakened,and the basic hypothesis is that the central limit theorem should be applicable .Consider now the infinite sequence of successive sums {S, 1 , n > I) . This isthe kind of stochastic process to which an entire chapter will be devoted later(Chapter 8) . The classical limit theorems we have so far discussed deal withthe individual terms of the sequence [S, j itself, but there are various othersequences derived from it that are no less interesting and useful . To give afew examples :max Sin , min Sin , max IS,, 1, max1


7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 229and for each j, define f (j) byNow we write, for 0 < E < x :nt(j)-1 < j < nt(j)-nI max Sm > x,~n- } _ ° '{Ej ; I Snt(j) - Sj I < 6,1n)1 Efn- } and Q2 (S",( ;) - Sj)


230 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSconverges in distribution as n -f oc . Now its ch .f. f (t i , . . . , tk ) is given byF{exp(iVk/n(tjS ni + . . . + t kSnk ))) = d,`'{exp(i-\/k/n[(ti + . . . + tk )S n,+ (t 2 + . . .+ tk) (Sn 2 - S'1)+ . . . + tk(Snk - Sn k _ I )D},which converges to(7) exp [- i(t i + . . . + tk) 2 ] exp [-21(t2 + . . . tk )2] . . . exp (_ z tk) ,since the ch .f.'s ofVnS17k1 9 - (Snz - Sn, ), . . .7nk(Snk -'Snk-1 )all converge to e_ t2 /2 by the central limit theorem (Theorem 6 .4 .4), n .i+l -n j being asymptotically equal to n/k for each j . It is well known that thech .f. given in (7) is that of the k-dimensional normal distribution, but for ourpurpose it is sufficient to know that the convergence theorem for ch .f.'s holdsin any dimension, and so R nk converges vaguely to R ... k, where R 00k is somefixed k-dimensional distribution .Now suppose for a special sequence {X j } satisfying the same conditionsas IX .}, the corresponding Pn can be shown to converge ("pointwise" in fact,but "vaguely" j if need be) :(8) Vx : hm Pn (x) = G(x) .n -± 00Then, applying (6) with Pn replaced by P n and letting n ---). oc, we obtain,since R oo k is a fixed distribution :1G(x) < Rook (x) < G(x + E) + Elk .Substituting back into the original (6) and taking upper and lower limits,we have1 1G(x - E) -2 < Rook (x - E) -6 2k < lim P n (x)Ek Ek n1< Rook(x) < G(x +c) + Elk.Letting k - oo, we conclude that P n converges vaguely to G, since E isarbitrary .It remains to prove (8) for a special cWoice of {Xj } and determine G . Thiscan be done most expeditiously by taking the common d .f. of the X D 's to be


7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 2 3 1the symmetric Bernoullian 2(8 1 + S_ 1 ) . In this case we can indeed computethe more specific probability(9) UT max Sm < x ; Sn = y ,1 y . If we observe thatin our particular case maxl< m < n Sin > x if and only if S j = x for some j,1 < j < n, the probability in (9) is seen to be equal to`'T{Sn = y} - !~71' max S in > x ; S, = yI


232 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSand we have proved that the value of the probability in (9) is given by(10) . The trick used above, which consists in changing the sign of every X jafter the first time when S, reaches the value x, or, geometrically speaking,reflecting the path { (j, Si), j > 1 ] about the line Sj = x upon first reaching it,is called the "reflection principle" . It is often attributed to Desire Andre in itscombinatorial formulation of the so-called "ballot problem" (see Exercise 5below) .The value of (10) is, of course, well known in the Bernoullian case, andsumming over y we obtain, if n is even :~I max Sm


7 .3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 1 23 3EXERCISES1. Let {X j , j > 1 } be a sequence of independent r .v .'s, and f a Borelmeasurable function of m variables. Then if ~k = f (Xk+l, . . . , Xk+,n ), thesequence {~ k , k > 1) is (m - 1)-dependent .*2 . Let {X j , j > 1} be a sequence of independent r .v .'s having theBernoullian d .f. p8 1 + (1 - p)8o , 0 < p < 1 . An r-run of successes in thesample sequence {X j (w), j > 1 } is defined to be a sequence of r consecutive"ones" preceded and followed by "zeros" . Let Nn be the number of r-runs inthe first n terms of the sample sequence . Prove a central limit theorem for N n .3. Let {X j , v j , j > 1} be independent r .v .'s such that the vg's are integervalued,vj --~ oc a .e ., and the central limit theorem applies to (Sn - an)/b,,where S„ _ E~=1 Xj , an , b n are real constants, bn -+ oo . Then it also appliesto (SVn - aV n )l b,,,* 4. Give an example of a sequence of independent and identicallydistributed r.v .'s {X„ } with mean 0 and variance I and a sequence of positiveinteger-valued r .v .'s vn tending to oo a .e . such that S„n /s„, does not convergein distribution . [HINT : The easiest way is to use Theorem 8 .3.3 below .]*5 . There are a ballots marked A and b ballots marked B . Suppose thatthese a + b ballots are counted in random order . What is the probability thatthe number of ballots for A always leads in the counting?6 . If {X n } are independent, identically distributed symmetric r .v .'s, thenfor every x > 0,{IsnI >x} > 1 2 ;, ~{max IXkI >x}1 0 such thatJ~{IXI > n 1 l') > c/n . [This is due to Feller ; use Exercise 3 of Sec . 6 .5 .]8. Under the same hypothesis as in Theorem 7 .3 .3, prove thatmaxi< n,


234 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS1I) n00nn(n+2+Y_Z kx- (n+2_Y_Z)} kx.2 2This can be done by starting the sample path at z and reflecting at both barriers0 and x (Kelvin's method of images) . Next, show thatlim :'7{-z J < Sm < (x - z),1n for 1 < m < n}n-+0C1 (2k+1)x-z 2kx-z2kx-z(2k-1)x-zeye /2 d y .Finally, use the Fourier series for the function h of period 2x :to convert the above limit to_ 1, if - x - z < y < -z ;h( ~ ) +1, if - z < y < x - z ;4 1 . (2k + 1)7rz (2k + 1) 2 7r22k + 1 sin exp -x2X2 ] .=0This gives the asymptotic joint distribution ofmin Sm and max Sm ,1


7 .4 ERROR ESTIMATION 1 2 35as n -~~ oo, thenThus(r + 1)c,~ oz -1 /2 (1 - z) r12dz ( 2) (r+1>/2 .r(2+1)Cr+ 1= Crrr+1 +12Finallye2r/2 r r + 1N„ r F(r+ 1) 22r12r(2+1) - r 1(2)000x rdG(x) .This result remains valid if the common d .f. F of X j is of the integer latticetype with mean 0 and variance 1 . If F is not of the lattice type, no S„ needever be zero-but the "next nearest thing", to wit the number of changes ofsign of S,,, is asymptotically distributed as G, at least under the additionalassumption of a finite third absolute moment .]7 .4 Error estimationQuestions of convergence lead inevitably to the question of the "speed" ofconvergence-in other words, to an investigation of the difference betweenthe approximating expression and its limit . Specifically, if a sequence of d .f.'sF„ converge to the unit normal d .f. c, as in the central limit theorem, whatcan one say about the "remainder term" F„ (x) - c (x)? An adequate estimateof this term is necessary in many mathematical applications, as well as fornumerical computation . Under Liapounov's condition there is a neat "orderbound" due to Berry and Esseen, who improved upon Liapounov's older result,as follows .Theorem 7 .4.1 . Under the hypotheses of Theorem 7 .1 .2, there is a universalconstant A 0 such that(1) sup IF,, (x) - (D (x)I < A0F,xwhere F„ is the d .f . of S,, .


236 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSIn the case of a single sequence of independent and identically distributedr .v .'s {Xj, j > 1 } with mean 0, variance 62 , and third absolute moment y < oc,the right side of (1) reduces tony Aoy 1Ao (n j2 ) 3/2 =9 3 n 1/2'H. Cramer and P . L. Hsu have shown that under somewhat stronger conditions,one may even obtain an asymptotic expansion of the form :H1(x) H2(x) H3(x)F" (x) _ ~(x)+ n1/2 + n + n3/2 + . . .where the H's are explicit functions involving the Hermite polynomials . Weshall not go into this, as the basic method of obtaining such an expansion issimilar to the proof of the preceding theorem, although considerable technicalcomplications arise . For this and other variants of the problem see Cramer[10], Gnedenko and Kolmogorov [12], and Hsu's paper cited at the end ofthis chapter .We shall give the proof of Theorem 7 .4 .1 in a series of lemmas, in whichthe machinery of operating with ch .f.'s is further exploited and found to beefficient .Lemma 1 . Let F be a d .f., G a real-valued function satisfying the conditionsbelow :Set(i) lim1,_ 00 G(x) = 0, lima ,+. G(x) = 1 ;(ii) G has a derivative that is bounded everywhere : supx jG'(x)i < M .1(2) = 2M sup I F(x) - G(x)l .Then there exists a real number a such that we have for every T > 0 :( T° 1 - cosx(3) 2MTA { 3 2 dx - nl / o xcos TxJ {F(x + a) - G(x + a)} dxX 2PROOF . Clearly the A in (2) is finite, since G is everywhere bounded by(1) and (ii) . We may suppose that the left member of (3) is strictly positive, forotherwise there is nothing to prove ; hence A > 0 . Since F - G vanishes at+oc by (i), there exists a sequence of numbers {x„ } converging to a finite limit


22x2x27 .4 ERROR ESTIMATION 1 23 7b such that F(x„) - G(x„) converges to 2MA or -2MA . Hence either F(b) -G(b) = 2MA or F(b-) - G(b) = -2MA . The two cases being similar, weshall treat the second one . Put a = b - A ; then if Ix I < A, we have by (ii)and the mean value theorem of differential calculus :and consequentlyG(x + a) > G(b) + (x - A)MF(x + a) - G(x + a) < F(b-) - [G(b) + (x - 0)M] _ -M(x + A) .It follows that° 1 - cos Tx f° 1 - cos Tx~° x2{F(x+a)-G(x+a)}dx < -M(x+0)dx= -2MA ~° 1 - cos Tx dx ;Jox 2-°001 - cos Txf {F(x+a)-G(x+a)}dx+J ° x° f00) 1 - cos Tx 00 1 - cos Tx< 2M A ~~ + 2 dx = 4M A 2 dx .x ° xAdding these inequalities, we obtainI 1 - cos TxAf O°{F(x+a)-G(x+a)}dx < 2MA +2J -[f00 x 0 °1 - cos Tx ° 00 I - cos Txdx - 2MA -3 + 2dx .0x2This reduces to (3),sincef 00 1 -cos Tx 7Tdx = -o x 22by (3) of Sec . 6.2, provided that T is so large that the left member of (3) ispositive ; otherwise (3) is trivial .0I °Lemma 2 .In addition to the assumptions of Lemma 1, we assume that(iii) G is of bounded variation in (-oo, oo) ;(iv) f 00 JF(x) - G(x){ dx < oo .Let00f(t) _ J e` tx dF(x), g(t) = e`fxdG(x).00 -00


x2238 1CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSThen we have(4) p < 1 ~T I f (t ) - g(t)1 dt + 127rM Jot7rTPROOF .That the integral on the right side of (4) is finite will soon beapparent, and as a Lebesgue integral the value of the integrand at t = 0 maybe overlooked . We have, by a partial integration, which is permissible onaccount of condition (iii) :(5)f (t) - g(t) _ -it f 00 {F(x) - G(x )}e itx dx ;00and consequentlyf(t)-g(t)_ 1t4f{F(x=e°° + a)-it- G(x + a)}e'tx dx .In particular, the left member above is bounded for allMultiplying by T - Itl and integrating, we obtaint ; 0 by condition (iv) .fT(6) f(t)-g(t) .Je-1itItl)dtTT °0f-Tf {F(x0c+ a) - G(x + a)}e` tx (T - I tI) dx AWe may invert the repeated integral by Fubini's theorem and conditionand obtain (cf. Exercise 2 of Sec . 6 .2) :(iv),00{F(x + a) - G(x + a)) 1 - cos~Txdx < T pT I f(t)-g(t)Ix2dt .JotIn conjunction with (3), this yields(7) 2M0 3 f Ix1TThe quantity in braces in (7) is not less thansx dx,- n< T I f(t)-g(t) 1A0 t3f' 1 -cos x 00 2f dx-3 ~ 2 dx-7r= 7r6JTE X2 2 TOUsing this in (7), we obtain (4) .Lemma 2 bounds the maximum difference between two d .f .'s satisfyingcertain regularity conditions by means of a certain average difference betweentheir ch .f .'s (It is stated for a function of bounded variation, since this is needed


7 .4 ERROR ESTIMATION l 23 9for the asymptotic expansion mentioned above) . This should be compared withthe general discussion around Theorem 6 .3 .4. We shall now apply the lemmato the specific case of F„ and (D in Theorem 7 .4 .1 . Let the ch .f . of F„ be fso thatknf n (t)= 11f nj (t) .j=1Lemma 3 . For I t I < 1/(21"1 13), we have(8) If, (t) - e - `2/2 1 < 1' n ItI 3 e _t2/2 .PROOF . We shall denote by 9 below a "generic" complex number with101 < 1, otherwise unspecified and not necessarily the same at each appearance .By Taylor's expansion :2t 1 - Qnj t2 +9Yni t3 .fn ' ( ) 2 6For the range of t given in the lemma, we have by Liapounov's inequality :(9) Ianjtl


24 0 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSor explicitly :t 2log f n (t) + 2 < 1 2 r'„ ItI 3 .Since l e" - 1( < I u I eH"H for all u, it follows that(f„ (t)e2 /2 - 1( < r"2tI3 exphn2tl 3Since I'„ I t 13 /2 < 1/16 and e 1116 < 2, this implies (8) .Lemma 4 . For I t I < 1/(4r,,), we have(10) I f,,(t)I < e -t2/3 .PROOF . We symmetrize (see end of Sec . 6 .2) to facilitate the estimation .We have00If „j (t)1 2 = L f cos t(x - y)dF, j (x) dF n j (Y),since I f „ j I 2 is real . Using the elementary inequalitiescosu-1+ 2< (~ 3Ix - YI 3 < 4(IxI3 + IY1 3 ) ;we see that the double integral above does not exceed0o x t 2 2{1- 2(x 2- 2xy+Y 2 )+3jti 3 (jX1 3 +IY1 3 ) dF, 1 j(x)dF„j(Y)00 0CMultiplying over j, we obtain= 1 -621t2 4 4+ 3ynjItl3 < exp-Q2 jt2 + 3y„jjt1 3If n(t)1 2 < eXp__t2+ 3rnIt13


7 .4 ERROR ESTIMATION 1 24 1PROOF . If Its < 1/(21, 1/3), this is implied by (8) . If 1/(2r n 1 /3 ) < I tI


242 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSan asymptotic evaluation ofI- F,(x,)1 - 4)(x,)as x„ -a oc more rapidly than indicated above . This requires a different typeof approximation by means of "bilateral Laplace transforms", which will notbe discussed here .EXERCISES1 . If F and G are d .f .'s with finite first moments, then~DOIF(x) - G(x)I dx < oo .[HINT : Use Exercise 18 of Sec . 3 .2 .]2 . If f and g are ch .f .'s such that f (t) = g(t) for Itl < T, then00TIF(x)-G(x)Idx< ./ - TThis is due to Esseen (Acta Math, 77(1944)) .*3 . There exists a universal constant A 1 > 0 such that for any sequenceof independent, identically distributed integer-valued r .v .'s {X j } with mean 0and variance 1, we havesup IF ., (x) - (D (x) I ?1/2'where F t? is the d .f . of (E~,1 X j )/~ . [HINT : Use Exercise 24 of Sec . 6 .4 .]4. Prove that for every x > 0 :00X e-x2/2 < e-y2 /2 d < l e -x2 /21 -+- x2 I - - x7 .5 Law of the iterated logarithmThe law of the iterated logarithm is a crowning achievement in classical probabilitytheory . It had its origin in attempts to perfect Borel's theorem on normalnumbers (Theorem 5 .1 .3) . In its simplest but basic form, this asserts : if N„ (w)denotes the number of occurrences of the digit 1 in the first n places of thebinary (dyadic) expansion of the real number w in [0, 1], then N n (w) n/2for almost every w in Borel measure . What can one say about the deviationN, (a)) - n/2? The order bounds O(n(l/21+E) E > 0; O((n log n) 1 / 2 ) ( cf.


7.5 LAW OF THE ITERATED LOGARITHM 1 2 43Theorem 5 .4 .1) ; and O((n log logn) 1 / 2 ) were obtained successively by Hausdorff(1913), Hardy and Littlewood (1914), and Khintchine (1922) ; but in1924 Khintchine gave the definitive answer :limn-+oonN n (w) - 21 2 n log log n1for almost every w . This sharp result with such a fine order of infinity as "loglog" earned its celebrated name . No less celebrated is the following extensiongiven by Kolmogorov (1929) . Let {X n , n > 1} be a sequence of independentr .v .'s, S, = En=, X j ; suppose that S' (X„) = 0 for each n andS(1) sup IX, (w)I = ° ( iogiogsn ) 'where s,2 = U 2(S"), then we have for almost every w :(2) lim S"(6)) = 1 .n >oo ,/2S2 log log SnThe condition (1) was shown by Marcinkiewicz and Zygmund to be of thebest possible kind, but an interesting complement was added by Hartman andWintner that (2) also holds if the Xn 's are identically distributed with a finitesecond moment. Finally, further sharpening of (2) was given by Kolmogorovand by Erdos in the Bernoullian case, and in the general case under exactconditions by Feller ; the "last word" being as follows : for any increasingsequence qp n , we haveaccording as the series?'Ns" (w) > S,, (p„ i .o . ) _ { 0 100


244 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSproof of such a "strong limit theorem" (bounds with probability one) as thelaw of the iterated logarithm depends essentially on the corresponding "weaklimit theorem" (convergence of distributions) with a sufficiently good estimateof the remainder term .The vital link just mentioned will be given below as Lemma 1 . In the*at .of this section "A" will denote a generic strictly positive constant, notnecessarily the same at each appearance, and A will denote a constant suchthat I A I < A . We shall also use the notation in the preceding statement ofKolmogorov's theorem, andYn = ((ox"I 3 ),rnnj=1Yas in Sec . 7.4, but for a single sequence of r.v .'s .Let us set alsocP(), ,x) = 2Xlog log x, 'k > 0, x > 0 .Lemma 1 .(3)Suppose that for some E, 0 < E < 1, we havern A (W +&, SO) :!Sxe- y'/2 dyA(log s n ) 1 +s'A{S„ > (PG - E, s n )} > .(log sn) 1 _ (6/2)PROOF . By Theorem 7 .4.1, we have for each x :(6) J~'{S„ > xs„ } = 1,/2n x S 3We have as x -- oc :(7)1~ e_~''2 d y + A T ne-x2 j2(See Exercise 4 of Sec . 7 .4) . Substituting x = ,,/2(1 ± 6) log log s n , the firstterm on the right side of (6) is, by (7), asymptotically equal tox.,/4n(1 ± 6) log log s n (1og s n )its


7 .5 LAW OF THE ITERATED LOGARITHM 1 24 5This dominates the second (remainder) term on the right side of (6), by (3)since 0 < 8 < e. Hence (4) and (5) follow as rather weak consequences .To establish (2), let us write for each fixed 8, 0 < 8 < E :En+ _ {w : S n (w) > cp(1 + 8, sn)},E n= {w : S n (w) > ~0(1 - 8, s')},and proceed by steps .1° . We prove first that(8) ~PI {E n + i .o .} = 0in the notation introduced in Sec . 4.2, by using the convergence part ofthe Borel-Cantelli lemma there . But it is evident from (4) that the seriesEn -(En+) is far from convergent, since s, is expected to be of the orderof in-. The main trick is to apply the lemma to a crucial subsequence Ink)(see Theorem 5 .1 .2 for a crude form of the trick) chosen to have two properties: first, >k ?'(E nk +) converges, and second, "E„+ i .o ." already implies"Enk + i .o ." nearly, namely if the given 8 is slightly decreased . This modifiedimplication is a consequence of a simple but essential probabilistic argumentspelled out in Lemma 2 below .Given c > 1, let n k be the largest value of n satisfying S n < C k , so thatS nk < Ck < snk+1Since (maxi _ {w : ISnk+i - S1 I< S, tk+1 } .By Chebyshev's inequality, we haveJ'(F,) > 1 -s 2 - snk+l2nk 1--~ CS2nk+12hence GJ'(F 1 ) > A > 0 for all sufficiently large k .Lemma 2 . Let {E j ) and {F j }, 1 < j < n < oc, be two sequences of events .Suppose that for each j, the event F j is independent of Ei I . . . Ec_ 1Ej , and


1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS2 46that there exists a constant A > 0 such that / (F j) > A for every j . Thenwe have(12) UEjF1 >kJ' UEjj=1 j=1PROOF . The left member in (12) is equal to/ nU[(E1F1)c\j=1nj=1rrU [J E`1J=1n(Ej-1Fj-1)`(EjFj)]E`_1E1FJ)(Ei . . . Ej-,E,) A,which is equal to the right member in (12) .nj=1n9(E1 . . . E>-tEj)g(F'j)Applying Lemma 2 to Ej+ and the F j in (11), we obtain114_+1-1nk+,-1(13) ~' U Ej+Fj > Ate' U E/J=nkIt is clear that the event Ej+ fl F j impliesJ=nkSnk_1 > S1 - S„k+r > (W + 8, S j) - Snk .+t ,which is, by (9) and (10), asymptotically greater than1 3C ~O 1 + 4 8' sr, k+,.Choose c so close to 1 that (1 + 3/48)/c2 > 1 + (8/2) and put8Gk = w : Snk+i > 1 + 2' Snk+, ;note that Gk is just E„k + with 8 replaced by 8/2 . The above implication maybe written asEj+F j C Gkfor sufficiently large k and all j in the range given in (10) ; hence we havenk+,-1(14) U E+F j C Gk .J=1t k


7 .5 LAW OF THE ITERATED LOGARITHM 1 2 4 7It follows from (4) thatE 9)(Gk) - A +(sl2) < A 1: 1 1+(s/~~ < oo .k (log Snk ) k (k log c)In conjunction with (13) and (14) we conclude thatknk+, - 1U Ej + < oc ;j=nkand consequently by the Borel-Cantelli lemma thatnA+, -1U E j + i .o . = 0 .j=n AThis is equivalent (why?) to the desired result (8) .2° . Next we prove that with the same subsequence Ink) but an arbitraryc, if we put tk = S2k+I- Sn A , andthen we haveDk = w : SnA+1 (w) - S,ik (w) > cP 1 - 2 , tk ,(15) UT(Dk i .o .) = 1 .Since the differences SnA+, - S1lk , k > l, are independent r .v .'s, the divergencepart of the Borel-Cantelli lemma is applicable to them . To estimate ~ (D k ),we may apply Lemma 1 to the sequence {X,lk+j, j > 11 . We haveand consequently by (3) :2 1 2t k 1 - C2 S nk+,, ( I - 2CC 2(k+1)Hence by (5),r, k+1 - r,tk < AF,lk+, < At3kS 3nk+,0090 1+E~A11(Dk) >-- (log tk >- kiAand so Ek ,/'(D k ) = oo and (15) follows .3 ° . We shall use (8) and (15) together to prove(16) /I (E, - i .o .) = 1 .


248 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSThis requires just a rough estimate entailing a suitable choice of c . By (8)applied to {-X,) and (15), for almost every w the following two assertionsare true :(i) S,jk+ , ( cv) - S„ k. (w) > (p(1 - (3/2), tk ) for infinitely many k ;(ii) Silk (w) > -cp(2, silk ) for all sufficiently large k .For such an w, we have thenS(17) S„ k .+) (w) > 2 , tk - cp(2, sil k ) for infinitely many k .Using (9) and log log t2 log log Snk+l, we see that the expression in the rightside of (17) is asymptotically greater thanCS\ 11 - 2 1 -\\ / C2- C cp(1, Snk+i )> (P(1 - S, Snk+3 ),provided that c is chosen sufficiently large . Doing this, we have thereforeproved that(18) '7(Enk+l _ i .o .) = 1,which certainly implies (16) .4° . The truth of (8) and (16), for each fixed S, 0 < S 1} . Recall that (3) is more than sufficient to ensure the validityof the central limit theorem, namely that S n /s n converges in dist . to c1. Thusthe law of the iterated logarithm complements the central limit theorem bycircumscribing the extraordinary fluctuations of the sequence {S n , n > 1} . Animmediate consequence is that for almost every w, the sample sequence S, (w)changes sign infinite&v often . For much more precise results in this directionsee Chapter 8 .In view of the discussion preceding Theorem 7 .3.3, one may wonderabout the almost everywhere bounds formax Sin , max JSmJ, and so on .1


7 .5 LAW OF THE ITERATED LOGARITHM I 249It is interesting to observe that as far as the lim sup,, is concerned, these twofunctionals behave exactly like Sn itself (Exercise 2 below) . However, thequestion of liminf,, is quite different . In the case of maxim,, IS,,,I, anotherlaw of the (inverted) iterated logarithm holds as follows . For almost every w,we haveJim maxi"=1 X j . Then)J9p lim ISn((0)1 = 0 } = 1 .{n+C0[xrn-r : Consider SnA+, - S nA, with nk ^- kk . A quick proof follows fromTheorem 8 .3 .3 below .]4. Prove that in Exercise 9 of Sec . 7 .3 we have '7flSn = 0 i .o .} = 1 .*5. The law of the iterated logarithm may be used to supply certain counterexamples. For instance, if the X n 's are independent and X,, = ±n'/ 2 /log logn with probability 1 each, then S, 1n --f 0 a .e., but Kolmogorov's sufficientcondition (see case (i) after Theorem 5 .4 .1) >,, I(Xn)/n 2 < oo fails .6. Prove that U'{ISn I >(p(1 - 8, sn )i .o .) = 1, without use of (8), asfollows . Letek = {(0 : Sil k (w)1 < cp(1 - 8, SnA)} ;f k = { w:Snk+l(w)_Snk(w) > (p 1 2 , sfk .+1)}.


25 0 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSShow that for sufficiently large k the event ek n f k implies the complementof ek+1 ; hence deducek+1 k~~ n e j < ~(elo) 11[1 - ow j )]j=joj=joand show that the product -k 0 as k -3 00 .7 .6 Infinite divisibilityThe weak law of large numbers and the central limit theorem are concerned,respectively, with the convergence in dist . of sums of independent r .v .'s to adegenerate and a normal d .f. It may seem strange that we should be so muchoccupied with these two apparently unrelated distributions . Let us point out,however, that in terms of ch .f.'s these two may be denoted, respectively, byeat and e°`t-b't2 - exponentials of polynomials of the first and second degreein (it) . This explains the considerable similarity between the two cases, asevidenced particularly in Theorems 6 .4 .3 and 6 .4 .4 .Now the question arises : what other limiting d .f.'s are there when smallindependent r .v.'s are added? Specifically, consider the double array (2) inSec . 7 .1, in which independence in each row and holospoudicity are assumed .Suppose that for some sequence of constants a n ,e~S n- aj=1converges in dist . to F . What is the class of such F's, and when doessuch a convergence take place? For a single sequence of independent r .v .'s{X j , j > 1}, similar questions may be posed for the "normed sums" (S 7z -a n )/b n .These questions have been answered completely by the work of Levy, Khintchine,Kolmogorov, and others ; for a comprehensive treatment we refer to thebook by Gnedenko and Kolmogorov [12] . Here we must content ourselveswith a modest introduction to this important and beautiful subject .We begin by recalling other cases of the above-mentioned limiting distributions,conveniently dispilayed by their ch .f.'s :i.>0 ; e0


7 .6 INFINITE DIVISIBILITY 1 251among the latter for a = 2. All these are exponentials and have the furtherproperty that their "n th roots" :eait/n , e (1 /n)(ait-b 2 t2 ) e(x/n)(e"-1)e -(c/n)Itl'are also ch .f.'s. It is remarkable that this simple property already characterizesthe class of distributions we are looking for (although we shall prove onlypart of this fact here) .DEFINITION OF INFINITE DIVISIBILITY . A ch .f. f is called infinitely divisible ifffor each integer n > 1, there exists a ch .f. fn such that( 1 ) f = (fn) n •In terms of d .f.'s, this becomes in obvious notation :F=Fn * = F,*F11,k . . .,kFn .(n factors)In terms of r .v.'s this means, for each n > 1, in a suitable probability space(why "suitable"?) : there exist r .v .'s X and X n j , 1 < j < n, the latter beingindependent among themselves, such that X has ch .f. f, X, j has ch .f. f n , and(2) X = x1 j .nj=1X is thus "divisible" into n independent and identically distributed parts, foreach n . That such an X, and only such a one, can be the "limit" of sums ofsmall independent terms as described above seems at least plausible .A vital, though not characteristic, property of an infinitely divisible ch .f.will be given first .Theorem 7 .6 .1 . An infinitely divisible ch .f. never vanishes (for real t) .PROOF . We shall see presently that a complex-valued ch .f . is trouble-somewhen its "nth root" is to be extracted . So let us avoid this by going to the real,using the Corollary to Theorem 6 .1 .4 . Let f and f n be as in (1) and writeg=If 12 , gn=Ifnl 2 .For each t EW1, g(t) being real and positive, though conceivably vanishing,its real positive nth root is uniquely defined ; let us denote it by [g(t)] 1 /" . Sinceby (1) we haveg(t) = [gn (t)) n ,and gn (t) > 0, it follows that(3) dt : gn (t) = [g(t)]' ln •


2 52 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSBut 0 < g(t) < 1, hence lim„ .0 [g(t)]t/n is 0 or 1 according as g(t) = 0 org(t) :A 0 . Thus urn co g„ (t) exists for every t, and the limit function, sayh(t), can take at most the two possible values 0 and 1. Furthermore, since g iscontinuous at t = 0 with g(0) = 1, there exists a t o > 0 such that g(t) =A 0 forI tl < t o . It follows that h(t) = 1 for ItI < to . Therefore the sequence of ch .f.'sg n converges to the function h, which has just been shown to be continuousat the origin . By the convergence theorem of Sec . 6 .3, h must be a ch .f . andso continuous everywhere . Hence h is identically equal to 1, and so by theremark after (3) we haveThe theorem is proved .Vt : if (t)1 z = g(t) 0 0 .Theorem 7 .6.1 immediately shows that the uniform distribution on [-1, 1 ]is not infinitely divisible, since its ch .f . is sin t/t, which vanishes for some t,although in a literal sense it has infinitely many divisors! (See Exercise 8 ofSec . 6 .3.) On the other hand, the ch .f.2 + cos t3never vanishes, but for it (1) fails even when n = 2; when n > 3, the failureof (1) for this ch .f. is an immediate consequence of Exercise 7 of Sec . 6 .1, ifwe notice that the corresponding p .m . consists of exactly 3 atoms .Now that we have proved Theorem 7 .6 .1, it seems natural to go back toan arbitrary infinitely divisible ch .f. and establish the generalization of (3) :f n (t) _ [f (t)] II nfor some "determir. : :ion" of the multiple-valued nth root on the right side . Thiscan be done by a 4 -nple process of "continuous continuation" of a complexvaluedfunction of _ real variable . Although merely an elementary exercise in"complex variables" . it has been treated in the existing literature in a cavalierfashion and then r: Bused or abused . For this reason the main propositionswill be spelled ou :n meticulous detail here .Theorem 7 .6 .2 . Let a complex-valued function f of the real variable t begiven . Suppose th_ : f (0) = 1 and that for some T > 0, f is continuous in[-T, T] and does : :t vanish in the interval . Then there exists a unique (singlevalued)function /' . -:f t in [-T, T] with A(0) = 0 that is continuous there andsatisfies(4) f (t) = e~`W, -T < t < T .


7.6 INFINITE DIVISIBILITY 1 253The corresponding statement when [-T, T] is replaced by (-oo, oc) isalso true .PROOF . Consider the range of f (t), t E [-T, T] ; this is a closed set ofpoints in the complex plane . Since it does not contain the origin, we haveinf If (t)-01 =PT > 0 .-T


2 54 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSwith a similar expression for t_ k _ 1 < t < t_ k . Thus (4) is satisfied in [t_ 1 , t 1 ],and if it is satisfied for t = t k , then it is satisfied in [tk , tk+1 ], sincee ~.(r) = eX(tk)+L(f(t)lf(tA)) = f (tk) f (t) f (tk )= f (t) •Thus (4) is satisfied in [-T, T] by induction, and the theorem is proved forsuch an interval . To prove it for (-oo, oc) let us observe that, having definedin [-n, n], we can extend it to [-n - 1, n + 1] with the previous method,by dividing [n, n + 1] . for example, into small equal parts whose length mustbe chosen dependent on n at each stage (why?) . The continuity of A is clearfrom the construction .To prove the uniqueness of A, suppose that A' has the same properties asX . Since both satisfy equation (4), it follows that for each t, there exists aninteger m(t) such thatA(t) - A'(t) = 27ri m(t) .The left side being continuous in t, m( .) must be a constant (why?), whichmust be equal to m(0) = 0 . Thus A(t) = A'(t) .Remark . It may not be amiss to point out that A(t) is a single-valuedfunction of t but not of f (t) ; see Exercise 7 below .Theorem 7 .6 .3 . For a fixed T, let each k f , k > 1, as well as f satisfy theconditions for f in Theorem 7 .6.2, and denote the corresponding ), by kA .Suppose that k f converges uniformly to f in [-T, T], then kA convergesuniformly to ;, in [-T . T] .PROOF . Let L be as in (7), then there exists a 6, 0 < 6 < 2, such that1L(z)I < 1, if 1z - 11 < 6 .By the hypothesis of uniformity, there exists kl (T) such that if k > k1 (T ),then we have(8) sup kf (t)iti :!~-T f (t)and consequently- 1 < 8,(9) sup LItIST(kf (t)f (t)


7 .6 INFINITE DIVISIBILITY 1 255Since L is continuous in Iz - I < 8, it follows that km(') is continuous inJtl < T . Since it is integer-valued and equals 0 at t - 0, it is identically zero .Thus (10) reduces to(11) kA(t) - a,(t) _ L kf(t) ,f (t)Itl < T .The function L being continuous at z - 1, the uniform convergence of k f / fto 1 in [-T, T] implies that of k l - -k to 0, as asserted by the theorem .Thanks to Theorem 7 .6.1, Theorem 7 .6 .2 is applicable to each infinitelydivisible ch .f. f in (-oo, oc) . Henceforth we shall call the corresponding ),the distinguished logarithm, and eI(t)/n the distinguished nth root of f . Wecan now extract the correct nth root in (1) above .Theorem 7 .6.4 . For each n, the f n in (1) is just the distinguished nth rootof f .PROOF . It follows from Theorem 7 .6 .1 and (1) that the ch .f. f,, nevervanishes in (-oc, oc), hence its distinguished logarithm a. n is defined. Takingmultiple-valued logarithms in (1), we obtain as in (10) :Vt : a. (t) - nA n (t) - 27r m n (t),where Mn () takes only integer values . We conclude as before that m„ ( .) = 0,and consequently(12)f n ( t ) = e ), " W = e k(t)/nas asserted .Corollary . If f is a positive infinitely divisible ch .f., then for every t thef n (t) in (1) is just the real positive nth root of f (t) .PROOF . Elementary analysis shows that the real-valued logarithm of areal number x in (0, oo) is a continuous function of x . It follows that this isa continuous solution of equation (4) in (-oo, oo) . The uniqueness assertionin Theorem 7 .6 .2 then identifies it with the distinguished logarithm of f, andthe corollary follows, since the real positive nth root is the exponential of 1 /ntimes the real logarithm .As an immediate consequence of (12),Vt : lim f n (t) = 1 .n-+oowe haveThus by Theorem 7 .1 .1 the double array {Xn 1, 1 < j < n, 1 < n) giving riseto (2) is holospoudic . We have therefore proved that each infinitely divisible


2 5 6 I CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSdistribution can be obtained as the limiting distribution of S, = E"=1 X„ j insuch an array .It is trivial that the product of two infinitely divisible ch .f .'s is again sucha one, for we have in obvious notation :The next proposition lies deeper .If -2 f = (If,)' - (2fn) n = (If, -2 f,) " -Theorem 7 .6 .5 . Let { k f , k > 1 } be a sequence of infinitely divisible ch .f.'sconverging everywhere to the ch .f. f . Then f is infinitely divisible .PROOF . The difficulty is to prove first that f never vanishes . Consider, asin the proof of Theorem 7 .6 .1 : g = I f 12 4g = I kf I2. For each n > 1, let x 1 Jndenote the real positive nth root of a real positive x . Then we have, by thehypothesis of convergence and the continuity of x11' as a function of x,(13) Vt : [kg(t)] 'I' --*[g(t)] lln .By the Corollary to Theorem 7 .6 .4, the left member in (13) is a ch .f. The rightmember is continuous everywhere . It follows from the convergence theoremfor ch .f.'s that [g( .)] 1 /n is a ch .f . Since g is its nth power, and this is true foreach n > 1, we have proved that g is infinitely divisible and so never vanishes .Hence f never vanishes and has a distinguished logarithm X defined everywhere. Let that of k f be dl . Since the convergence of k f to f is necessarilyuniform in each finite interval (see Sec . 6 .3), it follows from Theorem 7 .6 .3that d- -* ? everywhere, and consequently(14) exp(k~.(t)/n) -a exp(A(t)/n)for every t . The left member in (14) is a ch .f. by Theorem 7 .6 .4, and the rightmember is continuous by the definition of X . Hence it follows as before thate'*O' is a ch .f . and f is infinitely divisible .The following alternative proof is interesting . There exists a S > 0 suchthat f does not vanish for ItI < S, hence a. is defined in this interval . Foreach n, (14) holds unifonnly in this interval by Theorem 7 .6 .3 . By Exercise 6of Sec . 6 .3, this is sufficient to ensure the existence of a subsequence from{exp(kX(t)/n), k > 1} converging everywhere to some ch .f. (p,, . The nth powerof this subsequence then converges to (cp„)", but being a subsequence of 1k f }it also converges to f . Hence f = (cpn ) 1l, and we conclude again that f isinfinitely divisible .Using the preceding theorem, we can construct a wide class of ch .f.'sthat are infinitely divisible . For each a and real u, the function( ehd _ 1q3(t ; a, u) = ee~


7 .6 INFINITE DIVISIBILITY 1 257is an infinitely divisible ch .f., since it is obtained from the Poisson ch .f. withparameter a by substituting ut for t . We shall call such a ch .f. a generalizedPoisson ch .f. A finite product of these :kjj T (t ; a j , u j ) = expj=1 j=1j(e"i - 1)is then also infinitely divisible. Now if G is any bounded increasing function,the integral fo(e't" - 1)dG(u) may be approximated by sums of the kindappearing as exponent in the right member of (16), for all t in M, 1 and indeeduniformly so in every finite interval (why?) . It follows that for each such G,the function(17) f (t) = exp (- 1)dG(u)[r_0is an infinitely divisible ch .f. Now it turns out that although this falls somewhatshort of being the most general form of an infinitely divisible ch .f.,we have nevertheless the following qualititive result, which is a completegeneralization of (16) .Theorem 7 .6 .6 . For each infinitely divisible ch .f. f, there exists a doublearray of pairs of real constants (a n j , u„ j ), 1 < j < k,, , 1 < n, where a j > 0,such that(18) f (t) = li ~jj q (t ; an j, un j) .j=1k„The converse is also true . Thus the class of infinitely divisible d .f.'s coincideswith the closure, with respect to vague convergence, of convolutions of a finitenumber of generalized Poisson d .f.'s .PROOF . Let f and f „ be as in (1) and let X be the distinguished logarithmof f , F„ the d .f . corresponding to f n . We have for each t, as n -a oo :n [fn ( t ) - 1] = n [e X(t)ln - 1] --~ A(t)and consequently(19) e»[f"(t)-1l -* e x(t) - f (t)Actually the first member in (19) is a ch .f. by Theorem 6 .5 .6, so that theconvergence is uniform in each finite interval, but this fact alone is neithernecessary nor sufficient for what follows . We have(eitu -n [f n (t) - 1 ] = f0C 1)n dF„ (u) .-oo


25 8 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSFor each n, n F, is a bounded increasing function, hence there exists{a,,j,unj ;1


7 .6 INFINITE DIVISIBILITY 1 259Let us end this section with an interesting example . Put s = o, + it, a > 1and t real ; consider the Riemann zeta function :where p ranges over all prime numbers . Fix a > 1 and define.f (t)+ it) .We assert that f is an infinitely divisible ch .f. For each p and every real t,the complex number 1 - p- ° - ` r lies within the circle {z : 1z - 1I < 1} . Letlog z denote that determination of the logarithm with an angle in (-1r, m] . Bylooking at the angles, we see thatlog 11 pp--tr - log(1 - P-U) - log(1 - p-U-tr )Since001 (e -(m log 1 it - 1)MP 'norm=100m=1f (t) = "li 0 11T3(t ;P 0it is an infinitely divisible ch .f .* 3 . Let f be a ch .f. such that there exists a sequence of positive integersnk going to infinity and a sequence of ch .f .'s ck satisfying f = ((p k )" z ; thenf is infinitely divisible .4 . Give another proof that the right member of (17) is an infinitely divisiblech .f. by using Theorem 6 .5 .6 .


1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS2 6 05 . Show that f (t) _ (1 - b)1(1 - be"), 0 < b < 1, is an infinitely divisiblech .f. [xmrr: Use canonical form .]6. Show that the d .f. with density ~ar(a)-lx"-le-$X, a > 0, 8 > 0, in(0, oo), and 0 otherwise, is infinitely divisible .7. Carry out the proof of Theorem 7 .6 .2 specifically for the "trivial" butinstructive case f (t) = e'", where a is a fixed real number .*8 . Give an example to show that in Theorem 7 .6 .3, if the uniformity ofconvergence of k f to f is omitted, then kA need not converge to A . {Hmn :k f (t) = exp{2rri(-1)kkt(1 + kt)-1 -1).]9. Let f (t) = 1 - t, f k (t) = I - t + (_1)kitk-1, 0 < t < 2, k > 1. Thenf k never vanishes and converges uniformly to f in [0, 2] . Let If-k, denote thedistinguished square root of f k in [0, 2] . Show that / 'fk does not convergein any neighborhood of t = 1 . Why is Theorem 7 .6 .3 not applicable? [Thisexample is supplied by E . Reich] .*10 . Some writers have given the proof of Theorem 7 .6 .6 by apparentlyconsidering each fixed t and using an analogue of (21) without the "supjtj,there . Criticize this "quick proof' . [xmrr : Show that the two relationsVt and Vm :Vt :Jim umn (t) = u,n (t),M_+00lim u .n (t) = u(t),n-+oodo not imply the existence of a sequence {mn } such thatVt : nli U,,,, (t) = U(t) .-~ ccIndeed, they do not even imply the existence of two subsequences {,n„} and{n„} such thatVt : 1 rn um„n (t) = u(t) .Thus the extension of Lemma 1 in Sec . 7 .2 is false .]The three "counterexamples" in Exercises 7 to 9 go to show that thecavalierism alluded to above is not to be shrugged off easily .11. Strengthening Theorem 6 .5 .5, show that two infinitely divisiblech .f.'s may coincide in a neighborhood of 0 without being identical .*12 . Reconsider Exercise 17 of Sec . 6.4 and try to apply Theorem 7 .6 .3 .[HINT : The latter is not immediately applicable, owing to the lack of uniformconvergence . However, show first that if e`nit converges for t E A, wherein(A) > 0, then it converges for all t . This follows from a result due to Steinhaus,asserting that the difference set A - A contains a neighborhood of 0 (see,e .g ., Halrnos [4, p . 68]), and from the equation e`n"e`n`t = e`n`~t-f-t Let (b, J,{b', } be any two subsequences of c,, then e(b„-b„ )tt --k 1 for all t . Since I is a


7 .6 INFINITE DIVISIBILITY 1 261ch .f., the convergence is uniform in every finite interval by the convergencetheorem for ch .f .'s . Alternatively, ifco(t) = lim e`n` tn-+a0then ~o satisfies Cauchy's functional equation and must be of the form e"t,which is a ch .f. These approaches are fancier than the simple one indicatedin the hint for the said exercise, but they are interesting . There is no knownquick proof by "taking logarithms", as some authors have done .]Bibliographical NoteThe most comprehensive treatment of the material in Secs . 7 .1, 7 .2, and 7 .6 is byGnedenko and Kolmogorov [12] . In this as well as nearly all other existing bookson the subject, the handling of logarithms must be strengthened by the discussion inSec . 7 .6 .For an extensive development of Lindeberg's method (the operator approach) toinfinitely divisible laws, see Feller [13, vol . 2] .Theorem 7 .3 .2 together with its proof as given here is implicit inW . Doeblin, Sur deux problemes de M. Kolmogoroff concernant les chainesdenombrables, Bull . Soc . Math . France 66 (1938), 210-220 .It was rediscovered by F . J. Anscombe . For the extension to the case where the constantc in (3) of Sec . 7 .3 is replaced by an arbitrary r.v ., see H . Wittenberg, Limiting distributionsof random sums of independent random variables, Z . Wahrscheinlichkeitstheorie1 (1964), 7-18 .Theorem 7 .3 .3 is contained inP . Erdos and M . Kac, On certain limit theorems of the theory of probability, Bull .Am . Math . Soc . 52 (1946) 292-302 .The proof given for Theorem 7 .4 .1 is based onP . L . Hsu, The approximate distributions of the mean and variance of a sample ofindependent variables, Ann . Math . Statistics 16 (1945), 1-29 .This paper contains historical references as well as further extensions .For the law of the iterated logarithm, see the classicA . N . Kolmogorov, Uber das Gesetz des iterierten Logarithmus, Math . Annalen101 (1929), 126-136 .For a survey of "classical limit theorems" up to 1945, seeW . Feller, The fundamental limit theorems in probability, Bull . Am . Math . Soc .51 (1945), 800-832 .


2 62 1 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONSKai Lai Chung, On the maximum partial sums of sequences of independent randomvariables . Trans . Am . Math . Soc . 64 (1948), 205-233 .Infinitely divisible laws are treated in Chapter 7 of Levy [11] as the analyticalcounterpart of his full theory of additive processes, or processes with independentincrements (later supplemented by J . L . Doob and K . Ito) .New developments in limit theorems arose from connections with the Brownianmotion process in the works by Skorohod and Strassen . For an exposition see references[16] and [22] of General Bibliography .


8Random walk8 .1 Zero-or-one lawsIn this chapter we adopt the notation N for the set of strictly positive integers,and N° for the set of positive integers ; used as an index set, each is endowedwith the natural ordering and interpreted as a discrete time parameter . Similarly,for each n E N, N,, denotes the ordered set of integers from 1 to n (bothinclusive) ; N° that of integers from 0 to n (both inclusive) ; and Nn that ofintegers beginning with n + 1 .On the probability triple (2, T , °J'), a sequence {X n , n E N} where eachX n is an r .v . (defined on Q and finite a .e .), will be called a (discrete parameter)stochastic process . Various Borel fields connected with such a process will nowbe introduced . For any sub-B .F . ~ of ~T, we shall write(1)XEanduse the expression "X belongs to . " or "S contains X" to mean thatX -1 ( :T) c 6' (see Sec . 3 .1 for notation) : in the customary language X is saidto be "measurable with respect to ~j" . For each n E N, we define two B .F .'sas follows :


264 1 RANDOM WALK`7n = the augmented B .F . generated by the family of r .v .'s {Xk, k E N n } ;that is, , is the smallest B .F.' containing all Xk in the familyand all null sets ;= the augmented B .F . generated by the family of r.v .'s {Xk, k E N'n } .Recall that the union U,°°_1 ~ is a field but not necessarily a B .F . Thesmallest B .F . containing it, or equivalently containing every -,~?n- , n E N, isdenoted by00moo = V Mn ;n=1it is the B .F . generated by the stochastic process {Xn , n E N} . On the otherhand, the intersection f n_ 1 n is a B .F . denoted also by An° = 1 0~' . It will becalled the remote field of the stochastic process and a member of it a remoteei vent .Since 3~~ C ~,T, 9' is defined on ~& . For the study of the process {X n , n EN} alone, it is sufficient to consider the reduced triple (S2, 3'~, G1' Thefollowing approximation theorem is fundamental .Theorem 8 .1 .1 .such thatGiven e > 0 andA E ~, there exists A E E Un_=1 9,Tn(2' -1~6(AAA E ) < c .PROOF . Let -6' be the collection of sets A for which the assertion of thetheorem is true . Suppose Ak E 6~ for each k E N and Ak T A or A k , A . ThenA also belongs to i, as we can easily see by first taking k large and thenapplying the asserted property to Ak . Thus ~ is a monotone class . Since it istrivial that ~ contains the field Un° l nthat generates ~&, ' must contain;1 by the Corollary to Theorem 2 .1 .2, proving the theorem .Without using Theorem 2 .1 .2, one can verify that 6' is closed with respectto complementation (trivial), finite union (by Exercise 1 of Sec . 2 .1), andcountable union (as increasing limit of finite unions) . Hence ~ is a B .F . thatmust contain ~& .It will be convenient to use a particular type of sample space Q . In theno :ation of Sec . 3 .4, let00Q = X Qn,n=1where each St n is a "copy" of the real line R 1 . Thus S2 is just the space of allinnnite sequences of real numbers . A point w will be written as {wn , n E N},and w, as a function of (o will be called the nth coordinate nction) of w .


8 .1 ZERO-OR-ONE LAWS 1 2 65Each S2„ is endowed with the Euclidean B .F . A 1 , and the product Borel fieldin the notation above) on Q is defined to be the B .F. generated bythe finite-product sets of the form(3) n {co : w n , E B" j }kj=1where (n 1 , . . . , n k ) is an arbitrary finite subset of N and where each B" 1 E A 1 .In contrast to the case discussed in Sec . 3 .3, however, no restriction is made onthe p .m . ?T on JT . We shall not enter here into the matter of a general constructionof JT. The Kolmogorov extension theorem (Theorem 3 .3 .6) asserts thaton the concrete space-field (S2, i ) just specified, there exists a p .m . 9' whoseprojection on each finite-dimensional subspace may be arbitrarily preassigned,subject only to consistency (where one such subspace contains the other) .Theorem 3 .3 .4 is a particular case of this theorem .In this chapter we shall use the concrete probability space above, of theso-called "function space type", to simplify the exposition, but all the resultsbelow remain valid without this specification . This follows from a measurepreservinghomomorphism between an abstract probability space and one ofthe function-space type ; see Doob [17, chap . 2] .The chief advantage of this concrete representation of Q is that it enablesus to define an important mapping on the space .DEFINITION OF THE SHIFT .The shift z is a mapping of 0 such thatr : w = {w", n E N} --f rw = {w,,+1, n E N} ;in other words, the image of a point has as its nth coordinate thecoordinate of the original point .(n + l)stClearly r is an oo-to-1 mapping and it is from Q onto S2 . Its iteratesare defined as usual by composition : r ° = identity, rk = r . r k-1 for k > 1 . Itinduces a direct set mapping r and an inverse set mapping r -1 according tothe usual definitions . Thusr -1A`{w:rweA}and r - " is the nth iterate of r -1 . If A is the set in (3), then(4) r-1 A= n {w : w„ ;+1 E B,, ; }It follows from this that r -1 maps Ji into ~/T ; more precisely,VA E :-~ : -r- " A E n E N,kj=1


2 6 6 1 RANDOM WALKwhere is the Borel field generated by {wk, k > n]. This is obvious by (4)if A is of the form above, and since the class of A for which the assertionholds is a B .F ., the result is true in general .DEFNrrION . A set in 3 is called invariant (under the shift) iff A = r-1 A .An r .v . Y on Q is invariant iff Y(w) = Y(rw) for every w E 2 .Observe the following general relation, valid for each point mapping zand the associated inverse set mapping -c - I, each function Y on S2 and eachsubset A of : h 1(5) r -1 {w: Y(w) E A} = {w : Y(rw) E A} .This follows from z -1 ° Y -1 = (Y o z) -1 .We shall need another kind of mapping of Q . A permutation on Nn is a1-to-1 mapping of N„ to itself, denoted as usual byl, 2, . . ., n61, 62, . . ., an ) .The collection of such mappings forms a group with respect to composition .A finite permutation on N is by definition a permutation on a certain "initialsegment" N of N . Given such a permutation as shown above, we define awto be the point in Q whose coordinates are obtained from those of w by thecorresponding permutation, namely_ w0j, if j E N n ;(mow) j - wj , if j E Nn .As usual . o induces a direct set mapping a and an inverse set mapping a -1 ,the latter being also the direct set mapping induced by the "group inverse"07 -1 of o . In analogy with the preceding definition we have the following .DEFncrHON . A set in i is called permutable iff A = aA for every finitepermutation or on N. A function Y on Q is permutable iff Y(w) = Y(aw) for,every finite permutation a and every w E Q .It is fairly obvious that an invariant set is remote and a remote set ispermutable : also that each of the collections : all invariant events, all remoteevents, all permutable events, forms a sub-B .F. of,- . If each of these B .F.'sis augmented (see Exercise 20 of Sec 2 .2), the resulting augmented B .F .'swill be called "almost invariant", "almost remote", and "almost permutable",respectively . Finally, the collection of all sets in Ji of probability either 0 or


8 .1 ZERO-OR-ONE LAWS 1 2 6 71 clearly forms a B .F ., which may be called the "all-or-nothing" field . ThisB .F . will also be referred to as "almost trivial" .Now that we have defined all these concepts for the general stochasticprocess, it must be admitted that they are not very useful without furtherspecifications of the process . We proceed at once to the particular case below .DEFINITION . A sequence of independent r .v .'s will be called an independentprocess ; it is called a stationary independent process iff the r .v .'s have acommon distribution .Aspects of this type of process have been our main object of study,usually under additional assumptions on the distributions . Having christenedit, we shall henceforth focus our attention on "the evolution of the processas a whole" -whatever this phrase may mean . For this type of process, thespecific probability triple described above has been constructed in Sec . 3 .3 .Indeed, ZT= ~~, and the sequence of independent r .v .'s is just that of thesuccessive coordinate functions [a), n c- N), which, however, will also beinterchangeably denoted by {X,,, n E N} . If (p is any Borel measurable function,then {cp(X„ ), n E N) is another such process .The following result is called Kolmogorov's "zero-or-one law" .Theorem 8 .1 .2 . For an independent process, each remote event has probabilityzero or one .PROOF . Let A E nn, 1 J~,t and suppose that DT(A) > 0 ; we are going toprove that ?P(A) = 1 . Since 3„ and uri are independent fields (see Exercise 5of Sec . 3 .3), A is independent of every set in ~„ for each n c- N ; namely, ifM E Un 1 %,, then(6) P(A n m) = ~(A)?)(M) .If we set~A(M) _~(A n m)'~(A)for M E JT, then ETA ( .) is clearly a p .m . (the conditional probability relative toA ; see Chapter 9) . By (6) it coincides with ~~ on Un °_ 1 3 „ and consequentlyalso on -~T by Theorem 2 .2 .3 . Hence we may take M to be A in (6) to concludethat /-,(A) = ?/'(A) 2 or ,?P(A) = 1 .The usefulness of the notions of shift and permutation in a stationaryindependent process is based on the next result, which says that both -1 -1 andor are "measure-preserving" .


268 1 RANDOM WALKTheorem 8 .1 .3 . For a stationary independent process, if A E - and a isany finite permutation, we have(7) J'(z - 'A) _ 9/'(A) ;(8) ^QA) _ .J'(A ) .PROOF . Define a set function JT on T as follows :~' (A) = 9)(i-'A) .Since z -1 maps disjoint sets into disjoint sets, it is clear that ~' is a p .m . Fora finite-product set A, such as the one in (3), it follows from (4) thatkA(A) _ fJ,u,(B ) = -'J'(A) .j=1Hence ;~P and -';~ coincide also on each set that is the union of a finite numberof such disjoint sets, and so on the B .F. 2-7 generated by them, according toTheorem 2 .2 .3 . This proves (7) ; (8) is proved similarly .The following companion to Theorem 8 .1 .2,is very' useful .due to Hewitt and Savage,Theorem 8 .1 .4 . For a stationary independent process, each permutable eventhas probability zero or one .PROOF . Let A be a permutable event . Given c > 0, we may choose Ek > 0so thatccE Ek < 6 -k=1By Theorem 8 .1 .1, there exists Ak E '" k such that 91(A A A k )


8.1 ZERO-OR-ONE LAWS 1 269Applying this to Ek = A A Mk , and observing the simple identitywe deduce thatlim sup (A A Mk) =(A\ lim inf Mk) U (lira sup Mk \ A ),'(A o lim sup Mk ) < J'(lim sup(A A Mk)) < E .Since U',, Mk E n m, the set lim sup Mk belongs to nk_,,, „A . , which is seento coincide with the remote field . Thus lim sup M k has probability zero or oneby Theorem 8 .1 .2, and the same must be true of A, since E is arbitrary in theinequality above .Here is a more transparent proof of the theorem based on the metric onthe measure space (S2, ~~ 7 , 9) given in Exercise 8 of Sec . 3 .2. Since A k andMk are independent, we haveA(Ak fl Mk) _ ,'~1(Ak)9'(Mk) .Now Ak -,, A and M k-* A in the metric just mentioned, hence alsoAkf1Mk-+APIAin the same sense . Since convergence of events in this metric implies convergenceof their probabilities, it follows that J' (A fl A) = J'(A)~(A ), and thetheorem is proved .Corollary . For a stationary independent process, the B .F .'s of almostpermutable or almost remote or almost invariant sets all coincide with theall-or-nothing field .EXERCISESQ and ~ are the infinite product space and field specified above .1 . Find an example of a remote field that is not the trivial one ; to makeit interesting, insist that the r.v .'s are not identical .2. An r .v . belongs to the all-or-nothing field if and only if it is constanta .e .3. If A is invariant then A = rA ; the converse is false .4. An r .v . is invariant [permutable] if and only if it belongs to theinvariant [permutable] field .5. The set of convergence of an arbitrary sequence of r .v .'s {Y,,, n E N}or of the sequence of their partial sums E " _, Y j are both permutable . Theirlimits are permutable r .v.'s with domain the set of convergence .


270 1 RANDOM WALK*6. If an > O, limn , a„ exists > 0 finite or infinite, and lim n_o, ( a,, +I /an)= 1, then the set of convergence of {an-1 E~_1 Y 1 } is invariant . If a n -* + 00 ,the upper and lower limits of this sequence are invariant r .v .'s .*7. The set {Y2, E A i .o .}, where A E Adl, is remote but not necessarilyinvariant; the set {E'= Yj A i .o .} is permutable but not necessarily remote .Find some other essentially different examples of these two kinds .8. Find trivial examples of independent processes where the threenumbers JT(r -1 A), T(A), :7(rA) take the values 1, 0, 1 ; or 0, z, 1 .9. Prove that an invariant event is remote and a remote event ispermutable .*10 . Consider the bi-infinite product space of all bi-infinite sequences ofreal numbers {co n , n E N}, where N is the set of all integers in its natural(algebraic) ordering . Define the shift as in the text with N replacing N, andshow that it is 1-to-1 on this space. Prove the analogue of (7) .11 . Show that the conclusion of Theorem 8 .1 .4 holds true for a sequenceof independent r.v .'s, not necessarily stationary, but satisfying the followingcondition : for every j there exists a k > j such that Xk has the same distributionas X j . [This remark is due to Susan Horn .]12 . Let IX,, n > 1 } be independent r .v .'s with 1flX n = 4 - n } = ~P{Xn =-4-'T } = 2 . Then the remote field of {S,,, n > 11, where S„ Xi, is nottrivial .8.2 Basic notionsFrom now on we consider only a stationary independent process {X11 , n EN} on the concrete probability triple specified in the preceding section . Thecommon distribution of X„ will be denoted by p. (p .m .) or F (d .f.) ; when onlythis is involved, we shall write X for a representative X,,, thus ((X) for cr'(X n ) .Our interest in such a process derives mainly from the fact that it underliesanother process of richer content . This is obtained by forming the successivepartial sums as follows :(1)nSn = ZX 1 , n EN .j=1An initial r .v. So - 0 is adjoined whenever this serves notational convenience,as in X n = Sn - S„ _ I for n E N . The sequence IS,, n E N} is then a veryfamiliar object in this book, but now we wish to find a proper name for it . Anofficially correct one would be "stochastic process with stationary independentdifferences" ; the name "homogeneous additive process" can also be used . We


8 .2 BASIC NOTIONS 1 2 7 1have, however, decided to call it a "random walk (process)", although the useof this term is frequently restricted to the case when A is of the integer latticetype or even more narrowly a Bernoullian distribution .DEFINUION OF RANDOM WALK . A random walk is the process {S n , n E N}defined in (1) where {Xn , n E N} is a stationary independent process . Byconvention we set also So = 0 .A similar definition applies in a Euclidean space of any dimension, butwe shall be concerned only with R1 except in some exercises later .Let us observe that even for an independent process {Xn , n E N}, itsremote field is in general different from the remote field of {S n , n E N}, whereS„ _ >J=1 X j . They are almost the same, being both almost trivial, for astationary independent process by virtue of Theorem 8 .1 .4, since the remotefield of the random walk is clearly contained in the permutable field of thecorresponding stationary independent process .We add that, while the notion of remoteness applies to any process,"(shift)-invariant" and "permutable" will be used here only for the underlying"coordinate process" {co n , n E N} or IX,, n E N} .The following relation will be much used below, for m < nmn-mX (rrnw)j=1 j=1=n-m X i+. (w) = Sn (w) - Sm (w) •It follows from Theorem 8 .1 .3 that S, _,n and S n - Sm have the same distribution. This is obvious directly, since it is just ,u (n - nt ) * .As an application of the results of Sec . 8 .1 to a random walk, we statethe following consequence of Theorem 8 .1 .4 .Theorem 8 .2.1 . Let B„ E %3 1 for each n e N . Thenis equal to zero or one .u~{Sn E B n i .o .}PROOF . If cr is a permutation on N,n , then S„ (6co) = S n (c )) for n > m,hence the set 00Ain = U {Sn E Bn}n =inis unchanged under o--1 or a . Since A, n decreases as m increases, itfollows that00n A, Amin=]is permutable, and the theorem is proved .


272 1 RANDOM WALKEven for a fixed B = B n the result is significant, since it is by no meansevident that the set {Sn > 0 i .o .), for instance, is even vaguely invariantor remote with respect to {X n , n E N} (cf. Exercise 7 of Sec . 8 .1) . Yet thepreceding theorem implies that it is in fact almost invariant . This is the strengthof the notion of permutability as against invariance or remoteness .For any serious study of the random walk process, it is imperative tointroduce the concept of an "optional r .v ." This notion has already been usedmore than once in the book (where?) but has not yet been named . Since thebasic inferences are very simple and are supposed to be intuitively obvious,it has been the custom in the literature until recently not to make the formalintroduction at all . However, the reader will profit by meeting these fundamentalideas for the theory of stochastic processes at the earliest possible time .They will be needed in the next chapter, too .DEFINITION OF OPTIONAL r .v . An r .v. a is called optional relative to thearbitrary stochastic process {Z n , n E N} iff it takes strictly positive integervalues or -boo and satisfies the following condition :(2) Vn E N U fool : {w : a(w) = n} E Win,where „ is the B .F . generated by {Z k , k E Nn } .Similarly if the process is indexed by N° (as in Chapter 9), then the rangeof a will be N ° . Thus if the index n is regarded as the time parameter, thena effects a choice of time (an "option") for each sample point (0 . One maythink of this choice as a time to "stop", whence the popular alias "stoppingtime", but this is usually rather a momentary pause after which the processproceeds again : time marches on!Associated with each optional r .v . a there are two or three importantobjects . First, the pre-a field `T is the collection of all sets in ~& of the form(3) U {{a = n) n A n ],I < n


8 .2 BASIC NOTIONS 1 273Instead of requiring a to be defined on all 0 but possibly taking the valueoc, we may suppose it to be defined on a set A in Note that a strictlypositive integer n is an optional r.v . The concepts of pre-a and post-a fieldsreduce in this case to the previous ~„ and ~~, ; .A vital example of optional r.v . is that of the first entrance time into agiven Borel set A :00min{n E N : Z n (w) E A} on U {w : Z n (w) E A) ;(5) aA(w) = n=1+00 elsewhere .To see that this is optional, we need only observe that for each n E N :{w : aA(w) = n } _ {w : Z j (w) E A`, 1 < j < n - 1 ; Zn (w) E A}which clearly belongs to ?,T, ; similarly for n = oo .Concepts connected with optionality have everyday counterparts, implicitin phrases such as "within thirty days of the accident (should it occur)" . Historically,they arose from "gambling systems", in which the gambler choosesopportune times to enter his bets according to previous observations, experiments,or whatnot . In this interpretation, a + 1 is the time chosen to gambleand is determined by events strictly prior to it. Note that, along with a, a + 1is also an optional r .v ., but the converse is false .So far the notions are valid for an arbitrary process on an arbitrary triple .We now return to a stationary independent process on the specified triple andextend the notion of "shift" to an "a-shift" as follows : ra is a mapping on{a < oo} such that(6) racy = r n w on {w : a(co) = n} .Thus the post-a process is just the process IX,, (taco), n E N} . Recalling thatX n is a mapping on S2, we may also write(7) Xa+n (w) = Xn (r a (0) = (Xn ° 'r ' ) (0) )and regard X„ ° ra, n E N, as the r .v.'s of the new process . The inverse setmapping (ra) -1 , to be written more simply as r-a, is defined as usual :r-'A = {w : raw E A} .Let us now prove the fundamental theorem about "stopping" a stationaryindependent process .Theorem 8 .2 .2 . For a stationary independent process and an almost everywherefinite optional r .v . a relative to it, the pre-a and post-a fields are independent. Furthermore the post-a process is a stationary independent processwith the same common distribution as the original one .


274 1 RANDOM WALKPROOF . Both assertions are summarized in the formula below . For anyA E .-;,, k E N, Bj E A 1 , 1< j< k, we have(8) 9'{A;X«+j E Bj, 1 < j < k} = _"P{A} 11 g(Bj) .j=1To prove (8),we observe that it follows from the definition of a and ~a that(9) Aft{a=n}=A n fl{a=n}E9,,where A n E 7n for each n E N . Consequently we have°J'{A :a = n ; X,, + , E B j , 1 < j < k} = °J'{A n ; a = n ;Xn+j E Bj , 1 < j < k}k= _IP{A ;a = n}_'{Xn+j E B j , 1 < j < k}_A ; a = n} 11 g(Bj),kj=1where the second equation is a consequence of (9) and the independence ofn and zT' . Summing over n E N, we obtain (8) .An immediate corollary is the extension of (7) of Sec . 8 .1 to an a-shift .Corollary . For each A E 3~~ we have(10) 91 (r-'A) _ _'~P(A) .a kJust as we iterate the shift r, we can iterate r" . Put a 1inductively byak+1(( o) = ak k E N .= a, and defineEach a k is finite a .e . if a is . Next, define 8o = 0, andWe are now in a position to state the following result, which will beneed°,: later.Theorem 8 .2.3. Let a be an a .e . finite optional r .v . relative to a stationaryindependent process . Then the random vectors {Vk, k E N}, whereVk(w) _ (ak (w) , X k_,+i (w), . . . , XP, (w)),are independent and identically distributed .


8 .2 BASIC NOTIONS 1 275PROOF . The independence follows from Theorem 8 .2.2 by our showingthat 1 ,11, . . . , Vk_ 1 belong to the pre-,Bk_ 1 field, while V k belongs to the post-$k_1 field . The details are left to the reader ; cf. Exercise 6 below .To prove that Vk and Vk+1 have the same distribution, we may supposethat k = 1 . Then for each n E N, and each n-dimensional Borel set A, we have{w : a 2 (w) = n ; (Xaj+1(w), . . ., Xai+a 2 ((O)) E A}= {co : a l (t"co) = n ; (X 1 (r"CO), . . ., Xa i (r" (0) E A),sinceXaI (r " CJ) = X a i(~W)(t " co) = X a 2(w )(taw)= XaI (m)+a2((0) (co) = X,1 +,2 (w)by the quirk of notation that denotes by X,,( .) the function whose value atw is given by Xa( ,, ) (co) and by (7) with n = a2 (co) . By (5) of Sec . 8 .1, thepreceding set is the r-"-image (inverse image under r') of the set{w : a l (w) = n ; (X 1(w), . . ., Xal (w)) E A),and so by (10) has the same probability as the latter . This proves our assertion .Corollary. The r .v .'s {Y k , k E N}, whereYk(W) =f+1(P(Xn(w))and co is a Borel measurable function, are independent and identicallydistributed .For cP - 1, Y k reduces to ak . For co(x) - x, Yk = S& - S $A _, . The readeris advised to get a clear picture of the quantities ak, 13k' and Y k beforeproceeding further, perhaps by considering a special case such as (5) .We shall now apply these considerations to obtain results on the "globalbehavior" of the random walk . These will be broad qualitative statementsdistinguished by their generality without any additional assumptions .The optional r .v . to be considered is the first entrance time into the strictlypositive half of the real line, namely A = (0, oc) in (5) above . Similar resultshold for [0, oc) ; and then by taking the negative of each X n , we may deducethe corresponding result for (-oc, 0] or (-oo, 0) . Results obtained in this waywill be labeled as "dual" below . Thus, omitting A from the notation :00min{n E N : Sn > 0} on U {w : S n (w) > 0} ;(1 1) a(co)n=1+cc elsewhere ;


2 7 6 1 RANDOM WALKandb'n EN :{a=n}={Sj


8.2 BASIC NOTIONS 1 27 7Theorem 8.2 .5. For the general random walk, there are four mutually exclusivepossibilities, each taking place a .e . :(i) `dn E N : S n = 0 ;(ii) S, -~ -00 ;(Ill) Sn --~ +00 ;(iv) -00 = limn-),oo Sn < lim""~ S n = +00 .PROOF. If X = 0 a .e ., then (i) happens . Excluding this, let c1 = limn S n .Then ~o1 is a permutable r.v ., hence a constant c, possibly ±oo, a .e . byTheorem 8 .1 .4 . Sincelim Sn = X 1 + lim(Sn - X1),nnwe have (P1 = X1 +'P2, where c0 2 (0)) ='p l (rw) = c a.e . Since X1 $ 0, itfollows that c = +oo or -oo . This means thatBy symmetry we have alsoeither llm S n = +oo or lim S n = -00nneither lim S n = -oo or lim S, = + 00 .nnThese double alternatives yield the new possibilities (ii), (iii), or (iv), othercombinations being impossible .This last possibility will be elaborated upon in the next section .EXERCISESIn Exercises 1-6, the stochastic process is arbitrary .* 1 . a is optional if and only if V n E N : {a < n } E `fin .*2 . For each optional a we have a E a and Xa E « . If a and ~ are bothoptional and a < ,8, then 3 C . p7 .3 . If a 1 and a2 are both optional, then so is a l A a2, a l V a 2 , a 1 + a 2 . Ifa is optional and A E a, then ao defined below is also optional :ao =a on A+00 on c2\0 .*4 . If a is optional and ~ is optional relative to the post-a process, thena + ~ is optional (relative to the original process) .5 . `dk E N : a l + . . . + a k is optional . [For the a in (11), this has beencalled the kth ladder variable .]


27 8 1 RANDOM WALK*6. Prove the following relations :ak = a ° TOA-1 ; ,T~A-1 ° Ta = TO' ; XI'R k-i +j ° -r' = X /gk+j7. If a and 8 are any two optional r .v .'s, thenX 13+j (Taw) = Xa((0)+fl(raw)+j ( (0 ) ;(T O ° Ta)(w) = z~(r"w)+a(w)(~)0 T~+a (w) in general .*8. Find an example of two optional r .v .'s a and fi such that a < ,B buta' ~5 ~j . However, if y is optional relative to the post-a process and ,B =a + y, then indeed Via' D 1' . As a particular case, ~k is decreasing (while'-1 k is increasing) as k increases .9. Find an example of two optional r .v .'s a and ,B such that a < ,B but- a is not optional .10. Generalize Theorem 8 .2 .2 to the case where the domain of definitionand finiteness of a is A with 0 < ~P(A) < 1 . [This leads to a useful extensionof the notion of independence . For a given A in 9 with _`P(A) > 0, twoevents A and M, where M C A, are said to be independent relative to A iff{A n A n M} = {A n A}go {M} .]* 11 . Let {X,,, n E N} be a stationary independent process and folk, k E N}a sequence of strictly increasing finite optional r .v .'s . Then {Xa k +1, k E N} isa stationary independent process with the same common distribution as theoriginal process . [This is the gambling-system theorem first given by Doob in1936 .]12 . Prove the Corollary to Theorem 8 .2 .2 .13. State and prove the analogue of Theorem 8 .2 .4 with a replaced bya[o,,,,) . [The inclusion of 0 in the set of entrance causes a small difference .]14. In an independent process where all X„ have a common bound,(c{a} < oc implies (`{S a } < oc for each optional a [cf. Theorem 5 .5 .3] .8 .3 RecurrenceA basic question about the random walk is the range of the whole process :U,OO_ 1 S„ (w) for a .e . w ; or, "where does it ever go?" Theorem 8 .2 .5 tells usthat, ignoring the trivial case where it stays put at 0, it either goes off to -00or +oc, or fluctuates between them . But how does it fluctuate? Exercise 9below will show that the random walk can take large leaps from one end tothe other without stopping in any middle range more than a finite number oftimes . On the other hand, it may revisit every neighborhood of every point aninfinite number of times . The latter circumstance calls for a definition .


8.3 RECURRENCE I 2 7 9DEFINrrION . The number x E P21 is called a recurrent value of the randomwalk {S,,, n E N}, iff for every c > 0 we have(1) {ISn -xI < E i .0 .} = 1 .The set of all recurrent values will be denoted by t)1 .Taking a sequence of E decreasing to zero, we see that (1) implies theapparently stronger statement that the random walk is in each neighborhoodof x i .o . a.e .Let us also call the number x a possible value of the random walk ifffor every e > 0, there exists n E N such that ISn - xI < E} > 0 . Clearly arecurrent value is a possible value of the random walk (see Exercise 2 below) .Theorem 8 .3 .1 . The set Jt is either empty or a closed additive group of realnumbers . In the latter case it reduces to the singleton {0) if and only if X = 0a .e . ; otherwise 91 is either the whole _0'2 1 or the infinite cyclic group generatedby a nonzero number c, namely {±nc : n E N° ) .PROOF. Suppose N :A 0 throughout the proof . To prove that S}1 is a group,let us show that if x is a possible value and y E Y, then y - x E N. Supposenot; then there is a strictly positive probability that from a certain value of non, Sn will not be in a certain neighborhood of y - x . Let us put for z E R 1 :(2) p€,m(z) _ 7{IS n - zI > E for all n > m} ;Iso that p 2E , n,(y - x) > 0 for some e > 0 and m E N . Since x is a possiblevalue, for the same c we have a k such that P{ISk - xI < E) > 0 . Now thetwo independent events Sk - x I < E and IS„ - Sk - (y - x) I > 2E togetherimply that IS, - y I > E ; hence(3) PE,k+m (Y) _ ='T {IS,, - Y I >E for all n > k + m)> JY{ISk - XI < WITS, - Sk - (y - x)I > 2E for all n > k + in) .The last-written probability is equal to p2E .n1(y - x), since S„ - Sk has thesame distribution as Sn _k . It follows that the first term in (3) is strictly positive,contradicting the assumption that y E 1. We have thus proved that 91 is anadditive subgroup of :%,' . It is trivial that N as a subset of %, 1 is closed in theEuclidean topology . A well-known proposition (proof?) asserts that the onlyclosed additive subgroups of are those mentioned in the second sentenceof the theorem . Unless X - 0 a .e ., it has at least one possible value x =A 0,and the argument above shows that -x = 0 - x c= 91 and consequently alsox = 0 - (-x) E 91 . Suppose 91 is not empty, then 0 E 91 . Hence 91 is not asingleton . The theorem is completely proved .


2 8 0 1 RANDOM WALKIt is clear from the preceding theorem that the key to recurrence is thevalue 0, for which we have the criterion below .Theorem 8 .3 .2 .If for some E > 0 we have(4) 1: -,:;P { IS, I< E} < oc,nthen(5) -:flIS .< E i .o .} = 0(for the same c) so that 0 ' N . If for every E > 0 we have(6) EJ ?;D{IS,I < E} = oo,nthen(7) ?;?If IS, I < E i .o .} = 1for every E > 0 and so 0 E 9I .Remark . Actually if (4) or (6) holds for any E > 0, then it holds forevery c > 0 ; this fact follows from Lemma 1 below but is not needed here .PROOF . The first assertion follows at once from the convergence part ofthe Borel-Cantelli lemma (Theorem 4 .2 .1) . To prove the second part considerF = lim inf{ I S, I > E} ;nnamely F is the event that IS n I < E for only a finite number of values of n .For each co on F, there is an m(c)) such that IS, (w)I > E for all n > m(60) ; itfollows that if we consider "the last time that IS, I < E", we have0C~'(F) =7{ISmI< E ; IS, I > E for all n > m + 1} .M=0Since the two independent events ISm I < E and I S n - Sm I > 2E together implythat IS, I > e, we have1 > 1' (F) >am=1ISmI< E},'~P{IS, - SmI > 2E for all n > m+ 1}M=1{ISmI< E}P2E,1(0)


8.3 RECURRENCE 1 28 1by the previous notation (2), since S,, - Sn, has the same distribution as Sn-m •Consequently (6) cannot be true unless P2,,1(0) = 0 . We proceed to extendthis to show that P2E,k = 0 for every k E N . To this aim we fix k and considerthe eventA,,,={IS.IEforall n>m+k} ;then A, and A n,- are disjoint whenever m' > m + k and consequently (why?)kThe argument above for the case k = 1 can now be repeated to yieldm=100k J'{ISm I < E}P2E,k(0),m=1and so P2E,k(0) = 0 for every E > 0 . Thus1'(F) = +m PE,k(0) = 0,which is equivalent to (7) .A simple sufficient condition for 0 E t, or equivalently for 91 willnow be given .Theorem 8.3 .3 . If the weak law of large numbers holds for the random walkIS, n E N} in the form that S n /n -* 0 in pr ., then N 0 0 .PROOF. We need two lemmas, the first of which is also useful elsewhere .Lemma 1 .For any c > 0 and m E N we have00(8) ,~P{ISn I < mE} < 2m -'~fl IS„ I < e} .(9)n=0PROOF OF LEMMA 1 . It is sufficient to prove that if the right member of (8)is finite, then so is the left member and (8) is true . PutI = (- E, E),(PJ (Sn ) =n=0J = [JE, (j + 1)E),for a fixed j E N ; and denote by co1,'J the respective indicator functions .Denote also by a the first entrance time into J, as defined in (5) of Sec . 8 .2with Z„ replaced by S,, and A by J . We have=k00=1coj (Sn )d 7 .


2 82 1 RANDOM WALKThe typical integral on the right side is by definition of a equal tof1 + 6oj (Sn ) d ' 0 : lim„ .,,) un (Sn) = 1 .(10) un (1) = oo .n=0Remark. If (ii) is true for all integer m > 1, then it is true for all realm > 1, with c doubled .PROOF OF LEMMA 2. Suppose not ; then for every A > 0 :00 >n=0U, 0) > -1 [Am]1(m) > -cm n=0Cm n=0


8.3 RECURRENCE 1 2 831 A> U ( )Cm n=0Letting m -+ oc and applying (iii) with 8 = A -1 ,001: un=0n (1) ? A 'Cwe obtainsince A is arbitrary, this is a contradiction, which proves the lemmaTo return to the proof of Theorem 8 .3 .3, we apply Lemma 2 withun (m) = 3 '{ ISn I < m} .Then condition (i) is obviously satisfied and condition (ii) with c = 2 followsfrom Lemma 1 . The hypothesis that S n /n -- 0 in pr. may be written asu n (Sn) = GJ'S nnfor every S > 0 as n - oo, hence condition (iii) is also satisfied . Thus Lemma 2yields00E-'~'P{ I Sn I < 1) _ + 00 .n=0Applying this to the "magnified" random walk with each X n replaced by X n /E,which does not disturb the hypothesis of the theorem, we obtain (6) for everye > 0, and so the theorem is proved .In practice, the following criterion is more expedient (Chung and Fuchs,1951) .Theorem 8 .3.4 . Suppose that at least one of S(X+) and e(X - ) is finite . The910 0 if and only if cP(X) = 0 ; otherwise case (ii) or (iii) of Theorem 8 .2 .5happens according as c(X) < 0 or > 0 .PROOF . If -oo < (-f(X) < 0 or 0 < o (X) < +oo, then by the strong lawof large numbers (as amended by Exercise 1 of Sec . 5 .4), we haveSn -~ (X) a .e .,nso that either (ii) or (iii) happens as asserted . If (f (X) = 0, then the same lawor its weaker form Theorem 5 .2 .2 applies ; hence Theorem 8 .3 .3 yields theconclusion .


284 1 RANDOM WALKDEFINITION OF RECURRENT RANDOM WALK. A random walk will be calledrecurrent iff , :A 0 ; it is degenerate iff N _ {0} ; and it is of the lattice typeiff T is generated by a nonzero element .The most exact kind of recurrence happens when the X, 's have a commondistribution which is concentrated on the integers, such that every integer isa possible value (for some S,,), and which has mean zero . In this case foreach integer c we have 7'{S n = c i .o .} = 1 . For the symmetrical Bernoullianrandom walk this was first proved by Polya in 1921 .We shall give another proof of the recurrence part of Theorem 8 .3 .4,namely that ( (X) = 0 is a sufficient condition for the random walk to be recurrentas just defined . This method is applicable in cases not covered by the twopreceding theorems (see Exercises 6-9 below), and the analytic machineryof ch .f.*s which it employs opens the way to the considerations in the nextsection .The starting point is the integrated form of the inversion formula inExercise 3 of Sec . 6 .2 . Talking x = 0, u = E, and F to be the d .f. of S n , wehaveEE(11) -"On I < E) > f ~(ISn I < u) du = E f : 1 -~ os Et f(t)n dt .Thus the series in (6) may be bounded below by summing the last expressionin (11) . The latter does not sum well as it stands, and it is natural to resortto a summability method . The Abelian method suits it well and leads to, for0 I1-rf(t)I2 >0and (1 - cos Et)/t 2 > CE 2 for IEtI < 1 and some constant C, it follows thatfor r~ < 1\c the right member of (12) is not less than(14)m-nRj 1 - r f (t) dt .77Now the existence of FF(JXI) implies by Theorem 6 .4 .2 that 1 - f (t) = o(t)as t -* 0 . Hence for any given 8 > 0 we may choose the rl above so thatII1 - rf (t) 1 2 < (1 - r + r[1 - R f (t)]) 2 + (rlf (t)) 2pp


8 .3 RECURRENCE 1 285< 2(1 - r) 2 + 2(r8t) 2 + (r8t) 2 = 2(1 - r) 2 + 3r 28Y .The integral in (14) is then not less thann (1 - r) dt 1 rW - 6-1ds-2(1 - r) 2 + 3r2S2t2 dt >- 3 f_(1_r) r71 + 8 2 s 2As r T 1, the right member above tends to rr\3S ; since S is arbitrary, we haveproved that the right member of (12) tends to +oo as r T 1 . Since the seriesin (6) dominates that on the left member of (12) for every r, it follows that(6) is true and so 0 E In by Theorem 8 .3 .2 .EXERCISESf is the ch .f . of p .1. Generalize Theorems 8 .3 .1 and 8 .3 .2 to Rd . (For d > 3 the generalizationis illusory ; see Exercise 12 below .)*2. If a random walk inR dis recurrent, then every possible value is arecurrent value .3. Prove the Remark after Theorem 8 .3 .2 .*4. Assume that 7{X 1 = 01 < 1 . Prove that x is a recurrent value of therandom walk if and only if00n=1'{ ISn - x I < E) = oo for every c > 0 .5. For a recurrent random walk that is neither degenerate nor of thelattice type, the countable set of points IS, (w), n E N) is everywhere dense inM, 1 for a .e.w. Hence prove the following result in Diophantine approximation :if y is irrational, then given any real x and c > 0 there exist integers m and nsuch that Imy + n - xj < E .*6 . If there exists a 6 > 0 such that (the integral below being real-valued)sdtlim = 00,rTlrf (t)then the random walk is recurrent .*7. If there exists a S > 0 such thatsdtsupO


28 6 I RANDOM WALKthen the random walk is not recurrent . [HINT : Use Exercise 3 of Sec . 6 .2 toshow that there exists a constant C(E) such that,,'(IS,, I < E) < C(E)_'w11 - cos E-lx l/EC(E) uAn (dx)


8 .3 RECURRENCE 1 2 8 7is a strictly positive quadratic form Q in (t 1 , t2, t3)- If3i=lIti I 0 such that for every b > 1 and m > m(b) :2(iii') un (m) > dm for m 2 < n < (bm) 2nThen (10) is true .*15. Generalize Theorem 8 .3 .3 to 92 as follows . If the central limittheorem applies in the form that S 12 I ,fn- converges in dist. to the unitnormal, then the random walk is recurrent . [HINT : Use Exercises 13 and 14 andExercise 4 of § 4 .3 . This is sharper than Exercise 11 . No proof of Exercise 12using a similar method is known .]16. Suppose (X) = 0, 0 < (,,(X2) < oo, and A is of the integer latticetype, thenGJ'{S n 2 = 0 i .o .} = 1 .17. The basic argument in the proof of Theorem 8 .3 .2 was the "last timein (-E, E)" . A harder but instructive argument using the "first time" may begiven as follows .f n {ISO)>Eform


288 1 RANDOM WALKIt follows by a form of Lemma 1 in this section thatlimin-+x n=mf nm ) > 14now use Theorem 8 .1 .4 .18. For an arbitrary random walk, if flVn E N : S n > 0) > 0, thennP{Sn < 0, S n+i > 01 < oo .Hence if in addition 9I{Vn E N :S, < 01 > 0, thenand consequentlyE I -~P7 {Sn > 01 - -~P{Sn+1 > O1j < oo ;nn(-1)n,,,J'{Sn > 01 < oo .n[HINT : For the first series, consider the last time that S n < 0 ; for the third series,apply Du Bois-Reymond's test. Cf. Theorem 8 .4 .4 below ; this exercise willbe completed in Exercise 15 of Sec . 8 .5 .]8 .4 Fine structureIn this section we embark on a probing in depth of the r .v . a defined in (11)of Sec . 8.2 and some related r .v .'s .The r .v. a being optional, the key to its introduction is to break up thetime sequence into a pre-a and a post-a era, as already anticipated in theterminology employed with a general optional r .v . We do this with a sort ofcharacteristic functional of the process which has already made its appearancein the last section :O° 1(1) ~rrn e itSn _ n f (t) n = 1 -rf(t)n=0 n=0where 0 < r < 1, t is real, and f is the ch .f . of X . Applying the principle justenunciated, we break this up into two parts :a-I


8 .4 FINE STRUCTURE 1 289with the understanding that on the set {a = oo}, the first sum above is En°_ owhile the second is empty and hence equal to zero . Now the second part maybe written as(2) e I:r00 00a+n eitSa+ n = raeitsaLrrn e it(S..+n - S.)n=0 n=0It follows from (7) of Sec . 8 .2 that,S'a+n- S a = Sno r ahas the same distribution as S n , and by Theorem 8 .2 .2 that for each n it isindependent of S a . [Note that the same fact has been used more than oncebefore, but for a constant a .] Hence the right member of (2) is equal tot { ra eits" } d,rn e itsn = f{raeits,} 11-rf(t)where raeitsais taken to be 0 for a = oo . Substituting this into (1), weobtain1 a-1(3) [1 - t{raeitsa } ] _ r n e itsn1 - rf (t)n=0We have00(4) 1 - a „ {r e its '} = 1 - Efla=n)r nand(5)a-1 00 r k-1r n e itSn = 1 rn eitsn d -OPn=0 k=1 {a=k} n=000= 1 + Er n eits~ dJ'n=1 {a>n}by an interchange of summation . Let us record the two power series appearingin (4) and (5) asP(r, t) = 1 - S{raeits') = 1: r n Pn (t) ;00n=0Q(r, t) _ rneitsn-E rn gn(t),5Gn=0 n=000


2 9 0 1 RANDOM WALKwherepo(t) pn (t) _ -fla=n)eitSn dJ' = - f eitX Un (dx) ;qo(t) = 1, qn (t) ={a>n)eirsn dam'= Jei x Vn (dx) .iNow U n ( •) = Jf{a = n ; S n E •} is a finite measure with support in (0, oc),while V,, ( •) = Jf{a > n ; S, E •} is a finite measure with support in (-oc, 0] .Thus each p n is the Fourier transform of a finite measure in (0, oo), whileeach qn is the Fourier transform of a finite measure in (-oc, 0] . Returning to(3), we may now write it as(6) 1 - r f (t) P(r, t) = Q(r, t) .The next step is to observe the familiar Taylor series :Thus we have1 °O1 - x = exp , IxI < 1 .n=1(7)11 - r f (t) = eXpn=1= expr ' eirsn dGJ' + f{Sn >o){Sn _


8 .4 FINE STRUCTURE 1 29 1rearrangements of the resulting double series that0000.f+(r, t) = 1 + ~on(t), .f-(r, t) = 1 + " *n(t),n=1where each cp n is the Fourier transform of a measure in (0, oo), while eachYin is the Fourier transform of a measure in (-oo, 0] . Substituting (7) into (6)and multiplying through by f + ( r, t), we obtain(8) P(r, t) f -(r, t) = Q(r, t) f +(r, t) .The next theorem below supplies the basic analytic technique for this development,known as the Wiener-Hopf technique .Theorem 8.4 .1 .LetP(r, t) rn pn (t), Q(r, t) = n qn (t),n =O0000n=0P * (r, t) = r n P n (t), Q * (r, t) _n=0 n=000n=1n q,i (t),where po (t) - qo (t) = po (t) = qo (t) = 1 ; and for n > 1, p„ and p ; as functionsof t are Fourier transforms of measures with support in (0, oc) ; q„ andq, as functions of t are Fourier transforms of measures in (-oo, 0]. Supposethat for some r o > 0 the four power series converge for r in (0, r o ) and allreal t, and the identity(9) P(r, t)Q * (r, t) _- P * (r, t)Q(r, t)holds there . ThenP=P * ,The theorem is also true if (0, oc) and(-oc, 0), respectively .Pk(t)gn-k(t) _k=0 k=0(-oc, 0] are replaced by [0, oc) andPROOF . It follows from (9) and the identity theorem for power series thatfor every n > 0 :nnPk( t )gn - k( t ) -Then for n = 1 equation (10) reduces to the following :pl (t) - pi (t) = qi (t) - qi (t) .


292 1 RANDOM WALKBy hypothesis, the left member above is the Fourier transform of a finite signedmeasure v 1 with support in (0, oc), while the right member is the Fouriertransform of a finite signed measure with support in (-oc, 0] . It follows fromthe uniqueness theorem for such transforms (Exercise 13 of Sec . 6 .2) thatwe must have v 1 - v 2 , and so both must be identically zero since they havedisjoint supports . Thus p1 = pi and q l = q* . To proceed by induction on n,suppose that we have proved that p j - pt and q j - q~ for 0 < j < n - 1 .Then it follows from (10) thatpn (t) + qn (t)_- pn (t) + qn (t) •Exactly the same argument as before yields that pn - pn and qn =- qn . Hencethe induction is complete and the first assertion of the theorem is proved ; thesecond is proved in the same way .Applying the preceding theorem to (8), we obtain the next theorem inthe case A = (0, oo) ; the rest is proved in exactly the same way .Theorem 8 .4 .2 . If a = aA is the first entrance time into A, where A is oneof the four sets : (0, oo), [0, oc), (-oc, 0), (-oo, 0], then we have1 - c`{ r a e `rs~} = exp - e`tsnd 7 ;r n eitsn1 = exp +n=1 n {S„EA`}elrsn d~P .From this result we shall deduce certain analytic expressions involvingthe r .v . a . Before we do that, let us list a number of classical Abelian andTauberian theorems below for ready reference .(* )(A) If c, > 0 and En°- o C n r n converges for 0 < r < 1, then00lim c n rn =rT1 11=0 n=0finite or infinite .(B) If Cn are complex numbers and En°_ o c, rn converges for 0 < r < 1,then (*) is true .(C) If Cn are complex numbers such that Cn = o(n _1 ) [ or just O(n -1 )]as n --+ oc, and the limit in the left member of (*) exists and is finite, then(*) is true .


8.4 FINE STRUCTURE 1 293(D) If c,(,' ) > 0, En °-0 c ( ' ) r n converges for 0 < r < 1 and diverges forr=1,i=1,2;andnnE Ck (l) K1 : n2)k=O k=0[or more particularly cn' ) - Kc,(2) ] as n --+ oo, where 0 < K < +oc, then00 00cMrn _ K 1: C (2) r nnn=0 n=0nas rT1 .(E) If c,> 0 and00E Cn rnn=011-ras r T 1, thenn-1I: Ck-n .k=0Observe that (C) is a partial converse of (B), and (E) is a partial converseof (D) . There is also an analogue of (D), which is actually a consequence of(B) : if c n are complex numbers converging to a finite limit c, then as r f 1,00E Cn rnn=0ti c1 - rProposition (A) is trivial . Proposition (B) is Abel's theorem, and proposition(C) is Tauber's theorem in the "little o" version and Littlewood's theoremin the "big 0" version ; only the former will be needed below . Proposition (D)is an Abelian theorem, and proposition (E) a Tauberian theorem, the latterbeing sometimes referred to as that of Hardy-Littlewood-Karamata . All fourcan be found in the admirable book by Titchmarsh, The theory of functions(2nd ed ., Oxford University Press, Inc ., New York, 1939, pp . 9-10, 224 ff.) .Theorem 8 .4 .3 .The generating function of a in Theorem 8 .4 .2 is given by(13) (`{r') = 1 - exp -n=1rnGJ'[S n E A]nn= 1 - (1 - r) exp y- GJ'[S n E AC] .


294 1 RANDOM WALKWe haveO° 1(14) ~P{a < oo} = 1 if and only f -~[S n E A] = oc ;in which case(15) 1 {a) = exp E n~[S n E Ac]n=1PROOF . Setting t = 0 in (11), we obtain the first equation in (13), fromwhich the second follows at once through10000YnY n 00 Y n= exp = exp -OP[Sn E A] + -"P[Sn E A`]1-r nn= n= n=Sincelim 1 {ra } = lim ?'{ = n =rT1rTln=0n=1by proposition (A), the middle term in (13) tends to a finite limit, hence alsothe power series in r there (why?) . By proposition (A), the said limit may beobtained by setting r = 1 in the series . This establishes (14) . Finally, settingt = 0 in (12), we obtainn=0n=1a-1 n(16) Lrn = expn=0 n=1r~P [Sn E A c ]nRewriting the left member in (16) as in (5) and letting r T 1, we obtain00(17) urn rn 7[a > n] = °J' [a > n] = 1{a} < oorTI= n} = P{a < oo}by proposition (A) . The right member of (16) tends to the right member of(15) by the same token, proving (15) .When is (N a) in (15) finite? This is answered by the next theorem.*Theorem 8 .4.4 .is finite ; thenSuppose that X # 0 and at least one of 1(X+) and 1(X - )(18) 1(X) > 0 = ct{a(o, 00 )} < 00 ;(19) c'(X) < 0 = (P{a[o,00)) = 00-* It can be shown that S, --* +oc a .e . if and only if f{a(o,~~} < oo ; see A . J . Lemoine, Annalsof Probability 2(1974) .


8.4 FINE STRUCTURE I 2 9 5PROOF . If c` (X) > 0, then 9 1{Sn -f +oo) = 1 by the strong law of largenumbers . Hence Sn = -00} = 0, and this implies by the dualof Theorem 8 .2 .4 that {a(_,,,o) < oo) < 1 . Let us sharpen this slightly toJ'{a(_,,),o] < o0} < 1 . To see this, write a' = a(_oo,o] and consider S, asin the proof of Theorem 8 .2 .4 . Clearly Sin < 0, and so if JT{a' < oo} = 1,one would have J'{S n < 0 i .o .) = 1, which is impossible . Now apply (14) toa(_o,~,o] and (15) to a(o, 00 ) to inferO° 1f l(y(o,oo)) = exp E < 01 < oc,n=1 nproving (18) . Next if (X) = 0, then J'{a(_~,o) < 00} = 1 by Theorem 8 .3 .4 .Hence, applying (14) toand (15) to a[o,,,.), we infer{a[o,~)} = exp°O 1n J'[S n < 0] = 0c .n=1Finally, if (X) < 0, then /){a[o,~) = 00} > 0 by an argument dual to thatgiven above for a(,,,o], and so {a[o,,,3 )} = oc trivially .Incidentally we have shown that the two r .v .'s a(o,w ) and a[o, 0c ) haveboth finite or both infinite expectations . Comparing this remark with (15), wederive an analytic by-product as follows .Corollary .We have(20)n=1JT [Sn = 0] < 00 .nThis can also be shown, purely analytically, by means of Exercise 25 ofSec . 6 .4 .The astonishing part of Theorem 8 .4 .4 is the case when the random walkis recurrent, which is the case if ((X) = 0 by Theorem 8 .3 .4 . Then the set[0, 00), which is more than half of the whole range, is revisited an infinitenumber of times . Nevertheless (19) says that the expected time for even onevisit is infinite! This phenomenon becomes more paradoxical if one reflectsthat the same is true for the other half (-oc, 0], and yet in a single step oneof the two halves will certainly be visited . Thus we have :01(- 0,-,01 A a[o, C)O) = 1, ( {a'(_ ,o]} = c~'{a[o,oo)} = 00 .Another curious by-product concerning the strong law of large numbersis obtained by combining Theorem 8 .4.3 with Theorem 8 .2 .4 .


296 1 RANDOM WALKTheorem 8 .4 .5 . S, 1n -* m a .e . for a finite constant m if and only if forevery c > 0 we haven=1S n--mnPROOF . Without loss of generality we may suppose m = 0 . We know fromTheorem 5 .4 .2 that S n /n -> 0 a.e . if and only if (,'(IX I) < oc and (~ (X) = 0 .If this is so, consider the stationary independent process {X' , n E N}, whereX n = X n - E, E > 0 ; and let S' and a' = ado ~> be the corresponding r .v .'sfor this modified process . Since AX') _ - E, it follows from the strong lawof large numbers that S' --> -oo a .e ., and consequently by Theorem 8 .2 .4 wehave < oo} < 1 . Hence we have by (14) applied to a' :001(22) 1: -°J'[S n - nE > 0] < 00 .nBy considering Xn + E instead of X n - E, we obtain a similar result with"Sn - nE > 0" in (22) replaced by "S n + nE < 0" . Combining the two, weobtain (21) when m = 0 .Conversely, if (21) is true with m = 0, then the argument above yields°J'{a' < oo} < 1, and so by Theorem 8 .2 .4, GJ'{limn . oo Sn = +oo} = 0 . Afortiori we have. E > 0 : _'P{S' > nE i .0 .} = -:7p{Sn > 2ne i .o .) = 0 .Similarly we obtain YE > 0 : J'{S,, < -2ne i .o .} = 0, and the last two relationstogether mean exactly Sn/n --* 0 a .e . (cf. Theorem 4 .2 .2) .Having investigated the stopping time a, we proceed to investigate thestopping place S a , where a < oc . The crucial case will be handled first .Theorem 8 .4 .6 . If (-r(X) = 0 and 0 < e(X2) = 62 < oc, then(23) ~,-, {S«} =a exp1 -1 - - ' (Sn E A ) < 00 .n=1 n 2PROOF . Observe that ~,,(X) = 0 implies each of the four r .v .'s a is finitea .e. by Theorem 8 .3 .4 . We now switch from Fourier transform to Laplacetransform in (11), and suppose for the sake of definiteness that A = (0, oo ) .It is readily verified that Theorem 6 .6 .5 is applicable, which yields0° nr(24) 1 - e`"'{r"e- ;~S°} = exp - .- e -)'s" dGJ'nn=1 {sn>0)


8 .4 FINE STRUCTURE 1 2 9 7for 0 < r < 1, 0 < X < oo . Letting r T 1 in (24), we obtain an expression forthe Laplace transform of Sa , but we must go further by differentiating (24)with respect to X to obtain(25) (r'{rae- ;~S~Sa}r nI S„ e -X s, dUT • exp - - e -xS ^ d:~ ,(ii=1 n 17>0) n=1 n S,, >0)the justification for termwise differentiation being easy, since f { IS,, I } < n J'I I X I) .If we now set ?, = 0 in (25), the result is(26) (-r{raS a }00 Yn 00 rn- (,{S„ } exp > 0]nn=1By Exercise 2 of Sec . 6 .4, we have as n --). oc,Ln Snn } s/27rnso that the coefficients in the first power series in (26) are asymptotically equalto a/-~f2- times those of°° . 12n(1 - r)-1/2 =22nC n / r n ,n=0since2n ti22n 2) ( nrrn 1It follows from proposition (D) above that00Ilr 0)It remains to prove that the limit above is finite, for then the limit of thepower series is also finite (why? it is precisely here that the Laplace transformsaves the day for us), and since the coefficients are o(l/n) by the central


298 1 RANDOM WALKlimit theorem, and certainly O(11n) in any event, proposition (C) above willidentify it as the right member of (23) with A = (0, oc) .Now by analogy with (27), replacing (0, cc) by (-oo, 0] and writinga(_oc,o] as 8, we have(28) ({S p} = v lim exp n 1 J'(S,, < 0)rTl n 2n=1Clearly the product of the two exponentials in (27) and (28) is just exp 0 = 1,hence if the limit in (27) were +oc, that in (28) would have to be 0 . Butsince c% (X) = 0 and (`(X2) > 0, we have T(X < 0) > 0, which implies atonce %/'(Sp < 0) > 0 and consequently cf{S f } < 0 . This contradiction provesthat the limits in (27) and (28) must both be finite, and the theorem is proved .Theorem 8 .4 .7 . Suppose that X # 0 and at least one of (,(X+) and (-,''(X- )is finite ; and let a = a ( o , oc ) , , = a (_,, o 1 .(i) If (-r(X) > 0 but may be +oo, then ASa ) =(ii) If t(X) = 0, then (F(Sa ) and o (S p) are both finite if and only if(~, '(X 2 ) < 00 .PROOF. The assertion (i) is a consequence of (18) and Wald's equation(Theorem 5 .5.3 and Exercise 8 of Sec . 5 .5) . The "if' part of assertion (ii)has been proved in the preceding theorem ; indeed we have even "evaluated"(Sa ) . To prove the "only if' part, we apply (11) to both a and ,8 in thepreceding proof and multiply the results together to obtain the remarkableequation :(29) [ 1 - c { . a e tts a}]{l - C`{rIe`rs0}] = exp -- f (t) n = 1 - r f (t) .Setting r = 1, we obtain for t ; 0 :1 - f (t) 1 - c'{eitS-} 1 - (`L{el tSf}t 2 -it +itLetting t ,(, 0, the right member above tends to (-,"ISa}c`'{-S~} by Theorem 6 .4 .2 .Hence the left member has a real and finite limit . This implies (`(X2) < 00 byExercise 1 of Sec . 6 .4 .8 .5 ContinuationOur next study concerns the r .v .'s M n and M defined in (12) and (13) ofSec . 8 .2. It is convenient to introduce a new r.v . L, which is the first time


8 .5 CONTINUATION 1 299(necessarily belonging to N,,O ) that M n is attained :(1) Vn ENO : L,(co) = min{k E N° : Sk(w) = M n ((0)} ;note that Lo = 0 .We shall also use the abbreviation(2)a = a(O,00),8 = a(-00,0]as in Theorems 8 .4.6 and 8 .4 .7 .For each n consider the special permutation below :(3) pn_ l, 2, nn n-1 ,1)'which amounts to a "reversal" of the ordered set of indices in N n . Recallingthat 8 ° p n is the r .v . whose value at w is 6(p n w), and Sk ° pn = Sn - Sn-kfor 1 < k < n, we have{~ ° pn > n } =nn{Sk ° pn > 0}k=1It follows from (8) of Sec . 8 .1 that(4) e'tsn d 7 =n n-1= n {Sn > Sn-k) = n {Sn > Sk } _ {L n = n} .k=1 k=0e` t(s n ° Pn ) dJi = f e'tsn dg') .4>n} f{~ ° p, >n} {Ln=n}Applying (5) and (12) of Sec . 8 .4 to fi and substituting (4), we obtain(5)00 00`Y ,1rn e'tsn d ;? = exp+L_ e'tsn d9' ;n =o {L,=n} n=1 n {Sn>0}applying (5) and (12) of Sec . 8 .4 to{a > n } = {L,, = 0}, we obtaina and substituting the obvious relation(6)00 00• rnfe' ts n d :%T = exp +>-n=0 {L,=O}r'l e't.Sn ,~dJn{sn


300 1 RANDOM WALKM is finite a .e . if and only if(8) En-'~Pfl S n > 0} < oo,in which case we have(9)of L n :PROOF .00n=100{e"}= expE n[,9(eits„) - 1]n=1Observe the basic equation that follows at once from the meaning( 10) IL, = k} _ {Lk = k} n {Lf_k ° z k = 0},where zk is the kth iterate of the shift . Since the two events on the right sideof (10) are independent, we obtain for each n E N°, and real t and u :(11)It follows that(12)n{eirM ne iu(sn -M n ) } = > eitskeiu(sn-Sk)d P00n=0n=0kk=0J_f{Ln=k}eitsk d~k=0 {Lk=k} {Ln-k~ik =0}nfe l"(sn-koTk )= eitsk dGJ'Jeiusn-k dJ~'Ak=0 {Lk=k} {Ln-k=0}r n {eitM ne iu(Sn -M n ) }00 00nfeitsn dc?P . N7 r'fLn=n} n =0 {Ln=O}eiusn d°J' .Setting u = 0 in (12) and using (5) as it stands and (6) with t = 0, we obtain00Er n=0 n=n {e itM n } = exp - isn >0}eirsn dP +f{Sn _0}1dGJ'which reduces to (7) .Next, by Theorem 8 .2 .4, M < oo a .e . if and only if J'{a < oc} < 1 orequivalently by (14) of Sec . 8 .4 if and only if (8) holds . In this case theconvergence theorem for ch .f.'s asserts that{eitM} = lim { e itM n },n-+00d~/


8 .5 CONTINUATION 1 30 1and consequently[ {ettM} = lim(l -rT100n=Or" c(` {e1tMn}= lim exp - - exprTl n nn=1 n=00= lim exp - [ '(e`tsn) - 1 ]rTlnn=1where the first equation is by proposition (B) in Sec . 8 .4 . SinceE I 00I c" (ettsn) - 1 ?'P[Sn > 0] < 00n=1 n=1by (8), the last-written limit above is equal to the right member of (9), byproposition (B) . Theorem 8 .5 .1 . is completely proved .By switching to Laplace transforms in (7) as done in the proof ofTheorem 8 .4 .6 and using proposition (E) in Sec . 8 .4, it is possible to givean "analytic" derivation of the second assertion of Theorem 8 .5 .1 withoutrecourse to the "probabilistic" result in Theorem 8 .2 .4 . This kind of mathematicalgambit should appeal to the curious as well as the obstinate ; see Exercise 9below . Another interesting exercise is to establish the following result :(13) ~(Mn) k f(Sk)k=1by differentiating (7) with respect to t . But a neat little formula such as (13)deserves a simpler proof, so here it is .Proof of (13) .WritingnMn=VS;=VSj=0 i=(1and dropping "dy" in the integrals below :nC '(Mn) =V SkJIM,, >O} k=1nnSk{M n >O ;S n >O} k=2 {M Mnn >O ;Sn 0} {Sn >0} k=2 {Mn_1>O ;Sn


302 1 RANDOM WALKCall the last three integrals fl , f2 , and f3 . We have on grounds of symmetryApply the cyclical permutation1 n J // S n .J {s n >o }1,2, . . .,n2,3, . . ., 1to f2to obtainObviously, we have//, Mn-12 = J is n >0}Gathering these, we obtain1 _ M n - 1 =IMn-1 •3 {M n_1 >O ;Sn


8 .5 CONTINUATION 1 3 0 3Lemma .For each n E N°, the two random vectorshave the same distribution .(Ln ,Sn ) and (n - Ln ,S n )PROOF .This will follow from (14) if we show that(15) dk E N, : pn 1 {L n = k} = {Ln = n - k} .Now for pn cw the first n + 1 partial sums Sj(pn c)), j E N°, are0,W n ,Wn +&)n-1, . . .,(On + . . .+Wn-j+1, . . .,(On + . . . +w1,which is the same asS n - Sn, Sn - Sn-1, Sn - Sn-2, . . . , S n - Sn_.j, . . . , Sn - S°,from which (15) follows by inspection .Theorem 8 .5 .2 .For each n E N°, the random vectors(16) (Ln, Sn) and (Vn, Sn)have the same distribution ; and the random vectors(16') (Ln, Sn ) and (vn , Sn )have the same distribution .PROOF . For n = 0 there is nothing to prove ; for n = 1, the assertion about(16) is trivially true since {L1 = 0} = {S1 < 0} = {v 1 = 0) ; similarly for (16') .We shall prove the general case by simultaneous induction, supposing that bothassertions have been proved when n is replaced by n - 1 . For each k E N°- 1and y E ~W , 1 , let us putG(y) = J'{L„-1 = k; S„-1 < y}, H(y) _ ?''{ Vn-1 = k ; Sn-1 y} .Then the induction hypothesis implies that G = H . Since X n is independentof Jhn _ 1 and so of the vector (Li - 1, S n -1), we have for each x E 3,' :w(17) /5{Llz -1 = k ; Sn < x} = f F(x - y) dG(y)00=1F(x - y)dH(y)00= ~P{Vn-1 = k ; S n < x),


304 1 RANDOM WALKwhere F is the common d .f . of each X, . Now observe that on the set {S„ < 0}we have L n = L,,-, by definition, hence if x < 0 :(18) {w : L„ (w) = k ; S,, (w) ::S x} = {w : Ln-1 (w) = k ; S n (w) < x} .On the other hand, on the set {S n < 0} we have vn - 1 = v n , so that if x < 0,(19) {w : v„ (w) = k ; S n (w) < x} = {cv : v n - 1 (w) = k ; S n (w) < x} .Combining (17) with (19), we obtain(20) Vk E N°, x < 0 : ?T{L n = k ; Sn < x } = ~{ vn = k ; Sn < x } .Next if k E N°, and x > 0, then by similar arguments :{w : Ln (w) = n - k ; Sn (w) > x} _ {w : Ln_ 1 (w) = n - k ; Sn (w) > x},{w : vn (w) = n - k ; Sn (CO) > X) 10-) V'-Using (16') when n is replaced by n - 1, we obtain the analogue of (20) :(21) Vk E N°, x > 0 : '{Ln = n - k ; Sn > x} = °J'{vn = n - k ; S n > x} .The left members in (21) and (22) below are equal by the lemma above, whilethe right members are trivially equal since v n + vn = n :(22) `dk E N°, x > 0 : ~{Ln = k ; S„ > x} _ °J'{v n = k ; Sn > x} .Combining (20) and (22) for x = 0, we have(23) dk E N° : ~{Ln = k} = °'{vn = k} ;subtracting (22) from (23), we obtain the equation in (20) for x > 0 ; hence itis true for every x, proving the assertion about (16) . Similarly for (16'), andthe induction is complete .As an immediate consequence of Theorem 8 .5.2, the obvious relation(10) is translated into the by-no-means obvious relation (24) below :Theorem 8 .5 .3 . We have for k E N°(24) % { v„ = k} = .mil'{vk = k} Z{vn_k = 0} .If the common distribution of each X„ is symmetric with no atom at zero,then_ 1l(25) Vk E N° : J'{ vn = k} = (- l )nk2 - n -2 k


8 .5 CONTINUATION 1 3 0 5PROOF . Let us denote the number on right side of (25), which is equal to1 2k) (2n - 2k)( k) ( n - k '22nby an (k) . Then for each n E N, {an (k), k E N° } is a well-known probabilitydistribution. For n = 1 we have9'{v1 = 0} = ~P{v1 = 1} = 2= a n (0) = a n (n),so that (25) holds trivially . Suppose now that it holds when n is replaced byn - 1 ; then for k E N,_1 we have by (24) :1 1GJ'{vn = k} _ (- 1)k_k2 (-1)n-k n-2k = a„ (k)It follows thatn-1{vn=0}+°J'{vn=n}_ 9~1 v„=k}k=1n-11 - a n (k) = a 12 (0) + a n (n) .k=1Under the hypotheses of the theorem, it is clear by considering the dual randomwalk that the two terms in the first member above are equal ; since the twoterms in the last member are obviously equal, they are all equal and thetheorem is proved .Stirling's formula and elementary calculus now lead to the famous "arcsinlaw", first discovered by Paul Levy (1939) for Brownian motion .Theorem 8 .5 .4 . If the common distribution of a stationary independentprocess is symmetric, then we haveJ ( V n < 2 1 ~'X dubx E [0, 1 ] : lim .~' x = - arc sin x _ -n->oo n 7 TC p ,/u(l - u)This limit theorem also holds for an independent, not necessarily stationaryprocess, in which each X n has mean 0 and variance 1 and such that theclassical central limit theorem is applicable . This can be proved by the samemethod (invariance principle) as Theorem 7 .3 .3 .


3 0 6 1 RANDOM WALKEXERCISES1 . Derive (3) of Sec . 8 .4 by considering00g{ r aeus ° } = 11n=1{a>n-1}eirsn d?' -eirsn d ?P .{a>n}*2 . Under the conditions of Theorem 8 .4 .6, show that{Sa } _ - 11m Sn d GJ' .n-+oo {,>n}3. Find an expression for the Laplace transform of Sa . Is the correspondingformula for the Fourier transform valid?*4. Prove (13) of Sec . 8 .5 by differentiating (7) there ; justify the steps .5. Prove thatand deduce that1 00?~P[M = 0] = exp -1GJ'[Sn > 0]6. If M < oo a .e ., then it has an infinitely divisible distribution .7. Prove thatT {Ln = 0 ; Sn = 0} =expn=0 n= n-?,'[S, = 0][HINT : One way to deduce this is to switch to Laplace transforms in (6) ofSec . 8 .5 and let a, -+ oo .]8. Prove that the left member of the equation in Exercise 7 is equal to100n=1J'{a' = n ; S n = 0}where a' = a{0, 00 ) ; hence prove its convergence .*9. Prove that00E I J'[S n > 0] < 00n=1I


8 .5 CONTINUATION 1 307implies JT(M < oc) = 1 as follows . From the Laplace transform version of(7) of Sec . 8 .5, show that for A > 0,00lim(1 -n n {e - ; ^ }rTlexists and is finite, say = X(A) . Now use proposition (E) in Sec . 8 .4 and applythe convergence theorem for Laplace transforms (Theorem 6 .6 .3) .*10 . Define a sequence of r .v .'s {Y n , n E N ° ) as follows :Yo = 0,where {Xn , n E N) is a stationary independent sequence . Prove that for eachn, Y n and M n have the same distribution . [This approach is useful in queuingtheory .]11. If -oo < c,,'~' (X) < 0, then[HINT : If Vn = maxi 0] -[HINT : Consider lim n,,o En°_ o G.Y'{v,n = n }rn and use (24) of Sec . 8 .5 .]13 . If ~(X) = 0, (X 2 ) = U 2 , 0 < a2 < oc, then as n - oo we haveece-C= ~[Vn 0] ^ nn , 'P[Vn = n]nn ,where°O 1 1 lc-~n 2 -~T[S„ >0]} .n=1JJ~[HINT :Considerlim(1 -rT1)1200n =O"'~P'[vn = 0]


3 0 8 1 RANDOM WALKas in the proof of Theorem 8 .4.6, and use the following lemma : if p„ is adecreasing sequence of positive numbers such that- 1/2 .]then p„ ^- nnk=1Pk - 2n 1/z* 14 . Prove Theorem 8 .5 .4 .15. For an arbitrary random walk, we haven-1)n .( 1'{Sn > 0) < oc .n[HINT : Half of the result is given in Exercise 18 of Sec . 8 .3 . For the remainingcase, apply proposition (C) in the 0-form to equation (5) of Sec . 8 .5 with L„replaced by vn and t = 0 . This result is due to D . L. Hanson and M . Katz .]Bibliographical NoteFor Sec . 8 .3, seeK. L . Chung and W . H . J . Fuchs, On the distribution of values of sums of randomvariables, Mem. Am . Math . Soc . 6 (1951), 1-12 .K . L . Chung and D . Ornstein, On the recurrence of sums of random variables,Bull . Am . Math . Soc . 68 (1962) 30-32 .Theorems 8 .4 .1 and 8 .4 .2 are due to G . Baxter ; seeG . Baxter, An analytical approach to finite fluctuation problems in probability,J . d' Analyse Math . 9 (1961), 31-70 .Historically, recurrence problems for Bernoullian random walks were first discussedby Polya in 1921 .The rest of Sec . 8 .4, as well as Theorem 8 .5 .1, is due largely to F . L . Spitzer ; seeF. L . Spitzer, A combinatorial lemma and its application to probability theory,Trans . Am . Math . Soc . 82 (1956), 323-339 .F . L . Spitzer, A Tauberian theorem and its probability interpretation, Trans . Am .Math . Soc . 94 (1960), 150-169 .See also his book below-which, however, treats only the lattice case :F . L . Spitzer, Principles of random walk . D . Van Nostrand Co ., Inc ., Princeton,N .J ., 1964 .


8 .5 CONTINUATION 1 309The latter part of Sec . 8 .5 is based onW . Feller, On combinatorial methods in fluctuation theory, Surveys in Probabilityand Statistics (the Harald Cramer volume), 75-91 . Almquist & Wiksell,Stockholm, 1959 .Theorem 8 .5 .3 is due to E . S . Andersen . Feller [13] also contains much material onrandom walks .


9 Conditioning . Markovproperty. Martingale9 .1 Basic properties of conditional expectationIf A is any set in 7 with T(A)>O, we define(1) ~'~'n(E) _-A (•) on ~~7 as follows :Clearly ?An is a p .m. on c ; it is called the "conditional probability relative toA" . The integral with respect to this p .m . is called the "conditional expectationrelative to A" :( 2) (-A (Y) _ Y(w)AA(dw) _~~ ~A) f Y(co)~'(dco) .If //(A) = 0, we decree that :%$A (E) = 0 for every E E T . This convention isexpedient as in (3) and (4) below .Let now {A,,, n > 1} be a countable measurable partition of Q, namely :OcQ = U An,n=1A n E)~' D(A)A,EJ~, A, n nA,=o, if m=n .


9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 1 1Then we have(4) (-"(Y) =00(3) ;!'//(E) = 7'(A n n E) = (An ) JPA n (E) ;n=1n=1provided that C "(Y) is defined . We have already used such decompositionsbefore, for example in the proof of Kolmogorov's inequality (Theorem 5 .3 .1) :nf Sn dT =D(Ak ) cn A (Sn ) .A k=1Another example is in the proof of Wald's equation (Theorem 5 .5 .3), wheree(SN) =000000Y(& (dw) = ?/)(A n )UA„ (Y),n=1k=1T (N = k) 0'{N=k) (SN ) .Thus the notion of conditioning gives rise to a decomposition when a givenevent or r .v . is considered on various parts of the sample space, on each ofwhich some particular information may be obtained .Let us, however, reflect for a few moments on the even more elementaryexample below . From a pack of 52 playing cards one card is drawn and seento be a spade . What is the probability that a second card drawn from theremaining deck will also be a spade? Since there are 51 cards left, amongwhich are 12 spades, it is clear that the required probability is 12/51 . But isthis the conditional probability 7'A (E) defined above, where A = "first cardis a spade" and E = "second card is a spade"? According to the definition,13 .12:JP(A n E) - 52 .51 12^A) - 13 51'52where the denominator and numerator have been separately evaluated byelementary combinatorial formulas . Hence the answer to the question above isindeed "yes" ; but this verification would be futile if we did not have anotherway to evaluate J''A (E) as first indicated . Indeed, conditional probability isoften used to evaluate a joint probability by turning the formula (1) around asfollows :13 12'~(A n E) = J-P(A)J~)A(E) = 52 51In general, it is used to reduce the calculation of a probability or expectationto a modified one which can be handled more easily .


3 12 1 CONDITIONING . MARKOV PROPERTY . MARTINGALELet be the Borel field generated by a countable partition {A n }, forexample that by a discrete r .v . X, where A n = {X = a n } . Given an integrabler .v . Y, we define the function c` (Y) on 0 by :(5) 4(Y) _ A n (Y)IA, ( •)nThus c'.,- (Y) is a discrete r .v . that assumes that value en,, (Y) on the set A,for each n . Now we can rewrite (4) as follows :•(Y) = f A47(Y) dJ' = [4(Y)d 9' .nFurthermore, for any A E -,07 A is the union of a subcollection of the A n 's(see Exercise 9 of Sec . 2 .1), and the same manipulation yields(6) VA E :fJAY dJ' = f 4~(Y) dJ' .In particular, this shows that 4g?(Y) is integrable . Formula (6) equates twointegrals over the same set with an essential difference : while the integrandY on the left belongs to 3 ;7, the integrand 4~(Y) on the right belongs to thesubfield -9. [The fact that 4(Y) is discrete is incidental to the nature of i' .]It holds for every A in the subfield -,6% but not necessarily for a set in 97\6 ? .Now suppose that there are two functions Cpl and ~02, both belonging to a7 ,such thatVA E mar : f Y d~' = f gyp; dam', i = 1, 2 .JAALet A = {w : cp l (w) > cp2(w)}, then A E -' and sonf n((pi - ~O 2) dGJ' = 0 .Hence 9'(A) = 0 ; interchanging cpi and cp2 above we conclude that cpi = c02a .e. We have therefore proved that the 4(Y) in (6) is unique up to an equivalence. Let us agree to use 4(Y) or d (Y I ;!~`) to denote the correspondingequivalence class, and let us call any particular member of the class a "version"of the conditional expectation .The results above are valid for an arbitrary Borel subfield ;!~' and will bestated in the theorem below .Theorem 9 .1 .1 . If c ( I Y I) < oo and ;(;'is a Borel subfield of IT, then thereexists a unique equivalence class of integrable r .v .'s S(Y 1 ~6) belonging to Isuch that (6) holds .


9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 13PROOF . Consider the set function v on ;~7 :VA E r~' : v(A) = f Y dP .IIt is finite-valued and countably additive, hence a "signed measure" on ,


3 1 4 1 CONDITIONING. MARKOV PROPERTY . MARTINGALEwhere "Y" 1 -" means that a (Y"Z) = 0 for every bounded Z E ,6' . In thelanguage of Banach space, Y' is the "projection" of Y on -,,~' and Y" its "orthogonalcomplement" .For the Borel field ~F{X} generated by the r .v . X, we write also (`(Y I X)for ~, (Y I ZIF {X}) ; similarly for I(Y I X 1 , . . ., X„) . The next theorem clarifiescertain useful connections .Theorem 9 .1 .2 . One version of the conditional expectation 'Z'(Y I X) is givenby cp(X ), where cp is a Borel measurable function on R 1 . Furthermore, if wedefine the signed measure A on _,~V bydB E 0 : X (B) = I Y dam',x- ' (B)and the p.m . of X by g, then (p is one version of the Radon-Nikodym derivatived),/dg .PROOF. The first assertion of the theorem is a particular case of thefollowing lemma .Lemma. If Z E i {X}, then Z = ~p(X) for some extended-valued Borelmeasurable function cp .PROOF OF THE LEMMA . It is sufficient to prove this for a bounded positive Z(why?) . Then there exists a sequence of simple functions Z m which increasesto Z everywhere, and each Z m is of the forme~ c j l njj=1where A j E 3?7{X} . Hence A j = X -1 (B j ) for some B j E A1 (see Exercise 11of Sec . 3 .1) . Thus if we takeej=1we have Zm = (p,, (X ) . Since corn (X) -> Z, it follows that corn converges on therange of X . But this range need not be Borel or even Lebesgue measurable(Exercise 6 of Sec . 3 .1) . To overcome this nuisance, we put,t1x E R 1 : cp(x) = hm (pm (x) .M_+00Then Z = limm co rn (X) = cp(X ), and cp is Borel measurable, proving the lemma .To prove the second assertion : given any B E ?A 1 , let A = X-1 (B), thenby Theorem 3 .2.2 we haveAnc(Y IX)dJ'= f IB(X)co(X)d"~'= / 1B(x)co(x)dg = f cp(x)dg)n


9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 1 5Hence by (6),X(B) = / Yd°J'= cp(x)dµ)JA 1 BThis being true for every B in 73 1 , it follows that cp is a version of the derivativedX/dµ . Theorem 9 .1 .2 is proved .IAs a consequence of the theorem, the function u(Y IX) of w is constanta .e . on each set on which X (w) is constant . By an abuse of notation, the co(x)above is sometimes written as S°(Y X = x) . We may then write, for example,for each real c :Y dGJ' = ~(Y I X = x) dGJ'{X < x} .{X_4 :501-00,C]Generalization to a finite number of X's is straightforward . Thus oneversion of (9(Y I X 1 , . . ., X,) is cp(X 1 , . . ., X,), where co is an n-dimensionalBorel measurable function, and by m`(Y I X 1 = xl , . . . , X, = x n ) is meant(p(xl, , xn)-It is worthwhile to point out the extreme cases of ~, (Y I;6?) :°(Y if) = e(Y), m I ~) = Y ; a .e .where / is the trivial field {0, Q} . If ' is the field generated by one setA : {P , A, A', S2}, then e(Y I -6-7) is equal to &(Y I A) on A and e(Y I A c )on A` . All these equations, as hereafter, are between equivalent classes ofr .v .'s .We shall suppose the pair ( .97, -OP) to be complete and each Borel subfieldof JF to be augmented (see Exercise 20 of Sec . 2 .2) . But even if isnot augmented and ~ is its augmentation, it follows from the definition thatdt(Y I ) = i (Y I ), since an r .v . belonging to 3~ is equal to one belongingto ;6~ almost everywhere (why ?) . Finally, if ~j-o is a field generating a7 , or justa collection of sets whose finite disjoint unions form such a field, then thevalidity of (6) for each A in -fo is sufficient for (6) as it stands . This followseasily from Theorem 2.2 .3 .The next result is basic .Theorem 9.1 .3 . Let Y and YZ be integrable r .v .'s and Z E then we have(8) c,`(YZ I ) = Zt(Y I -.6-7 ) a .e .[Here "a .e ." is necessary, since we have not stipulated to regard Z as anequivalence class of r .v .'s, although conditional expectations are so regardedby definition . Nevertheless we shall sometimes omit such obvious "a .e .'s"from now on .]


3 1 6 1 CONDITIONING. MARKOV PROPERTY . MARTINGALEPROOF . As usual we may suppose Y > 0, Z > 0 (see property (ii) below) .The proof consists in observing that the right member of (8) belongs to andsatisfies the defining relation for the left member, namely :I(9) VA E ;!~' :f ZC'(Y I -°) d P = f ZY dJ' .JnFor (9) is true if Z = 1 0 , where A E f, hence it is true if Z is a simpler .v. belonging to ;6' and consequently also for each Z in § by monotoneconvergence, whether the limits are finite or positive infinite . Note that theintegrability of Z (-r'(Y 1 §) is part of the assertion of the theorem .Recall that when § is generated by a partition {A n }, we have exhibited aspecific version (5) of (t(Y 1 §) . Now consider the corresponding ~?P(M I }as a function of the pair (M, w) :9(M 1 -) ( ) ?(M I An)1A (w) •nFor each fixed M, as a function of w this is a specific version of °J'(M 1 -9) .For each fixed wo, as a function of M this is a p .m . on .T given by -"P{ • I Am }for wo E A,n . Let us denote for a moment the family of p .m .'s arising in thismanner by C(wo, •) . We have then for each integrable r .v . Y and each wo in Q :(10) (r (Y I §)(wo)_ fdt'(Y I An ) 1 A„ (WO) YC(wo , do)) .nThus the specific version of (-f'(Y I §) may be evaluated, at each COQ E Q, byintegrating Y with respect to the p .m. C(a)o, •) . In this case the conditionalexpectation 6`(- 1 - ') as a functional on integrable r .v .'s is an integral in theliteral sense . But in general such a representation is impossible (see Doob[16, Sec . 1 .9]) and we must fall back on the defining relations to deduce itsproperties, usually from the unconditional analogues . Below are some of thesimplest examples, in which X and X n are integrable r .v .'s .(i) If X E §', then f' (X 1 §) = X a .e . ; this is true in particular if X is aconstant a .e .(ii) (P (XI + X2 I §) _ (1V1 I §) + 6"- (X2(iii) If X 1 < X2, then (`(X i I §) < c`(X 2 16') .(iv) W (X I {)I


9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 17(vii) If J X, I < Y where c`(Y) < oo and X„ -+ X, then ((X„ I l')~` (X 1 10 .To illustrate, (iii) is proved by observing that for each A E < :f~f(Xi I ~)dJT=f X i d2 ~`'(X2 I )}, we have 7'(A) = 0 . The inequality(iv) may be proved by (ii), (iii), and the equation X = X+ - X- . To prove(v), let the limit of f (X„ 1 ~) be Z, which exists a .e . by (iii) . Then for eachA E {, we have by the monotone convergence theorem :JZ dT = lim ' ~ (X„ I ~) dT = lim X„ dT = f X dT .A n f A " f A AThus Z satisfies the defining relation for o (X 1 6'), and it belongs to 6' withthe F (X„ 16')'s, hence Z = e(X 16') .To appreciate the caution that is necessary in handling conditional expectations,let us consider the Cauchy-Schwarz inequality :(IXYI 16)2 < e(X 2 I -6')A(Y 2 I ') .If we try to extend one of the usual proofs based on the positiveness of thequadratic form in A : (`((X + AY) 2 16'), the question arises that for each ),the quantity is defined only up to a null set N;,, and the union of these overall A cannot be ignored without comment . The reader is advised to think thisdifficulty through to its logical end, and then get out of it by restricting theA's to the rationals . Here is another way out : start from the following trivialinequality :IXIIYI X 2 y2up- 2a2 + 2fl 2 'where a = (X 2 1 6)12 , ,B = C~(Y 2 16) 1 / 2 , and a,B>0 ; apply the operation{- 16} using (ii) and (iii) above to obtainIXYIca,81 X 2 1 Y ~ 2< 2 c( a2 l~ + ZenNow use Theorem 9 .1 .3 to infer that this can be reduced toufi2 2{IXYIIf}< - a2 +2 2=1,the desired inequality .The following theorem is a generalization of Jensen's inequality in Sec . 3 .2 .


3 1 8 1 CONDITIONING . MARKOV PROPERTY . MARTINGALETheorem 9 .1 .4 . If (p is a convex function on Ml and X and (p(X) are integrabler.v .'s, then for each ~ :(11) (p(~ (X I s)) < ((P (X) I {)PROOF. If X is a simple r .v . taking the values {yj) on the sets {A j ), 1


9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 3 1 9where cP' is the right-hand derivative of co . Hence(P(X) - (p((r`'(X I ~)) >_ ~P'(~`(X I ~)){X - ~`(X )~The right member may not be integrable; but let A = {w : I (r'(X I ) I < A}for A>0 . Replace X by X 1 Ain the above, take expectations of both sides, andlet A T oc. Observe that&`{1P(XlA) I -67 1 = C{cP(X)lA +(p(0)1A' I ~} = ~'{ cP(X) I I}ln +cP(0)lAc .We now come to the most important property of conditional expectationrelating to changing fields . Note that when A = Q, the defining relation (6)may be written asThis has an immediate generalization .051((-rle(Y)) = (1-5r (Y) (Y))-Theorem 9 .1 .5 .If Y is integrable and 31 C ~z, then(13) (Y) _ (Y) if and only if c (Y) E ;and(14) ~(c (Y)) = ( (Y) _ (~5 (~ (Y))-PROOF . Since Y satisfies trivially the defining relation for c'- (Y I 1 ), itwill be equal to the latter if and only if Y E 31 . Now if we replace our basicby ~Z and Y by : (Y), the assertion (13) ensues . Next, sincec`;, (Y) E 1 C 2,the second equation in (14) follows from the same observation . It remainsto prove the first equation in (14) . Let A E ~i71 , then A E 12 ; applying thedefining relation twice, we obtain14((`~(Y))dJT = f c~nJA (Y)dUl`J Yd~ .JAHence e~, (~ (Y)) satisfies the defining relation for c% (Y) ; since it belongsto T1 , it is equal to the latter .As a particular case, we note, for example,(15) {c`(Y I X1, X2) I X1} = c%~(Y I X1) = (f{(=` (Y I X1) I X1, X2}To understand the meaning of this formula, we may think of X 1 and X2 asdiscrete, each producing a countable partition . The superimposition of both


3 2 0 I CONDITIONING . MARKOV PROPERTY . MARTINGALEpartitions yields sets of the form {A j fl Mk} . The "inner" expectation on theleft of (15) is the result of replacing Y by its "average" over each Ajfl Mk .Now if we replace this average r .v. again by its average over each A j , thenthe result is the same as if we had simply replaced Y by its average over eachA j . The second equation has a similar interpretation .Another kind of simple situation is afforded by the probability triple(Jlln, J3", m") discussed in Example 2 of Sec . 3 .3 . Let x1, . . . , xn be the coordinater .v .'s . . . = f (x 1 , . . . , x n ), where f is (Borel) measurable and integrable. It is easy to see that for 1 < k < n - 1,P1e(YIxI, . . .,xk)= J . ..J0 0PIf(xl, . . .,xn)dxk+l . . .dxn ,while for k = n, the left side is just y (a .e .) . Thus, taking conditional expectationwith respect to certain coordinate r .v.'s here amounts to integrating outthe other r.v .'s . The first equation in (15) in this case merely asserts the possibilityof iterated integration, while the second reduces to a banality, which weleave to the reader to write out .EXERCISES1 . Prove Lemma 2 in Sec . 7 .2 by using conditional probabilities .2 . Let [A,,) be a countable measurable partition of 0, and E E 3 with~P(E)>O ; then we have for each m :1:"'(A . )EA m (E)~E(A .) _ E_1~6(An)r"A„(E)n[This is Bayes' rule .]*3 . If X is an integrable r .v ., Y a bounded r .v., and ' a Borel subfield,then we have{e(X I S)Y} _ '{X (Y I )}4. Prove Fatou's lemma and Lebesgue's dominated convergence theoremfor conditional expectations .*5. Give an example where (-"(("(y I X 1 ) I X2) (c (Y I X 2 ) I X1)-[HINT : It is sufficient to give an example where (t (X I Y) =,4 ({(r'(X I Y) I X} ;consider an Q with three points .]* 6. Prove that a 2 (c (Y)) < a2 (y), where a2 is the variance .7. If the random vector has the probability density function p( •, •) andX is integrable, then one version of ('V I X + Y = z) is given byfxp(xz , - x) dx/Ip(x, z - x) dx .


9 .1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 1 321*8 . In the case above, there exists an integrable function cp( •, •) with theproperty that for each B E A1 ,f ~p(x, y) d yBis a version of 7'{Y E B I X = x} . [This is called a "conditional density functionin the wide sense" andnco (x, y) d yfoois the corresponding conditional distribution in the wide sense :cp(x, n) = GJ'{Y < r? I X = x} .The term "wide sense" refers to the fact that these are functions on R1 ratherthan on 0 ; see Doob [1, Sec . I .9] .]9. Let the p( ., •) above be the 2-dimensional normal density :1 1 x 2 2pxy y 2exp - 2 2 - + 22rra1Q2\/1 - p 2 2(1 - P) c1 910'2 0'2where a1>0, 62>0, 0 < p < 1 . Find the qo mentioned in Exercise 8 and(x, y) dy •foo y~pThe latter should be a version of d (Y I X = x) ; verify it.10 . Let ~ be a B .F., X and Y two r .v .'s such that(Y 2 I-C_/) = X 2 , (Y I ) = X .Then Y = X a .e .11. As in Exercise 10 but suppose now for any f E CK ;{X 2 I f (X )j = (,,{Y2 I f (X)}; M I f (X )} = e{Y I f (X )}Then Y = X a .e . [mNT : By a monotone class theorem the equations hold forf = I B , B E 276 1 ; now apply Exercise 10 with = 3T{X} .]12. Recall that X, in L 1 converges weakly in L 1 to X iff 1(X, Y) -a~(XY) for every bounded r .v . Y . Prove that this implies c`'(X„ 16') convergesweakly in L 1 to (,(X 1 -,6") for any Borel subfield E of 3T .*13. Let S be an r .v . such that 7 S > t) = e - `, t>0 . Compute d'{S I SAt}and c`{S I S v t} for each t > 0 .


3 22 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE9.2 Conditional independence ; Markov propertyIn this section we shall first apply the concept of conditioning to independentr .v .'s and to random walks, the two basic types of stochastic processes thathave been extensively studied in this book ; then we shall generalize to aMarkov process . Another important generalization, the martingale theory, willbe taken up in the next three sections .All the B .F.'s (Borel fields) below will be subfields of 9 . The B .F.'s{tea , a E A), where A is an arbitrary index set, are said to be conditionally independentrelative to the B .F. ;!~_, iff for any finite collection of sets A 1 , . . . , A,such that A j E : and the a j 's are distinct indices from A, we haveWhen is the trivial B .F., this reduces to unconditional independence .Theorem 9 .2 .1 . For each a E A let 3TM denote the smallest B .F . containingall , 8 E A - {a} . Then the 3's are conditionally independent relative tof~ , ' if and only if for each a and A a E " we have91 (Aa I (a ) V --C/- ) = -OP (Aa I ),where ZT (" ) V denotes the smallest B .F . containing ~7 (1) and -'6 .PROOF . It is sufficient to prove this for two B .F .'s 1 and , since thegeneral result follows by induction (how?) . Suppose then that for each A E 1we have(1) ~(A I J ) _ (A I )Let M E , then9)(AM I ) _ {~(AM I V ) I '} = {,'~P(A I V §)1M I }I )1M I fi~ } = ~(A I )?)'(M I ),where the first equation follows from Theorem 9 .1 .5, the second and fourthfrom Theorem 9 .1 .3, and the third from (1) . Thus j and are conditionallyindependent relative to' . Conversely, suppose the latter assertion is true, then{J'(A I )1M I } = J$(A I ) ~'(M I )= .--/'(AM I fl ) _ (`{U~'(A I ~~ V ~')1M I ~},


9.2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 3 2 3where the first equation follows from Theorem 9 .1 .3, the second by hypothesis,and the third as shown above . Hence for every A Ewe have?-)(A I ~) d J~' _ f J/'(A I J~ v fi) dJ' = T(AMA)MDMAIt follows from Theorem 2 .1 .2 (take ~ to be finite disjoint unions of sets likeMA) or more quickly from Exercise 10 of Sec . 2 .1 that this remains true ifMA is replaced by any set in ~ v ;6 . The resulting equation implies (1), andthe theorem is proved .When -~' is trivial and each ~ti" is generated by a single r .v ., we have thefollowing corollary .Corollary . Let {X", a E A} be an arbitrary set of r .v .'s. For each a let .~(a)denote the Borel field generated by all the r.v .'s in the set except X" . Thenthe X"'s are independent if and only if : for each a and each B E %31, we have{X" E B 13 (")} = A{X" E B} a .e .An equivalent form of the corollary is as follows : for each integrable r .v .Y belonging to the Borel field generated by X", we have(2) cF{Y I ~(")} = c{Y} .This is left as an exercise .Roughly speaking, independence among r.v .'s is equivalent to the lackof effect by conditioning relative to one another . However, such intuitivestatements must be treated with caution, as shown by the following examplewhich will be needed later .If ~T1, 1-, and ' are three Borel fields such that ., l V .%; is independentof , then for each integrable X E we have(3) {XI~2V%$}_«{XI } .Instead of a direct verification, which is left to the reader, it is interestingto deduce this from Theorem 9 .2 .1 by proving the following proposition .If : T v .~ is independent ofthen :~-hl and /S are conditionally independentrelative to % .To see this, let A1 E T, A E. Since/'(A1A2A3) = .P(A1A )J~(A3) = f(Ai J/',I `~z)'~'(A3)d'~'for every A2 E . , we have(AI A3 I '~z) = J/'( A1 I ;z )'' (A3) = °/'(A1 I Iwhich proves the proposition .


3 24 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEand X2 are independent, what is the effect of condi-Next we ask : if X 1tioning X 1 + X 2 by X 1 ?Theorem 9 .2 .2 . Let X1 and X2 be independent r .v .'s with p .m .'s µl and µ2 ;then for each B E %3 1 :(4) ;/'{X 1 + X2 E B I X 1 } = µ2(B - X 1 ) a .e .More generally, if {X,,, n > 1) is a sequence of independent r.v .'s with p .m .'s{µn , n > 1), and Sn = 2~=1 X j , then for each B E Al(5) A{S n E B I S1, . . . , Sn-1 } _ pn (B - Sn-1) = G1'{Sn E B I Sn-1 } a .e .PROOF . To prove (4), since its right member belongs to :7 {X 1 }, itis sufficient to verify that it satisfies the defining relation for its leftmember. Let A E Z {X1 }, then A = X1 1 (A) for some A E A 1 . It follows fromTheorem 3 .2 .2 thatµ2(B - X1)d9l = [2(B µ-xi)µl(dxl) .fA AWriting A = Al X A2 and applying Fubini's theorem to the right side above,then using Theorem 3 .2.3, we obtainfµl (dxl)A fi+x2EBA2(dx2) = ff µ(dxl, dx2)x, EAx1 +x2EB= dJ' = ~{X1 E A ; X, +X 2 E B} .JX~EAX1 +X2EBThis establishes (4) .To prove (5), we begin by observing that the second equation has just beenproved . Next we observe that since {X 1 , . . . , X,) and {S1, . . . , S, } obviouslygenerate the same Borel field, the left member of (5) is justJq'{Sn = B I X1, . . . , Xn-1 } .Now it is trivial that as a function of (XI, . . ., Xi_ 1 ), S, "depends on themonly through their sum S,_," . It thus appears obvious that the first term in (5)should depend only on Sn _ 1 , namely belong to ~ JS,,_,) (rather than the larger:'f {S1, . . . , S i _ 1 }) . Hence the equality of the first and third terms in (5) shouldbe a consequence of the assertion (13) in Theorem 9 .1 .5 . This argument,however, is not rigorous, and requires the following formal substantiation .Let µ(") = Al x . . .X An = A(n-1) X An andn-1A = n S 71 (Bj ),1= 1


9 .2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 1 32 5where Bn = B . Sets of the form of A generate the Borel field 7{S, . . . , S"-I } .It is therefore sufficient to verify that for each such A, we havef nAn (Bn - Sn-1) d_,11 = '{A ; Sn E Bn } .If we proceed as before and write s, = E j=, xj ,the left side above is equal toAn (Bn - Sn-1)l (n-1) (dx1, . . . , dxn-1)SjEBj,I


326 1 CONDITIONING . MARKOV PROPERTY. MARTINGALE{0 < S„ < A I S1, . . . , S„-1 } d.JT=[F n (A - S„-1) - F n (0 - Sn-1 )] d GT,where Fn is the d .f. Of A n . [A quirk of notation forbids us to write theintegrand on the right as T {-Sn _ i < Xn < A - S, 1 1!] Now for each co o inAn-1, S n-1(wo)>O, henceF n (A - Sn-I((oo)) < :flXn < A) < 1 - 8by hypothesis . It follows that< f (1 - 8) dam' = (I - 8) GJ'{An-1 },12-1and the first assertion of the theorem follows by iteration . The second is provedsimilarly by observing that A{X n + • • • +Xn+m-1 > mA) > 8m and choosingm so that mA exceeds the length of I . The details are left to the reader .Let N° = {0} U N denote the set of positive integers . For a given sequenceof r .v .'s {X n , n E N° } let us denote by 1 the Borel field generated by {Xn , n EI}, where I is a subset of N ° , such as [0, n], (n, oc), or {n} . Thus {n), [o,,],and have been denoted earlier by ti {X n }, 9n , and respectively .DEFINITION OF MARKOV PROCESS . The sequence of r .v .'s {Xn , n E N ° ) issaid to be a Markov process or to possess the "Markov property" iff forevery n E N° and every B E A 1 , we have(6) 2{X,+1 (E B I X o , . . . , Xn) = ~P{Xn+1 E B I X n } .This property may be verbally announced as : the conditional distribution(in the wide sense!) of each r .v . relative to all the preceding ones is the sameas that relative to the last preceding one . Thus if {Xn } is an independentprocess as defined in Chapter 8, then both the process itself and the processof the successive partial sums {S n } are Markov processes by Theorems 9 .2 .1and 9 .2 .2. The latter category includes random walk as a particular case ; notethat in this case our notation X n rather than Sn differs from that employed inChapter 8 .Equation (6) is equivalent to the apparently stronger proposition : forevery integrable Y E ~T(n+ 1), we have(6) ({Y I X1, . . ., X n } = (`{Y I X n } .It is clear that (6) implies (6) . To see the converse, let Ym be a sequence ofsimple r .v .'s belonging to :'%{n+1 ) and increasing to Y . By (6) and property (ii)


9 .2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY I 3 27of conditional expectation in Sec . 9 .1, (6') is true when Y is replaced by Ym ;hence by property (v) there, it is also true for Y .The following remark will be repeatedly used . If Y and Z are integrable,andZ E :'1,for each A of the formf Yd .JA= Zd9'n fnn X . 1 (Bj),j EIouwhere I0 is an arbitrary finite subset of I and each B j is an arbitrary Borel set,then Z = (-r'(Y I d1) . This follows from the uniqueness of conditional expectationand a previous remark given before Theorem 9 .1 .3, since finite disjointnions of sets of this form generate a field that generates 3 I .If the index n is regarded as a discrete time parameter, as is usual inthe theory of stochastic processes, then 9 7[o ,,2 ] is the field of "the past and thepresent", whileis that of "the future" ; whether the present is adjoinedto the past or the future is often a matter of convenience . The Markov propertyjust defined may be further characterized as follows .Theorem 9 .2 .4 . The Markov property is equivalent to either one of the twopropositions below :(7) Vn E N, M E '{M I ~T[O,n]} = lM I X,, 1 .(8) `dn E N, M1 E ~T[O,,i ], M2 E :( n , oo ) : b ° {MlM2 I Xn}= °J'{M1 I Xn }2flM2 I X n }These conclusions remain true if (,i , oo ) is replaced byPROOF . To prove (7) implies (8), let Y ; = 1M,, i = 1, 2 . We then have(9) 7'{M1 I Xn} 7{M2 I Xn} _ (`{Y1 I Xn}F~{Y2 I Xn}_ "{Y1 (`(Y2 I Xn ) I X,,_ (t {Y1 (^(Y2 I -[0,n]) I Xn}{(°(Y1Y2 13X[0,n]) IXn}_ ~J{Y1Y2 I Xn} = J'{M1M2 I X, J,where the second and fourth equations follow from Theorems 9 .1 .3, the thirdfrom assumption (7), and the fifth from Theorem 9 .1 .5 .Conversely, to prove that (8) implies (7), let A E :'{ n }, M1 E :'h[o, n ), M2 EBy the second equation in (9) applied to the fourth equation below,J~,,, ocl .


3 2 8 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEwe havef ?P (M2 I Xn)d~'= f i(Y2 IXn)dgl = f Y1~^(Y2 1 Xn)d_'AMA AM, A=I f c {Y1~(Y2 I Xn) Xn }d-'A_ ~ ('-(Y1 Xn)e(Y2 IAXn )d-OP= f P(M 1 I Xn)~(M2 I X.) dJ'A= / '(M1M2 IXn ) d~ = ~'(AM1M2) .ASince disjoint unions of sets of the form AM, as specified above generate theBorel field [o ,n ], the uniqueness of °J' (M2 I o,n])shows that it is equal toGJ'(M2 I X,), proving (7) .Finally, we prove the equivalence of the Markov property and the proposition(7) . Clearly the former is implied by the latter ; to prove the conversewe shall operate with conditional expectations instead of probabilities and useinduction. Suppose that it has been shown that for every n E N, and everybounded f belonging to 9[n+l,n+k], we have( 10 ) ~(f ( [O,n]) = e(f I -'1"fn}) .This is true fork = 1 by (6') . Let g be bounded, g E 9[n+1,n+k+1] ; we are goingto show that (10) remains true when f is replaced by g . For this purpose itis sufficient to consider a g of the form 9 1 9 2 , where gi E 9[n+1,n+k], 92 E~T{n+k+1}, both bounded. The successive steps, in slow motion fashion, are asfollows :e-19 1 9 [0,,,]) _ ({ (00 (g I {O,n+k]) I [o,n]} _(_ g7 g7{91e(92 I [o,n+k]) I [O,n]}_ {gl ( '(g2 I ~n+k}) I - O,n] } = S{g1 (g2 I -' 1'tn+k}) I -n} }= U191(-r(92 I 37[n,n+k]) I Vin)} = e{1(8192 I O'(n,n+k]) I {n}}~"'{g1g2 1 9{n}} = 0{g 1 9{n}} •It is left to the reader to scrutinize each equation carefully for explanation,except that the fifth one is an immediate consequence of the Markov property(n{92I ~{n+k}} = &`{g2 Io[O,n+k]} and (13) of Sec . 9 .1 . This establishes (7) forM E Uk° 1 which is a field generating .X(n , w ) . Hence (7) is true (why?) .The last assertion of the theorem is left as an exercise .


9.2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 329The property embodied in (8) may be announced as follows : "The pastand the future are conditionally independent given the present" . In this formthere is a symmetry that is not apparent in the other equivalent forms .The Markov property has an extension known as the "strong Markovproperty". In the case discussed here, where the time index is N°, it is anautomatic consequence of the ordinary Markov property, but it is of greatconceptual importance in applications . We recall the notions of an optionalr .v . a, the r .v . X a , and the fields J~a and T-', which are defined for a generalsequence of r .v .'s in Sec . 8 .2 . Note that here a may take the value 0, whichis included in N ° . We shall give the extension in the form (7) .Theorem 9 .2 .5 . Let {X n , n E N°} be a Markov process and a a finiteoptional r .v . relative to it . Then for each M E ~'7' a we have(11) GJ'{M I ?/;,} = Y{M I a,Xa} .PROOF . Since a E a and Xa Era (Exercise 2 of Sec . 8 .2), the rightmember above belongs to Via . To prove (11) it is then sufficient to verifythat the right member satisfies the defining relation for the left, when M is ofthe formen{Xa+j E Bj},j=1Put for each n,j°INow the crucial step is to show thatPB j E ', 1 < j < < f < 00 .M,1 = n{Xn+ j E Bj) E ~(n oo ) .(12) J~{M~i I Xn} 1 {a=n} = ~flM I a, X" 1 .n=0By the lemma in the proof of Theorem 9 .1 .2, there exists a Bore] measurablefunction cp„ such that %/'{M„ I X 71 } = cp„ (X 11 ), from which it follows that theleft member of (12) belongs to the Borel field generated by the two .r .v .'s aand Xa . Hence we shall prove (12) by verifying that its left member satisfiesthe defining relation for its right member, as follows . For each m E N andB E ,7A, 1 , we have{a=m :X,, EB}J~{Mn In=0X 1, [a=,,) d JP= { a=m :X,,,a=m :X,,, EB}J ~{Mm IX,n} d~'l'fa=m:XEB}JP{a = m ; Xm E B ; Mm }_ :~'{a = m ;X a E B ; M},


3 3 0 1 CONDITIONING. MARKOV PROPERTY . MARTINGALEwhere the second equation follows from an application of (7),from the optionality of a, namelyThis establishes (12) .Now let A E :Tq,{a = m} Ethen [cf. (3) of Sec . 8 .2] we have00A=U{(a= n) f1 A,1},n=Owhere A„ E :T[o,,,] . It follows that00{AM} = J,P{a = n ; An ; Mn }11=°00~P{Mn In=0 (c =n )nA,and the third} d `'7'0C11=0 AJ~{M,1 1 Xn}1{a=n}d P= f 2P{M I a,Xa }dG ,where the third equation is by an application of (7) while the fourth is by (12) .This being true for each A, we obtain (11) . The theorem is proved .When a is a constant n, it may be omitted from the right member of(11), and so (11) includes (7) as a particular case . It may be omitted also inthe homogeneous case discussed below (because then the cp„ above may bechosen to be independent of n ) .There is a very general method of constructing a Markov process withgiven "transition probability functions", as follows . Let P0( . ) be an arbitraryp.m . on (a 1 3 1 ) For each n > 1 let P1,( ., •) be a function of the pair (x, B)where x E _W1 and B E '~3 1 , having the measurability properties below :(a) for each x, P„ (x, •) is a p.m . on ~~1 ;(b) for each B, P„ ( •, B) J3 ' EIt is a consequence of Kolmogorov's extension theorem (see Sec . 3 .3) thatthere exists a sequence of r .v .'s {X,,, n E N ° } on some probability spacewith the. following "finite-dimensional joint distributions" : for each 0 < ~


9 .2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 33 1above to be mil, and that the resulting collection of p .m .'s are mutually consistentin the obvious sense [see (16) of Sec . 3 .3] . But the actual construction ofthe process {Xn , n E N ° ) with these given "marginal distributions" is omittedhere, the procedure being similar to but somewhat more sophisticated than thatgiven in Theorem 3 .3 .4, which is a particular case . Assuming the existence,we will now show that it has the Markov property by verifying (6) briefly .By Theorem 9 .1 .5, it will be sufficient to show that one version of the leftmember of (6) is given by Pn+1(Xn, B), which belongs to ~T{n) by condition(b) above . Let thennA = n[XJ E Bj]j=°and (n+ 1) be the (n + 1)-dimensional p .m. of the random vector(Xo, . . . , X n ) . It follows from Theorem 3 .2 .3 and (13) used twice thatf Pn+1(Xn, B) d-1, = f . . .fPfl+ i(X fl ,B)d'I+ µ ( unB o x . . .xB,f . . . fn+1Po(dxo)11 Pj(xj-1, dxj)B 0 x . . .xB n xB j=1_ _,"P(A ; Xn+1 E B) .This is what was to be shown .We call P° ( .) the initial distribution of the Markov process and P n (, )its "nth-stage transition probability function" . The case where the latter is thesame for all n > 1 is particularly important, and the corresponding Markovprocess is said to be "(temporally) homogeneous" or "with stationary transitionprobabilities" . In this case we write, with x = x0 :(14)P (») (x B)= f. . . f HP(xj, dxj+1),WlX . . .xVXBj =oand call it the "n-step transition probability function" ; when n = 1, the qualifier"1-step" is usually dropped . We also put P (°) (x, B) = 1B(x) . It is easy tosee that(15) p(''+l)(x, B) =fp(n)(y, B)P (1) (x, dy),so that all p(n) are just the iterates of P (l) .It follows from Theorem 9 .2.2 that for the Markov process {S n , n E N)there, we haveP n (x, B) = µn (B - x) .


332 I CONDITIONING . MARKOV PROPERTY . MARTINGALEIn particular, a random walk is a homogeneous Markov process with the 1-steptransition probability functionP' (x, B) = µ(B - x) .In the homogeneous case Theorem 9 .2.4 may be sharpened to read likeTheorem 8 .2 .2, which becomes then a particular case . The proof is left as anexercise .Theorem 9.2 .6 . For a homogeneous Markov process and a finite r .v. a whichis optional relative to the process, the pre-a and post-a fields are conditionallyindependent relative to Xa , namely :VA E 3~ , M E .fat : 9{AM I Xa } = OPJA I Xa }GJ'{M I Xa} •Furthermore, the post-a process {X a+n , n E N} is a homogeneous Markovprocess with the same transition probability function as the original one .Given a Markov process {X n , n E N°), the distribution of X° is a p .m.and for each B E A1 , there is according to Theorem 9 .1 .2 a Borel measurablefunction cn ( ., B) such that7 {Xn+1 E B I Xn = x} = con (x, B) .It seems plausible that the function (p n ( . , .) would correspond to the n-stagetransition probability function just discussed . The trouble is that while condition(b) above may be satisfied for each B by a particular choice of cn ( •, B), itis by no means clear why the resulting collection for varying B would satisfycondition (a) . Although it is possible to ensure this by means of conditionaldistributions in the wide sense alluded to in Exercise 8 of Sec . 9 .1, we shallnot discuss it here (see Doob [16, chap . 2]) .The theory of Markov processes is the most highly developed branchof stochastic processes . Special cases such as Markov chains, diffusion, andprocesses with independent increments have been treated in many monographs,a few of which are listed in the Bibliography at the end of the book .EXERCISES*1. Prove that the Markov property is also equivalent to the followingproposition : if t 1 < . . . < to < to+1 are indices in N° and B j , 1 < j < n + 1,are Borel sets, then'~Pi {Xtn+l E Bn+1 I X tl , . . . , X t, } = g '{Xto+l E Bn+ 1 I X t o }In this form we can define a Markov process (X t )t ranging in [0, oo) .with a continuous parameter


9 .2 CONDITIONAL INDEPENDENCE ; MARKOV PROPERTY 1 3 3 32. Does the Markov property imply the following (notation as inExercise 1) :ZT{Xn+1 E Bn+1 I X1 E B1,X,, E Bn } = ~'{Xn+1 E Bn+1 IXr, E B, J?3. Unlike independence, the Markov property is not necessarily preservedby a "functional" : { f (X,,), n E N°} . Give an example of this, but show that itis preserved by each one-to-one Borel measurable mapping f .*4. Prove the strong Markov property in the form of (8) .5. Prove Theorem 9 .2 .6 .*6. For the p(n) defined in (14), prove the "Chapman-Kolmogorovequations" :Yin E N, n E N : P ('+' ) (x, B) =fPt m) (x, dy)P (n) (y, B) .u 17. Generalize the Chapman-Kolmogorov equation in the nonhomogeneouscase .*8. For the homogeneous Markov process constructed in the text, showthat for each f > 0 we have{f (Xm+n) I Xm} = f f (y)P(n)(Xm, dy) .* 9. Let B be a Borel set, f 1(x, B) = P(x, B), and define f , for n > 2inductively byf n (x, B)= fBCdy)f n-1(y, B) ;BCput f (x, B) = E', f n (x, B) . Prove that f (X,,, B) is a version of theconditional probability J'{UO_ n +1 [X j E B] I Xn } for the homogeneous Markovprocess with transition probability function P( ., •) .*10. Using the f defined in Exercise 9, put00g(x,B)=f(x,B)-~ f P(n)(x,dy)[1-f(y,B)]n=1 fProve that g(X,,, B) is a version of the conditional probability:fllim sup[X J E B] I X„ } .J* 11 . Suppose that for a homogeneous Markov process the initial distributionhas support in N ° as a subset of %~ 1 , and that for each i c N°, the


3 3 4 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEtransition probability function P(i, •) , also has support in N° . Thus{P(i, j) ; (i, j) E N ° x N° )is an infinite matrix called the "transition matrix" . Show that P ( n ) as a matrixis just the nth power of P ( l ) . Express the probability ?7{X tk = i x , 1 < k < n}in terms of the elements of these matrices . [This is the case of homogeneousMarkov chains .]12 . A process {Xn , n E N°} is said to possess the "rth-order Markovproperty", where r > 1, iff (6) is replaced by{Xn+1 E B I X 0 , . . . , Xn } = i{X n11 E B I X n , . . . , Xn-r+1}for n > r - 1 . Show that if r < s, then the rth-order Markov property impliesthe sth . The ordinary Markov property is the case r = 1 .13 . Let Y n be the random vector (X n , Xn+l, • • • , Xn+r-1 ) . Then thevector process {Y n , n E N ° } has the ordinary Markov property (triviallygeneralized to vectors) if and only if {X n , n E N ° ) has the rth-order Markovproperty .14 . Let {X n , n E N°) be an independent process . LetnnSnl)_j=0j=0Jfor r > 1 . Then {S ;,r ) , n E N ° } has the rth-order Markov property . For r = 2,give an example to show that it need not be a Markov process .15 . If {S n , n E N} is a random walk such that GJ'{S1 :A 0)>O, then forany finite interval [a, b] there exists an E < 1 such thatJ'{S1 E [a, b], 1 < j < n } < c" .This is just Exercise 6 of Sec . 5 .5 again .]16 . The same conclusion is true if the random walk above is replacedby a homogeneous Markov process for which, e .g ., there exist 6>0 and ri>Osuch that P(x, ~? 1 - (x - 6, x + 8)) > ri for every x .9.3 Basic properties of smartingalesThe sequence of sums of independent r .v .'s has motivated the generalizationto a Markov process in the preceding section ; in another direction it will nowmotivate a martingale . Changing our previous notation to conform with later


9 .3 BASIC PROPERTIES OF SMARTINGALES I 335usage, let {x,,, n E N} denote independent r .v .'s with mean zero and writeX, _ > j -1 x j for the partial sum. Then we haveJ (X,+1 I x1, . . . , xn) = c"~ (X,7 + xn+1 I X1, . . . , X17)=X n +("' (xn+I IX1, . . .,xn)=Xn +cf(x„+1)=Xn .Note that the conditioning with respect to x1, . . . , x„ may be replaced byconditioning with respect to X 1 , . . . , X,, (why?) . Historically, the equationabove led to the consideration of dependent r .v .'s {x„ } satisfying the condition(1)C'- (xn+1 I x1, . . . , xn) = 0 .It is astonishing that this simple property should delineate such a useful classof stochastic processes which will now be introduced . In what follows, wherethe index set for n is not specified, it is understood to be either N or someinitial segment Nn, of N .DEFINITION OF MARTINGALE . The sequence of r .v.'s and B .F .'s {X,,, ' } iscalled a martingale iff we have for each n :(a) J~„ C ~n+1 and X n E ~ ;(b) A IXn1) < 00 ;(c) Xn = (Xn+1 I gin), a .e .It is called a supermartingale iff the "=" in (c) above is replaced by ">", anda submartingale iff it is replaced by "


336 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEAn equivalent form of (3) is as follows : for each AE 3' and n < m, we have(4) fX n dP=fXm dA . :It is often safer to use the explicit formula (4) rather than (3), because conditionalexpectations can be slippery things to handle . We shall refer to (3) or(4) as the defining relation of a martingale ; similarly for the "super" and "sub"varieties .Let us observe that in the form (3) or (4), the definition of a smartingaleis meaningful if the index set N is replaced by any linearly ordered set, with"


9.3 BASIC PROPERTIES OF SMARTINGALES 1 33 7PROOF . For a martingale we have equality in (5) for any convex cp, hencewe may take cp(x) = Ix 1, Ix I P or Ix I log' Ix I in the proof above .Thus for a martingale IX,), all three transmutations : {X; }, {X ;, } and{ 1X„ I } are submartingales . For a submartingale {X n }, nothing is said about thelast two .Corollary 3 . If {Xn , 3 n } is a supermartingale, then so is {X, A A, ~, n } whereA is any constant .PROOF . We leave it to the reader to deduce this from the theorem, buthere is a quick direct proof:(6)X„ AA > f(Xn+1 i 7n) A e` (A 1 Jan) > (r (Xn+1 AA I ~T n)It is possible to represent any smartingale as a martingale plus or minussomething special . Let us call a sequence of r .v .'s {Z,,, n E N) an increasingprocess iff it satisfies the conditions :( i) Z1 = 0 ; Z n < Zn+1 for n > 1 ;(ii) (-'C- (Z„) < oo for each n .It follows that Z, = T Z n exists but may take the value +oo ; Z,,.is integrable if and only if {Z„ } is L 1 -bounded as defined above, whichmeans here f (Z,) < oo . This is also equivalent to the uniformintegrability of {Z,,) because of (i) . We can now state the result as follows .Theorem 9 .3 .2 . Any submartingale {X n , Jan } can be written asXn = Yn +Zn,where {Y n , ~, } is a martingale, and {Z„ } is an increasing process .PROOF . From {X„ } we define its difference sequence as follows :(7) x1 = X1, x,, = X„ - Xn-1, n > 2,so that X,, = E~=1 x j , n > 1 (cf. the notation in the first paragraph of thissection) . The defining relation for a submartingale then becomesc" {Xn I ~ - 1) >- 0 ,with equality for a martingale . Furthermore, we putnY1 = X1, yn = x n - ~, {xn Y„ = yj'j=1nZ1 = 0, Zn Zn = Zj .j=1


338 1 CONDITIONING . MARKOV PROPERTY. MARTINGALEThen clearly xn = yn + z, and (6) follows by addition . To show that {Y, , n }is a martingale, we may verify that c{yn I ~n _ 1 } = 0 as indicated a momentago, and this is trivial by Theorem 9 .1 .5 . Since each zn > 0, it is equallyobvious that (Z, J is an increasing process . The theorem is proved .Observe that Z n E 9n _ 1 for each n, by definition . This has importantconsequences ; see Exercise 9 below . The decomposition (6) will be calledDoob's decomposition . For a supermartingale we need only change the "+"there into "-", since {- Y, n } is a martingale . The following complementis useful .Corollary . If {Xn } is L 1 -bounded [or uniformly integrable], then both {Y n }and {Zn } are L 1 -bounded [or uniformly integrable] .PROOF. We have from (6) :(9(Zn) _


9.3 BASIC PROPERTIES OF SMARTINGALES 1 339where again {a < n } may be replaced by {a = n } . Writing then(10) An = A fl {a = n),we have An E J andA=UAn=U[{a=n}nAn]nnwhere the index n ranges over Nom . This is (3) of Sec . 8 .2 . The reader shouldnow do Exercises 1-4 in Sec . 8 .2 to get acquainted with the simplest propertiesof optionality . Here are some of them which will be needed soon :.~a isa B .F. and a E Via ; if a is optional then so is aAn for each n E N ; if a < pwhere 8 is also optional then 3a c ~f ; in particular Mann c ~a n On and infact this inclusion is an equation .Next we assume X,,,, has been defined and X,,, E 3 ; ;,, . We then define Xaas follows :(11) Xa(C)) =Xa(cv)(w) ;in other words,X"' (w) = X" ((0) on {a = n}, n e N, .This definition makes sense for any a taking values in Nom, but for an optionala we can assert moreover that(12) Xa EThis is an exercise the reader should not miss ; observe that it is a naturalbut nontrivial extension of the assumption X11 E 2X-n for every n . Indeed, allthe general propositions concerning optional sampling aim at the same thing,namely to make optional times behave like constant times, or again to enableus to substitute optional r .v .'s for constants . For this purpose conditions mustsometimes be imposed either on a or on the smartingale {Xn } . Let us howeverbegin with a perfect case which turns out to be also very important .We introduce a class of martingales as follows . For any integrable r .v . Ywe put(13) Xn = (n-(Y I ~7.), n E Noo .By Theorem 9 .1 .5, if n < m :(14) Xn = ("{c (Y I ~m) ~ gn} = L{Xm Win}which shows {Xn, Jan } is a martingale, not only on N but also on Nom . Thefollowing properties extend both (13) and (14) to optional times .


340 1 CONDITIONING . MARKOV PROPERTY. MARTINGALETheorem 9 .3 .3 . For any optional a, we have(15) Xa = e (Y I ZTa) .If a < 8 where ,B is also optional, then {Xa , ~a ;X,6 , } forms a two-termmartingale .PROOF. Let us first show that Xa is integrable . It follows from (13) andJensen's inequality thatIXnI ::s (IYI I Vin)Since {a = n }E „, we may apply this to getIX,, IdA f - IX, IdA


9 .3 BASIC PROPERTIES OF SMARTINGALES 1 34 1because A j E ~/7Fj C ~Jrk, whereas {,B > k} = {,B < k}` E . It follows from thedefining relation of a supermartingale thatand consequentlyI X k dJ' > Xk+1 d .I'A ;n{P>k}A ;n{fi>k}A n >k {~_ }Rewriting this asXk d' > fA J n{ fi=k}JXk dA+ An{fi>k}Xk+1 d'l'Xk d~' - Xk+l d A > X~ d~ ;n1n{~>k} A ;n{B>-k+l} A ;n{~=k}summing over k from j to m, where m is an upper bound for 8 ; and thenreplacing X j by Xa on A j , we obtain(17) X a dJ' - Xm+1 dam' > X fi d' .IA 3n{ > '} JA 1n{ >m+1} A-n{j


3 42 1CONDITIONING. MARKOV PROPERTY . MARTINGALERemark . For a martingale {X n , urn ; n E N,,) this theorem is containedin Theorem 9 .3 .3 since we may take the Y in (13) to be X,,,, here .PROOF . (a) Suppose first that the supermartingale is positive with X," = 0a .e . The inequality (17) is true for every m E N, but now the second integralthere is positive so that we have~ X a dJ' > Xfi d A .JA IA . n(,8 X~ &JAn{a X,, by the defining property of supermartingale applied to X„ and X,,,, .Hence the difference IX,,, n E N) is a positive supermartingale with X"0 a .e. By Theorem 9 .3 .3, {X a, X , %}f~forms a martingale ; by case oc(a),{Xa, :Via ; X~, forms a supermartingale . Hence the conclusion of the theoremfollows simply by addition .The two preceding theorems are the basic cases of Doob's optionalsampling theorem . They do not cover all cases of optional sampling (see e .g .


9 .3 BASIC PROPERTIES OF SMARTINGALES ( 343Exercise 11 of Sec . 8 .2 and Exercise 11 below), but are adequate for manyapplications, some of which will be given later .Martingale theory has its intuitive background in gambling . If X„ is interpretedas the gambler's capital at time n, then the defining property postulatesthat his expected capital after one more game, played with the knowledge ofthe entire past and present, is exactly equal to his current capital . In otherwords, his expected gain is zero, and in this sense the game is said to be"fair". Similarly a smartingale is a game consistently biased in one direction.Now the gambler may opt to play the game only at certain preferredtimes, chosen with the benefit of past experience and present observation,but without clairvoyance into the future . [The inclusion of the present statusin his knowledge seems to violate raw intuition, but examine the examplebelow and Exercise 13 .] He hopes of course to gain advantage by devisingsuch a "system" but Doob's theorem forestalls him, at least mathematically .We have already mentioned such an interpretation in Sec . 8 .2 (see in particularExercise 11 of Sec . 8 .2 ; note that a + 1 rather than a is the optionaltime there .) The present generalization consists in replacing a stationary independentprocess by a smartingale . The classical problem of "gambler's ruin"illustrates very well the ideas involved, as follows .Let IS,, n E N°} be a random walk in the notation of Chapter 8, and let S1have the Bernoullian distribution 281 + 28_1 . It follows from Theorem 8 .3 .4,or the more elementary Exercise 15 of Sec . 9 .2, that the walk will almostcertainly leave the interval [-a, b], where a and b are strictly positive integers ;and since it can move only one unit a time, it must reach either -a or b . Thismeans that if we set(20) a = min{n > L S„ = -a}, ,B = min{n > 1 : S„ = b},then y = aA~ is a finite optional r .v. It follows from the Corollary toTheorem 9 .3 .4 that {S yA n } is a martingale . Now(21) lim S yAf = S y a .e .n->ocand clearly S y takes only the values -a and b . The question is : with whatprobabilities? In the gambling interpretation : if two gamblers play a fair cointossinggame and possess, respectively, a and b units of the constant stake asinitial capitals, what is the probability of ruin for each?The answer is immediate ("without any computation"!) if we show firstthat the two r .v .'s IS,, S y } form a martingale, for then(22) (` (Sy) = «'(S,) = 0,which is to say that-a :J/'{S y = -a) + b,J/ (S y = b} = 0,


344 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEso that the probability of ruin is inversely proportional to the initial capital ofthe gambler, a most sensible solution .To show that the pair {S1, S y } forms a martingale we use Theorem 9 .3 .5since {S yA n , n EN,, } is a bounded martingale . The more elementaryTheorem 9 .3 .4 is inapplicable, since y is not bounded. However, there isa simpler way out in this case : (21) and the boundedness just mentionedimply thatc` (S y ) = lim c`(S yA n ),n -4 00and since (` (S yA 1) = f (S 1 ), (22) follows directly .The ruin problem belonged to the ancient history of probability theory,and can be solved by elementary methods based on difference equations (see,e .g ., Uspensky, Introduction to mathematical probability, McGraw-Hill, NewYork, 1937) . The approach sketched above, however, has all the main ingredientsof an elaborate modern theory . The little equation (22) is the prototype ofa "harmonic equation", and the problem itself is a "boundary-value problem" .The steps used in the solution-to wit : the introduction of a martingale,its optional stopping, its convergence to a limit, and the extension of themartingale property to include the limit with the consequent convergence ofexpectations -are all part of a standard procedure now ensconced in thegeneral theory of Markov processes and the allied potential theory .EXERCISES1. The defining relation for a martingale may be generalized as follows .For each optional r .v . a < n, we have C''{Xn I Via } = Xa . Similarly for asmartingale .*2. If X is an integrable r .v ., then the collection of (equivalence classesof) r .v .'s 1`(X I r) with -,(, ranging over all Bore] subfields of ~T, is uniformlyintegrable .3. Suppose {Xnk),k = 1, 2, are two [super] martingales, a is a finiteoptional r .v ., and Xal1 _ [>]Xa2) . Define Xn =X,(,1)1{n


9 .3 BASIC PROPERTIES OF SMARTINGALES 1 345[HINT : Let x1 and x' be independent Bernoullian r .v .'s ; and x2 = x2 = + 1 or-1 according as x1 +xl = 0 or :A 0 ; notation as in (7) .]7. Find an example of a positive martingale which is not uniformly integrable. [HINT : You win 2" if it's heads n times in a row, and you lose everythingas soon as it's tails .]8. Find an example of a martingale {X,) such that X n -* -oo a .e . Thisimplies that even in a "fair" game one player may be bound to lose anarbitrarily large amount if he plays long enough (and no limit is set to theliability of the other player) . [HINT : Try sums of independent but not identicallydistributed r .v.'s with mean 0 .]* 9 . Prove that if {Y n , ?n } is a martingale such that Y n E 2-h-,7- 1 , then forevery n, Y, = Y 1 a .e. Deduce from this result that Doob's decomposition (6)is unique (up to equivalent r .v .'s) under the condition that Z n E 3n-1 for everyn > 2 . If this condition is not imposed, find two different decompositions .10 . If {X n } is a uniformly integrable submartingale, then for any optionalr .v. a we have(i) {XaAn } is a uniformly integrable submartingale ;(11) c(X 1) < co (Xa ) < SUN (X n ) .[HINT : IXaAn I -< IX, I + IXn l .]*11 . Let {Xn , „ ; n E N} be a [super] martingale satisfying the followingcondition : there exists a constant M such that for every n > 1 :e { IXn - Xn-1 I } < Ma .e .where Xo = 0 and 1b is trivial . Then for any two optional r .v .'s a and $such that a < 8 and (-P (8) < oc, {Xa , 9%a ; X fi , :'gyp} is a [super] martingale . Thisis another case of optional sampling given by Doob, which includes Wald'sequation (Theorem 5 .5 .3) as a special case . [HINT : Dominate the integrand inthe second integral in (17) by Y~ where Xo = 0 and Y,n = ~m=1 I X,t - X„_1 I .We have00f(Yfi) = f lx,, - Xn-lid :% < Mn=1 {~?n}12. Apply Exercise 11 to the gambler's ruin problem discussed in thetext and conclude that for the a in (20) we must have c` (a) = +oo . Verifythis by elementary computation .* 13 . In the gambler's ruin problem take b = 1 in (20) . Compute (,(SSA,,)for a fixed n and show that {So, S,,, n } forms a martingale . Observe that {So, Sfi}does not form a martingale and explain in gambling terms the effect of stopping


1 CONDITIONING346. MARKOV PROPERTY . MARTINGALEfi at n . This example shows why in optional sampling the option may be takeneven with the knowledge of the present moment under certain conditions . Inthe case here the present (namely fi A n) may leave one no choice!14 . In the gambler's ruin problem, suppose that S1 has the distributionP81 + (1 - P)8-1, P 0 iand let d = 2 p - 1 . Show that (_F(Sy) = d e (y) . Compute the probabilities ofruin by using difference equations to deduce (9(y), and vice versa .15 . Prove that for any L 1-bounded smartingale {X,, ~i n , n E N,,, }, andany optional a, we have c (jXaI) < oo . [iurcr : Prove the result first for amartingale, then use Doob's decomposition .]*16 . Let {Xn, n } be a martingale : x1 = X1, xn = Xn - X, -I for n > 2 ;let vn E :Tn_1 for n > 1 where A = 9,Tj ; now putnj=1vjxj .Show that IT,, 0~,7} is a martingale provided that Tn is integrable for every n .The martingale may be replaced by a smartingale if v, > 0 for every n . Asa particular case take vn = l{n


9 .4 INEQUALITIES AND CONVERGENCE 1 347since it takes only a finite number of values, Theorem 9 .3.4 shows that thepair {X(,X n } forms a submartingale . If we writeM = { max Xj1 A on M, hence the first inequality follows)'J (M) < [ xadA < f X n d A;M Mthe second is just a cruder consequence .Similarly let fi be the first j such that X j < -A if there is such a j inN, otherwise let ,B = n . Put alsoMk = { min X j < -A) .1


348 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEWe now come to a new kind of inequality, which will be the tool forproving the main convergence theorem below . Given any sequence of r .v .'s{X j }, for each sample point w, the convergence properties of the numericalsequence {X j (w)} hinge on the oscillation of the finite segments {Xj(w), j EN„ } as n -* cc . In particular the sequence will have a limit, finite or infinite, ifand only if the number of its oscillations between any two [rational] numbers aand b is finite (depending on a, b and w) . This is a standard type of argumentused in measure and integration theory (cf. Exercise 10 of Sec . 4 .2) . Theinteresting thing is that for a smartingale, a sharp estimate of the expectednumber of oscillations is obtainable .Let a < b . The number v of "upcrossings" of the interval [a, b] by anumerical sequence {x 1 , . . . , x, } is defined as follows . Setal = min{ j : 1 < j < n, xj < a},a2 =min{ j : ai < j < n, x j > b) ;if either al or a2 is not defined because no such j exists, we define v = 0 . Ingeneral, for k > 2 we seta2k-1 = min{j :a2k-2 < j ::~ n, xj < a},a2k = min{j : a2k-1 < j < n, x> > b} ;if any one of these is undefined, then all the subsequent ones will be undefined .Let at be the last defined one, with £ = 0 if al is undefined, then v is definedto be [f/2] . Thus v is the actual number of successive times that the sequencecrosses from < a to > b . Although the exact number is not essential, since acouple of crossings more or less would make no difference, we must adhereto a rigid way of counting in order to be accurate below .Theorem 9 .4 .2 . Let {X j , .-Tj , j E N„} be a submartingale and -oc < a


9 .4 INEQUALITIES AND CONVERGENCE 1 349the formulas below . In the same vein we set a o - 1 . Observe that a„ = n inany case, so thatn-1Xn - X1=Xa„-Xa (X .+ -X )_ +~+If j is odd and j + 1 < f (co), thenj=0 j even j oddXaj+ , (cv) > b>O = X aj (cv) ;If j is odd and j = f (co),thenXaj+i (co) = Xn (co) > 0 = Xaj (w) ;if jis odd and t(co) < j, thenHence in all cases we haveXaj+t (Cv) = X, (C.~) = X aj (CO) .(6) (Xaj + , (w) - X aj (( V)) > (X aj+ , (U) - Xaj (co))j odd j oddj+1 2 (n)b = "[0,b1 (co)b .Next, observe that {aj , 0 < j < n} as modified above is in general of the form1 = a0 < a1 < a2 < . . . < ae < at + 1 = . . . = a n = n, and since constantsare optional, this is an increasing sequence of optional r .v .'s . Hence byTheorem 9 .3 .4, {X aj , 0 < j < n } is a submartingale so that for each j, 0 0 and consequentlyc~ ( Xaj ) > 0 .J evenAdding to this the expectationsof the extreme terms in (6), we obtain(7) (`,(X" - X 1) > c`( V [0,b] )b ,which is the particular case of (5) under consideration .In the general case we apply the case just proved to {(X j - a)+, j E N„ },which is a submartingale by Corollary 1 to Theorem 9 .3 .1 . It is clear thatthe number of upcrossings of [a, b] by the given submartingale is exactlythat of [0, b - a] by the modified one . The inequality (7) becomes the firstinequality in (5) after the substitutions, and the second one follows since(X n - a) + < X+ + jai . Theorem 9 .4 .2 is proved .


3 5 0 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEThe corresponding result for a supermartingale will be given below ; butafter such a painstaking definition of upcrossing, we may leave the dual definitionof downcrossing to the reader .Theorem 9 .4 .3 . Let {Xj , ~Tj ; j E Nn } be a supermartingale and let -cc


9 .4 INEQUALITIES AND CONVERGENCE 1 35 1is a null set; and so is the union over all such pairs . Since this union containsthe set where lim ., X, < lim n X n , the limit exists a .e . It must be finite a .e . byFatou's lemma applied to the sequence JXn 1 .Corollary . Every uniformly bounded smartingale converges a .e . Every positivesupermartingale and every negative submartingale converge a .e.It may be instructive to sketch a direct proof of Theorem 9 .4 .4 which isdone "by hand", so to speak . This is the original proof given by Doob (1940)for a martingale .Suppose that the set A [a , b] above has probability > 77>0 . For each co inA[ a ,b], the sequence {X n (w), n E N) takes an infinite number of values < aand an infinite number of values > b . Let 1 = no < n 1 < . . . and putA 21 _ 1 = { min Xi < a), A 21 = { max X i > b) .n2j-2


352 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEOnce Theorem 9 .4.4 has been proved for a martingale, we can extend iteasily to a positive or uniformly integrable supermartingale by using Doob'sdecomposition . Suppose {X„ } is a positive supermartingale and X, = Y n - Z,as in Theorem 9 .3 .2 . Then 0 < Z n < Y n and consequentlynext we havec`(Z") = lim '(Z, ) < ( (Y 1 ) ;n-> 00c (Yn) = Cr(X11) + ev(Zn)


9 .4 INEQUALITIES AND CONVERGENCE 1 353Theorem 9 .4 .5 . The three propositions below are equivalent for a submartingale{X,,, 11 E N} :(a) it is a uniformly integrable sequence ;(b) it converges in L 1 ;(c) it converges a .e . to an integrable X,,,, such that {X,,, n E N,,} isa submartingale and ("(X,) converges to (t (X .. . ) .PROOF . (a) = (b) : under (a) the condition in Theorem 9 .4 .4 is satisfied sothat X„ -a X,, a .e . This together with uniform integrability implies X, --* X, '~in L i by Theorem 4 .5 .4 with r = 1 .(b) ---* (c) : under (b) let X„ --+ X1,, in L 1 , then '1(jX„ J ) -+ C'((X ,~ 1) X,, a .e. by Theorem 9 .4 .4. For each A E ~ and n < n',we haveJA X, &0 < JA J [XflfdPby the defining relation . The right member converges to fA X,, d?' by L 1 -convergence and the resulting inequality shows that {X„ , „ ; n E N~ } is asubmartingale . Since L 1 -convergence also implies convergence of expectations,all three conditions in (c) are proved .(c) = (a) ; under (c) . {X,; , „ ; n c- N om } is a submartingale ; hence wehave for every ).>0 :(10) 4d9',X,+, d~) < X+{Xn f {Xn >),)>kJwhich shows that {X,; , n E N} is uniformly integrable . Since Xn -~ X+ a .e .,this implies c (X„) -a (`(X+) . Since by hypothesis ((X„) -+ ((X~ ), it followsthat ((X,) --± c (X0) . This and Xn -+ X- a .e . imply that {Xn } is uniformlyintegrable by Theorem 4 .5 .4 for r = 1 . Hence so is {X„ } .Theorem 9 .4 .6 . In the case of a martingale, propositions (a) and (b) aboveare equivalent to (c') or (d) below :(c') it converges a .e . to an integrable X,,, such that {X„ , n E N~ } isa martingale ;(d) there exists an integrable r .v . Y such that X„ = -F(Y „) for eachn E N .PROOF . (b) =:~ (c') as before ; (c') = (a) as before if we observe that('(X,,,) for every n in the present case, or more rapidly by consideringJX„ I instead of X as below . (c') =~ (d) is trivial, since we may takethe Y in (d) to be the X,,, in (c') . To prove (d) = (a), let n < n', then by


3 54 1 CONDITIONING . MARKOV PROPERTY . MARTINGALETheorem 9 .1 .5 :('(Xn-I~„)=~(~(Yi n-)fi '(Y X",hence {X„ , n , n E N ; Y, } is a martingale by definition. Consequently{ IX„ I , : 7,, , n E N ; I Y 1, 271 is a submartingale, and we have for each ) . > 0 :f,IX,I>1, }IXn I d~GAI Y I d< /{ lX„j>a.}~{IXnI > ),}


9 .4 INEQUALITIES AND CONVERGENCE 1 355(a) {X n } is uniformly integrable ;(b) X„ --> X-,,,, in L 1 ;(c) {X n , n E -N .. . } is a submartingale ;(d) limn ~_~ 4, a}< (IX,1 )=2c~(Xn)-L(X,1))}this implies that {X ; }is uniformly integrable . Next if n < m, then0 > I X n d= ~' = (`'(X,1) - X n dam/'> «(Xn ) - Xrnd J~{X„ > - ;0_ ~c'(Xn - X,n) + (`(Xnl) -J{X,I>-Ik }X, d~J'_ ((X,1 - X111) + X, n


3 5 6 1 CONDITIONING . MARKOV PROPERTY. MARTINGALEby the remark after (12) . It follows that {gin ) is also uniformly integrable, andtherefore (a) is proved .The next result will be stated for the index set N of all integers in theirnatural order :N={ . . .,-n, . . .,-2,-1,0,1,2, . . .,n, . . .} .Let { In } be increasing B .F .'s on N, namely : n C ~m if n < m . We may"close" them at both ends by adjoining the B .F.'s below :oo=1 \ 'nmoo=vnLet {Y n } be r.v.'s indexed by N . If the B .F .'s and r .v .'s are only given on Nor -N, they can be trivially extended to N by putting ~n = 1 , Y„ = Y 1 forall n < 0, or 7, ,~,7, = ~ 1, Y n = Y-1 for all n > 0. The following convergencetheorem is very useful .Theorem 9 .4 .8 . Suppose that the Y n 's are dominated by an integrable r .v .Z :(13) sup I Yn I < Znand lim„ Y, = Yx or Y-,, as n --~ oc or -oo . Then we have(14a)(14b)limn-+o0lim~` {Yn I n } _ C~{Y00 I moo } ;'IYn I Win} = co {Y-oo I 00} .In particular for a fixed integrable r .v. Y, we have(15a) lim c`'{Y I Win } _ {Y 1 ~-'} ;11-+0C(15b) lim «{Y I `J~/1 } _ " I Y I } .n-+-o0where the convergence holds also in L' in both cases .PROOF . We prove (15) first . Let X 1t = c{Y I .%} . For n E N, {X, 1i Jn}is a martingale already introduced in (13) of Sec . 9 .3 ; the same is true forn E -N . To prove (15a), we apply Theorem 9 .4.6 to deduce (c') there . Itremains to identify the limit X 00 with the right member of (15a) . For eachA E :'fin , we haveIYd'/'=I X n d .°A= J X0A A A


9 .4 INEQUALITIES AND CONVERGENCE 1 357Hence the equations hold also for A E ~& (why?), and this shows that X".has the defining property of 1(Y I ~&, since X,,,, E ~& . Similarly, the limitX_~ in (15b) exists by Theorem 9 .4 .7 ; to identify it, we have by (c) there,for each A E ,, :fX_ ocam'= d XndGJ'_ YdA.J n IThis shows that X_,,, is equal to the right member of (15b) .We can now prove (14a) . Put for M E N :Wm = sup IY n - Y" 1 ;n>mthen IWmI < 2Z and limm,c Wm = 0 a .e. Applying (15a) to W m we obtainlim co{IY n - YooIj } < lim e{W m I ~Tn } = co{Wm ~ moo}0n-+oon--+00As m -± oc, the last term above converges to zero by dominated convergence(see (vii) of Sec . 9 .1) . Hence the first term must be zero and this clearlyimplies (14a) . The proof of (14b) is completely similar .Although the corollary below is a very special case we give it for historicalinterest. It is called Paul Levy's zero-or-one law (1935) and includesTheorem 8 .1 .1 as a particular case .Corollary .If A E 3 , then(16) lim -OP(A I On ) = I n a .e .n-+ocThe reader is urged to ponder over the intuitive meaning of this result andjudge for himself whether it is "obvious" or "incredible" .EXERCISES*1 . Prove that for any smartingale, we have for each k >O :~, Z'P{suP IXn I> ?,} < 3 sup (F(IXn D ) .nnFor a martingale or a positive or negative smartingale the constant 3 may bereplaced by 1 .2. Let {X,) be a positive supermartingale . Then for almost every w,Xk (w) = 0 implies X,, (w) = 0 for all n > k . [This is the analogue of aminimum principle in potential theory .]3. Generalize the upcrossing inequality for a submartingale {Xn ,~} asfollows :{(Xn - a )+ I ) -(X i - a )+°{V[a b]I }


3 5 8 1 CONDITIONING . MARKOV PROPERTY . MARTINGALESimilarly, generalize the downcrossing inequality for a positive supermartingale{X„ , as follows :X 1 Abv[Q b ] I h } < b - a*4. As a sharpening of Theorems 9 .4 .2 and 9 .4 .3 we have, for a positivesupermartingale {X,,, n E N} :X A av ( Q,b} > k} < (b ) ( b )(n)J~{v{a b} > k}


9 .4 INEQUALITIES AND CONVERGENCE 1 359; 9 . Let IX,, : h'7-,,n E N} be a supermartingale satisfying the conditionlimn -0C (` (X„) > -oc . Then we have the representation X„ = X,, + X,7 where{X,,, c „} is a martingale and {X„, %,} is a positive supennartingale such thatlim„, 0 X„ = 0 in L 1 as well as a.e. This is the analogue of F . Riesz'sdecomposition of a superharmonic function, X ;, being the harmonic part andX;, the potential part . [HINT : Use Doob's decomposition X„ = Y„ - Z„ andput X„=Y„-~`(Z" I~'/n) •]10 . Let IX,,, ?X„ } be a potential ; namely a positive supermartingale suchthat Jim, ,--+,, = 0; and let X„ = Y„ - Z„ be the Doob decomposition[cf. (6) of Sec . 9 .3]. Show thatXn =I(Z,, I~T-n ) - Zn .* 11 . If {X„} is a martingale or positive submartingale such that(Xn) < oc, then {X„ } converges in L2 as well as a .e .SUN_12 . Let {~,,, n E N} be a sequence of independent and identicallydistributed r.v.'s with zero mean and unit variance ; and S„ _ ~~=.I ~jThen for any optional r .v . a relative to {~,, } such that c (ja- ) < oc, wehave c (IS a I) < /2-C`( .,Aa-) and c(S a ) = 0 . This is an extension of Wald'sequation due to Louis Gordon . [HINT : Truncate a and put 77k = (S2 /,/) -(Sn_ 1 /,/k - 1) ; thenk-1 fit k)77k dJ'00k=1k}/-1k- < 2(r( ,,fc-e ) ;now use Schwarz's inequality followed by Fatou's lemma .]The next two problems are meant to give an idea of the passage fromdiscrete parameter martingale theory to the continuous parameter theory .13 . Let {X t , `At ; t E [0, 1 ]} be a continuous parameter supermartingale .For each t E [0, 1 ] and sequence It,j decreasing tot, {X t „ } converges a .e . andin L 1 . For each t E [0, 1 ] and sequence {t,, } increasing to t, {X t, } converges a .e .but not necessarily in L 1 . [HINT : In the second case consider X t„ - (1'7{X t I ?It,, } .]*14. In Exercise 13 let Q be the set of rational numbers in [0, 1] . Foreach t E (0, 1) both limits below exist a .e . :lilnXs , limX s .'Tt,cQ[HINT : Let {Q,,, n > 1 } be finite subsets of Q such that Q,, T Q ; and apply theupcrossing inequality to {X,•, S E Q,, }, then let n ---> oo .],i tsEQ


3 60 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE9.5 ApplicationsAlthough some of the major successes of martingale theory lie in the field ofcontinuous-parameter stochastic processes, which cannot be discussed here, ithas also made various important contributions within the scope of this work .We shall illustrate these below with a few that are related to our previoustopics, and indicate some others among the exercises .(I)The notions of "at least once" and "infinitely often"These have been a recurring theme in Chapters 4, 5, 8, and 9 and play importantroles in the theory of random walk and its generalization to Markovprocesses . Let {X,I , n E N ° ) be an arbitrary stochastic process ; the notationfor fields in Sec . 9 .2 will be used . For each n consider the events :00A n =U{Xj EB1 ),j=n00M= I I A n = {Xj E B j i .o .},n=1where B n are arbitrary Borel sets .Theorem 9 .5 .1 .We have(1) lim ,?I {An+1 13[0,n)} = 1 M a .e .,where [o .n] may be replaced by {n } or X n if the process is Markovian .PROOF .By Theorem 9 .4 .8, (14a), the limit is~'{M I ~io,00>} = 1 M .The next result is a "principle of recurrence" which is useful in Markovprocesses ; it is an extension of the idea in Theorem 9 .2 .3 (see also Exercises 15and 16 of Sec . 9 .2) .Theorem 9 .5 .2 . Let {Xn , n E N°} be a Markov process and A, Bn Borelsets . Suppose that there exists 8>0 such that for every n,00(2) ){ Uthen we have[X j E B j] I X n } > 8 a .e . on the set {X n E A n } ;j=n+1(3) J'{[X j E A j i .o.]\[Xj E Bj i .o .]} = 0 .


9.5 APPLICATIONS 1 361PROOF. Let A _ {Xj E Aj i .o .} and use the notation An and M above .We may ignore the null sets in (1) and (2) . Then if w E A, our hypothesisimplies thatZII{An+l I X„}((o) > 8 i.o .In view of (1) this is possible only if w E M . Thus A C M, which implies (3) .The intuitive meaning of the preceding theorem has been given byDoeblin as follows : if the chance of a pedestrian's getting run over is greaterthan 8 > 0 each time he crosses a certain street, then he will not be crossingit indefinitely (since he will be killed first)! Here IX, E A,,) is the event ofthe nth crossing, {Xn E Bn} that of being run over at the nth crossing .(II) Harmonic and superharmonic functions for a Markov processLet {X,,, n E N°} be a homogeneous Markov process as discussed in Sec . 9 .2with the transition probability function P( ., . ). An extended-valued functionf on R1 is said to be harmonic (with respect to P) iff it is integrable withrespect to the measure P(x, .) for each x and satisfies the following "harmonicequation" ;(4) Vx E ~W1 : f (x) = f P(x, dy)f(y) .It is superharmonic (with respect to P) iff the "_" in (4) is replaced byin this case f may take the value +oo .Lemma . If f is [super]harmonic, then { f (X,), n e N° }, where X0 - x° forsome given x° in is a [super]martingale .PROOF . We have, recalling (14) of Sec . 9 .2,{f(Xn)} = f P" (xo,dy)f(y) < oc,as is easily seen by iterating (4) and applying an extended form of Fubini'stheorem (see, e .g ., Neveu [6]). Next we have, upon substituting X„ for x in (4) :f (X,) = f P(Xn, d y)f (y) = (`{ .f (Xn+l) I Xn} = ("If (Xn+1) 1 ~[°,nl},where the second equation follows by Exercise 8 of Sec . 9 .2 and the third byMarkov property . This proves the lemma in the harmonic case ; the other caseis similar . (Why not also the "sub" case?)The most important example of a harmonic function is the g( ., B) ofExercise 10 of Sec . 9 .2 for a given B ; that of a superharmonic function is the


1 B) CONDITIONINGof Exercise 9 there is superharmonic and is called the "potential" of the362. MARKOV PROPERTY . MARTINGALEf ( •,. These assertions follow easily from their probabilisticmeanings given in the cited exercises, but purely analytic verificationsare also simple and instructive . Finally, if for some B we havefor every x, then 7r( •)set B .7r(x) =P(') (x, B) < oc11=0Theorem 9 .5 .3 . Suppose that the remote field of {Xn, n E N°} is trivial .Then each bounded harmonic function is a constant a .e . with respect to eachwhere A„ is the p .m . of Xn .PROOF. By Theorem 9 .4 .5, f (Xn) converges a .e . to Z such that{ f (Xn), [0,n1 Z, x[0,00)}is a martingale . Clearly Z belongs to the remote field and so is a constant ca .e . Sincef (X n) = (~IJZ ( n }each f (X„) is the same constant c a .e. Mapped into 31, the last assertionbecomes the conclusion of the theorem .(III) The supremum of a submartingaleThe first inequality in (1) of Sec . 9 .4 is of a type familiar in ergodic theoryand leads to the result below, which has been called the "dominated ergodictheorem" by Wiener. In the case where X„ is the sum of independent r .v .'swith mean 0, it is due to Marcinkiewicz and Zygmund . We write IIXIIP forthe U-norm of X : IIXIIn = (IXIp) .Theorem 9 .5 .4. Let 1 < p < oo and 1 / p + 1 /q = 1 . Suppose that {Xn , n EN) is a positive submartingale satisfying the condition(5) sup ("(XP} < oo .nThen sup,IENX,1 E LP and(6) I I supXn I Ir < q sup I IX,, I IP .11 /tPROOF . The condition (5) implies that IX,,) is uniformly integrable(Exercise 8 of Sec . 4 .5), hence by Theorem 9 .4 .5, Xn --~ X,,, a .e . and {Xn, n ENom} is a submartingale . Writing Y for sup X,,, we have by an obvious


9 .5 APPLICATIONS 1 363extension of the first equation in (1) of Sec . 9 .4 :(7) W .>0: A~{Y > A} < / X,0 dT .J{Y?~}Now it turns out that such an inequality for any two r .v .'sY and X"' impliesthe inequality I I Y I I p::S g I IXoo I Ip, from which (6) follows by Fatou's lemma .This is shown by the calculation below, where G Q,) _ 91{Y > a.} .00((Yp) _ -JA dG(A) A}_YpAp-2 da~ d's~ X" fo-17IX,) Yp -1 d~P < gIIXooiIpIIYP -' 11q= glfXoollp{('(Yp)j 11 q .Since we do not yet know (ff (Yp) < 00, it is necessary to replace Y firstwith Y AM, where m is a constant, and verify the truth of (7) after thisreplacement, before dividing through in the obvious way . We then let m T o0and conclude (6) .The result above is false for p = 1 and is replaced by a more complicatedone (Exercise 7 below) .(IV)Convergence of sums of independent r .v.'sWe return to Theorem 5 .3 .4 and complete the discussion there by showingthat the convergence in distribution of the series En X, already implies itsconvergence a .e. This can also be proved by an analytic method based onestimation of ch .f.'s, but the martingale approach is more illuminating .Theorem 9 .5 .5 . If {X n , n E N) is a sequence of independent r .v .'s such thatS n =E'=, X j converges in distribution as n ---> oo, then S, converges a .e .PROOF . Let f ; be the ch .f. of X j , so thatn(Pit = 11 f jj=1


3 64 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEis the ch .f. of S,, . By the convergence theorem of Sec . 6 .3, cn convergeseverywhere to co, the ch .f. of the limit distribution of S,, . We shall need onlythis fact for (tI < to, where t o is so small that cp(t) 0 0 for ItI < to ; then thisis also true of c ,, for all sufficiently large n . For such values of n and a fixedt with I tt < t o , we define the complex-valued r .v. Z, as follows :(8)Zne its,cpn (t)Then each Z„ is integrable ; indeed the sequence {Z n ) is uniformly bounded .We have for each n, if ~n denotes the Borel field generated by S1, . . . , S n("{Zn+1eitsn e itX n+ ,(n (t) f n+1(t)eitS n e itX„ + ,~ ncn (t) f n+1 (t)eitSnf n+1 (t)~n (t) f n+1(t)=Z Z/1where the second equation follows from Theorem 9 .1 .3 and the third fromindependence . Thus {Zn , :~„ } is a martingale, in the sense that its real andimaginary parts are both martingales . Since it is uniformly bounded, it followsfrom Theorem 9 .4 .4 that Zn converges a .e . This means, for each t with I t I < to,there is a set Sg t with °J'(S2 t ) = 1 such that if w E Q, then the sequence ofcomplex numbers eits^ (W ) / cp„ (t) converges and so also does e irsn (w) . But howdoes one deduce from this the convergence of S, (w)? The argument belowmay seem unnecessarily tedious, but it is of a familiar and indispensable kindin certain parts of stochastic processes .Consider e irsn (w) as a function of (t, w) in the product space T x Q, whereT = [-to, to], with the product measure nZ x where n is the Lebesguemeasure on T . Since this function is measurable in (t, w) for each n, theset C of (t, w) for which limn~~ eits„(m) exists is measurable with respect tom x ~ . Each section of C by a fixed t has full measure ?~ (Q t ) = 1 as justshown, hence Fubini's theorem asserts that almost every section of C by afixed w must also have full measure m(T) = 2to . This means that there existsan S2 with '~'(Q) = 1, and for each w E S2 there is a subset T,,, of T withm(T .) = m(T), such that if t E T, then lirn n,~ e its" (' ) exists . Now we arein a position to apply Exercise 17 of Sec . 6 .4 to conclude the convergence ofS n (w) for w E S2, thus finishing the proof of the theorem .According to the preceding proof, due to Doob, the hypothesis ofTheorem 9 .5 .5 may be further weakened to that the sequence of ch .f.'s ofS„ converges on a set of t of strictly positive Lebesgue measure . In particular,if an infinite product II„ f „ of ch .f .'s converges on such a set, then it convergeseverywhere .


9.5 APPLICATIONS 1 365(V)The strong law of large numbersOur next example is a new proof of the classical strong law of large numbersin the form of Theorem 5 .4 .2, (8) . This basically different approach, whichhas more a measure-theoretic than an analytic flavor, is one of the strikingsuccesses of martingale theory . It was given by Doob (1949) .Theorem 9 .5 .6 . Let IS,, n e N} be a random walk (in the sense ofChapter 8) with c`{IS i 1} < oc . Then we havelimn-+ooSnn= '{Si } a .e .PROOF. Recall that S n = Ejn=1 Xj and consider for 1 < k < n :(9) ( F{Xk I Sn, Sn+1 . . . . } = c fXk I finwhere n is the Borel field generated by {S j , j > n} . Thusas n increases . By Theorem 9 .4 .8, (15b), the right side of (9) converges toc`{Xk 1 6] . Now ;!~;, is also generated by S n and {X j , j > n + 1}, hence itfollows from the independence of the latter from the pair (X k , S,,) and (3) ofSec . 9 .2 that we have(10) C {Xk Sn, Xn+l, Xn+2, . . .} = ( {Xk S, } = c`IX 1 Sn},the second equation by reason of symmetry (proof?) . Summing over k from1 to n and taking the average, we infer thatnENS" _ (r fXi I n }nso that if Y_,, = S,,/n for n E N, {Y,,, n E -NJ is a martingale . In particular,lim S " = Jim cc{X 1I n} = (``{XI I _},n - oo n 17__>00where the second equation follows from the argument above . On the otherhand, the first limit is a remote (even invariant) r .v . in the sense of Sec . 8 .1,since for every m > 1 we havenlim Sn (w) = limEj=m Xj ((0) ,n-+o0 n n-+oo n


3 6 6 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEhence it must be a constant a .e. by Theorem 8 .1 .2 . [Alternatively we may useTheorem 8 .1 .4, the limit above being even more obviously a permutable r .v .]This constant must be f { {X 1 1 -,0-11 _ ~ {X 1 }, proving the theorem .(VI)Exchangeable eventsThe method used in the preceding example can also be applied to the theoryof exchangeable events . The events {E,, n E NJ are said to be exchangeableiff for every k > 1, the probability of the joint occurrence of any k of them isthe same for every choice of k members from the sequence, namely we have(11) ;~1'{E n , f1 . . . fl Enk } = Wk, k E N ;for any subset {n 1 , . , n k } of N . Let us denote the indicator of E n by en ,and putN nj=1then Nn is the number of occurrences among the first n events of the sequence .Denote by , the B .F . generated by {N j , j > n}, andThen the definition of exchangeability implies that if n j < n for 1k, thennEN< J


9.5 APPLICATIONS 1 3 6 7But it is trivial that 1 + ejz = ( 1 + Z)ej since e j takes only the values 0 and1, hence'nnf,tjz'' _ 11(1 +Z) ej _ (1 +Z)N"j=0 j=1From this we obtain by comparing the coefficients :(13) fnk= Nn),0 41j=1I owe this derivation to David Klarner .


3 68 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEThe sequence {Q2 (X ), n E N} associated with X is called its squared variationprocess and is useful in many applications . We begin with the algebraicidentity :X ;,17j=12xj xj + 2 Xj-lx j .j=1 j=2If X„ E L 2 for every n, then all terms above are integrable, and we have foreach j :( 16 ) (c (X j-l xj) = cn (Xj_1 cn(xj Jj-1 )) = 0 .It follows thatL(X n )nj=1When X„ is the nth partial sum of a sequence of independent r .v .'s with zeromean and finite variance, the preceding formula reduces to the additivity ofvariances ; see (6) of Sec . 5 .1 .Now suppose that {X,} is a positive bounded supermartingale such that0 < X, < a. for all n, where A is a constant . Then the quantity of (16) isnegative and bounded below byIn this case we obtain from (15) :XJ((a.(F (xj I ' -I ) ) _ Acr(xj) < 0 .nn> (f (X,,) > c` (Q 2 ) + 2Anj=2C`(xj) _ c (Q1) + 2ALV (X„) = c`(XIso that(17) (,c (Q') < 2A6`(X1) < 2a . 2 .If X is a positive martingale, then XAA, is a supermartingale of the kindjust considered so that (17) is applicable to it . Letting X* = supl


9 .5 APPLICATIONS 1 369We have therefore established the inequality :(18) Z'1 1 Q,, (X) ot'where the fraction is taken to be zero if the denominator vanishes . Accordingto the discussion of Sec . 9.1, we haveX,, = ("I Y INow suppose that the partitions become finer as n increases so that {J7;;, n E N}is an increasing sequence of Borel fields . Then we obtain by Theorem 9 .4 .8 :limX„Y1 :'7 U .11-+00


3 7 0 1 CONDITIONING . MARKOV PROPERTY . MARTINGALEIn particular if Y E ~~ we have obtained the Radon-Nikodym derivativeY = d v/d 7 as the almost everywhere limit of "increment ratios" over a "net" :(n)Y(w) = lim v( ~' ( c' ))n-aoo ~(A(n) )'i(w)where j(w) is the unique j such that w E Ann ) .If (0, ZT, ~P) is (ill, ~A, m) as in Example 2 of Sec . 3 .1, and v is anarbitrary measure which is absolutely continuous with respect to the Lebesguemeasure m, we may take the nth partition to be 0 = i ( 0 1n) < . . . < 4(n) = 1such thato0 ; otherwise Z n = 0 . Then {Z n , n E N} is a martingalethat converges a .e . and in L' . [This is from information theory .]


9 .5 APPLICATIONS ( 3712. Suppose that for each n, {X j , 1 < j < n} and {X ' , 1 < j < n) haverespectively the n-dimensional probability density functions p, and qn . DefineYn _ gn(X1, . . .,X n )pn(X1, . . .,X,)if the denominator >0 and = 0 otherwise . Then {Y n , n E N} is a supermartingalethat converges a .e . [This is from statistics .]3 . Let {Z n , n E N°} be positive integer-valued r .v .'s such that Z° - 1and for each n > 1, the conditional distribution of Z n given Z 0 , . . . , Z n _ 1 isthat of Zn _ 1 independent r .v.'s with the common distribution {p k , k E N°),where p1 < 1 and000


3 7 2 1 CONDITIONING . MARKOV PROPERTY . MARTINGALE*7. The analogue of Theorem 9 .5 .4 for p = 1 is as follows : if X n > 0 forall n, thene(''{supXn } < [1 + sup i`{Xn log+ Xn}],n e - 1 nwhere log+ x = log x if x > 1 and 0 if x < 1 .*8 . As an example of a more sophisticated application of the martingaleconvergence theorem, consider the following result due to Paul Levy . LetIX, n E N°) be a sequence of uniformly bounded r.v .'s, then the two series~7 Xn and G{Xn I X1, . . .,Xn-1}nnconverge or diverge together . [HINT : LetYn = X n - L {Xn I X1, . . . , Xn-1 } and Z n =nj=1Define a to be the first time that Z n > A and show that e (ZAn ) is boundedin n . Apply Theorem 9 .4 .4 to {ZaAn } for each A to show that Z n converges onthe set where lim„ Z„ < oo ; similarly also on the set where lim71 Z n > - oc .The situation is reminiscent of Theorem 8 .2 .5 .]9. Let {Yk, 1 < k < n} be independent r .v .'s with mean zero and finitevariances ak ;kSk=~Y i .j=1 j=1a~ > 0, Zk = Sk - SkProve that {Zk, 1 < k < n } is a martingale . Suppose now all Yk are boundedby a constant A, and define a and M as in the proof of Theorem 9 .4 .1, withthe Xk there replaced by the Sk here . Prove thatSna'(M`) < (F(Sa) _ (1(Sa) < (~.+A) 2 .Thus we obtain:/'{ max ISk I < A} < (), 2A )z .1n} . Prove that if (f ((ISaI/a)l{a


9 .5 APPLICATIONS 1 373is due to McCabe and Shepp . [HINT :nCn = 11?/-){1X11 < j} -~ C>0 ;j=1L°O 1 C,~IXn -1I d'/ = 1X1 I dJ~ ;n=1 n {a=n} n=1 n {1X1I>n}~ 1- -n=1 n {an }I'sn-1 I d, < oc .]11. Deduce from Exercise 10 that c~(sup n ISn 1 /n) < oc if and onlyif c`(IX1I log+ IXi1) < oo . [HINT : Apply Exercise 7 to the martingaleI . . . . S„ /n, . . . , S2/2, S 1 } in Example (V) .]12. In Example (VI) show that (i) -r,- is generated by rl ; (ii) the events{E n , n E NJ are conditionally independent given 17 ; (iii) for any l events E nj ,1 < j < l and any k < l we have~'{E,, , n . . . n E,lA n En L+I n . . . n En }= Jf10Xk (lx) l-kG(dx)-where G is the distributions of q .13. Prove the inequality (19) .*14. Prove that if v is a measure on ~~ that is singular with respect to./ , then the X,'s in (21) converge a .e . to zero . [HINT : Show thatv(A) = X n dZ' for A E m < n,n15. In the case of (V, A, m), suppose that v = 6 1 and the nth partitionis obtained by dividing °l/ into 2" equal parts : what are the X„'s in (21)? Usethis to "explain away" the St . Peterburg paradox (see Exercise 5 of Sec . 5 .2) .Bibliographical Noteand apply Fatou's lemma . {X" } is a supermartingale, not necessarily a martingale!]Most of the results can be found in Chapter 7 of Doob [17] . Another useful account isgiven by Meyer [20] . For an early but stimulating account of the connections betweenrandom walks and partial differential equations, seeA . Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung . Springer-Verlag, Berlin, 1933 .


3 7 4 1 CONDITIONING . MARKOV PROPERTY . MARTINGALETheorems 9 .5 .1 and 9 .5 .2 are contained inKai Lai Chung, The general theory of Markov processes according to Doeblin,Z . Wahrscheinlichkeitstheorie 2 (1964), 230-254 .For Theorem 9 .5 .4 in the case of an independent process, seeJ. Marcinkiewicz and A . Zygmund, Sur les fonctions independants, Fund . Math .29 (1937), 60-90,which contains other interesting and useful relics .The following article serves as a guide to the recent literature on martingaleinequalities and their uses :Ann. Proba-D . L . Burkholder, Distribution function inequalities for martingales,bility 1 (1973), 19-42 .


Supplement : Measure andIntegralFor basic mathematical vocabulary and notation the reader is referred to § 1 .1and §2.1 of the main text .1 Construction of measureLet Q be an abstract space and / its total Borel field, then AAc Q .E J meansDEFINITION 1 . A function A* with domain / and range in [0, oc] is anouter measure iff the following properties hold :(a) A* (0) = 0 ;(b) (monotonicity) if A I C A2, then /-t* (A I ) < ~~ * 02) ;(c) (subadditivity) if {A j } is a countable sequence of sets in J, then(UiA j< µ*(A j ) .DEFINITION 2 . Let - be a field in Q . A function µ with domain ~'Ao andrange in [0, oo] is a measure on A% iff (a) and the following property hold :


376 1 SUPPLEMENT : MEASURE AND INTEGRAL(d) (additivity) if {B J }UJ BJ E :%, thenis a countable sequence of disjoint sets in tand(1)It(UBi) -i )Jg(Bj) .Let us show that the properties (b) and (c) for outer measure hold for a measureg, provided all the sets involved belong to A .If A 1 A2 E z~b, and A 1 C A 2 , then A 'A2 E A because ~5t- is a field ;A 2 =A 1 UA~A 2 and so by (d) :g(A2) = /,t(A1)+ g(A'A2) ? A00-Next, if each AJ E ~b , and furthermore if U J A J E A (this must be assumedfor a countably infinite union because it is not implied by the definition of afield!), thenJ-A 1 UAlA2 UAlA2A 3 U . . .and so by (d), since each member of the disjoint union above belongs to : 3 :g UAJ = A(Ai) + g(A,A2) + A(AlA2A3) + . . .I< g(A1)+g(A2)+A03)+ . . .by property (b) just proved .The symbol N denotes the sequence of natural numbers (strictly positiveintegers) ; when used as index set, it will frequently be omitted as understood .For instance, the index j used above ranges over N or a finite segment of N .Now let us suppose that the field : b is a Borel field to be denoted byand that g is a measure on it . Then if A„ E '- for each n E N, the countableunion U, A„ and countable intersection nn A„ both belong to ~T- . In this casewe have the following fundamental properties .(e) (increasing limit) if A„ C A, + 1 for all n and A, T A - Un A,,, thenlim T g(A„) = A(A) .17(f) (decreasing limit) if A„ D A„ + 1 for all n, A„ 4, A = nn A,,, and forsome n we have g (A„) < oo, then1im ~ A(An) = A(A) .11


I CONSTRUCTION OF MEASURE 1 377The additional assumption in (f) is essential . For a counterexample letA„ _ (n, oo) in R, then A„ 4, 4 the empty set, but the measure (length!) of A„is +oc for all n, while 0 surely must have measure 0 . See §3 for formalitiesof this trivial example . It can even be made discrete if we use the countingmeasure # of natural numbers : let A„ _ {n, n + 1, n + 2, . . .} so that #(A„) _+oc, #(n n An) = 0 .Beginning with a measure µ on a field A, not a Bore] field, we proceedto construct a measure on the Borel field 7 generated by A, namely theminimal Borel field containing % (see §2 .1) . This is called an extension ofji from A to Y , when the notation it is maintained . Curiously, we do thisby first constructing an outer measure A* on the total Borel field J and thenshowing that µ* is in truth a measure on a certain Borel field to be denotedby z~;7 * that contains A . Then of course 3T* must contain the minimal ~ , andso it* restricted to Z is an extension of the original µ from A to 7. But wehave obtained a further extension to ~A * that is in general "larger" than 3T andpossesses a further desirable property to be discussed .DEFINITION 3 . Given a measure A on a field A in Q, we define µ* onJ as follows, for any A E J :(2) A*(A) = inf ,u(BJ)JB J E A for all j and U BJ D A .A countable (possibly finite) collection of sets {BJ } satisfying the conditionsindicated in (2) will be referred to below as a "covering" of A . The infimumtaken over all such coverings exists because the single set c2 constitutes acovering of A, so that0 < u * (A) < i.c * (Q) < +oo .It is not trivial that µ* (A) = ,u (A) if A E %, which is part of the next theorem .Theorem 1 . We have µ* = I, on A ; u* on J is an outer measure .PROOF . Let A E ~b, then the single set A serves as a covering of A ; henceµ* (A) < µ(A) . For any covering {BJ } of A, we have AB j E A andUABJ =AEI ;Jhence by property (c) of µ on 3~ followed by property (b) :µ(A) = µ (UAB J) < A(ABJ) < ti (BJ) .JJ


378 1 SUPPLEMENT : MEASURE AND INTEGRALIt follows from (2) that p(A) < ,u*(A) . Thus µ* _ /I. on A .To prove u* is an outer measure, the properties (a) and (b) are trivial .To prove (c), let E > 0 . For each j, by the definition of ,u* (A j ), there exists acovering {Bjk} of A j such that~A(Bjk) < A * (4 (4j)+ 2jikThe double sequence{Bjk} is a covering of U j A j such thatE It(B ik) 0 :,u U A j < ,u* (A j ) +cthat establishes (c) for A*, since E is arbitrarily small .With the outer measure E.t*, a class of sets ,* is associated as follows .DEFINITION 4 .A set A c Q belongs to ~T* iff for every Z c S2 we have(3) A* (Z) _ It* (AZ) + µ * (A`Z) .If in (3) we change "_" into "" .Theorem 2 . Y * is a Borel field and contains ~ . On 97*, It* is a measure .PROOF . Let A E A . For any Z c 0 and any c > 0, there exists a covering(B j } of Z such that(4) ~µ(Bj) < µ* (Z) +jSince AB j E ~;b, JAB j } is a covering of AZ ; {AcB j } is a covering of A Z, hence(5) A * (AZ) _< µ(ABj), It* (A`Z) < It (A CBj) .Since A is a measure on A, we have for each j :'(6) I(ABj) + I(A cBj) - A(Bj ) .


1 CONSTRUCTION OF MEASURE 1 379It follows from (4), (5), and (6) thatA*(AZ) + µ*(A`Z) < µ*(Z) + E .Letting E ~ 0 establishes the criterion (3) in its ">" form . Thus A E ~T *, andwe have proved that 3 C ~* .To prove that 3 ;7 * is a Borel field, it is trivial that it is closed undercomplementation because the criterion (3) is unaltered when A is changedinto A` . Next, to show that -IT* is closed under union, let A E 9,- * and B E 3~7 * .Then for any Z C Q, we have by (3) with A replaced by B and Z replaced byZA or ZA` :Hence by (3) again as written :A*(ZA) = A*(ZAB) + µ*(ZAB') ;A* (ZA`) _ A * (ZA`B) + It* (ZA`B` ) .A* (Z) = ,u* (ZAB) + A* (ZAB`) + A * (ZA`B) + µ* (ZA`B`) .Applying (3) with Z replaced by Z(A U B), we haveA*(Z(A U B)) = µ*(Z(A U B)A) + p*(Z(A U B)A`)= ,u * (ZA) + A * (ZA`B)= µ*(ZAB) + It* (ZAB`) + µ*(ZA`B) .Comparing the two preceding equations, we see that/J-* (Z) =,u*(Z(A U B)) + µ*(Z(A U B)`) .Hence A U B E *, and we have proved that * is a field .Now let {A j } be an infinite sequence of sets in * ; putj-1B I =A 1 , Bj =A j \ UA for j > 2 .(i=1Then {B j } is a sequence of disjoint sets in * (because * is a field) and hasthe same union as {Aj} . For any Z C Q, we have for each n > 1 :nJ_ g (Z(~B1)Bfl) + / G * (Z(~B J)B)nj=1 j=1=µ*(ZBn)+A* ( Z flU:j=1BJ))


380 1 SUPPLEMENT : MEASURE AND INTEGRALbecause B,, E T * . It follows by induction on n that(7) ~l Z U B j = µ*(ZBj ) .nJ=1 j=1Since U". ] B j E :7*, we have by (7) and the monotonicity of 1U* :~(.( * (Z )nnUBj -}- ~(.l.* Z U( j=1 j=1nBjcpp,u *(ZBj) + g* Z U B j>=1 J=1Letting n f oo and using property (c) of /f , we obtain00,u * (Z )> µ* Z U Bj + ,u * Z ( 00 U Bjj=1 j=1that establishes U~° 1Bj E * . Thus J7* is a Borel field .Finally, let {Bj } be a sequence of disjoint sets in * . By the property (b)of ,u* and (7) with Z = S2, we havecc n n o0,u U Bj > lim sup /f U B j = 1 nm µ (B1) = /f (B j ) .j=1nj=1 j=1 j=Combined with the property (c) of /f , we obtain the countable additivity ofµ* on *, namely the property (d) for a measure :The proof of Theorem 2 is complete .j=1/*(Bj)2 Characterization of extensionsWe have proved thatwhere some of the "n" may turn out to be "=" . Since we have extended themeasure µ from moo to .f * in Theorem 2, what for :J7h- ? The answer will appearin the sequel .


2 CHARACTERIZATION OF EXTENSIONS 1 381The triple (S2, zT, µ) where ~ is a Borel field of subsets of S2, and A is ameasure on will be called a measure space . It is qualified by the adjective"finite" when ,u(S2) < oc, and by the noun "probability" when ,u(Q) = l .A more general case is defined below .DEFINITION 5 .A measure ,u on a field 33 (not necessarily Borel field) issaid to be a-finite iff there exists a sequence of sets IQ,, n E N} in J~ suchthat µ (Q,,) < oc for each n, and U 17 St n = S2 . In this case the measure space(Q, :h , p), where JT is the minimal Borel field containing , is said to be"a-finite on so" .Theorem 3 . Let ~~ be a field and z the Borel field generated by r~ . Let µland µ2 be two measures on J that agree on ~ . If one of them, hence bothare a-finite on `b-, then they agree on 1 .PROOF . Let {S2„ } be as in Definition 5 . Define a class 6F of subsets of S2as follows :F = {A C S2 : I,1(S2 nA) = A 2 (S2 nA) for all n E N} .Since Q,, E A, for any A E 3 we have QA E - for all n ; hence 6' D :3 .Suppose A k E l A k C Ak+I for all k E N and Ak T A. Then by property (e)of µ1 and /L2 as measures on we have for each nµi (S2„ A) = 1km t ,uI (Q,,Ak) = liken T A2(S2nAk) = ,2(S2nA) .Thus A E 6'. Similarly by property (f), and the hypothesis (S2 n ) _/L2(Q,,) < oo, if Ak E f and Ak 4, A, then A E (,-- . Therefore { is closed underboth increasing and decreasing limits ; hence D : by Theorem 2 .1 .2 of themain text . This implies for any A Eii1(A) = liranT /L1 (2nA) = lim T /2(QnA) _ U2 (A)nby property (e) once again . Thus ,u, and lµ2 agree on .It follows from Theorem 3 that under the a-finite assumption there, theouter measure µ* in Theorem 2 restricted to the minimal Borel fieldcontaining -% is the unique extension of A from 3 to 3 . What about themore extensive extension to ~T*? We are going to prove that it is also uniquewhen a further property is imposed on the extension . We - begin by definingtwo classes of special sets in . .DEFINITION 6 . Given the field of sets in Q, let


382 1 SUPPLEMENT : MEASURE AND INTEGRALBoth these collections belong to ~T because the Borel field is closed undercountable union and intersection, and these operations may be iterated, heretwice only, for each collection . If B E . ~b , then B belongs to both %os andz~bba because we can take B,,,,, = B . Finally, A E Avb if and only if A` E As,becausen /n U Bmn = U B mnIn nm nTheorem 4 .Let A E T * . There exists B E &b such thatA CB ; It* (A) = A* (B) .If u is a-finite on ~, then there exists C E Aba such thatC C A ; ,u*(C) = A*(A) .PROOF . For each m, there exists {Bmn } in ~T such thatPutA C U BmnnnB,n = U Bmn ;nA * (Bmn) :!~ A * (A) + 1 .mB = n Bmmthen A C B and B E A~b .We haveh * (B) < p * (Bm)


2 CHARACTERIZATION OF EXTENSIONS 1 383Since St n E ~ and Bn E aQ , it is easy to verify that S2„Bn E .7u s , by thedistributive law for the intersection with a union . PutIt is trivial that C E % SQ andConsequently, we haveC = U Q n B' nnnA = U S2 1A D C .nµ * (A) > µ*(C) > Jiminf/c * (S2 nBn)n= lim inf µ* (Q,,A) -,a* (A),nthe last equation owing to property (e) of the measure A* . Thus A* (A) =A* (C), and the assertion is proved .The measure A* on 3T* is constructed from the measure A on the field. The restriction of [c* to the minimal Borel field ~ containing ~?'b7- willhenceforth be denoted by A instead of µ* .In a general measure space (Q, v), let us denote by ..4"(fi', v) the classof all sets A in ~ with v(A) = 0 . They are called the null sets when -,6' andv are understood, or v-null sets when ry is understood . Beware that if A C Band B is a null set, it does not follow that A is a null set because A may notbe in ! This remark introduces the following definition .DEFINITION 7 . The measure space (Q, ~', v) is called complete iff anysubset of a null set is a null set .Theorem 5 . The following three collections of subsets of Q are idential :(i) A C S2 and the outer measure /.c * (A) = 0 ;(ii) A E ~T * and µ*(A) = 0 ;(iii) A C B where B E :% and µ(B) = 0 .It is the collection ( "( *, W) .PROOF . If µ* (A) = 0, we will prove A E :?X-7* by verifying the criterion(3) . For any Z C Q, we have by properties (a) and (b) of µ* :0 < µ*(ZA) < µ* (A) = 0 ; u * (ZA`) < /.t* (Z) ;


384 1 SUPPLEMENT : MEASURE AND INTEGRALand consequently by property (c) :g*(Z) = g * (ZA U ZA`) < g*(ZA) + g*(ZA`) < g* (Z) .Thus (3) is satisfied and we have proved that (i) and (ii) are equivalent .Next, let A E T * and g* (A) = 0 . Then we have by the first assertion inTheorem 4 that there exists B E 3;7 such that A C B and g* (A) = g (B) . ThusA satisfies (iii) . Conversely, if A satisfies (iii), then by property (b) of outermeasure : g* (A) < g* (B) = g(B) = 0, and so (i) is true .As consequence, any subset of a (ZT*, g*)-null set is a (*, g*)-null set .This is the first assertion in the next theorem .Theorem 6 . The measure space (Q, *, g*) is complete . Let (Q, 6-7 , v) be acomplete measure space ; §' D A and v = g on A . If g is a-finite on A then;J' D3 * and v=g* on :'* .PROOF . Let A E *, then by Theorem 4 there exists B E O and C Esuch that(8) C C A C B ; g(C) = g*(A) = A(B) .Since v =,u on z o , we have by Theorem 3, v = g on 3T . Hence by (8) wehave v(B - C) = 0 . Since A - C C B - C and B - C E ;6', and (Q, ;6', v) iscomplete, we have A - C E -6- and so A = C U (A - C) E .Moreover, since C, A, and B belong to , it follows from (8) thatA(C) = v(C) < v(A) < v(B) = g(B)and consequently by (8) again v(A) = g(A) . The theorem is proved .To summarize the gist of Theorems 4 and 6, if the measure g on the field~To is a-finite on A, then (3;7 , g) is its unique extension to ~ , and ( *, g* )is its minimal complete extension . Here one is tempted to change the notationgtogoonJ%!We will complete the picture by showing how to obtain (J~ *, g*) from( ~?T , g), reversing the order of previous construction . Given the measure space(Q, ZT, g), let us denote by the collection of subsets of 2 as follows : A E Ciff there exists B E I _(:T, g) such that A C B . Clearly has the "hereditary"property : if A belongs to C, then all subsets of A belong to ~ ; is also closedunder countable union . Next, we define the collection(9) 3 ={ACQIA =B-C where BE`,CEC} .where the symbol "-" denotes strict difference of sets, namely B - C = BC`where C C B . Finally we define a function µ on :-F as follows, for the A


2 CHARACTERIZATION OF EXTENSIONS 1 385shown in (9) :(10) A(A) = µ(B) .We will legitimize this definition and with the same stroke prove the monotonicityof µ . Suppose then(11) B1 - C1CB2 - C2, BiE9,CiE ,i=1,2 .Let C 1 C D E J '(Z,,7 , µ) . Then B 1 C B 2 U D and so µ(B 1 ) < µ(B 2 U D) =µ(B2)- When the C in (11) is "=", we can interchange B 1 and B2 to concludethat µ(B1) = µ(B2), so that the definition (10) is legitimate .Theorem 7 . ~F is a Borel field and µ is a measure on 7 .PROOF . Let A n E 3T, n E N ; so that A n = B J1 Cn as in (9) . We have then00( ' 'n n BnnU Cn .n=1 n=1 n=1Since the class -C is closed under countable union, this shows that ~ is closedunder countable intersection . Next let C C D, D E At(3~7 , µ) ; thenA`=B`UC=B`UBC=B`U(B(D-(D-C)))=(B`UBD)-B(D-C) .Since B(D - C) C D, we have B(D - C) E f; hence the above shows thatis also closed under complementation and therefore is a Borel field . Clearly~T D ~ because we may take C = 0 in (9) .To prove J is countably additive on 7, let JA„) be disjoint sets in7 . ThenA ll = Bn -C 11 , Bn E ~J , Cn E .There exists D in µ) containing U1 C n . Then {B n - D) aredisjoint and00 00 ccU (Bn - D) C U A n C U Bn .n=1 n=1 n=1All these sets belong to ?T and so by the monotonicity of µ :A < A (uAn< µ (uB)n .nn


386 1 SUPPLEMENT : MEASURE AND INTEGRALSince µ = µ on ~T, the first and third members above are equal to, respectively:A (u(B n - D) = l-, (Bn - l-t(Bn) µ(A,,) ;n n n nIt (u)Bn ) :!~ 1i(Bn) lT(An)nTherefore we havennµ ~UA n =nn710n)-Since µ(q) = µ(O - q) = /.t(O) = 0, µ is a measure on 9- .Corollary. In truth: 2 = * and µ = µ* .PROOF . For any A E *, by the first part of Theorem 4, there exists B Esuch thatA=B-(B-A), µ*(B-A)=0 .Hence by Theorem 5, B - A E and so A E ~T by (9) . Thus 97* c . SinceC * and { E * by Theorem 6, we have 9 7 C 97 * by (9) . Hence =*. It follows from the above that µ*(A) = µ(B) = µ(A) . Hence µ* = µ onThe question arises naturally whether we can extend µ from 5 todirectly without the intervention of * . This is indeed possible by a method oftransfinite induction originally conceived by Borel ; see an article by LeBlancand G .E . Fox: "On the extension of measure by the method of Borel", CanadianJournal of Mathematics, 1956, pp . 516-523 . It is technically lengthierthan the method of outer measure expounded here .Although the case of a countable space 2 can be treated in an obviousway, it is instructive to apply the general theory to see what happens .Let Q = N U w ; A is the minimal field (not Borel field) containing eachsingleton n in N, but not w . Let Nf denote the collection of all finite subsetsof N ; then `~o consists of Nf and the complements of members of Nf (withrespect to Q), the latter all containing w . Let 0 < µ(n) < oc for all n E N ; ameasure µ is defined on alb as follows :µ(A) = / Jµ(n) if A E Nf ; µ(A`) = µ(S2) - µ(A) .nEAWe must still define µ(Q) . Observe that by the properties of a measure, wehave µ(S2) > >fEN lc(n) = s, say .


3 MEASURES IN R 1 387Now we use Definition 3 to determine the outer measure µ* . It is easyto see that for any A C N, we haveIn particular A* (N) = s .A * (A) =Next we haveEAa (n)µ* (w) = inf /-t (A`) = µ(0) - sup µ(A) = µ(2) - sAENfAENfprovided s < oc ; otherwise the inf above is oo . Thus we haveµ * (w) _ A(Q) - s if µ(2) < oc ; µ* (a)) = oc if µ(2) = cc .It follows that for any A C S2 :µ * (A) = µ * (n)where µ* (n) = g (n) for n E N . Thus µ* is a measure on,/, namely, ~?T* = J .But it is obvious that ~T- = J since 3 contains N as countable union andso contains co as complement . Hence ~T = ~ * = J .If µ(S2) oc and s = oc, the extension µ* of a to J is not unique,because we can define µ((o) to be any positive number and get an extension. Thus µ is not 6-finite on A, by Theorem 3 . But we can verify thisdirectly when µ(S2) = oc, whether s = oc or s < oc . Thus in the latter case,A(c)) = cc is also the unique extension of µ from A to J . This means thatthe condition of a-finiteness on A is only a sufficient and not a necessarycondition for the unique extension .As a ramification of the example above, let Q = N U co, U cot , with twoextra points adjoined to N, but keep J,O as before. Then (_ *) is strictlysmaller than J because neither w l nor 0)2 belongs to it . From Definition 3 weobtainµ*(c ) 1 U cot) = Lc*(w1) = A*(w2)Thus µ* is not even two-by-two additive on .' unless the three quantitiesabove are zero . The two points coi and cot form an inseparable couple . Weleave it to the curious reader to wonder about other possibilities .nEA3 Measures in RLet R = (- oc, +oo) be the set of real members, alias the real line, with itsEuclidean topology. For -oo < a < b < +oc,(12) (a,b]={xER:a


3 8 8 1 SUPPLEMENT : MEASURE AND INTEGRALis an interval of a particular shape, namely open at left end and closed at rightend . For b = +oc, (a, +oo] = (a, +oo) because +oc is not in R . By choiceof the particular shape, the complement of such an interval is the union oftwo intervals of the same shape :(a, b]` = (-oc, a] U (b, oo] .When a = b, of course (a, a] = q5 is the empty set . A finite or countablyinfinite number of such intervals may merge end to end into a single one asillustrated below :°° 1 1(13) (0,21=(0,11U(1,21 ; (0,11=Un+1' nn=1Apart from this possibility, the representation of (a, b] is unique .The minimal Borel field containing all (a, b] will be denoted by 3 andcalled the Borel field of R . Since a bounded open interval is the countableunion of intervals like (a, b], and any open set in R is the countable unionof (disjount) bounded open intervals, the Borel field A contains all opensets ; hence by complementation it contains all closed sets, in particular allcompact sets . Starting from one of these collections, forming countable unionand countable intersection successively, a countable number of times, one canbuild up A3 through a transfinite induction .Now suppose a measure m has been defined on J6, subject to the soleassumption that its value for a finite (alias bounded) interval be finite, namelyif -oc < a < b < + oo, then(14) 0 < m((a, b]) < oc .We associate a point function F on R with the set function m on as follows :(15) F(0) = 0; F(x) = m((0, x]) for x > 0 ; F(x) = -m((x, 0]) for x < 0 .This function may be called the "generalized distribution" for m .F is finite everywhere owing to (14), andWe see that(16) m((a, b1) = F(b) - F(a) .F is increasing (viz . nondecreasing) in R and so the limitsF(+oc) = lim F(x) < +oo, F(-oo) = lim F(x) > -oox- +oo x-+ -ooboth exist . We shall write oc for +oc sometimes . Next, F has unilateral limitseverywhere, and is right-continuous :F(x-) < F(x) = F(x+) .


3 MEASURES IN R 1 389The right-continuity follows from the monotone limit properties (e) and (f )of m and the primary assumption (14) . The measure of a single point x isgiven bym(x) = F(x) - F(x-) .We shall denote a point and the set consisting of it (singleton) by the samesymbol .The simplest example of F is given by F(x) __ x . In this case F is continuouseverywhere and (14) becomesm((a, b]) = b - a .We can replace (a, b] above by (a, b), [a, b) or [a, b] because m(x) = 0 foreach x . This measure is the length of the line-segment from a to b . It wasin this classic case that the following extension was first conceived by SmileBorel (1871-1956) .We shall follow the methods in §§1-2, due to H . Lebesgue andC. Caratheodory . Given F as specified above, we are going to construct ameasure m on _1Z and a larger Borel field _1~6* that fulfills the prescription (16) .The first step is to determine the minimal field Ao containing all (a, b] .Since a field is closed under finite union, it must contain all sets of the formn(17) B- U I j, I j = (a j, b j ], 1< j< n ; n E N .j=1Without loss of generality, we may suppose the intervals I jmerging intersections as illustiated byto be disjoint, by(1, 31 U (2, 41 = (1, 41 .Then it is clear that the complement BC is of the same form . The union oftwo sets like B is also of the same form . Thus the collection of all sets likeB already forms a field and so it must be Ao . Of course it contains (includes)the empty set ¢ = (a, a] and R . However, it does not contain any (a, b) exceptR, [a, b), [a, b], or any single point!Next we define a measure m on Ao satisfying (16) . Since the condition(d) in Definition 2 requires it to be finitely additive, there is only one way :for the generic B in (17) with disjoint I j we must putn(18) m(B) = i i ) = (F(bj) - F(aj)) .j=1 j=1Having so defined m on Ao, we must now prove that it satisfies the condition(d) in toto, in order to proclaim it to be a measure on o . Namely, ifn


390 1 SUPPLEMENT : MEASURE AND INTEGRAL{B k , 1 < k < I < oc} is a finite or countable sequence of disjoint sets in A o,we must prove(19) m U B k =k=11k=m(Bk) .whenever I is finite, and moreover when I = oc and the union U k _ 1 Bk happensto be in : 3o .The case for a finite 1 is really clear . If each Bk is represented as in(17), then the disjoint union of a finite number of them is represented in asimilar manner by pooling together all the disjoint I1's from the Bk's . Then theequation (19) just means that a finite double array of numbers can be summedin two orders .If that is so easy, what is the difficulty when l = oo? It turns out, asBore] saw clearly, that the crux of the matter lies in the following fabulous"banality ."Borel's lemma .If -oo < a < b < +oc and(20) (a, b] = U(aj, b1],j=100where a j < b j for each j, and the intervals (a j , b j ] are disjoint, then we have(21) F(b) - F(a)00j=1(F(bj) - F(a j )) .PROOF . We will first give a transfinite argument that requires knowledgeof ordinal numbers . But it is so intuitively clear that it can be appreciatedwithout that prerequisite . Looking at (20) we see there is a unique indexj such that h j = b ; name that index k and rename ak as c1 . By removing(a k , bk] = (c 1 , b] from both sides of (20) we obtain(22) (a, c1] = U(a j , b j ]00J-1j#kThis small but giant step shortens the original (a, b] to (a, c1] . Obviously wecan repeat the process and shorten it to (a, c2] where a < c2 < c1 = b, and soby mathematical induction we obtain a sequence a < c„ < . . . < c2 < c1 = b .Needless to say, if for some n we have c„ = a, then we have accomplishedour purpose, but this cannot happen under our specific assumptionsbecause we have not used up all the infinite number of intervals in the union .


3 MEASURES IN R 1 39 1Therefore the process must go on ad infinitum . Suppose then c„ > c„+1 forall 17 E N, so that c u, = lim„ ~ c„ exists, then c u, > a . If c u, = a (which caneasily happen, see (13)), then we are done and (21) follows, although the termsin the series have been gathered step by step in a (possibly) different order .What if cu, > a? In this case there is a unique j such that b j = c w ; renamethe corresponding a j as c,,1 . We have now(23) (a, c.] =00U (0j, bj],j=1where the (a', b 'J's are the leftovers from the original collection in (20) afteran infinite number of them have been removed in the process . The interval(cc 1, cu,] is contained in the reduced new collection and we can begin a newprocess by first removing it from both sides of (23), then the next, to bedenoted by [cu,2, c,,,], and so on. If for some n we have c u,,, = a, then (21)is proved because at each step a term in the sum is gathered . Otherwise thereexists the limit lim ~ c u,,, = cww > a . If c,,, = a, then (21) follows in the limit .Otherwise c,,,,,, must be equal to some b j (why?), and the induction goes on .Let us spare ourselves of the cumbersome notation for the successive wellorderedordinal numbers . But will this process stop after a countable numberof steps, namely, does there exist an ordinal number a of countable cardinalitysuch that ca = a? The answer is "yes" because there are only countably manyintervals in the union (20) .The preceding proof (which may be made logically formal) reveals thepossibly complex structure hidden in the "order-blind" union in (20) . Borel inhis These (1894) adopted a similar argument to prove a more general resultthat became known as his Covering Theorem (see below) . A proof of the lattercan be found in any text on real analysis, without the use of ordinal numbers .We will use the covering theorem to give another proof of Borel's lemma, forthe sake of comparison (and learning) .This second proof establishes the equation (21) by two inequalities inopposite direction . i he first inequality is easy by considering the first n termsin the disjoint union (20) :nF(b) - F(a) (F(bj) - F(aj)) .j=1As n goes to infinity we obtain (21) with the "=" replaced by ">" .The other half is more subtle : the reader should pause and think why?The previous argument with ordinal numbers tells the story .


392 1 SUPPLEMENT : MEASURE AND INTEGRALBorel's covering theorem . Let [a, b] be a compact interval, and (a j , bj ),j E N, be bounded open intervals, which may intersect arbitrarily, such that(24)[a, b] C U (aj, b j) .j=1Then there exists a finite integer 1 such that when 1 is substituted for oo inthe above, the inclusion remains valid .In other words, a finite subset of the original infinite set of open intervalssuffices to do the covering . This theorem is also called the Heine-BorelTheorem ; see Hardy [1] (in the general Bibliography) for two proofs by Besicovitch.To apply (24) to (20), we must alter the shape of the intervals (aj , bj ] tofit the picture in (24) .Let -oo < a < b < oc ; and E > 0. Choose a' in (a, b),choose bj > b j such thatand for each j(25) F(a') - F(a) < 2 ; F(b') - F(b j ) < 2j+ 1 'These choices are possible because F is right continuous ; and now we have00[a', b] C U (a j ,j=1b')as required in (24) .Hence by Borel's theorem, there exists a finite l such thatt(26) [a', b] C U (aj , b') .j=1From this it follows "easily" that(27) F(b) - F(a') (F(bj ) - F(aj )) .j=1We will spell out the proof by induction on 1 . When l = 1 it is obvious .Suppose the assertion has been proved for l - 1, l > 2 . From (26) as written,there is k, 1 < k < 1, such that a k < a' < b k and so(28) F(bk) - F(d) < F'(bk) - F(ak) .


3 MEASURES IN R 1 393If we intersect both sides of (26) with the complement of (a k , bk ), we obtaini[b k , b] C U(aj, b j ) .j=1j#kHere the number of intervals on the right side is 1hypothesis we have- 1 ; hence by the inductionF(b) - F(bk ) -(F(b j ) - F(aj))j=1j#kAdding this to (28) we obtainfrom (27) and (25) that(27), and the induction is complete . It followsF(b) - F(a) (F(b j ) - F(a j )) + E .j=1Beware that the 1 above depends on E . However, if we change l to 00 (back toinfinity!) then the infinite series of course does not depend on E . Therefore wecan let c -* 0 to obtain (21) when the "=" there is changed to " _


3 9 4 1 SUPPLEMENT : MEASURE AND INTEGRALWe will proven(30) m(Ii) E m(I jk) .0Cr=1 j=1 k=1For n = 1, (29) is of the form (20) since a countable set of sets can beordered as a sequence . Hence (30) follows by Borel's lemma . In general,simple geometry shows that each I i in (29) is the union of a subcollection ofthe 1 jk's . This is easier to see if we order the Ii 's in algebraic order and, aftermerging where possible, separate them at nonzero distances . Therefore (30)follows by adding n equations, each of which results from Borel's lemma .This completes the proof of the countable additivity of m on ?A0, namely(19) is true as stipulated there for 1 = oo as well as l < oc .The general method developed in § 1 can now be applied to (R, J3o, m) .Substituting J3 o for zTb, m for t in Definition 3, we obtain the outer measurem* . It is remarkable that the countable additivity of m on A0, for which twopainstaking proofs were given above, is used exactly in one place, at the beginningof Theorem 1, to prove that m* = m on .730 . Next, we define the Borelfield 73* as in Definition 4 . By Theorem 6, (R, ?Z*, m*) is a complete measurespace . By Definition 5, m is 6-finite on Ao because (-n, n] f (-oo, oo) asn T oc and m((-n, n]) is finite by our primary assumption (14) . Hence byTheorem 3, the restriction of m* to J3 is the unique extension of m from J' oto J,3 .In the most important case where F(x) _- x, the measure m on J30 is thelength : m((a, b]) = b - a . It was Borel who, around the turn of the twentiethcentury, first conceived of the notion of a countably additive "length" on anextensive class of sets, now named after him : the Borel field J3 . A member ofthis class is called a Borel set . The larger Borel field J3* was first constructedby Lebesgue from an outer and an inner measure (see pp . 28-29 of maintext) . The latter was later bypassed by Caratheodory, whose method is adoptedhere . A member of A)* is usually called Lebesgue-measurable . The intimaterelationship between J3 and J3* is best seen from Theorem 7 .The generalization to a generalized distribution function F is sometimesreferred to as Borel-Lebesgue-Stieltjes . See §2 .2 of the main text for thespecial case of a probability distribution .The generalization to a Euclidean space of higher dimension presents nonew difficulty and is encumbered with tedious geometrical "baggage" .It can be proved that the cardinal number of all Borel sets is that of thereal numbers (viz . all points in R), commonly denoted by c (the continuum) .On the other hand, if Z is a Borel set of cardinal c with m(Z) = 0, suchas the Cantor ternary set (p . 13 of main text), then by the remark precedingTheorem 6, all subsets of Z are Lebesgue-measurable and hence their totalityI


4 INTEGRAL 1 395has cardinal 2C which is strictly greater than c (see e .g . [3]) . It follows thatthere are incomparably more Lebesgue-measurable sets than Borel sets .It is however not easy to exhibit a set in A* but not in A ; see ExerciseNo . 15 on p . 15 of the main text for a clue, but that example uses a non-Lebesgue-measurable set to begin with .Are there non-Lebesgue-measurable sets? Using the Axiom of Choice,we can "define" such a set rather easily ; see example [3] or [5] . However, PaulCohen has proved that the axiom is independent of the other logical axiomsknown as Zermelo-Fraenkel system commonly adopted in mathematics ; andRobert Solovay has proved that in a certain model without the axiom ofchoice, all sets of real numbers are Lebesgue-measurable . In the notation ofDefinition 1 in § 1 in this case, ?Z* = J and the outer measure m* is a measureon J .N .B . Although no explicit invocation is made of the axiom of choice inthe main text of this book, a weaker version of it under the prefix "countable"must have been casually employed on the q .t. Without the latter, allegedly it isimpossible to show that the union of a countable collection of countable setsis countable . This kind of logical finesse is beyond the scope of this book .4 IntegralThe measure space (Q, ~ , µ) is fixed . A function f with domain Q andrange in R* = [-oo, +oo] is called ~-measurable iff for each real number cwe have{f


1 SUPPLEMENT3 9 6: MEASURE AND INTEGRALto beDEFINITION 8(a) . For the basic function f in (31), its integral is defined(32) E(f) = aju(Aj)and is also denoted byf fdµ = f f(w)µ(dw) .s~If a term in (32) is O.oo or oo .0, it is taken to be 0 . In particular if f - 0, thenE(O) = 0 even if µ(S2) = oo . If A E ~ and µ(A) = 0, then the basic functionoc .lA + O .lAchas integral equal tooo .0 + 0.µ(A`) = 0 .We list some of the properties of the integral .(i) Let {Bj} be a countable set of disjoint sets in ?~`, with union S2 and {bj)arbitrary positive numbers or oc, not necessarily distinct . Then the function(33) bj 1B ;is basic, and its integral is equal tobj µ (Bj)iPROOF . Collect all equal bb's into aj and the corresponding Bb's into Ajas in (31) . The result follows from the theorem on double series of positiveterms that it may be summed in any order to yield a unique sum, possibly +oc .(ii) If f and g are basic and f < g, thenE(f) < E(g) .In particular if E(f) = +oo, then E(g) = +oo .PROOF . Let f be as in (31) and g as in (33) . The doubly indexed set{Ajfl Bk} are disjoint and their union is Q . We have using (i) :E(f) = aj µ(Ajfl Bk) ;E(g) = bkµ(Ai fl Bk) .i


4 INTEGRAL 1 397The order of summation in the second double series may be reversed, and theresult follows by the countable additivity of lt .(iii) If f and g are basic functions, a and b positive numbers, then a f +bg is basic andE(a f + bg) = aE(f) + bE(g) .PROOF . It is trivial that a f is basic andE(af) = aE(f) .Hence it is sufficient to prove the result for a = b = 1 .decomposition in (ii), we haveUsing the doubleE(f + g) = ( J + bk)A(Aj n Bk) .JSplitting the double series in two and then summing in two orders, we obtainthe result .It is good time to state a general result that contains the double seriestheorem used above and some other version of it that will be used below .Double Limit Lemma . Let {C Jk ; j E N, k E N} be a doubly indexed arrayof real numbers with the following properties :(a) for each fixed j, the sequence {C J k ; k E N) is increasing in k ;(b) for each fixed k, the sequence {CJk ; j E N} is increasing in j .Then we havelim T lira f CJk = lim T lim T C Jk < + oo .j k k jThe proof is surprisingly simple . Both repeated limits exist by fundamentalanalysis . Suppose first that one of these is the finite number C . Thenfor any E > 0, there exist jo and k o such that CJok0 > C - E . This impliesthat the other limit > C - e. Since c is arbitrary and the two indices areinterchangeable, the two limits must be equal . Next if the C above is +oo,then changing C - E into c -1 finishes the same argument .As an easy exercise, the reader should derive the cited theorem on doubleseries from the Lemma .Let A E '~ and f be a basic function . Then the productfunction and its integral will be denoted by(34) E(A ; f) = ff (w)lt(dw) = ff dµ .AAl A f is a basic


3 9 8 1 SUPPLEMENT : MEASURE AND INTEGRAL(iv) Let A n E h , A n C An+1 for all n and A = U„A n . Then we have(35) limE(A,, ; f) = E(A ; f ) .nPROOF . Denote f by (33), so that IA f = >j bJ l ABj . By (i),E (A ; f) = b j µ (ABA)with a similar equation where A is replaced by A,, . Since p.(A„B j ) T u.(ABj)as n T oo, and as m f oo, (35) follows by the double limittheorem .Consider now an increasing sequence {f n } of basic functions, namely,f n< f n+1 for all n . Then f = limn T f n exists and f E , but of course fneed not be basic ; and its integral has yet to be defined . By property (ii), thenumerical sequence E(f n ) is increasing and so lim n f E(f n ) exists, possiblyequal to +oo . It is tempting to define E(f) to be that limit, but we need thefollowing result to legitimize the idea .Theorem 8 . Let {fn} and 19n) be two increasing sequences of basic functionssuch that(36) lira T f n =lira T9,nn(everywhere in Q) . Then we have(37) lim T E(f n ) = Jim T E(gn ) .nnPROOF .Denote the common limit function in (36) by f and putA = {w E Q : f (w) > 0),then A E ZT . Since 0 < g n < f, we have lAcg n = 0 identically ; hence by property(iii) :(38) E(g,) = E(A ; g,,) -}- E(A` ; gn) = E(A ; gn ) .Fix an n and put for each k E N :Ak = CO E Q : f k (w) > n - 1 gn (w) .nSince f k < f k+1, we have A k C A k+1 for all k . We are going to prove that(39)00U Ak =A .k=1


n4 INTEGRAL 1 399If W E A k , then f (w) > f k(w) > [(n - In ]g, (w) > 0 ; hence w E A . On theother hand, if w E A thenl m T f k (w) = f (w) ->gn (w)and f (w) > 0 ; hence there exists an index k such thatf k (w) > n -n1 gn (w)and so w E Ak . Thus (39) is proved . By property (ii), sincen-1fk>- IA, f k >- IA, n gnwe haven-1E(fk) > E(Ak ;fk) ?nE(Ak ;gn) .Letting k T oo, we obtain by property (iv) :lkm T E(fk) >n-1lkmE(Ak ;gn)n-1 n-1E (A ; gn) =E (g,l )nnwhere the last equation is due to (38) . Now let n T oo to obtainhmf E(fk)>lnmTE(gn) .Since { f , } and {g,) are interchangeable, (37) is proved .Corollary . Let f , and f be basic functions such that f „ f f , then E (f n ) TE(f ) .PROOF. Take g n = f for all n in the theorem .The class of positive :T-measurable functions will be denoted by i+ .Such a function can be approximated in various ways by basic functions . It isnice to do so by an increasing sequence, and of course we should approximateall functions in J~+ in the same way . We choose a particular mode as follows .Define a function on [0, oo] by the (uncommon) symbol ) ] :)0] = 0 ; )oo] = oo ;)x]=n-l for xE(n-1,n},nEN .


400 1 SUPPLEMENT : MEASURE AND INTEGRALThus )Jr] = 3, )4] = 3 . Next we define for any f E :'~+ the approximatingsequence { f (""), III E N, by(40) f(m)(w) _)2m f (w)]2mEach f ("') is a basic function with range in the set of dyadic (binary) numbers :{k/2"'} where k is a nonnegative integer or oo. We have f (m) < f (m+1) forall m, by the magic property of bisection . Finally f (in ) T f owing to theleft-continuity of the function x -*)x] .DEFINITION 8(b) .For f E `+, its integral is defined to be(41) E(f) = lim f E(f 'm ) ) .mWhen f is basic, Definition 8(b) is consistent with 8(a), by Corollary toTheorem 8 . The extension of property (ii) of integrals to 3+ is trivial, becausef < g implies f (m) < g (m ) . On the contrary, (f + g) (m ) is not f (m) + g (m) , butsince f (' n) + g (m) T (f + g), it follows from Theorem 8 thatlim T E(f (m) + g (m) ) = lim finmE((f + g) (m) )that yields property (iii) for ~_+, together with E(a f (m ) ) T aE(f ), for a > 0 .Property (iv) for ~+ will be given in an equivalent form as follows .(iv) For f E ~+, the function of sets defined on .T byis a measure .A--* E(A ;f)PROOF . We need only prove that if A = Un°_ 1An where the A n 's aredisjoint sets in .-T, then00E(A ;f)= E(An ; .f)n=1~For a basic f, this follows from properties (iii) and (iv) . The extension tocan be done by the double limit theorem and is left as an exercise .L'~+There are three fundamental theorems relating the convergence of functionswith the convergence of their integrals . We begin with Beppo Levi'stheorem on Monotone Convergence (1906), which is the extension of Corollaryto Theorem 8 to :'i+ .


4 INTEGRAL 1 401Theorem 9 . Let If, } be an increasing sequence of functions in ~T- + withlimit f : f n T f . Then we haveThen we havelim T E(f n) = E(f) < + oo .nPROOF . We have f E ~+ ; hence by Definition 8(b), (41) holds . For eachf,, we have, using analogous notation :(42) lim f E(fnm) ) = E(f n ) .MSince f n T f , the numbers )2m f n (a))] T)2' f (w)] as n t oo, owing to theleft continuity of x -p )x] . Hence by Corollary to Theorem 8,(43) urn f E(fnm) ) = E(f (m) ) .It follows thatlmm T 1nm T E(f nm) ) = lmm f E(f (m)) = E(f ) .On the other hand, it follows from (42) thatlim f lim f E(f nm)) = lim t E(f n ) .n m nTherefore the theorem is proved by the double limit lemma .From Theorem 9 we derive Lebesgue's theorem in its pristine positiveguise .Theorem 10 . Let f n E F+ , n E N . Suppose(a) lim n f n = 0 ;(b) E(SUp n f n ) < 00 .(44)PROOF .Put for n E N :lim E (f n ) = 0 -n(45)gn = Sup f kk>nThen gn E ~T+ , and as n t oc, g n ,. lim sup„ f n = 0 by (a) ; and g i = sup,, f „so that E(gl) < oc by (b) .Now consider the sequence {g l - g n }, n E N . This is increasing with limitg i . Hence by Theorem 9, we havelim T E(gi - gn) = E(gl ) .n


402 1 SUPPLEMENT : MEASURE AND INTEGRALBy property (iii) for F+ ,E(g1 - gn) + E(9n) = E(gi ) .Substituting into the preceding relation and cancelling the finite E($1), weobtain E(g n ) J, 0 . Since 0 < f n < g n , so that 0 < E(fn) < E(gn ) by property(ii) for F+, (44) follows .The next result is known as Fatou's lemma, of the same vintage 1906as Beppo Levi's . It has the virtue of "no assumptions" with the consequentone-sided conclusion, which is however often useful .Theorem 11 . Let If,) be an arbitrary sequence of functions in f + . Thenwe have(46) E(lim inf f n ) < lim infE(f n ) .nnPROOF . Put for n E N :gn = inf f k,k>nthenlim inf f n = lim T gn .nnHence by Theorem 9,(47) E (lim inf fn ) = lim f E (gn ) .nnSince g, < f n, we have E(g„) < E(f n ) andlim inf E(9n) < lim infE(f n ) .nnThe left member above is in truth the right member of (47) ; therefore (46)follows as a milder but neater conclusion .We have derived Theorem 11 from Theorem 9 . Conversely, it is easy togo the other way . For if f „ f f , then (46) yields E(f) < lim n T E(f n ) . Sincef > f,,, E(f) > lim„ T E(f n ) ; hence there is equality .We can also derive Theorem 10 directly from Theorem 11 . Using thenotation in (45), we have 0 < gi - f n < gi . Hence by condition (a) and (46),that yields (44) .E(g1) = E(liminf(gi - fn)) < liminf(E(gl) - E(fn))nn= E(g1) - lim sup E(f)n


4 INTEGRAL I 403inThe three theorems 9, 10, and 11 are intimately woven .We proceed to the final stage of the integral . For any f E 3~`with range[-oo, +ool, putf+ __ f on {f >- 0}, f- _ f on {f < 0},0 on If < 0} ; 0 on If > 0} .Then f+ E 9+ , f - E~+ ,andBy Definition 8(b) and property (iii) :f = f+ - f - ; If I = f+ + f-(48) E(I f I) = E(f +) + E(f ) .DEFINITION 8(c) . For f E 3~7 , its integral is defined to be(49) E(f) = E(f +) - E(f ),provided the right side above is defined, namely not oc - oc . We say f isintegrable, or f E L', iff both E (f +) and E (f -) are finite ; in this case E (f )is a finite number . When E(f) exists but f is not integrable, then it must beequal to +oo or -oo, by (49) .A set A in 0 is called a null set iff A E .' and it (A) = 0 . A mathematicalproposition is said to hold almost everywhere, or a .e . iff there is a null set Asuch that it holds outside A, namely in A` .A number of important observations are collected below .Theorem 12 . (i) The function f in ~ is integrable if and only if I f I isintegrable ; we have(50) IE(f )I < E(If I) .(ii) For any f E 3 and any null set A, we have(51) E(A ;f)= f fdlc=0; E(f)=E(A` ;f)=fC fdlL .JJAfACIf f E L 1 , then the set {N E S2 : I f (w)I = oo} is a null set .(iv) If f E L', g E T, and IgI < If I a .e ., then g E L' .(v) If f E OT, g E .T, and g = f a .e ., then E(g) exists if and only if E(f )exists, and then E(g) = E(f ) .(vi) If µ(Q) < oc, then any a .e. bounded _37-measurable function is integrable.PROOF . (i) is trivial from (48) and (49) ; (ii) follows from1A If I< l A .oc


404 1 SUPPLEMENT : MEASURE AND INTEGRALso that0 < E(lAIf I) < E(lA .oo) = µ(A) .00 = 0 .This implies (51) .To prove (iii), letA(n) = {If I >_ n} .Then A (n)E ?~` andnµ(A(n)) = E(A(n) ; n) < E(A(n) ; IfI) < E(I f 1) .Hence(52) l-t(A(n)) < 1 E(I f 1) .nLetting n T oc, so that A(n) 4 {I f I = oo} ; since µ(A(1)) < E(I f I) < oo, wehave by property (f) of the measure µ :,a({If I = oo}) = Jim ~ µ(A(n)) = 0 .nTo prove (iv), let IgI < If I on Ac, where µ(A) = 0 . Thenand consequently191


4 INTEGRAL 1 405To prove this we may suppose, without loss of generality, that f E i+ . Wehave thenHenceE(Bk ; f) = E(Bk n A(n ) ; f) +- E(Bk n A(n )` ; f )< E(A(n ) ; f) + E(Bk )n .lim sup E(Bk ; P < E(A(n ) ; f )kand the result follows by letting n -+ oo and using (53) .It is convenient to define, for any f in ~T, a class of functions denotedby C(f ), as follows : g E C(f) iff g = f a .e. When (Q, ?T, it) is a completemeasure space, such a g is automatically in 4T . To see this, let B = {g 0 f } .Our definition of "a .e ." means only that B is a subset of a null set A ; in plainEnglish this does not say whether g is equal to f or not anywhere in A - B .However if the measure is complete, then any subset of a null set is also anull set, so that not only the set B but all its subsets are null, hence in ~T, .Hence for any real number c,{g-


406 1 SUPPLEMENT : MEASURE AND INTEGRALas follows :f++g- < g + + f -Applying properties (ii) and (iii) for +, we obtainE(f +) + E(g ) < E(g + ) + E(f ) .By the assumptions of L', all the four quantities above are finite numbers .Transposing back we obtain the desired conclusion .(iii) if f E L', g E L', then f + g E L 1 , andE(f + g) = E(f) + E(g) .Let us leave this as an exercise . If we assume only that both E(f) andE(g) exist and that the right member in the equation above is defined, namelynot (+oe) + (-oc) or (-oo) + (+oc), does E(f + g) then exist and equalto the sum? We leave this as a good exercise for the curious, and return toTheorem 10 in its practical form .Theorem 10 ° . Let f , E 7 ; suppose(a) lim„ f, = f a .e . ;(b) there exists ~O E L 1 such that for all n :If,, I< cp a .e .Then we have(c)lim„E(If„- f1)=0 .PROOF.observe first thatI lim f „ I < sup If n 1 ;n , 1Ifn-fI < Ifn1+IfI


5 APPLICATIONS 1 407Curiously, the best known partfixed B .of the theorem is the corollary below with aCorollary .We haveli nm fJaadµ=f f duauniformly in B E ~-T .This is trivial from (c), because, in alternative notation :IE(B ; f n) - E(B ; f)I < E(B ;Ifn - fl)


408 I SUPPLEMENT : MEASURE AND INTEGRALSuppose the infinite series >kuk(x) converges I ; then in the usual notation :limnk=1U(x) =xk=1uk(x) = s(x)exists and is finite . Now suppose each ukkby property (iii) of the integral ; andis Lebesgue-integrable, then so is each s,,,Is„ (x) dx =k=1Iuk (x) dx .Question : does the numerical series above converge? and if so is the sum of integralsequal to the integral of the sum :E fUk(X)dX = u k (x) dx s(x) dx?= Jk=1 / k=1 /This is the problem of integration term by term .A very special but important case is when the interval I = [a, b] is compactand the functions Uk are all continuous in I . If we assume that the series Ek_ 1 u k (x)converges uniformly in I, then it follows from elementary analysis that the sequenceof partial sums {s„ (x)} is totally bounded, that is,sup sup IS, (x) I = sup sup IS, (x) I < 00 .n x x nSince m(I) < oc, the bounded convergence theorem applies to yieldlim/ s„ (x) dx = / lim s,, (x) dx.n / / "The Taylor series of an analytic function always converges uniformly and absolutelyin any compact subinterval of its interval of convergence . Thus the result aboveis fruitful .Another example of term-by-term integration goes back to Theorem 8 .Example 2 . Let u k > 0, u k E L', then(54)a~b 00bk=1 k= ~ 1 fE uk uk dlt .Let f „ = EL 1 uk, then f „ E L', f ,, T f u k . Hence by monotone convergenceE(f) = limE(f„)1rthat is (54) .


5 APPLICATIONS 1 409When Uk is general the preceding result may be applied to Iuk I to obtainCfIukId/L .k=1 l uk l / dp k=1If this is finite, then the same is true when Iuk I is replaced by uA and u- . It thenfollows by subtraction that (54) is also true . This result of term-by-term integrationmay be regarded as a special case of the Fubini-Tonelli theorem (pp . 63-64), whereone of the measures is the counting measure on N .For another perspective, we will apply the Borel-Lebesgue theory of integral tothe older Riemann context .Example 3 . Let (I, A*, m) be as in the preceding example, but let I = [a, b] becompact . Let f be a continuous function on I . Denote by P a partition of I as follows :a=xo


41 0 1 SUPPLEMENT: MEASURE AND INTEGRALThe finite existence of the limit above signifies the Riemann-integrability of f, and thelimit is then its Rienwnn-integral f b f (x) dx . Thus we have proved that a continuousfunction on a compact interval is Riemann-integrable, and its Riemann-integral is equalto the Lebesgue integral . Let us recall that in the new theory, any bounded measurablefunction is integrable over any bounded measurable set . For example, the function1sin -, x E (0, 1]Xbeing bounded by 1 is integrable . But from the strict Riemannian point of view ithas only an "improper" integral because (0, 1] is not closed and the function is notcontinuous on [0, 1], indeed it is not definable there . Yet the limitlEmE1sin 1 dxXexists and can be defined to be f sin (1/x) dx, As a matter of fact, the Riemann sumsdo converge despite the unceasing oscillation of f between 0 and 1 as x J, 0 .Example 4 . The Riemann integral of a function on (0, oo) is called an "infiniteintegral" and is definable as follows :00nfo n-'°° of (x) dx = lim f (x) dxwhen the limit exists and is finite . A famous example issinx(56) f (x) = , x E (0, oo) .XThis function is bounded by 1 and is continuous . It can be extended to [0, oo) bydefining f (0) = 1 by continuity . A cute calculation (see §6 .2 of main text) yields theresult (useful in Optics) :n fn"sin x nlim x dx = 2*oBy contrast, the function I f I is not Lebesgue-integrable . To show this, we usetrigonometry :sinx > 1 13n-=x ) - / (2n + 1)ir = C n for X E (2n7r + 4 , 2nJr + 4 I n .Thus for x > 0 :00xn=1C n l,n (x) .The right member above is a basic function, with its integral :Y: Cnm(I n )11n(2n + 1)2 =


log ~ _ .d~ f~ - dx = log ,5 APPLICATIONS 1 4 1 1It follows that E(f +) _ + oo . Similarly E(f -) = +oo ; therefore by Definition 8(c)E(f) does not exist! This example is a splendid illustration of the followingNon-Theorem .Let f E-,0 and f n = f 1 (o,,, ), n E N . Then f, E-10 and f, --*f as n-+ 00 .Even when the f n s are "totally bounded", it does not follow that(57) lim E(f n ) = E(f ) ;nindeed E(f) may not exist .On the other hand, if we assume, in addition, either (a) f > 0 ; or (b) E(f)exists, in particular f E L' ; then the limit relation will hold, by Theorems 9 and 10,respectively . The next example falls in both categories .Example 5. The square of the function f in (56) :sinx 2f(x)2= ,( xXERis integrable in the Lebesgue sense, and is also improperly integrable in the Riemannsense .We have2.f (x) < 1(-1,+I)11~-~,-i)ut+i,+~)x2and the function on the right side is integrable, hence so is f 2 .Incredibly, we have°° smx 2 7t °° sinxdx = - = (RI) dx,o C- x ! 2 o xwhere we have inserted an "RI" to warn against taking the second integral as aLebesgue integral . See §6 .2 for the calculation . So far as I know, nobody has explainedthe equality of these two integrals .Example 6 . The most notorious example of a simple function that is not Riemannintegrableand that baffled a generation of mathematicians is the function 1 Q, where Qis the set of rational numbers . Its Riemann sums can be made to equal any real numberbetween 0 and 1, when we confine Q to the unit interval (0, 1) . The function is sototally discontinuous that the Riemannian way of approximating it, horizontally so tospeak, fails utterly . But of course it is ludicrous even to consider this indicator functionrather than the set Q itself. There was a historical reason for this folly : integration wasregarded as the inverse operation to differentiation, so that to integrate was meant to"find the primitive" whose derivative is to be the integrand, for example,~2 d ~2xdx=I2 , d2 -~ ;


4 12 1 SUPPLEMENT : MEASURE AND INTEGRALA primitive is called "indefinite integral", and f2 (1/x)dx e .g . is called a "definiteintegral ." Thus the unsolvable problem was to find fo'~ I Q (x) dx, 0 < ~ < 1 .The notion of measure as length, area, and volume is much more ancient thanNewton's fluxion (derivative), not to mention the primitive measure of counting withfingers (and toes) . The notion of "countable additivity" of a measure, although seeminglynatural and facile, somehow did not take hold until Borel saw thatm(Q) m(q) = =0 .qEQThere can be no question that the "length" of a single point q is zero . Euclid gave it"zero dimension" .This is the beginning of MEASURE. An INTEGRAL is a weighted measure,as is obvious from Definition 8(a) . The rest is approximation, vertically as in Definition8(b), and convergence, as in all analysis .As for the connexion with differentiation, Lebesgue made it, and a clue is givenin §1 .3 of the main text .qEQ


General bibliography[The five divisions below are merely a rough classification and are not meant to bemutually exclusive .]1 . Basic analysis[1] G . H . Hardy, A course of pure mathematics, 10th ed. Cambridge UniversityPress, New York, 1952 .[2] W . Rudin, Principles of mathematical analysis, 2nd ed . McGraw-Hill BookCompany, New York, 1964 .[3] I . P . Natanson, Theory of functions of a real variable (translated from theRussian) . Frederick Ungar Publishing Co ., New York, 1955 .2 . Measure and integration[4] P . R . Halmos, Measure theory . D . Van Nostrand Co ., Inc ., Princeton, N .J .,1956 .[5] H . L . Royden, Real analysis . The Macmillan Company, New York, 1963 .[6] J . Neveu, Mathematical foundations of the calculus of probability . Holden-Day, Inc ., San Francisco, 1965 .3 . Probability theory[7] Paul Levy, Calcul des probabilites . Gauthier-Villars, Paris, 1925 .[8] A . Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung . Springer-Verlag, Berlin, 1933 .


414 ( GENERAL BIBIOGRAPHY[9] M . Frechet, Generalizes sur les probabilites . Variables aleatoires. Gauthier-Villars, Paris, 1937 .[10] H . Cramer, Random variables and probability distributions, 3rd ed. CambridgeUniversity Press, New York, 1970 [1st ed ., 1937] .[11] Paul Levy, Theorie de l'addition des variables aleatoires, 2nd ed . Gauthier-Villars, Paris, 1954 [1st ed ., 1937] .[12] B . V . Gnedenko and A . N . Kolmogorov, Limit distributions for sums of independentrandom variables (translated from the Russian) . Addison-Wesley PublishingCo ., Inc ., Reading, Mass ., 1954 .[13] William Feller, An introduction to probability theory and its applications,vol . 1 (3rd ed .) and vol . 2 (2nd ed .) . John Wiley & Sons, Inc ., New York, 1968 and1971 [1 st ed . of vol . 1, 1950] .[14] Michel Loeve, Probability theory, 3rd ed . D . Van Nostrand Co ., Inc ., Princeton,N .J ., 1963 .[15] A . Renyi, Probability theory . North-Holland Publishing Co ., Amsterdam,1970 .[16] Leo Breiman, Probability . Addison-Wesley Publishing Co ., Reading, Mass .,1968 .4 . Stochastic processes[17] J . L . Doob, Stochastic processes . John Wiley & Sons, Inc ., New York, 1953 .[18] Kai Lai Chung, Markov chains with stationary transition probabilities, 2nded . Springer-Verlag, Berlin, 1967 [1st ed ., 1960] .[19] Frank Spitzer, Principles of random walk . D. Van Nostrand Co ., Princeton,N .J ., 1964 .[20] Paul-Andre Meyer, Probabilites et potential . Herman (Editions Scientifiques),Paris, 1966 .[21] G . A . Hunt, Martingales et processus de Markov . Dunod, Paris, 1966 .[22] David Freedman, Brownian motion and diffusion . Holden-Day, Inc ., SanFrancisco, 1971 .5 . Supplementary reading[23] Mark Kac, Statistical independence in probability, analysis and numbertheory, Carus Mathematical Monograph 12 . John Wiley & Sons, Inc ., New York,1959 .[24] A . Renyi, Foundations of probability . Holden-Day, Inc ., San Francisco,1970 .[25] Kai Lai Chung, Elementary probability theory with stochastic processes .Springer-Verlag, Berlin, 1974 .


IndexAbelian theorem, 292Absolutely continuous part of d .f., 12Adapted, 335Additivity of variance, 108Adjunction, 25a .e ., see Almost everywhereAlgebraic measure space, 28Almost everywhere, 30Almost invariant, permutable, remote, trivial,266Approximation lemmas, 91, 264Arcsin law, 305Arctan transformation, 75Asymptotic density, 24Asymptotic expansion, 236Atom, 31, 32Augmentation, 32Axiom of continuity, 22moo, /3, ', 389Ballot problem, 232Basic function, 395Bayes' rule, 320Belonging (of function to B .F .), 263Beppo Levi's theorem, 401ABBernstein's theorem, 200Berry-Esseen theorem, 235B .F ., see Borel fieldBi-infinite sequence, 270Binary expansion, 60Bochner-Herglotz theorem, 187Boole's inequality, 21Borel-Cantelli lemma, 80Borel field, 18generated, 18Borel field on R, 388, 394Borel-Lebesgue measure, 24, 389Borel-Lebesgue-Stiltjes measure, 394Borel measurable function, 37Borel's Covering Theorem, 392Borel set, 24, 38, 394Borel's Lemma, 390Borel's theorem on normal numbers, 109Boundary value problem, 344Bounded convergence theorem, 44Branching process, 371Brownian motion, 129, 227CK, Co, CB, C, 91CB , Cu, 157C


4 1 6 1 INDEXCanonical representation of infinitelydivisible laws, 258Cantor distribution, 13-15, 129, 174Caratheodory criterion, 28, 389, 394Carleman's condition, 103Cauchy criterion, 70Cauchy distribution, 156Cauchy-Schwartz inequality, 50, 317Central limit theorem, 205classical form, 177for dependent r .v .'s, 224for random number of terms, 226general form, 211Liapounov form, 208Lindeberg-Feller form, 214Lindeberg's method, 211with remainder, 235Chapman-Kolmogorov equations, 333Characteristic function, 150list of, 155-156multidimensional, 197Chebyshev's inequality, 50strengthened, 404ch .f., see Characteristic functionCoin-tossing, 111Complete probability space, 30Completely monotonic, 200Concentration function, 160Conditional density, 321Conditional distribution, 321Conditional expectation, 313as integration, 316as martingale limit, 369Conditional independence, 322Conditional probability, 310, 313Continuity interval, 85, 196Convergence a .e ., 68in dist., 96in LP, 71in pr ., 70weak in L 1 , 73Convergence theorem :for ch .f., 169for Laplace transform, 200Convolution, 152, 157Coordinate function, 58, 264Countable, 4Dde Finetti's theorem, 367De Moivre's formula, 220Density of d .f., 12d .f., see Distribution functionDifferentiation :of ch .f., 175of d .f., 163Diophantine approximation, 285Discrete random variable, 39Distinguished logarithm, nth root, 255Distribution function, 7continuous, 9degenerate, 7discrete, 9n-dimensional, 53of p.m., 30Dominated convergence theorem, 44Dominated ergodic theorem, 362Dominated r.v ., 71Doob's martingale convergence, 351Doob's sampling theorem, 342Double array, 205Double limit lemma, 397Downcrossing inequality, 350Egorov's theorem, 79Empiric distribution, 138Equi-continuity of ch .f.'s, 169Equivalent sequences, 112Event, 54Exchangeable, event, 366Existence theorem for independent r .v .'s, 60Expectation, 41Extension theorem, see Kolmogorov'sextension theorem*, 380,T* -measurability, 378approximation of -IF* -measurable set, 382Fatou's lemma, 45, 402Feller's dichotomy, 134Field, 17Finite-product set, 61First entrance time, 273Fourier series, 183Fourier-Stieltjes transform, 151of a complex variable, 203Fubini's theorem, 63Functionals of S,,, 228Function space, 265EF


INDEX 1 417GGambler's ruin, 343Gambling system, 273, 278, see also OptionalsamplingGamma distribution, 158Generalized Poisson distribution, 257Glivenko-Cantelli theorem, 140Harmonic analysis, 164Harmonic equation, 344Harmonic function, 361Helley's extraction principle, 88Holder's inequality, 50Holospoudicity, 207Homogeneous Markov chain, process,333Identically distributed, 37Independence, 53characterizations, 197, 322Independent process, 267Indicator, 40In dist., see In distributionIn distribution, 96Induced measure, 36Infinitely divisible, 250Infinitely often, 77, 360Infinite product space, 61Information theory, 148, 414Inner measure, 28In pr ., see In probabilityIn probability, 70Integrable random variable, 43Integrable function, 403Integral, 396, 400, 403additivity, 406convergence theorems, 401, 406-408Integrated ch .f., 172Integration term by term, 44, 408Invariance principle, 228Invariant, 266Inverse mapping, 35Inversion formula for ch .f., 161, 196Inversions, 220i .o ., see Infinitely oftenHIJensen's inequality, 50, 317Jump, 3Jumping part of d.f., 7Kolmogorov extension theorem, 64, 265Kolmogorov's inequalites, 121, 125Kronecker's lemma, 129Laplace transform, 196Large deviations, 214, 241Lattice d.f., 183Law of iterated logarithm, 242, 248Law of large numbersconverse, 138for pairwise independent r.v .'s, 134for uncorrelated r.v .'s, 108identically distributed case, 133, 295, 365strong, 109, 129weak, 114, 177Law of small numbers, 181Lebesgue measurable, 394Lebesgue's dominated and boundedconvergence, 401, 406Levi's monotone convergence, 401Levy-Cramer theorem, 169Levy distance, 98Levy's formula, see Canonical representationLevy's theorem on series of independentr.v .'s, 126strengthened, 363Liapounov's central limit theorem, 208Liapounov's inequality, 50Liminf, Limsup, 75Lindeberg's method, 211Lindeberg-Feller theorem, 214Lower semicontinuous function, 95Marcinkiewicz-Zygmund inequality, 362Markov chain, homogeneous, 333Markov process, 326homogeneous, 330or rth order, 333Markov property, 326strong, 327JKLM


4 1 8 1 INDEXMartingale, 335, see also Submartingale,Supermartingaledifference sequence, 337Krickeberg's decomposition, 358of conditional expectations, 338stopped, 341Maximum of S,,, 228, 299Maximum of submartingale, 346, 362M .C ., see Monotone classm-dependent, 224Mean, 49Measurable with respect to 3 , 35MeasureCompletion, 384Extension, 377, 380a-finite, 381Transfinite construction, 386Uniqueness, 381Measure space, 381discrete, 386Median, 117Method of moments, 103, 178Metricfor ch .f.'s, 172for convergence in pr ., 71for d .f.'s, 98for measure space, 47for p.m .'s, 172Minimal B .F ., 18Minkowski's inequality, 50Moment, 49Moment problem, 103Monotone class theorems, 19Monotone convergence theorem, 44, 401Monotone property of measure, 21Multinomial distribution, 181n-dimensional distribution, 53Negligibility, 206Non-measurable set, 395Normal d .f., 104characterization of, 180, 182convergence to, see Central limit theoremkernel, 157positive, 104, 232Normal number, 109, 112Norming, 206Null set, 30, 383Number of positive terms, 302Number of zeros, 234NOptional r.v ., 259, 338Optional sampling, 338, 346Order statistics, 147Orthogonal, 107Outcome, 57Outer measure, 28, 375, 377-378Pairwise independence, 53Partition, 40Permutable, 266p.m ., see Probability measurePoisson d .f., 194Poisson process, 142P61ya criterion, 191P61ya distribution, 158Positive definite, 187Positive normal, see NormalPost-a, 259Potential, 359, 361Pre-a, 259Probability distribution, 36on circle, 107Probability measure, 20of d.f., 30Probability space, triple, 23Product B .F . measure, space, 58Projection, 64Queuing theory, 307Radon-Nikodym theorem, 313, 369Random variable, 34, see also Discrete,integrable, simple, symmetric randomvariableRandom vector, 37Random walk, 271recurrent, 284Recurrence principle, 360Recurrent value, 278Reflection principle, 232Remainder term, 235Remote event, field, 264Renewal theory, 142Riemann integral, 4090PQR


INDEX 1 419improper, infinite, 410primitive, 411Riemann-Lebesgue lemma, 167Riemann-Stieltjes integral, 198Riemann zeta function, 259Riesz decomposition, 359Right continuity, 5r .v ., see Random variableSample function, 141Sample space, 23discrete, 23s .d .f., see Subdistribution functionSecond-order stochastic process, 191Sequentially vaguely compact, 88Set of convergence, 79Shift, 265optional, 273Simple random variable, 40Singleton, 16Singular d .f., 12measure, 31Singular part of d.f., 12Smartingale, 335closeable, 359Span, 184Spectral distribution, 191St . Petersburg paradox, 120, 373Spitzer's identity, 299Squared variation, 367Stable distribution, 193Standard deviation, 49Stationary independent process, 267Stationary transition probabilities, 331Step function, 92Stochastic process, with discrete parameter,263Stone -Weierstrass theorem, 92, 198Stopped (super) martingale, 340-341Stopping time, see Optional samplingSubdistribution function, 88Submartingale, 335convergence theorems for, 350convex transformation of, 336Doob decomposition, 337inequalities, 346Subprobability measure, 85Subsequences, method of, 109Superharmonic function, 361SSupermartingale, 335optional sampling, 340Support of d.£, 10Support of p.m ., 32Symmetric difference (o), 16Symmetric random variable, 165Symmetric stable law, 193, 250Symmetrization, 155System theorem, see Gambling system,SubmartingaleTauberian theorem, 292Taylor's series, 177Three series theorem, 125Tight, 95Topological measure space, 28Trace, 23Transition probability function, 331Trial, 57Truncation, 115Types of d .f., 184Uncorrelated, 107Uniform distribution, 30Uniform integrability, 99Uniqueness theoremfor ch .f., 160, 168for Laplace transform, 199, 201for measure, 30Upcrossing inequality, 348TUVVague convergence, 85, 92general criterion, 96Variance, 49WWald's equation, 144, 345Weak convergence, 73Weierstrass theorem, 145Wiener-Hopf technique, 291Zero-or-one lawHewitt-Savage, 268Kolmogorov, 267L6vy, 357Z

More magazines by this user
Similar magazines