12.07.2015 Views

Anderson's lemma

Anderson's lemma

Anderson's lemma

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Appendix AAnderson’s <strong>lemma</strong>Examples?Suppose f is a symmetric, unimodal density on the real line and J =[−α, α] is an interval. Let θ ⊕ J = [−α + θ,α + θ] be the translate of J by θ.For which θ is∫F(θ) = {x ∈ θ ⊕ J} f (x) dxa maximum? Symmetry suggests θ = 0 as the maximizing value, which is easyto verify by simple Calculus:dF(θ) = f (α + θ)− f (−α + θ)dθ= 0 when α + θ =−(−α + θ).The multidimensional analog of the problem has just as intuitive a solution, butthe proof is much harder.symm Definition. Say that f is symmetric (about the origin) if f (−x) = f (x)for all x. Say that a set C is symmetric (about the origin) if −x ∈ C wheneverx ∈ C. (That is, C is symmetric if its indicator is a symmetric function.) Afunction f on R k is said to be bowl-shaped if each set { f ≤ t} is symmetricand convex.The main result of this Appendix is the solution to a minimization problemthat is needed to establish minimax properties of estimators for a Gaussian shiftfamily, as in Appendix B/.anderson Lemma. For a nonnegative, bowl-shaped function ρ and Z distributedN(0, I k ), the expectation Pρ(Z + θ) is minimized when θ = 0.Intuitively, the minimum must occur when the bottom of the bowl sits atthe point of highest normal density.The Lemma is a special case of a more general assertion about integrals.convex.min Theorem. Let f and g be nonnegative measurable functions on R k suchthat both { f ≥ t} and {g ≤ t} are symmetric, convex sets for each t ∈ R + .Then the function∫∫G(θ) = g(z + θ)f (z) dz = g(z) f (z − θ)dzis minimized when θ = 0.The proof of the Theorem depends upon a strange-looking inequalityinvolving sums of sets,A ⊕ B ={x + y : x ∈ A, y ∈ B}Even if both A and B are Lebesgue measurable, there is no guarantee thatA ⊕ B is also Lebesgue measurable. To be precise, I should use inner measurewith respect to k-dimensional Lebesgue measure λ k .Asymptopia: 23 March 2001 c○David Pollard 1


Appendix AAnderson’s <strong>lemma</strong>Brunn.Minkowski Brunn-Minkowski Inequality. For measurable subsets A and B of R k ,(λ k (A ⊕ B)) 1/k ≥ (λ k A) 1/k + (λ k B) 1/k .Proof. Consider the case k = 2. The proof for higher dimensions differs onlyin notational details. Abbreviate λ 2 to λ.The argument develops in four steps. First dispose of the easy case whereboth A and B are open rectangles, with sides parallel to the coordinate axes.Call such sets coordinate rectangles. Then argue by induction to extend theresult to finite unions of coordinate rectangles. Approximate by finite opencovers to extend to the case of compact A and B, then complete the proof byan inner approximation.Case (i) Suppose A and B are both coordinate rectangles, with sidelengths l a ,w a and l b ,w b . The sum A ⊕ B is a rectangle with side lengthsl a + l b ,w a + w b . The asserted inequality becomes√(la + l b )(w a + w b ) ≥ √ l a w a + √ l b w b .By homogeneity we may assume w a + w b = 1 = l a + l b , so that the left-handside becomes 1. By virtue of the inequality arithmetic mean ≥ geometric mean(that is, the concavity of the logarithm function), the right-hand side is less than1/ 2 (l a + w a ) + 1 / 2 (l b + w b ) = 1,as asserted.Case (ii) Suppose A is a union of m disjoint open rectangles with sidesparallel to the coordinate axes, and B is a similar union of n rectangles. Argueby induction on m + n. Case (i) covers m + n = 2. So suppose m > 1.If two open coordinate rectangles do not intersect, they must lie on oppositesides of some line parallel to either the x- ory-axis. Thus we may assume,with no loss of generality, that at least one of the A rectangles lies in each ofthe open half-space to the left and right of a line x = x A .DefineA − = A ∩{x < x A },A + = A ∩{x > x A }.θ = λA+λAChoose x B so that the sets B − = B ∩{x < x B } and B+ =B ∩{x > x B } alsohave Lebesgue measures with λB + = θλB. That is, the line x = x B divides Bin the same proportions as the line x = x A divides A.Notice that each of A − and A + is a disjoint union of at most m − 1 openrectangles; each of B − and B + is a disjoint union of at most n open rectangles.The sum A − ⊕ B − lies in {x < x A + x B }; the sum A + ⊕ B + lies in thedisjoint halfspace {x > x A + x B }; and each is a subset of A ⊕ B. By disjointnessand the inductive hypothesis,λ(A ⊕ B) ≥ λ(A − ⊕ B − ) + λ(A + ⊕ B + )(√≥ λA− + √ ) 2 (√λB − + λA+ + √ ) 2λB +( √1 √ √ √ ) 2 (√ √ √ √ ) 2= − θ λA + 1 − θ λB) + θ λA + θ λB= (1 − θ + θ)(√λA +√λB) 2.Take square roots to complete the argument.2 Asymptopia: 23 March 2001 c○David Pollard


Appendix AAnderson’s <strong>lemma</strong>Case (iii) Suppose both A and B are compact. Approximate by sets of theform appearing in case (ii) by arguing as follows. Fix δ>0. LetA δ ={z : d(z, A) 0. Let D s ={g ≤ s}. Each C t and D s is asymmetric, convex set, with λ k (C t ) 0. By Corollary ,(λ k (θ ⊕ Ct ) ∩ Ds c )= λk C t − λ k ((θ ⊕ C t ) ∩ D s )Asymptopia: 23 March 2001 c○David Pollard 3


Appendix AAnderson’s <strong>lemma</strong>is minimized at θ = 0. It follows that∫∫∫G(θ) = {0 < t ≤ f (z − θ)}{0 < s < g(z)} ds dt dz∫∫∫= {z ∈ θ ⊕ C t }{z ∈ Ds c } ds dt dz∫∫= λ k (θ ⊕ C t ) ∩ Ds c ds dtis minimized when θ = 0.For the general case, replace f by the function f R (z) = f (z){|z| ≤R}, towhich the result just established applies, and then pass to the limit as R →∞.NotesCheck Anderson’s original paper: Proc. Amer. Math. Soc. 6 (1955),170-176.Cite Federer for proof of B-M.4 Asymptopia: 23 March 2001 c○David Pollard


Appendix BGaussian shift familiesThis appendix will be expanded. It is only a sketch at the moment. I mightchange the definition of bowl-shaped, by dropping the compactness part.The collection of distributions {N(δ, I k ) : δ ∈ R k } is called a Gaussianshift family. Say that a nonnegative function ρ is bowl-shaped functionif it is lower semicontinuous, and all the sets {L ≤ t} are closed (by lowersemicontinuity), convex, and symmetric about the origin.The minimax criterion seeks a randomized estimator, defined by a familyof probability kernels {K z : z ∈ R k } on R k , to minimizesup Qδ z K z y L(y − δ) ≥ Qz 0 L(z).δ∈R kA proof to show that the minimax estimator takes K z degenerate at z dependson an invariance argument, that makes limiting sense of the idea of a uniformprobability distribution on R k . Consider first an heuristic argument for a specialcase.minimax.heuristic Example. Suppose X is an observation from the N(θ, 1) distribution, with θan unknown real parameter. Let ρ(·) be a nonnegative loss function that growsslowly enough to make P 0 ρ(X) finite, and such that {ρ ≤ t} is an intervalsymmetric about the origin, for each t strictly less than sup ρ. For example,ρ(x) = x 2 or ρ(x) = min(1, x 2 ) would both be acceptable.A minimax estimator for θ is a measurable function T (X) that minimizesthe maximum expected loss,sup P θ ρ(T (X) − θ).θHere is an heuristic argument to show that T (X) = X is minimax, based on theblatantly false assertion that Lebesgue measure on the real line has total massone.First note thatP θ ρ(X − θ) = P 0 ρ(X) for all θ,so that the supremum over θ has no effect when T (X) = X. For the generalT (X), the supremum is greater than∫P θ ρ(T (X) − θ)dθ because ∫ dθ = 1?!∫∫= φ(x)ρ(T (x + θ)− θ)dθ dx where φ = N(0, 1) density∫∫= φ(x)ρ(T (θ) − θ + x) dθ dx by invariance.Asymptopia: 23 March 2001 c○David Pollard 1


Appendix BGaussian shift familiesInterchange the order of integration then consider the inner integral. I hope youwill find it obvious (if not, you could resort to Calculus or Anderson’s Lemmafrom Appendix A/) that ∫ φ(x)ρ(x + β)dx achieves its minimum when β = 0.The inner integral is smallest when T (θ) = θ for all θ. One more applicationof the false heuristic assumption about Lebesgue measure then leads to∫∫∫sup P θ ρ(T (X) − θ) ≥ φ(x)ρ(x) dx dθ = P 0 ρ(X),θ□ as required for the minimax property.almost.unif The same heuristic argument would work in higher dimensions, althoughthe obvious fact about minimization of the inner integral might no longerseem so obvious. A rigorous proof of the fact is actually quite difficult; it wasestablished by Anderson (1955).To replace the false heuristic assumption concerning Lebesgue measure bya rigorous argument, we must work with a sequence of probability measuresthat are almost invariant in an asymptotic sense.Let ψ denote the uniform probability density over the unit ball in R k .Write λ k for Lebesgue measure on R k .Define(t) = λ x |ψ(x + t) − ψ(x)| =C k λ k |{|x + t| ≤1}−{|x| ≤1}|,where C k denotes the Lebesgue measure of the unit ball. Clearly 0 ≤ ≤ 2,and (t) → 0ast → 0. Let m σ denote the probability measure with densityσ −k ψ(x/σ ). Intuitively, as σ →∞, the measure behaves increasingly like themythic uniform distribution. More precisely, if g is measurable and bounded inabsolute value by a constant C then|mσ x g(x + t) − mx σ g(x)| =|λy ψ(y)g(σ y + t) − λ y ψ(y)g(σ y)|≤ λ y |ψ(y − t/σ ) − ψ(y)||g(σ y)|≤ C(−t/σ ).This bound quantifies how far m σ is from having the real invariance property.LAMT.minimax Lemma. Let ρ(·) be a bowl-shaped loss function. Then, for every randomizedestimator {K z },limit.minimax sup Qδ z K z y ρ(y − δ) ≥ Qz 0 ρ(z).δthe supremum running over all δ in R k . More precisely, for each R strictly lessthan Q 0 ρ there exists a finite set D R such thatmax Qδ z K z y ρ(y − δ) > Rδ∈D Rfor every randomized estimator.Proof. Consider first the case where ρ is bounded (0 ≤ ρ ≤ C) and uniformlycontinuous.To prove the stronger assertion, consider firstβ(σ, K ) = sup Qδ z K z y ρ(y − δ).|δ|≤σBound the supremum from below by an average with respect to m σ , and rewritethe Q δ integral as a Q 0 integral, to getβ(σ, K ) ≥ m δ σ Qz 0 K y δ+zρ(y − δ).Invoke inequality with g(δ) = K y δ ρ(y + z − δ) to reduce the right-handside toQ0 z mδ σ g(δ) − CQz 0(−z/σ ).2 Asymptopia: 23 March 2001 c○David Pollard


Appendix CAlmost invariant measures Anderson’s Lemma. Let f be a density with respect to Lebesgue measureon R s , such that:(i) f (x) = f (−x) for all x;(ii) the high density region { f ≥ t} is convex, for each t.For C a convex set symmetric about the origin, let C(θ) ={x + θ : x ∈ C}.Then ∫ {x ∈ C(θ)} f (x) dx is maximized at θ = 0. Corollary. Let ρ be a nonnegative function on R s such that {ρ ≤ t} is convexand symmetric about the origin, for each t. Then ∫ f (x − α)ρ(x − β) dx isminimized when α = β.Proof. By a change of variables, it is enough to consider the case whereα = 0. Let R t ={ρ ≤ t}. Then∫∫∫f (x)ρ(x − β) dx = f (x){0 ≤ t


Appendix CAlmost invariant measures Corollary. There exists a sequence of probability measures {m i } on suchthat the asymptotic invariance property holds for all bounded, measurableg(·) on R k and all x in . Example. Here is a rigorous version of the heuristic argument fromExample . To make things a trifle more interesting, suppose X hasa N(θ, I s ) distribution indexed by an unknown θ in R s . Suppose ρ is anonnegative bowl-shaped loss function.For t < sup ρ, define ρ t = min(t,ρ). Each ρ t is also bowl-shaped; itsatisfies the conditions imposed by Corollary . With φ(·) now denotingthe N(0, I s ) density, the heuristic argument has a rigorous analog,∫sup P θ ρ(T (X) − θ) ≥ m i (dθ)P θ ρ t (T (X) − θ)θ=∫∫m i (dθ)φ(x)ρ t (T (x + θ) − θ) dx,for each m i from Lemma . For each x, the <strong>lemma</strong> gives∫∫m i (dθ)ρ t (T (x +θ)−θ) = o(1)+ m i (dθ)ρ t (T (θ)−θ + x)as i →∞.Boundedness of ρ t lets us invoke a Dominated Convergence argument tointegrate out over x, turning the last double integral from inequality into∫∫o(1) + m i (dθ)φ(x)ρ t (T (θ) − θ + x) dx as i →∞.Notice how the dθ in the heuristic argument has become m i (dθ), at the costof an extra o(1) term. The integral over x is minimized when T (θ) = θ, byvirtue of Corollary . It follows that, for each t < sup ρ,∫∫sup P θ ρ(T (X) − θ) ≥ o(1) + m i (dθ)φ(x)ρ t (x)dxθ∫= o(1) + φ(x)ρ t (x)dx.□The dependence on i has now concentrated into the o(1) term. Let i tend toinfinity, then let t increase to sup ρ, to deduce the minimax property for X asan estimator for θ.Asymptopia: 23 March 2001 c○David Pollard 3


Appendix DVector Lattices[§] 1. DefinitionsA vector space V equipped with a partial order “≤” is called a vector latticeif for each pair x, y in V(i) there is a smallest element z (denoted by x ∨ y) for which x ≤ z andy ≤ z,(ii) there is a largest element w (denoted by x ∧ w) for which x ≤ w andy ≤ w(iii) if x ≤ y then x + z ≤ y + z for all z ∈ V(iv) if x ≤ y and c ∈ R + then cx ≤ cyThe element |x| := x ∨ (−x) is called the modulus of x. The elementx + := x ∨ 0 is called the positive part of x. The element x − := (−x) ∨ 0iscalled the negative part of x.If a vector lattice V is equipped with a norm ‖·‖for which(v) for all x, y ∈ V ,if|x| ≤|y| then ‖x‖ ≤‖y‖then V (equipped with ≤ and ‖·‖) is called a normed vector lattice.L.infty Example. Let S be a set equipped with a sigma-field S. The space L ∞ (S, S)of all bounded, S-measurable real functions on S is a vector lattice for thepointwise ordering:f ≤ g means f (s) ≤ g(s) for all s in SThe supremum norm, defined by□‖ f ‖=sup | f (s)| ,s∈Ssatisfies property (v). The space L ∞ (S, S) is a normed vector lattice.The space L ∞ (S, S) is also complete for the supremum norm, in the sensethat Cauchy sequences converge to bounded, measurable limit functions. Acomplete normed vector lattice is called a Banach lattice.If S is equipped with a topology, the set C(S) of all bdd, continuous realfunctions on S is also a Banach lattice under the pointwise ordering and thesupremum norm. If all functions in C(S) are S-measurable (for example, Scould be the Borel sigma-field) then C(S) is a Banach sublattice of L ∞ (S, S).[§] 2. Basic factsAsymptopia: 4Jan99 c○David Pollard 1


Section D.2Basic factsFact A: For real c, and x, y ∈ V ,{ c(x ∧ y) for c ≥ 0(cx) ∧ (cy) =c(x ∨ y) for c ≤ 0□Fact B: For x, y, z ∈ V ,(x + z) ∧ (y + z) = (x ∧ y) + z□Fact C: For x, y, z ∈ V ,x ∧ y + x ∨ y = x + y + (x − x − y) ∧ (y − x − y) − (−x) ∧ (−y)= x + y□Fact D: For each x ∈ V ,2x + = (x + x) ∨ (x − x) = x + (x ∨ (−x)) = x +|x|2x − = (−x − x) ∨ (x − x) =−x + ((−x) ∨ x) =−x +|x|By addition and subtraction.x = x + − x − |x| =x + + x −From (2x + ) ∨ (2x − ) = (x +|x|) ∨ (−x +|x|) = x ∨ (−x) +|x| =2|x| deducex + ∨ x − =|x| =x + + x −x + ∧ x − = x + + x − − (x + ∨ x − ) = 0□singular Definition. Elements x and y in V are said to be orthogonal (or latticedisjoint) if |x|∧|y| =0. Write x ⊥ y to denote orthogonality. define subsetsX and Y of V to be orthogonal if x ⊥ y for each x in X and y in Y .abs.triangle Lemma. For all x and y in V ,|x + y| ≤|x|+|y| with equality if and only if |x|∧|y| =0Proof. Argue first for the inequality.|x + y| =(x + y) ∨ (−x − y)= (x +|x|+y +|y|) ∨ (|x|−x +|y|−y) −|x|−|y|= (2x + + 2y + ) ∨ (2x − + 2y − ) −|x|−|y|= 2 ( (x + ∨ y + − x + ∧ y + ) ∨ (x − ∨ y − − x − ∧ y − ) ) −|x|−|y|Discard the nonnegative quantities x + ∧ y + and x − ∧ y − , leaving the upperbound2(x + ∨ y + ∨ x − ∨ y − ) −|x|−|y| =2(|x|∨|y|) −|x|−|y|= 2(|x|+|y|−|x|∧|y|) −|x|−|y|≤|x|+|y|The last inequality is an equality if and only if |x| ∧|y| =0. Moreover, if|x|∧|y| =0 then both the terms x + ∧ y + and x − ∧ y − are also zero, which□ implies equality at all steps.splitting.<strong>lemma</strong> Splitting Lemma. If x 1 , x 2 , y 1 , y 2 ∈ V + and x 1 + x 2 = y 1 + y 2 then thereexist x ij ∈ V + such thatu 11 + u 21 = x 1 u 11 + u 12 = y 1u 12 + u 22 = x 2 u 21 + u 22 = y 22 Asymptopia: 4Jan99 c○David Pollard


Appendix DVector LatticesProof.Define the u ij as in the following table:u 11 = x 1 ∧ y 1 u 12 = (y 1 − x 1 ) + y 1u 21 = (y 2 − x 2 ) + u 22 = y 2 ∧ x 2 y 2x 1 x 2 x 1 + x 2 = y 1 + y 2Check the first row sum:x 1 ∧ y 1 + (y 1 − x 1 ) + = (x 1 ∧ y 1 ) + (y 1 ∨ x 1 ) − x 1 = x 1 + y 1 − x 1Interchange subscripts 1 and 2 to check the second row sum. Interchange theroles of x and y, then use the fact that y 2 − x 2 = x 1 − y 1 to check the column□ sums.splitting.ineq Corollary. If w, x 1 , x 2 ∈ V + and w ≤ x 1 + x 2 then there exist w i ∈ V +such that w = w 1 + w 2 and w i ≤ x i for i = 1, 2.□ Proof. let y 1 = w and y 2 = x 1 + x 2 − w.sum.orthog Exercise. Suppose x ∈ V and y 1 , y 2 ∈ V + and x ⊥ y i for i = 1, 2. Showthat x ⊥ (y 1 + y 2 ) and x ⊥ (y 1 − y 2 ).Solution: Suppose 0 ≤ w =|x|∧(y 1 + y 2 ). Split w into a sum w 1 + w 2 ,with 0 ≤ w i ≤ y i . Note w i ≤|x|.w = w 1 ∧|x|+w 2 ∧|x| ≤y 1 ∧|x|+y 2 ∧|x| =0.Thus |x| ⊥(y 1 + y 2 ).The second assertion follows from the first and the fact that |y 1 − y 2 |≤□ y 1 + y 2 .[§] 3. Order-bounded linear functionalslet V be a vector lattice. A linear map λ from V into R is to be an orderbounded linear functional if, for each pair a ≤ b,sup{|λ(x)| : a ≤ x ≤ b} < ∞A linear map λ from V into R is to be an increasing linear functional ifλ(x) ≤ λ(y) whenever x ≤ y. Equivalently, a linear functional is increasing ifλ(x) ≥ 0 for all x ∈ V + . The space V # of all order-bounded linear functionalson V is called the order dual of V .fnal.ordering Definition. Define λ 1 ≤ λ 2 to mean that λ 1 (x) ≤ λ 2 (x) for all x in V + .The key facts about linear functionals are that(i) The order dual V # is a vector lattice.(ii) Each λ in V # can be expressed as a difference of two increasing linearfunctionals.extension Lemma. Let λ : V + → R satisfy(i) λ(αx + βy) = αλ(x) + βλ(y) for all α, β ∈ R + and x, y ∈ V +(ii) For each w in V + ,sup{|λ(x)| :0≤ x ≤ w} < ∞Then λ has a unique extension to an order-bounded linear functional on V .Proof. Define λ(u) = λ(u + ) − λ(u − ). Check that if x 1 − x 2 = y 1 − y 2 thenλ(x 1 ) − λ(x 2 ) = λ(y) 1 − λ(y 2 ). Deduce linearity.Asymptopia: 4Jan99 c○David Pollard 3


Section D.3Order-bounded linear functionalsFor order bounded: if a ≤ x ≤ b then 0 ≤ x + ≤ b + and 0 ≤ x − ≤ a − ,whence|λ(x)| ≤|λ(x + )|+|λ(x − )|≤ sup{|λ(y)| :0≤ y ≤ b + }+sup{|λ(y)| :0≤ y ≤ a − } < ∞.□By virtue of this extension result, there is no harm in referring to therestriction of an order-bounded linear functional to V + as an order-boundedlinear functional on V + . If such a functional maps V + into R + then it isincreasing.fnal.max Theorem. Let λ 1 and λ 2 be order-bounded linear functionals. Then, forv ∈ V + ,(i) (λ 1 ∨ λ 2 )(v) = sup{λ 1 v 1 + λ 2 v 2 : v i ∈ V + and v 1 + v 2 = v}(ii) (λ 1 ∧ λ 2 )(v) = inf{λ 1 v 1 + λ 2 v 2 : v i ∈ V + and v 1 + v 2 = v}The space V # is a vector lattice.Proof. Consider only the assertion for maxima of the two functionals. Thearguments for minima are analogous. Write µ(v) for the expression on theright-hand side of (i).First show that µ is a linear functional, in the sense of Lemma . Itistrivial to check that µ(cv) = cµ(v) for all v ∈ V + and c ∈ R + . To establishadditivity,µ(v + w) = µ(v) + µ(v) for all v, w ∈ V + ,consider pairs in V + with v 1 + v 2 = v and w 1 + w 2 = w. For such pairs,λ 1 (v 1 ) + λ 2 (v 2 ) + λ 1 (w 1 ) + λ 2 (w 2 ) = λ 1 (v 1 + w 1 ) + λ 2 (v 2 + w 2 )Because v 1 + w 1 + v 2 + w 2 = v + w, the right-hand side is less than µ(v + w).Take the supremum over all pairs to deduce thatµ(v) + µ(w) ≤ µ(v + w)For the reverse inequality, consider any pair in V + for which u 1 + u 2 = v + w.Invoke the Splitting Lemma to break u 1 into v 1 + w 1 and u 2 into v 2 + w 2such that v 1 + v 2 = v and w 1 + w 2 = w. Then□λ 1 u 1 + λ 2 u 2 = λ 1 v − 1 + λ 2 v 2 + λ 1 w 1 + λ 2 w 2≤ µ(v) + µ(w)Take a suprememum over u 1 , u 2 to deduce thatµ(v + w) ≤ µ(v) + µ(w)Additivity follows.Next, check that µ is order bounded, in the sense of Lemma .Consider v with 0 ≤ v ≤ w, for fixed w in V + . The supremum defining µ(v)is smaller in absolute value thansup (|λ 1 (v 1 )|+|λ(v 2 )|}) ≤ sup |λ 1 (u)|+ sup |λ 2 (u)| < ∞v 1 +v 2 =v0≤u≤w0≤u≤wThe linear functional µ belongs to V # .If λ ∈ V # and λ ≥ λ 1 and λ ≥ λ 2 , then, for v ∈ V + ,µ(v) ≤ sup{λv 1 + λv 2 : v i ∈ V + and v 1 + v 2 = v} =λ(v)That is, µ ≤ λ; the order bounded linear functional µ is the least upper boundλ 1 ∨ λ 2 .The lattice decomposition λ = λ + − λ − expresses a general λ in V # as adifference of two increasing functionals.4 Asymptopia: 4Jan99 c○David Pollard


Appendix DVector Latticesabs.lam Exercise. For λ in V # and w in V + , show that|λ|(w) = sup λ(u) = sup |λ(u)||u|≤w |u|≤wSolution: From the definition of |λ| as λ ∨ (−λ) we have|λ|(w) = sup{λw 1 − λw 2 : w i ∈ V + and w 1 + w 2 = w}= sup (λv − λ(w − v))0≤v≤wSubstitute u for 2v − w, noting that 0 ≤ v ≤ w if and only if −w ≤ u ≤ w ifand only if |u| ≤w, to reexpress last supremum assup λ(u)|u|≤w□ The last equality is trivial, because |u| =|−u|.[§] 4. Lattices of bounded functionsLet Ɣ be a vector lattice of bounded, real functions on a set S. Assume Ɣcontains the constant function 1. Equip Ɣ with its supremum norm,‖γ ‖=sup |γ(s)|.s∈SAssume Ɣ is closed (in the space of all bounded functions on S) for uniformconvergence. Then Ɣ is a Banach lattice.The norm on Ɣ has the special property thatM.norm ‖γ 1 ∨ γ 2 ‖=max (‖γ 1 ‖, ‖γ 2 ‖) if γ 1 ≥ 0 and γ 2 ≥ 0Such an equality makes Ɣ an example of an abstract M-space. Normedvector lattices satisfying an equality like share many of the propertiesof spaces of bounded functions under a supremum norm. In fact, under mildconditions (Schaefer 1974, Section II.7), abstract M-spaces can be representedas spaces of bounded functions (in fact, the space of all bounded, continuousreal functions on some compact space).bdd.meas Example. Let S be the sigma-field on S generated by Ɣ. IfƔ contains allindicator functions of S-measurable sets, then it must coincide with the spacebm(S, S) of all bounded, S-measurable, real functions on S, because eachfunction in bm(S, S) is a uniform limit of simple functions, ∑ i α i A i ∈ Ɣ.The example of C[0, 1], the space of all bounded, continuous, realfunctions on [0, 1], shows that Ɣ need not always contain all bounded,□ measurable functions.A linear map λ from Ɣ to R is continuous if and only if‖λ‖ := sup{|λ(γ )| : ‖γ ‖≤1} < ∞.It is traditional to write Ɣ ∗ for the space of all such continuous linear functionals.meas.fnal Example. If µ = µ 1 − µ 2 is a signed measure (expressed as a difference oftwo nonnegative measures µ 1 and µ 2 ), defined on a sigma-field S for which allfunctions in Ɣ are S-measurable, then the integral γ ↦→ µ(g) := µ 1 (γ )−µ 2 (γ )defines a continuous linear functional on Ɣ, becausesup |µ(γ )| ≤µ 1 (1) + µ 2 (1)


Section D.4Lattices of bounded functionsLet µ = µ 1 − µ 2 and ν = ν 1 − ν 2 be finite signed measures. LetQ = µ 1 + µ 2 + ν 1 + ν 2 . Each µ i and ν i is absolutely continuous with respectto Q. Write m i and n i for the corresponding densities. Then µ has densitym = m 1 − m 2 and ν has density n = n 1 − n 2 , both with respect to Q.The lattice-theoretic maximum of µ and ν has a closed-form expression::for γ ∈ Ɣ + ,sup {µγ 1 + νγ 2 }= sup Q (mγ 1 + nγ 2 ) = Q ((m ∨ n)γ ) ,γ 1 +γ 2 =γγ 1 +γ 2 =γthe supremum being achieved at γ 1 ={m ≥ n}γ and γ 2 ={m < n}γ .Inparticular, |µ| is the countably additive measure with density |m| with respectto Q. Similarly, µ + has denity m + , and µ − has denisty m − .The measures µ and ν are said to be mutually singular if there exists aset S 0 ∈ S such that |µ|(S 0 ) = 0 =|ν|(S0 c ). Equivalently, Q(|m|∧|n|) = 0, or|µ|∧|ν| =0. That is, mutual singularity is the same concept as orthogonality□ of the corresponding linear functionals.orderbdd.cts Theorem.(i) The spaces Ɣ ∗ and Ɣ # are the same.(ii) ‖λ‖ =|λ|(1) for each λ in Ɣ ∗ .(iii) The space Ɣ ∗ is a Banach lattice.Proof. Notice that the inequalities ‖γ ‖≤1 and |γ |≤1, for a function γ inƔ place the same constraint on γ , namely, that it is everywhere bounded inabsolute value by the constant 1. (Notice though that 1 denotes a real numberin the first inequality and a constant function in the second inequality.) Thusnorm.abs □‖λ‖ =sup{|λ(γ )| : −1 ≤ γ ≤ 1}If λ is order bounded then the right-hand side of is finite, and henceλ ∈ Ɣ ∗ . Conversely, if γ 1 and γ 2 are functions in Ɣ, both bounded in absolutevalue by a constant C, thensup{|λ(γ )| : γ 1 ≤ γ ≤ γ 2 }≤C sup{|λ(γ )| : −1 ≤ γ ≤ 1}Finiteness of ‖λ‖ therefore implies order boundedness.Exercise , with w equal to 1, establishes assertion (ii).The Banach lattice property, that ‖λ 1 ‖≤‖λ 2 ‖ whenever |λ 1 |≤|λ 2 |,follows directly from (ii).Assertion (ii) of the Theorem has another consequence,L.norm ‖λ 1 + λ 2 ‖=‖λ 1 ‖+‖λ 2 ‖ if λ 1 ≥ 0 and λ 2 ≥ 0.Such an equality makes Ɣ # an example of an abstract L-space. Normedvector lattices satisfying an equality like share many of the propertiesof spaces of finite signed measures. In fact every L-space can be representedas a space L 1 (µ), for some Radon measure on a locally compact space. Inthis sense, evry abstract L-space corresponds to the collection of finite signedmeasures that are absolutely continuous with respect to a fixed measure (on avery nice topological space).It is a little mysterious where the countable additivity comes from, forit is not so hard to construct examples where the functionals in Ɣ # are notrepresented by signed measures living on a sigma-field on S. The trick lies inthe choice of the space where µ lives. A small analogy might help.6 Asymptopia: 4Jan99 c○David Pollard


Appendix DVector Latticesanalogy Example. Let S = [0, 1) and let Ɣ denote the space of all functionsobtainable by restricting continuous real functions on [0, 1] to the subinterval[0, 1). For each γ in Ɣ defineλ(g) = lim γ(s)s→1It is easy to see check that λ is an increasing linear functional on Ɣ, but it doesnot correspond to an integral with respect to any countably additive measure:the sequenceγ n (s) = (n − ns)) ∧ 1is nonnegative and it increases pointwise to the constant function 1 functionon S, butλ(γ n ) ≡ 0, which does not converge to λ(1); monotone convergenceis violated.Of course λ does correspond to the integral with respect to the point masssitting at 1, but that measure does not live on S. We need a larger space to□ support the representing measures.Lucien Le Cam uses subsets of abstract L-spaces to represent statisticalexperiments. He has argued (Le Cam 1986, Chapter 1), with some justificationI believe, that many of the technical problems associated with the traditionalsetting of probability measures on a sigma-field of a given sample space can betraced to an unfortunate choice of sample space. He has pointed out that theL-space structure is what really matters, and that there might be many waysto represent a given L-space. Why should we then be inconvenienced with aninital choice of sample space, if it calls forth unnecessary regularity conditionsmerely to make proofs flow smoothly? The downside of his approach is thatfamiliar looking objects, such as Markov kernels, can take on a strange newappearance in the context of an arbitrary sample space. Only with the rightrepresentations with the cunningly chosen sample spaces do measures look liketraditional measures, and random variables look like measurable functions, andrandomizations look like Markov kernels.norm.orthog Exercise. If λ 1 ⊥ λ 2 , for λ i ∈ Ɣ ∗ , show that‖λ 1 + λ 2 ‖=‖λ 1 ‖+‖λ 2 ‖Solution: The norm property requires ‖λ 1 + λ 2 ‖≤‖λ 1 ‖+‖λ 2 ‖. Theequality comes from the L-space property , Lemma , and the fact□ that ‖λ‖ =‖|λ|‖.[§] 5. Bands in a vector latticeA subset W of a vector lattice V is said to be majorized by an element vif w ≤ v for each w in W . The element v is called a majorant. The smallestmajorant, if it exists, is called the supremum of W , and is denoted by sup W .That is, sup W , if it exists, is the v 0 in V for which:(i) w ≤ v 0 for all w in W ;(ii) if v is an element of V for which w ≤ v for all w in W then v 0 ≤ v.A vector lattice for which each majorized set has a supremum is said to beDedekind complete.Asymptopia: 4Jan99 c○David Pollard 7


Section D.5Bands in a vector latticesup.real.line Example. The subset W 0 of all positive rational numbers on the real lineis not majorized within R: there is no real number r that is larger than everypositive rational.The subset W 1 of all rational numbers w for which w 2 ≤ 2 is majorized.For example, 2, 1.5, 1.42, 1.415 are majorants. The smallest majorant is √ 2.The real numbers are Dedekind complete. In fact, the real numbers canbe defined as equivalence classes of majorized subsets of the rationals. Moreprecisely, a real number gets defined by a “Dedekind cut”, the partition ofthe rationals that is eventually identified with the sets of all rationals that aresmaller than the real and all rationals larger than the real), a procedure due to□ Dedekind (Rudin 1976, pages 17–21).cts.fn.sup Example. The set C(R) of all bounded, continuous real functions on the realline is a vector lattice under the pointwise ordering of functions. The subset Wconsisting of all w ∈ C(R) for which{ 1 if t ≤ 0w(t) ≤0 if t > 0is majorized (by the constant function 1, for example), but it has no supremumin C(R).□ The vector lattice C(R) is not Dedkind complete.band.def Definition. A vector sublattice B of a vector lattice V is called a bandin V if(i) for each v in V and each b in B, if|v| ≤|b| then v ∈ B(ii) each nonempty subset of B that has a majorant in V also has asupremum, which belongs to BNotice that the lattice properties of a band follow automatically from (i)and (ii) and its vector space properties: b 1 ∨ b 2 = sup{b 1 , b 2 } and b 1 ∧ b 2 =b 1 + b 2 − (b 1 ∨ b 2 ).orthog.band Example. Suppose V is a Dedekind complete vector lattice. For a subsetW of V , not necessarily a sublattice or even a subspace, its orthogonalcomplement W ⊥ is defined as the set of v ∈ V that are orthogonal to eachelement of W . Assertion: The set W ⊥ is a band. Proof: The vector spaceproperty follows easily from Exercise . Property (i) of bands followsimmediatley from the trivial fact that if |v| ≤|b| and |w| ∧|b| =0 then|w|∧|v| =0. property (ii) is slightly trickier.Suppose X ⊂ W ⊥ is majorized. Write m for its supremum. For afixed x 0 in X, define X 0 ={x ∈ X : x ≥ x 0 }. Notice that m = sup X 0 andm −x 0 = sup x∈X0 (x −x 0 ). (This little trick effectively lets us reduce the problemto the case where X ⊆ V +, so that absolute values cause no trouble.) Fix a win W . We need to show that |w|∧|m| =0. Bound |m| by (m − x 0 ) +|x 0 |.Byassumption |w|∧|x 0 |=0. By virtue of the result in Example , itisnowenough to prove that |w|∧(m − x 0 ) = 0. The distributive law from Problem [3],|w|∧(m − x 0 ) =|w|∧sup(x − x 0 )X 0= sup (|w|∧(x − x 0 ))X 0□= 0 because x − x 0 ∈ W ⊥The band W ⊥⊥ consists of all elements that are orthogonal to the elementsthat are orthogonal to W . Figure out what that means and you will see whyW ⊆ W ⊥⊥ . In fact, as shown in Problem [4], W ⊥⊥ is the smallest bandcontaining W , also known as the band in V generated by W . The Problem usesfacts about projections onto bands (see Section 7).8 Asymptopia: 4Jan99 c○David Pollard


Appendix DVector LatticesL1.band Exercise. Let V = Ɣ # , where Ɣ is the vector lattice of all bounded, S-measurable functions on a set S equipped with a sigma-field S. Let λ be a fixedsigma-finite measure on S. Let B be the subset of V identified with the spaceof all finite signed measures on S that are absolutely continuous with respectto λ. Show that B is a band.Solution: Each µ in B has a density wrt λ. The set B may therefore beidentifed with the Banach space L 1 (λ), the space of equivalence classes ofλ-integrable real functions.Property (i) of bands is easy to prove. Suppose 0 ≤|v| ≤|µ| with v ∈ Vand µ ∈ B. With no loss of generality, suppose 0 ≤ v ≤ µ. The functional vdefines a set function ν on S: the value of ν(A) is just the functional v appliedto the indicator function of A. Linearity of v implies finite additivity of ν.Also, if {A n } is a decreasing sequence of sets in S with intersection ∅ then0 ≤ ν A n ≤ µA n ↓ 0.The set function ν is actually a finite, countably additive measure on S. Theintegral wrt ν coincides with the fnal v on Ɣ: argue from the fact that eachγ ∈ Ɣ is a uniform limit of sinmple functions. That is, we may identify v withthe measure ν. Clearly, ν is absolutely continuous with respect to µ, whichis itself absolutely continuous with respect to λ. Thus ν ∈ B, as required forproperty (i) of bands.For property (ii), suppose that M is a subset of B that is majorized by win V . That is,µ(γ ) ≤ v(γ) for all γ ∈ Ɣ + .Let M 1 denote the set of all finite maxima µ 1 ∨ ...∨ µ n of measures from M.The set M 1 is also majorized by w. IfM 1 has a supremum then so does M,and the two suprema are equal. Why?Define = sup µ∈M1 µ(1), which is bounded above by w1. Chooseµ n ∈ M 1 such that µ n (1) → . Without loss of generality, we may assumeµ 1 ≤ µ 2 ≤ ....The corresponding sequence of densities {m n } with respect to λ increasesalmost surely to a measurable function m, which defines a new measure µ. Inits role as an element of V , the measure µ is majorized by w: for each γ ∈ Ɣ + ,µγ = lim λ(m n g) by monotone convergence≤ w(γ ) because µ n ≤ w for each nTo prove that µ ≥ µ 0 for every µ 0 in M 1 it is enough to show that(µ − µ 0 ) + (1) = 0 for each such µ 0 . From the fact that µ 0 ∨ µ n ∈ M 1 we get ≥ µ 0 ∨ µ n (1) ≥ µ n 1 → whence(µ 0 − µ 0 ) + (1) ≤ (µ 0 − µ n ) + (1) = (µ 0 ∨ µ n )(1) − µ n 1 → 0.Thus µ majorizes M.If ν is another majorant for M 1 then, for each fixed γ ∈ Ɣ + ,µγ = lim µ n γ ≤ νγn□That is, µ is the supremum for M 1 .Remark. The density m (or, more precisely, its equivalence class inL 1 (λ)) is called the essential supremum of the class of densitiescorresponding to the measures in M.Asymptopia: 4Jan99 c○David Pollard 9


Section D.6The band of countably additive measures[§] 6. The band of countably additive measuresLet Ɣ be a vector lattice of bounded functions on a set S, equipped withsupremum norm. Let S be the sigma-field on S generated by Ɣ. As before,let ca + (S, S) denote the set of countably additive (nonnegative) finite measureson S, and ca(S, S) denote the set of finite signed measures (differences ofmeasures from ca + (S, S)) onS. Write L for the dual space Ɣ ∗ = Ɣ # .According to the Daniell-Stone representation theorem (Royden 1968,Chapter 13), there is a one-to-one correspondence between ca + (S, S) and theset of increasing linear functionals µ on Ɣ with the sigma-smoothnessproperty:• if {γ n : n ∈ N} ⊂Ɣ + and γ n ↓ 0 pointwise, then µ(γ n ) ↓ 0.The linear functional is equal to the integral with respect to the representingmeasure.It follows immediately from the sigma-smoothness characterization, thatsolid if λ ∈ L and |λ| ≤µ ∈ ca + (S, S) then |λ| ∈ca + (S, S)(and hence both λ + and λ − are also represented by countably additive measures).In particular, if λ ∈ L + and µ ∈ ca + (S, S) then both µ ∧ λ and (µ − λ) + belongto ca + (S, S).ca.sup Lemma. Let M ⊆ ca + (S, S) have an upper bound λ in L + , that is,0 ≤ µ(γ ) ≤ λ(γ ) for all g ∈ Ɣ +Then there exists a smallest λ 0 in L + with µ ≤ λ 0 for all µ ∈ M. Thefunctional λ 0 is countably additive.Proof. Define M 1 as the set of all finite maxima µ 1 ∨ ...∨ µ k of finitesubcollections of measures from M. The functional λ is also an upper boundfor M 1 .Define = sup µ∈M1 µ(1), which is bounded above by λ1. Chooseµ n ∈ M 1 such that µ n (1) → . Without loss of generality, we may assumeµ 1 ≤ µ 2 ≤ ....Let Q be a finite measure dominating all the µ n , such asQ = ∑ 2 −n µ n /(1 +‖µ n ‖)nWrite m n for the density of µ n with respect to Q. Then the sequence {m n }increases almost surely to a measurable function m for whichQm = lim Qm n = lim µ n 1 = n nDefine λ 0 as the measure with density m with respect to Q.To prove that λ ≥ µ for every µ in M it is enough to show that(λ 0 − µ) + (1) = 0 for each such µ. From the fact that µ ∨ µ n ∈ M 1 we get ≥ µ ∨ µ n (1) ≥ µ n 1 → whence(µ − λ 0 ) + (1) ≤ (µ − µ n ) + (1) = (µ ∨ µ n )(1) − µ n 1 → 0.Thus λ is an upper bound for M.If λ 1 is another upper bound for M then, for each fixed γ ∈ Ɣ + ,λ 0 γ = lim µ n γ ≤ λ 1 γn□That is, λ 0 is the smallest upper bound for M.Property and Lemma identify ca(S, S) as a band in theBanach lattice L.10 Asymptopia: 4Jan99 c○David Pollard


Appendix DVector LatticesProjection onto ca(S, S)For each λ in L + write ψ c (λ) for the largest measure in the set {µ ∈ ca + (S, S) :µ ≤ λ}, and write ψ f (λ) for λ − ψ c (λ).Properties: ψ f (λ) ⊥ ν for all ν ∈ ca + (S, S). The maps are linear, andhave a linear extension to L.[§] 7. Projections onto bands in a vector latticeLet B be a band in a vector lattice V . For each v ∈ V + ,define π B (v) ∈ B asthe supremum of all b ∈ B + for which 0 ≤ b ≤ v.band.proj Theorem. The map v ↦→ π B (v) from V + into B + has an extension to anincreasing linear map from V into B with the property thatsing.part v − π b v ⊥ B for all v in VProof. Consider first the behaviour of π B on V + .The property π B (cv) = cπ B (v), for each c ∈ R + , follows easily from thedefinition of a supremum.If v 1 ,v 2 ∈ V + thenπ B (v 1 + v 2 ) ≥ π B (v 1 ) + π B (v 2 ),because the right-hand side is ≤ v 1 + v 2 . For the reverse inequality, considerany b in B that is ≤ v 1 + v 2 . Invoke the Splitting Lemma to express b asa sum b 1 + b 2 with b i ≤ v i . Then b i ≤ π B (v i ), whence b ≤ π B (v 1 ) + π B (v 2 ).That is, the right-hand side is a majorant for the set of which the left-hand sideis the supremum. The reverse inequality follows.It follows that π B is linear in the sense of Lemma . It has aunique extension to an increasing linear functional on V ,defined by π B (v) =π B (v + ) − π B (v − ).To prove , start again with v ∈ V + . We have only to show that|b|∧(v − π B (v)) = 0 for each b ∈ B. Write b 0 for the last minimum. Then0 ≤ b 0 ≤ (v − π B (v); and b 0 ∈ B by property (ii) of bands. It follows thatv ≥ b 0 + π B (v) ∈ B. From the definition of π B (v) as the supremum, we thenget that π B (v) ≥ b 0 + π B , that is, b 0 = 0.For a general v in V , the special case gives v + − π B (v + ) ⊥ B andv − − π B (v − ) ⊥ B. Exercise then implies that v − π B (v) is orthgonal□ to B.leb.decomp Example. Let λ be a sigma-finite measure on a sigma field S of subsets of aset S, and let µ be a finite signed measure on S. Let Ɣ denote the banach latticeof all bounded, S-measurable, real functions on S. Let B denote the band in Ɣ #corresponding to L 1 (λ). The projection µ B of µ onto B is a countably additivemeasure that is absolutely continuous with respect to λ. The measure µ − µ Bis orthogonal to each member of B. It must concentrate on a set that has zeroλ measure.The split of µ into the absolutely continuous and singular parts is called□ the Lebesgue decomposition of µ.[§] 8. Problemsdistributive [1] (Distributive law for vector lattices) For x, y, z ∈ V , a vector lattice, prove(x ∨ y) ∧ z = (x ∧ z) ∨ (y ∧ z)Asymptopia: 4Jan99 c○David Pollard 11


Section D.8Problemsby following these steps:(i) Let w = x ∨ y. Show that w ∧ z ≥ x ∧ z and w ∧ z ≥ y ∧ z. Deduce thatthe left-hand side of the asserted equality is ≥ the right-hand side.(ii) Let v denote the right-hand side of the asserted equality, and let u equalx ∨ y ∨ z. Show that v ≥ x + z − x ∨ z and v ≥ y + z − y ∨ z. Deducethat v + u ≥ (x + z) ∨ (y + z) = w + z. Use ?? to show that the lastright-hand side equals u + (w ∧ z).distributive2 [2] For x, y, z ∈ V , a vector lattice, prove the other distributive law,(x ∧ y) ∨ z = (x ∨ z) ∧ (y ∨ z)Hint: Replace x, y, z by −x, −y, −z, then use the first distributive law.distributive.sup [3] Let V be a vector lattice with a subset X for which sup X exists. For each v inV , show that sup x∈X (v ∧ x) exists andv ∧ sup X = sup(v ∧ x).x∈XHint: Write u for sup X. It is easy to show that v ∧ u is a majorant for theset {v ∧ x : x ∈ X}. Ifw is another majorant, argue as in Problem [1] thatw ≥ x+v−(x∨v) ≥ x+v−u for all x ∈ X. Deduce that w+u−v ≥ sup X = u,that is, w ≥ v.perpeprp [4] Let B be a band in a Dedekind complete vector lattice V .(i) Show that B ⊥⊥ = B. Hint: Suppose x ∈ B ⊥⊥ . Decompose as x B + x ⊥ ,where x B = π B (x). Argue that x ⊥ is an element of both B ⊥ and B ⊥⊥ .Deduce that x ⊥ is orthogonal to itself, and hence is zero.(ii) For a subset W of V , deduce that W ⊥⊥ is the smallest band containing W .Hint: if B is a band with B ⊇ W show that B ⊥ ⊆ W ⊥ .general.splitting [5] Extend the Splitting Lemma to sums of more than two terms: if x i ∈ V +for i = 1,...,m and y j ∈ V + for j = 1,...,n and ∑ i x i = ∑ j y j then thereexist u ij ∈ V + such that ∑ j u ij = x i and ∑ i u ij = y j for all i and j.[§] 9. NotesSee Schaefer (1986) for an introduction to the theory of vector lattices.See Schaefer (1974) for facts about L-spaces and M-spaces.Aliprantis & Border (1994, Chapter 7) for a quicker tour through the basictheory.Fremlin (1974).Torgersen (1991, Chapter 5).First four(?) chapters of Bourbaki’s Intégration.Kelley & Namioka (1963, Appendix A) for a proof of the representationof abstract L-spaces and M-spaces.ReferencesAliprantis, C. D. & Border, K. C. (1994), Infinite Dimensional Analysis: AHitchhiler’s Guide, Springer-Verlag.Fremlin, D. H. (1974), Topological Riesz Spaces and Measure Theory, CambridgeUniversity Press.Kelley, J. L. & Namioka, I. (1963), Linear Topological Spaces, Van Nostrand.Le Cam, L. (1986), Asymptotic Methods in Statistical Decision Theory,Springer-Verlag, New York.12 Asymptopia: 4Jan99 c○David Pollard


Appendix DVector LatticesRoyden, H. L. (1968), Real Analysis, second edn, Macmillan, New York.Rudin, W. (1976), Principles of Mathematical Analysis, third edn, McGraw-Hill.Schaefer, H. H. (1974), Banach Lattices and Positive Operators, Springer-Verlag.Schaefer, H. H. (1986), Topological Vector Spaces, Springer-Verlag.Torgersen, E. (1991), Comparison of Statistical Experiments, CambridgeUniversity Press.Asymptopia: 4Jan99 c○David Pollard 13


Appendix ETopological vector spaces1. Locally convex topological vector spacesA vector space X equipped with a topology is said to be a topological vectorspace (TVS) if(i) (x 1 , x 2 ) ↦→ x 1 + x + 2 is continuous as a map from X ⊗ X into X(ii) (λ, x) ↦→ λx is continuous as a map from R ⊗ X into XA TVS is said to be locally convex if it is Hausdorff and if every neighborhoodof 0 contains a convex neighborhood of 0. Abbreviate “locally convextopological vector space” to lcTVS. Example. A TVS whose topology is defined by a norm is a lcTVS. The□ balls centered at the origin are convex. Example. Let X be a vector space and Ɣ be a collection of linear functionsfrom X into R that separate points (that is, if x and y are distinct points of Xthen f (x) ≠ f (y) for at least one f in Ɣ). Then the weakest topology on Xmaking all functions in Ɣ continuous makes X a lcTVS. The sets{x ∈ X : | f i (x)| sup λxx∈FIn particular, X ∗ separates points of X.If f is a convex, lower semi-continuous map from X into R ∪ {∞}, theepigraphepi( f ) ={(x, t) ∈ X ⊗ R : t ≥ f (x)}is a closed, convex subset of the lcTVS X ⊗ R. The hyperplanes that separateepi( f ) from points (x, s) with s < f (x) define a collection of continuous linearfunctionals on X whose pointwise supremum equals f . That is, f (x) = sup{x ∗ (x) : f ≥ x ∗ on X}Asymptopia: 4Jan99 c○David Pollard 1


Section E.1Locally convex topological vector spacesBarycenterLet K be a compact, convex subset of a lcTVS X. Let λ be a probabilitymeasure on the Borel sigma-field of K . The barycenter of λ is the point x 0in K for whichλx ∗ = x ∗ (x 0 ) for all x ∗ in X ∗If X = R the barycenter is just the mean, λx.Uniqueness of the barycenter in general follows from the fact that X ∗separates points of X. Existence follows from the separating hyerplane propertyfor Euclidean space. For each finite subset D of X ∗ defineK D ={x ∈ K : x ∗ (x) = λ(x ∗ ) for all x ∗ ∈ D}Each K D is closed, and hence compact. If each K D were nonempty then theintersection of all such K D would be nonempty. A point in the intersectionwould be a barycenter. (So the intersection can contain at most one point.)Suppose D = x1 ∗,...,x k ∗. To prove that K D ≠∅,define a continuouslinear map ψ from X into R k by x ↦→ (x1 ∗(x),...,x k ∗ (x)). The image ψ(K ) iscompact and convex. For K D to be nonempty we need ψ(K ) to contain the pointL = (λx1 ∗,...,λx k ∗). If it didn’t, there would exist a vector α = (a 1,...,α k )such that α ′ L < inf x∈K α ′ ψ(x). The linear functional x ∗ = ∑ i α i xi∗ wouldthen have the property λ(x ∗ ) = α ′ L < inf x ∗ (x),x∈Kan inequality impossible for a probability measure. (Proof borrowed fromAlfsen 1971, page 11).Notice that we didn’t really need to identify λ as a probability measure.It would suffice if λ were an increasing linear functional on the space C(K ) ofall continuous real functions on K , with λ1 = 1. We could then interpret λx ∗as the result of applying λ to the continuous function obtained by restricting x ∗to K . Of course the Riesz Representation Theorem ensures that λ correspondsto a Borel probability measure on K , but one doesn’t need to invoke thatrepresentation in order to prove existence of the barycenter. It was enough thatλ be an increasing, continuous linear functional on cc(K ) with λ(1) = 1.2. Minimax theoremNotation: C(K ) denotes the set of all bounded, continuous real functions on atopological space K , equipped with the norm for uniform convergence. It is alcTVS. Write C + (K ) for the set of nonnegative functions in C(K ). Theorem. Let f be a map from K ⊗C into R∪{∞}, where K is a compact,convex subset of a lcTVS X and C is a convex subset of a vector space Y.Suppose:(i) the map x ↦→ f (x, y) is convex and lower semi-continuous on K ;(ii) the map y ↦→ f (x, y) is concave on C.Then infx∈K supy∈Cf (x, y) = sup infy∈C x∈Kf (x, y)Proof. Clearly the left-hand side of equality is no smaller than theright-hand side. Also, if the right-hand side is infinite the equality holds fortrivial reasons.For convenience of notation define f (x, y) =∞for all x in K c .2 Asymptopia: 4Jan99 c○David Pollard


Appendix ETopological vector spacesSuppose then that M is a finite number for which M ≥ sup inf f (x, y)y∈C x∈KFix an ɛ>0.Write C L (K ) for the set of all restrictions of continuous linear functionalsto K . From , for each y ∈ C, f (x, y) = sup{g(x) : g ∈ C L (K ) and f (·, y) ≥ g}The setG ={g ∈ C(K ) : g ≤ f (·, y) for some y in C}is convex: if g 1 ≤ f (·, y 1 ) and g 2 ≤ f (·, y 2 ), then for α i ≥ 0 and α 1 + α 2 = 1,α 1 g 1 (x) + α 2 g 2 (x) ≤ α 1 f (x, y 1 ) + α 2 f (x, y 2 ) ≤ f (x,α 1 y 1 + α 2 y 2 )by concavity of f (x, ·). The closure G (for the uniform norm) is also convex.The constant function M + 2ɛ cannot belong to G; for if it did then therewould exist a g ∈ G, and hence a y ∈ C, such thatM + ɛ ≤ g(x) ≤ f (x, y) for all x ∈ K ,that is, M + ɛ ≤ inf x∈K f (x, y), in violation of .Invoke the separating hyperplane property with the singleton {M +2ɛ}as the compact, convex set and G as the closed, convex set. Let λ be acontinuous linear functional on C(K ) separating the two sets:□λ(M + 2ɛ) > sup λ(g).The set G has the property that g − ch ∈ G for all g ∈ G, all h ∈ C + (K ), andall c > 0. It follows that the functional λ must be increasing, that is, λ(h) ≥ 0for all h ∈ C + (K ), because(M + 2ɛ)λ(1) >λ(g) − cλ(h)We can also standardize λ so that λ(1) = 1. (How does one rule out thepossibility that λ(1) ≤ 0?) The separation inequality, specialized to linearfunctions in G, then givesM + 2ɛ >sup{λ(g) : g ∈ C L (K ) and g(·) ≤ f (·, y) for some y ∈ C}Let λ have barycenter x 0 , a point in K . Then the last inequality becomesM + 2ɛ >sup{x ∗ (x 0 ) : x ∗ ∈ X ∗ and x ∗ ≤ f (·, y) for some y ∈ C}The envelope property for lower semi-continuous convex functions lets usidentify the right-hand side of the last inequality as sup y∈C f (x 0 , y). That is,M + 2ɛ >sup f (x 0 , y)y∈CThe left-hand side of is less than M + 2ɛ, for each ɛ>0. The assertedequality follows.g∈G3. NotesSee Schaefer (1986) for the theory of topological vector spaces, includingthe separation theorems for lcTVS.See Schaefer (1974) for facts about L-spaces and M-spaces.Torgersen (1991), Millar (1983), Strasser (1985), van der Vaart (1984).Le Cam (1986, Section 2.3) for characterization of δ-distance via comparisonof risks. Le Cam (1964).Asymptopia: 4Jan99 c○David Pollard 3


Section E.3NotesReferencesAlfsen, E. M. (1971), Compact convex sets and boundary integrals, Springer-Verlag.Le Cam, L. (1964), ‘Sufficiency and approximate sufficiency’, Annals ofMathematical Statistics 35, 1419–1455.Le Cam, L. (1986), Asymptotic Methods in Statistical Decision Theory,Springer-Verlag, New York.Millar, P. W. (1983), ‘The minimax principle in asymptotic statistical theory’,Springer Lecture Notes in Mathematics pp. 75–265.Schaefer, H. H. (1974), Banach Lattices and Positive Operators, Springer-Verlag.Schaefer, H. H. (1986), Topological Vector Spaces, Springer-Verlag.Strasser, H. (1985), Mathematical Theory of Statistics: Statistical Experimentsand Asymptotic Decision Theory, De Gruyter, Berlin.Torgersen, E. (1991), Comparison of Statistical Experiments, CambridgeUniversity Press.van der Vaart, A. (1984), Limits of experiments, Lecture notes, December 1994.Part of a book to be published by Cambridge University Press.4 Asymptopia: 4Jan99 c○David Pollard


DRAFT: 30 March 2001 David PollardAppendix FNetsA sequence on a set X is a map i ↦→ x i from the natural numbers N to X. Asubsequence is defined by a map ψ : N → N with the property:• for each i 0 in N there exits a j 0 in N such that ψ(j) ≥ i 0 for all j ≥ j 0 .Sequences adequately describe the topology of metric spaces (or spaces where eachpoint has a countable neighborhood base). For example,a point lies in the theclosure of a set iff it the limit point of a convergent sequence on that set.For more general topological spaces,a more general notion of convergence isneeded. Definition. AsetI is set to be directed by an ordering relation ‘‘≤’’ if(i) if i ≤ j and j ≤ i then i = j(ii) if i ≤ j and j ≤ k then i ≤ k(iii) for each i and j in I there exists a k in I such that i ≤ k and j ≤ kA net (or generalized sequence) on a set X is a map from a directed set into X.I will also write i ≥ j to mean j ≤ i.Properties(i) and (ii) make I a partially ordered set. (See Dudley 1989,page 8for the reason why the the reflexive property, i ≤ i,is omitted.)Suppose π is some property that might hold for points in a set X. (For example,π might indicate that the point belongs to a particular subset.) If {x i : i ∈ I } is a neton X,say that property π holds eventually if there exists an i 0 ∈ I such that π holdfor each x i for i 0 ≤ i. Say that property π holds frequently if for each i 0 ∈ I thereis an i ∈ I such that π hold for each x i .A net {x i : i ∈ I } on a topological space X is said to converge to a point y(written x i → y) if,for each neighborhood V of y,the net is eventually within V . Example. Let A be a subset of a topological space X. A point y is defined tobe in the closure of A if: for each neighborhood V of y the intersection V ∩ A isnonempty. That is,to each neighborhood V there exits an x V in V ∩ A. The set Vof all neighborhoods of y is directed by inclusion: that is, V 1 ≤ V 2 is defined tomean V 1 ⊇ V 2 . The net {x V : V ∈ V} converges to y.Conversely,if a net on A converges to y then y must lie in the closure of A.Why?Pollard@Paris2001 30 March 2001


2 Appendix F: NetsThe points of the closure of A are precisely those expressible as limits of□ convergent nets on A. Exercise. Show that a function f from a topological space X into a topological□ space Y is continuous at a point x iff f (x i ) → f (y) for each net x i → y.Let {x i : i ∈ I } be a net. Each map ψ from another directed set J with theproperty• for each i 0 in I there exits a j 0 in J such that ψ(j) ≥ i 0 for all j ≥ j 0 . Thatis, ψ(j) ≥ i 0 eventually. defines a subnet of the net.Universal subnetsA net {x i : i ∈ I } on X is said to be universal if,for each subset B of X,eitherx i ∈ B eventually or x i ∈ B c eventually.Two important properties make universal nets a very elegant tool for the studyof compactness:(A) Every net has a universal subnet(B) A topological space is compact iff every universal net on it converges.Together,these two properties give to universal nets a role anlogous to the Cantordiagonalization argument for sequences.Proof of (A)The proof uses Zorn’s Lemma (see Dudley 1989,Section 1.5):• Let (P, ≤) be a partially ordered set. Suppose every chain (that is,a subset P 0such that,for all i, j ∈ P 0 ,either i ≤ j or j ≤ i) has an upper bound (that is,there exists an i 0 ∈ P for which i ≤ i 0 for all i ∈ P 0 ). Then P has a maximalelement (that is,there exists an m ∈ P such that there is no p ∈ P with m ≤ pand m ≠ p).The elements of the P in this proof are all the collections p of nonempty subsetsof X with the properties that(i) if B 1 ∈ p and B 2 ∈ p then B 1 ∩ B 2 ∈ p(ii) if B ∈ p then x i ∈ B frequentlyOrder P by inclusion: that is, p 1 ≤ p 2 is defined to mean p 1 ⊆ p 2 .If P 0 is a chain in P then ∪P 0 also belongs to P (Why?); it is an upper boundfor P 0 . By Zorn’s Lemma,there is a maximal p 0 in P.Assertion: For each subset A of X either A ∈ p 0 or A c ∈ p 0 . Proof: SupposeA /∈ p 0 . We cannot have x i ∈ A ∩ B frequently for every B in p 0 ,for thenp = p 0 ∪{A}∪{A ∩ B : b ∈ p 0 } would be a member of P larger than the maximal p 0 .There must exist some B 0 ∈ p 0 for which the net is outside A ∩ B 0 eventually.Similarly,if A c were not in p 0 ,there would exist some B 1 ∈ p 0 for which thenet were outside A c ∩ B 1 eventually. Because B 0 ∩ B 1 would be contained withinthe union of A ∩ B 0 and A c ∩ B 1 ,the net would eventually lie outside B 0 ∩ B 1 ,incontradiction to the defining properties of members of P. The contradiction showsthat A c ∈ P 0 .Pollard@Paris2001 30 March 2001


Appendix F Nets 3Let J denote the set of all pairs (i, A) with x i ∈ A and A ∈ p 0 . Order J by:(i, A) ≤ ( j, B) means i ≤ j and A ⊇ B. Define a subnet of {x i : i ∈ I } by themap ψ(i, A) = i. Assertion: the subnet is universal. Proof: Let A be a subset of X.Suppose,without loss of generality,that A ∈ p 0 . For some i,the pair (i, A) belongsto J. For all ( j, B) ≥ (i, A) we then have x j ∈ B ⊆ A.Proof of (B)Suppose {x i : i ∈ I } is a universal net (on a topological space X),which does notconverge. Then,to each x ∈ X there must exist an open neighborhood G x with theproperty that x i ∈ G c x frequently. Universality forces the stronger property: x i ∈ G c xeventually. Property (iii) of directed sets implies that x i ∈∩ x∈S G c x eventually,foreach finite S ⊂ X. That is,the net is eventaully outside ∪ x∈S G x . If the open cover{G x : x ∈ X} had a finite subcover,the net would have nowhere to go. The space Xcannot be compact.Conversely,suppose X is not compact. Then there exists some open cover Gof X with the property that for each finite H ⊂ G there exits a point x H /∈∪H.Order the finite subsets by inclusion: H 1 ≤ H 2 means H 1 ⊆ H 2 . Then {x H } is anet indexed by the finite subclasses of G. Let {x H(i) : i ∈ I } be a universal subnet.It cannot converge. For if x H(i) → y ∈ X then there would be a G in G containing ywith x H(i) ∈ G eventually. But when G is a member of the class H(i),as it must beeventually,the point x H(i) is,by definition,outside ∪H(i) ⊇ G.NotesKelley (1955,Chapter 2) has explained the theory of nets under the title “Moore-Smith convergence”,which identifies the originators of the idea.ReferencesDudley,R. M. (1989),Real Analysis and Probability,Wadsworth,Belmont,Calif.Kelley,J. (1955),General Topology,may 1970 edn,Van Nostrand,Princeton,NJ.Pollard@Paris2001 30 March 2001

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!