04.01.2015 Views

WEEK 8 - Of the Clux

WEEK 8 - Of the Clux

WEEK 8 - Of the Clux

SHOW MORE
SHOW LESS

Transform your PDFs into Flipbooks and boost your revenue!

Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.

<strong>WEEK</strong> 8<br />

Expectation<br />

Let (Ω, F, P) be a probability space. Let X be a random variable on this probability<br />

space and µ X its distribution. If its cummulative distribution function (c.d.f.)<br />

F X (y) = µ X ((−∞, y)) is differentiable, <strong>the</strong>n we define <strong>the</strong> probability density function<br />

(p.d.f.) f X as <strong>the</strong> derivative of F X . In this case, we know that <strong>the</strong> expectation<br />

of <strong>the</strong> random variable is defined as<br />

∫<br />

E(X) = xf X (x)dx.<br />

R<br />

This is well defined provided that f X is measurable. However, if <strong>the</strong> random variable<br />

takes only finitely (or counably) many values {a 1 , . . . , a p }, <strong>the</strong>n <strong>the</strong> c.d.f is a step<br />

function that is not differentiable everywhere (in particular, it is not differentiable at<br />

{a 1 , . . . , a p }). Then, we have seen in previous years that <strong>the</strong> expectation is defined<br />

as <strong>the</strong> sum:<br />

p∑<br />

E(X) = a i P(X = a i ).<br />

i=1<br />

However, it is possible to define a random variable that does not fit in any of <strong>the</strong> above<br />

categories (consider, for example, <strong>the</strong> random variable with a uniform distribution on<br />

<strong>the</strong> Cantor set, as defined in homework 6). How do we define its expectation Can<br />

we find a way to define <strong>the</strong> expectation for all random variables, without using <strong>the</strong>ir<br />

p.d.f. or c.d.f. but only <strong>the</strong>ir distribution µ X It turns out that this is possible by<br />

using <strong>the</strong> Lebesgue construction of <strong>the</strong> integral, only now it is not with respect to <strong>the</strong><br />

Lebesgue measure but <strong>the</strong> measure µ X :<br />

Definition 0.1. Let X be a random variable defined on a probability space (Ω, F, P).<br />

We define <strong>the</strong> expectation of X with respect to <strong>the</strong> probability measure P,<br />

E(X) as follows:<br />

(i) if X = 1 A is a indicator random variable, i.e. a random variable that is 1 on a<br />

set A ∈ F and zero o<strong>the</strong>rwise, <strong>the</strong>n<br />

E (1 A ) = P(A).<br />

(ii) if X = ∑ n<br />

i=1 a i1 Ai for some A i ∈ F, i = 1, . . . , n is a simple random variable,<br />

i.e. a random variable that only takes finitely many values, <strong>the</strong>n<br />

E<br />

( n∑<br />

i=1<br />

a i 1 Ai<br />

)<br />

=<br />

n∑<br />

a i P(A i ).<br />

i=1<br />

(iii) if X is a non-negative random variable, <strong>the</strong>n<br />

E(X) = lim<br />

n→∞<br />

E(X n ),<br />

1


for (X n ) n≥1 a sequence of simple random variables defined as:<br />

⎧<br />

i−1<br />

⎨ , if i−1 ≤ X(ω) < i and for i = 1, . . . , n2 n<br />

2 n 2 n 2 n<br />

X n (ω) = n, X(ω) ≥ n<br />

⎩<br />

0, o<strong>the</strong>rwise<br />

. (1)<br />

So,<br />

( n2 n<br />

∑<br />

E(X) = lim<br />

n→∞<br />

i=1<br />

( n2 n<br />

∑<br />

n→∞<br />

= lim<br />

i=1<br />

i − 1<br />

2 n P(i − 1<br />

2 n ≤ X < i<br />

2 n ) + nP(X ≥ n) )<br />

i − 1<br />

2 n µ X<br />

(<br />

[ i − 1<br />

2 n , i<br />

2 n ) )<br />

+ nµ X ([n, +∞))<br />

)<br />

. (2)<br />

(iv) if X is any random variable, we can always write it as a difference between two<br />

non-negative random variables<br />

X = X + − X − ,<br />

where X + = X1 X≥0 and X − = −X1 X


We now state <strong>the</strong> Radon-Nikodym <strong>the</strong>orem:<br />

Theorem 0.5. A measure µ on B(R) is absolutely continuous with respect to <strong>the</strong><br />

Lebesgue measure if and only if <strong>the</strong>re exists a measurable function f : R → R such<br />

that<br />

∫<br />

∫<br />

∀A ∈ B(R), µ(A) = 1 A (x)f(x)dx = f(x)dx.<br />

R<br />

We call f <strong>the</strong> Radon-Nikodym derivative of measure µ with respect to <strong>the</strong> Lebesgue<br />

measure and we often write f = dµ<br />

dx .<br />

We will show that if X has a p.d.f. f X , <strong>the</strong>n<br />

∫<br />

E(X) = xf X (x)dx (3)<br />

as expected.<br />

(i) We define<br />

R<br />

∫<br />

˜µ X (A) =<br />

A<br />

f X (x)dx.<br />

It is easy to check that this is a probability measure. Its c.d.f. will be<br />

˜F X (y) =<br />

∫ y<br />

−∞<br />

f X (x)dx =<br />

∫ y<br />

−∞<br />

A<br />

F ′ X(x)dx = F X (y)<br />

since f X is <strong>the</strong> derivative of F X and F X (−∞) = 0. So, <strong>the</strong> measures µ X and<br />

˜µ X have <strong>the</strong> same c.d.f. and by Cara<strong>the</strong>odory’s extension <strong>the</strong>orem, <strong>the</strong> two<br />

measures should be <strong>the</strong> same on B(R):<br />

∀A ∈ B(R), µ X (A) = ˜µ X (A).<br />

We conclude that µ X is absolutely continuous w.r.t. <strong>the</strong> Lebesgue measure and<br />

its Radon-Nikodym derivative is dµ X<br />

dx = f X.<br />

(ii) Suppose X ≥ 0. Then,<br />

( n2 n<br />

∑<br />

E(X) = lim<br />

n→∞<br />

i=1<br />

( n2 n<br />

∑<br />

n→∞<br />

= lim<br />

= lim<br />

n→∞<br />

∫R<br />

i=1<br />

( n2 n<br />

(<br />

i − 1<br />

2 µ n X [ i − 1 )<br />

2 , i<br />

n 2 ) + nµ n X ([n, +∞))<br />

i − 1<br />

1<br />

2<br />

∫R<br />

n [<br />

i−1<br />

2 n , i<br />

2 n ) (x)f X(x)dx + n<br />

∑<br />

i=1<br />

i − 1<br />

2 n 1 [ i−1<br />

2 n , i<br />

2 n ) (x) + n1 [n,+∞)(x)<br />

∫<br />

R<br />

)<br />

)<br />

=<br />

1 [n,+∞) (x)f X (x)dx<br />

f X (x)dx<br />

It is easy to see that <strong>the</strong> limit of <strong>the</strong> function inside <strong>the</strong> paren<strong>the</strong>sis is x. Using<br />

<strong>the</strong> Monotone Convergence Theorem which we state below, we can take <strong>the</strong><br />

limit inside <strong>the</strong> integral and we get<br />

∫ ∞<br />

0<br />

3<br />

xf X (x)dx<br />

)<br />

=


as promised. It is now easy to check that (3) is also true for any random variable<br />

(not necessarily non-negative).<br />

We define a discrete random variable in <strong>the</strong> usual way:<br />

Definition 0.6. X is called a discrete random variable if <strong>the</strong>re exists a finite or<br />

countantable set A ∈ B(R) (|A| ≤ | N |) such that µ X (A) = 1.<br />

Clearly, <strong>the</strong> distribution of a discrete random variable is not absolutely continuous<br />

with respect to <strong>the</strong> Lebesgue measure. In fact, <strong>the</strong>y are mutually singular: we say<br />

that a measures µ on B(R) and <strong>the</strong> Lebesgue measure are mutually singular if <strong>the</strong>re<br />

exists an A ∈ B(R) such that Leb(A) = 0 and µ(A c ) = 0.<br />

We said that if X is a discrete random variable, its distribution and <strong>the</strong> Lebesgue<br />

measure are mutually singular. The opposite is not true: for example, <strong>the</strong> randon<br />

variable with uniform distribution on <strong>the</strong> Cantor set (see homework 6) is not discrete<br />

but its distribution and <strong>the</strong> Lebesgue measure are mutually singular: <strong>the</strong> uniform<br />

distribution on <strong>the</strong> Cantor set has measure 1 on <strong>the</strong> Cantor set while <strong>the</strong> Lebesgue<br />

measure of <strong>the</strong> Cantor set is 0.<br />

It is often convenient to write <strong>the</strong> distribution of a discrete random variable as a<br />

linear combination of delta measures: A delta measure on some x ∈ R, denoted by<br />

δ x is defined as<br />

{ 1 , if x ∈ A<br />

∀A ∈ B(R), δ x (A) =<br />

0 , o<strong>the</strong>rwise<br />

If <strong>the</strong> distribution of a random variable X is δ x for some x ∈ R, <strong>the</strong>n we know that<br />

whatever <strong>the</strong> outcome of <strong>the</strong> random experiment, X = x with probability one. Thus,<br />

<strong>the</strong> expectation of any measurable function g of X will be Eg(X) = g(x).<br />

If X is a discrete random variable taking values in some set (a i ) p i=1 (with <strong>the</strong><br />

possibility that p = ∞), i.e.<br />

X =<br />

p∑<br />

a i 1 Ai , where A i = X −1 ({a i })<br />

i=1<br />

<strong>the</strong>n its distribution µ X can be written as<br />

µ X =<br />

p∑<br />

p i δ ai where p i = P(A i ).<br />

i=1<br />

The expectation of X will <strong>the</strong>n be<br />

E(X) =<br />

p∑<br />

a i p i<br />

i=1<br />

which is consistent with <strong>the</strong> definition we used before.<br />

4


Monotone and Dominated Convergence Theorems<br />

In order to define <strong>the</strong> expectation of a non-negative random variable X, we have used<br />

a particular sequence of partitions of R + corresponding to <strong>the</strong> sequence of simple<br />

functions (X n ) n≥1 defined in (1). However, any o<strong>the</strong>r sequence of partitions would<br />

have given <strong>the</strong> same result. This is a consequence of <strong>the</strong> Monotone Convergence Theorem.<br />

Before stating this <strong>the</strong>orem, we have to define almost sure (a.s.) convergence<br />

with respect to a measure µ:<br />

Definition 0.7. Let X n and X be random variables defined on (Ω, F, P). We say<br />

that a sequence of random variables (X n ) n≥1 converges almost surely (a.s.) to X with<br />

respect to P if <strong>the</strong> set of ω’s for which it does not converge to X(ω) has measure 0:<br />

P ({ω : X n (ω) does not converge toX(ω) }) = 0<br />

or, equivalently,<br />

P ({ω : X n (ω) → X(ω)}) = 1.<br />

Note that a.s. convergence with respect to any measure is weaker than pointwise<br />

convergence: if X n (ω) → X(ω) as n → ∞ for all ω, <strong>the</strong>n <strong>the</strong> set of all ω for which<br />

X n (ω) does not converge to X(ω) will be <strong>the</strong> empty set and, consequently, will have<br />

measure zero.<br />

Theorem 0.8 (Monotone Convergence Theorem). Let X n be an increasing<br />

sequence of non-negative random variables defined on a probability space (Ω, F, P).<br />

If <strong>the</strong> sequence X n converges almost surely to a random variable X, <strong>the</strong>n<br />

lim E(X n) = E(X).<br />

n→∞<br />

This <strong>the</strong>orem has <strong>the</strong> following consequence: suppose X is a non-negative measurable<br />

random variable defined on (Ω, F, P). Let (y (n)<br />

i ) Nn<br />

i=0 be any sequence of partitions<br />

of [0, ∞), i.e.<br />

0 = y (n)<br />

0 < y (n)<br />

1 < y (n)<br />

2 < · · · < y (n)<br />

N < n−1 y(n) N n<br />

and y (n)<br />

N n<br />

→ ∞, such that<br />

lim<br />

n→∞<br />

max (y i − y i−1 ) = 0 and {y (n)<br />

i<br />

i=1,...,N n<br />

For every n ≥ 1, we define X y n as follows:<br />

∑N n<br />

Xn(x) y = y (n)<br />

i−1 1 )<br />

.<br />

X<br />

([y −1 (n)<br />

i−1 ,y(n) i )<br />

i=1<br />

} Nn<br />

i=1 ⊆ {y(n+1) i } N n+1<br />

i=1<br />

Then, one can show using <strong>the</strong> same arguments as before that (X y n) n≥1 is an increasing<br />

sequence of non-negative random variables that converges pointwise to X. Thus, by<br />

<strong>the</strong> monotone convergence <strong>the</strong>orem,<br />

lim<br />

n→∞ EXy n = EX,<br />

5


i.e. <strong>the</strong> limit of <strong>the</strong> expectation of <strong>the</strong> simple functions Xn y corresponding to a partition<br />

(y (n)<br />

i ) Nn<br />

i=0 that approximate X is independent of <strong>the</strong> choice of partitions, as promised!<br />

Ano<strong>the</strong>r corollary of <strong>the</strong> monotone convergence <strong>the</strong>orem is <strong>the</strong> following:<br />

Corollary 0.9. Suppose that <strong>the</strong> Riemann-Stieltjes integral of function f exists.<br />

Then, its Lebesgue integral will be equal to <strong>the</strong> Riemann-Stieltjes integral.<br />

Proof. Suppose f ≥ 0. Since <strong>the</strong> Riemann-Stieltjes integral exists, it will be equal to<br />

∫<br />

(<br />

Nn<br />

)<br />

∑<br />

( )<br />

f(x)dx = lim<br />

inf f(z<br />

R<br />

n→∞<br />

∫R<br />

(n)<br />

i ) 1 (n) (x<br />

dx<br />

i−1 ,x(n) i ]<br />

i=1<br />

z (n)<br />

i<br />

∈(x (n)<br />

i−1 ,x(n) i ]<br />

for (x (n)<br />

i ) Nn<br />

i=0 a sequence of partitions of satisfying<br />

lim<br />

n→∞<br />

max (x i − x i−1 ) = 0 and {x (n)<br />

i<br />

i=1,...,N n<br />

} Nn<br />

i=1 ⊆ {x(n+1) i } N n+1<br />

i=1 .<br />

Remember that since <strong>the</strong> Riemann-Stieltjes integral exists, it will be independent of<br />

<strong>the</strong> choice of partition. The functions<br />

f n (x) =<br />

∑N n<br />

i=1<br />

z (n)<br />

i<br />

inf<br />

∈(x (n)<br />

i−1 ,x(n) i ]<br />

( )<br />

f(z (n)<br />

i ) 1 (n) (x i−1 ,x(n) i ]<br />

will be non-negative, increasing and will converge to f almost surely – remember that<br />

for <strong>the</strong> Riemann-Stieltjes integral to exist, f has to be continuous or have up to many<br />

discontinuities. f n might not converge to f on <strong>the</strong> discontinuities, but since <strong>the</strong>re are<br />

only finitely many of <strong>the</strong>m, <strong>the</strong>ir measure is 0. Thus, by <strong>the</strong> monotone convergence<br />

<strong>the</strong>orem, <strong>the</strong> Lebesgue integral will also be equal to <strong>the</strong> limit of <strong>the</strong> integrals of f n<br />

and thus equal to <strong>the</strong> Riemann-Stieltjes integral.<br />

The monotone convergence <strong>the</strong>orem allows us to interchange limits and expectations.<br />

Ano<strong>the</strong>r <strong>the</strong>orem that allows us to do that is <strong>the</strong> Dominated Convergence<br />

Theorem:<br />

Theorem 0.10 (Dominated Convergence Theorem). Let X n be a sequence of<br />

random variables defined on a probability space (Ω, F, P). If <strong>the</strong> sequence X n is<br />

bounded almost surely by a random variable Y , i.e. P(|X n | ≤ Y ) = 1 and EY < ∞,<br />

and <strong>the</strong> X n converge almost surely to a random variable X, <strong>the</strong>n<br />

lim E(X n) = E(X).<br />

n→∞<br />

Example 1. We are not always allowed to interchange limits and probabilities. For<br />

example, let f n (x) = n (0,<br />

1<br />

)(x). Then,<br />

n<br />

while<br />

∀x ∈ [0, 1],<br />

∀n ≥ 1,<br />

∫ 1<br />

0<br />

lim f n (x) = 0<br />

n→∞<br />

f n (x)dx = n · 1<br />

n .<br />

6


Consequently<br />

0 =<br />

∫ 1<br />

0<br />

lim f n(x)dx ≠ lim<br />

n→∞<br />

n→∞<br />

∫ 1<br />

0<br />

f n (x)dx = 1.<br />

Note that <strong>the</strong> sequence (f n (x)) n≥1 is nei<strong>the</strong>r bounded nor increasing and thus, nei<strong>the</strong>r<br />

<strong>the</strong> monotone convergence <strong>the</strong>orem not <strong>the</strong> dominated convergence <strong>the</strong>orem apply.<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!