WEEK 8 - Of the Clux
WEEK 8 - Of the Clux
WEEK 8 - Of the Clux
Transform your PDFs into Flipbooks and boost your revenue!
Leverage SEO-optimized Flipbooks, powerful backlinks, and multimedia content to professionally showcase your products and significantly increase your reach.
<strong>WEEK</strong> 8<br />
Expectation<br />
Let (Ω, F, P) be a probability space. Let X be a random variable on this probability<br />
space and µ X its distribution. If its cummulative distribution function (c.d.f.)<br />
F X (y) = µ X ((−∞, y)) is differentiable, <strong>the</strong>n we define <strong>the</strong> probability density function<br />
(p.d.f.) f X as <strong>the</strong> derivative of F X . In this case, we know that <strong>the</strong> expectation<br />
of <strong>the</strong> random variable is defined as<br />
∫<br />
E(X) = xf X (x)dx.<br />
R<br />
This is well defined provided that f X is measurable. However, if <strong>the</strong> random variable<br />
takes only finitely (or counably) many values {a 1 , . . . , a p }, <strong>the</strong>n <strong>the</strong> c.d.f is a step<br />
function that is not differentiable everywhere (in particular, it is not differentiable at<br />
{a 1 , . . . , a p }). Then, we have seen in previous years that <strong>the</strong> expectation is defined<br />
as <strong>the</strong> sum:<br />
p∑<br />
E(X) = a i P(X = a i ).<br />
i=1<br />
However, it is possible to define a random variable that does not fit in any of <strong>the</strong> above<br />
categories (consider, for example, <strong>the</strong> random variable with a uniform distribution on<br />
<strong>the</strong> Cantor set, as defined in homework 6). How do we define its expectation Can<br />
we find a way to define <strong>the</strong> expectation for all random variables, without using <strong>the</strong>ir<br />
p.d.f. or c.d.f. but only <strong>the</strong>ir distribution µ X It turns out that this is possible by<br />
using <strong>the</strong> Lebesgue construction of <strong>the</strong> integral, only now it is not with respect to <strong>the</strong><br />
Lebesgue measure but <strong>the</strong> measure µ X :<br />
Definition 0.1. Let X be a random variable defined on a probability space (Ω, F, P).<br />
We define <strong>the</strong> expectation of X with respect to <strong>the</strong> probability measure P,<br />
E(X) as follows:<br />
(i) if X = 1 A is a indicator random variable, i.e. a random variable that is 1 on a<br />
set A ∈ F and zero o<strong>the</strong>rwise, <strong>the</strong>n<br />
E (1 A ) = P(A).<br />
(ii) if X = ∑ n<br />
i=1 a i1 Ai for some A i ∈ F, i = 1, . . . , n is a simple random variable,<br />
i.e. a random variable that only takes finitely many values, <strong>the</strong>n<br />
E<br />
( n∑<br />
i=1<br />
a i 1 Ai<br />
)<br />
=<br />
n∑<br />
a i P(A i ).<br />
i=1<br />
(iii) if X is a non-negative random variable, <strong>the</strong>n<br />
E(X) = lim<br />
n→∞<br />
E(X n ),<br />
1
for (X n ) n≥1 a sequence of simple random variables defined as:<br />
⎧<br />
i−1<br />
⎨ , if i−1 ≤ X(ω) < i and for i = 1, . . . , n2 n<br />
2 n 2 n 2 n<br />
X n (ω) = n, X(ω) ≥ n<br />
⎩<br />
0, o<strong>the</strong>rwise<br />
. (1)<br />
So,<br />
( n2 n<br />
∑<br />
E(X) = lim<br />
n→∞<br />
i=1<br />
( n2 n<br />
∑<br />
n→∞<br />
= lim<br />
i=1<br />
i − 1<br />
2 n P(i − 1<br />
2 n ≤ X < i<br />
2 n ) + nP(X ≥ n) )<br />
i − 1<br />
2 n µ X<br />
(<br />
[ i − 1<br />
2 n , i<br />
2 n ) )<br />
+ nµ X ([n, +∞))<br />
)<br />
. (2)<br />
(iv) if X is any random variable, we can always write it as a difference between two<br />
non-negative random variables<br />
X = X + − X − ,<br />
where X + = X1 X≥0 and X − = −X1 X
We now state <strong>the</strong> Radon-Nikodym <strong>the</strong>orem:<br />
Theorem 0.5. A measure µ on B(R) is absolutely continuous with respect to <strong>the</strong><br />
Lebesgue measure if and only if <strong>the</strong>re exists a measurable function f : R → R such<br />
that<br />
∫<br />
∫<br />
∀A ∈ B(R), µ(A) = 1 A (x)f(x)dx = f(x)dx.<br />
R<br />
We call f <strong>the</strong> Radon-Nikodym derivative of measure µ with respect to <strong>the</strong> Lebesgue<br />
measure and we often write f = dµ<br />
dx .<br />
We will show that if X has a p.d.f. f X , <strong>the</strong>n<br />
∫<br />
E(X) = xf X (x)dx (3)<br />
as expected.<br />
(i) We define<br />
R<br />
∫<br />
˜µ X (A) =<br />
A<br />
f X (x)dx.<br />
It is easy to check that this is a probability measure. Its c.d.f. will be<br />
˜F X (y) =<br />
∫ y<br />
−∞<br />
f X (x)dx =<br />
∫ y<br />
−∞<br />
A<br />
F ′ X(x)dx = F X (y)<br />
since f X is <strong>the</strong> derivative of F X and F X (−∞) = 0. So, <strong>the</strong> measures µ X and<br />
˜µ X have <strong>the</strong> same c.d.f. and by Cara<strong>the</strong>odory’s extension <strong>the</strong>orem, <strong>the</strong> two<br />
measures should be <strong>the</strong> same on B(R):<br />
∀A ∈ B(R), µ X (A) = ˜µ X (A).<br />
We conclude that µ X is absolutely continuous w.r.t. <strong>the</strong> Lebesgue measure and<br />
its Radon-Nikodym derivative is dµ X<br />
dx = f X.<br />
(ii) Suppose X ≥ 0. Then,<br />
( n2 n<br />
∑<br />
E(X) = lim<br />
n→∞<br />
i=1<br />
( n2 n<br />
∑<br />
n→∞<br />
= lim<br />
= lim<br />
n→∞<br />
∫R<br />
i=1<br />
( n2 n<br />
(<br />
i − 1<br />
2 µ n X [ i − 1 )<br />
2 , i<br />
n 2 ) + nµ n X ([n, +∞))<br />
i − 1<br />
1<br />
2<br />
∫R<br />
n [<br />
i−1<br />
2 n , i<br />
2 n ) (x)f X(x)dx + n<br />
∑<br />
i=1<br />
i − 1<br />
2 n 1 [ i−1<br />
2 n , i<br />
2 n ) (x) + n1 [n,+∞)(x)<br />
∫<br />
R<br />
)<br />
)<br />
=<br />
1 [n,+∞) (x)f X (x)dx<br />
f X (x)dx<br />
It is easy to see that <strong>the</strong> limit of <strong>the</strong> function inside <strong>the</strong> paren<strong>the</strong>sis is x. Using<br />
<strong>the</strong> Monotone Convergence Theorem which we state below, we can take <strong>the</strong><br />
limit inside <strong>the</strong> integral and we get<br />
∫ ∞<br />
0<br />
3<br />
xf X (x)dx<br />
)<br />
=
as promised. It is now easy to check that (3) is also true for any random variable<br />
(not necessarily non-negative).<br />
We define a discrete random variable in <strong>the</strong> usual way:<br />
Definition 0.6. X is called a discrete random variable if <strong>the</strong>re exists a finite or<br />
countantable set A ∈ B(R) (|A| ≤ | N |) such that µ X (A) = 1.<br />
Clearly, <strong>the</strong> distribution of a discrete random variable is not absolutely continuous<br />
with respect to <strong>the</strong> Lebesgue measure. In fact, <strong>the</strong>y are mutually singular: we say<br />
that a measures µ on B(R) and <strong>the</strong> Lebesgue measure are mutually singular if <strong>the</strong>re<br />
exists an A ∈ B(R) such that Leb(A) = 0 and µ(A c ) = 0.<br />
We said that if X is a discrete random variable, its distribution and <strong>the</strong> Lebesgue<br />
measure are mutually singular. The opposite is not true: for example, <strong>the</strong> randon<br />
variable with uniform distribution on <strong>the</strong> Cantor set (see homework 6) is not discrete<br />
but its distribution and <strong>the</strong> Lebesgue measure are mutually singular: <strong>the</strong> uniform<br />
distribution on <strong>the</strong> Cantor set has measure 1 on <strong>the</strong> Cantor set while <strong>the</strong> Lebesgue<br />
measure of <strong>the</strong> Cantor set is 0.<br />
It is often convenient to write <strong>the</strong> distribution of a discrete random variable as a<br />
linear combination of delta measures: A delta measure on some x ∈ R, denoted by<br />
δ x is defined as<br />
{ 1 , if x ∈ A<br />
∀A ∈ B(R), δ x (A) =<br />
0 , o<strong>the</strong>rwise<br />
If <strong>the</strong> distribution of a random variable X is δ x for some x ∈ R, <strong>the</strong>n we know that<br />
whatever <strong>the</strong> outcome of <strong>the</strong> random experiment, X = x with probability one. Thus,<br />
<strong>the</strong> expectation of any measurable function g of X will be Eg(X) = g(x).<br />
If X is a discrete random variable taking values in some set (a i ) p i=1 (with <strong>the</strong><br />
possibility that p = ∞), i.e.<br />
X =<br />
p∑<br />
a i 1 Ai , where A i = X −1 ({a i })<br />
i=1<br />
<strong>the</strong>n its distribution µ X can be written as<br />
µ X =<br />
p∑<br />
p i δ ai where p i = P(A i ).<br />
i=1<br />
The expectation of X will <strong>the</strong>n be<br />
E(X) =<br />
p∑<br />
a i p i<br />
i=1<br />
which is consistent with <strong>the</strong> definition we used before.<br />
4
Monotone and Dominated Convergence Theorems<br />
In order to define <strong>the</strong> expectation of a non-negative random variable X, we have used<br />
a particular sequence of partitions of R + corresponding to <strong>the</strong> sequence of simple<br />
functions (X n ) n≥1 defined in (1). However, any o<strong>the</strong>r sequence of partitions would<br />
have given <strong>the</strong> same result. This is a consequence of <strong>the</strong> Monotone Convergence Theorem.<br />
Before stating this <strong>the</strong>orem, we have to define almost sure (a.s.) convergence<br />
with respect to a measure µ:<br />
Definition 0.7. Let X n and X be random variables defined on (Ω, F, P). We say<br />
that a sequence of random variables (X n ) n≥1 converges almost surely (a.s.) to X with<br />
respect to P if <strong>the</strong> set of ω’s for which it does not converge to X(ω) has measure 0:<br />
P ({ω : X n (ω) does not converge toX(ω) }) = 0<br />
or, equivalently,<br />
P ({ω : X n (ω) → X(ω)}) = 1.<br />
Note that a.s. convergence with respect to any measure is weaker than pointwise<br />
convergence: if X n (ω) → X(ω) as n → ∞ for all ω, <strong>the</strong>n <strong>the</strong> set of all ω for which<br />
X n (ω) does not converge to X(ω) will be <strong>the</strong> empty set and, consequently, will have<br />
measure zero.<br />
Theorem 0.8 (Monotone Convergence Theorem). Let X n be an increasing<br />
sequence of non-negative random variables defined on a probability space (Ω, F, P).<br />
If <strong>the</strong> sequence X n converges almost surely to a random variable X, <strong>the</strong>n<br />
lim E(X n) = E(X).<br />
n→∞<br />
This <strong>the</strong>orem has <strong>the</strong> following consequence: suppose X is a non-negative measurable<br />
random variable defined on (Ω, F, P). Let (y (n)<br />
i ) Nn<br />
i=0 be any sequence of partitions<br />
of [0, ∞), i.e.<br />
0 = y (n)<br />
0 < y (n)<br />
1 < y (n)<br />
2 < · · · < y (n)<br />
N < n−1 y(n) N n<br />
and y (n)<br />
N n<br />
→ ∞, such that<br />
lim<br />
n→∞<br />
max (y i − y i−1 ) = 0 and {y (n)<br />
i<br />
i=1,...,N n<br />
For every n ≥ 1, we define X y n as follows:<br />
∑N n<br />
Xn(x) y = y (n)<br />
i−1 1 )<br />
.<br />
X<br />
([y −1 (n)<br />
i−1 ,y(n) i )<br />
i=1<br />
} Nn<br />
i=1 ⊆ {y(n+1) i } N n+1<br />
i=1<br />
Then, one can show using <strong>the</strong> same arguments as before that (X y n) n≥1 is an increasing<br />
sequence of non-negative random variables that converges pointwise to X. Thus, by<br />
<strong>the</strong> monotone convergence <strong>the</strong>orem,<br />
lim<br />
n→∞ EXy n = EX,<br />
5
i.e. <strong>the</strong> limit of <strong>the</strong> expectation of <strong>the</strong> simple functions Xn y corresponding to a partition<br />
(y (n)<br />
i ) Nn<br />
i=0 that approximate X is independent of <strong>the</strong> choice of partitions, as promised!<br />
Ano<strong>the</strong>r corollary of <strong>the</strong> monotone convergence <strong>the</strong>orem is <strong>the</strong> following:<br />
Corollary 0.9. Suppose that <strong>the</strong> Riemann-Stieltjes integral of function f exists.<br />
Then, its Lebesgue integral will be equal to <strong>the</strong> Riemann-Stieltjes integral.<br />
Proof. Suppose f ≥ 0. Since <strong>the</strong> Riemann-Stieltjes integral exists, it will be equal to<br />
∫<br />
(<br />
Nn<br />
)<br />
∑<br />
( )<br />
f(x)dx = lim<br />
inf f(z<br />
R<br />
n→∞<br />
∫R<br />
(n)<br />
i ) 1 (n) (x<br />
dx<br />
i−1 ,x(n) i ]<br />
i=1<br />
z (n)<br />
i<br />
∈(x (n)<br />
i−1 ,x(n) i ]<br />
for (x (n)<br />
i ) Nn<br />
i=0 a sequence of partitions of satisfying<br />
lim<br />
n→∞<br />
max (x i − x i−1 ) = 0 and {x (n)<br />
i<br />
i=1,...,N n<br />
} Nn<br />
i=1 ⊆ {x(n+1) i } N n+1<br />
i=1 .<br />
Remember that since <strong>the</strong> Riemann-Stieltjes integral exists, it will be independent of<br />
<strong>the</strong> choice of partition. The functions<br />
f n (x) =<br />
∑N n<br />
i=1<br />
z (n)<br />
i<br />
inf<br />
∈(x (n)<br />
i−1 ,x(n) i ]<br />
( )<br />
f(z (n)<br />
i ) 1 (n) (x i−1 ,x(n) i ]<br />
will be non-negative, increasing and will converge to f almost surely – remember that<br />
for <strong>the</strong> Riemann-Stieltjes integral to exist, f has to be continuous or have up to many<br />
discontinuities. f n might not converge to f on <strong>the</strong> discontinuities, but since <strong>the</strong>re are<br />
only finitely many of <strong>the</strong>m, <strong>the</strong>ir measure is 0. Thus, by <strong>the</strong> monotone convergence<br />
<strong>the</strong>orem, <strong>the</strong> Lebesgue integral will also be equal to <strong>the</strong> limit of <strong>the</strong> integrals of f n<br />
and thus equal to <strong>the</strong> Riemann-Stieltjes integral.<br />
The monotone convergence <strong>the</strong>orem allows us to interchange limits and expectations.<br />
Ano<strong>the</strong>r <strong>the</strong>orem that allows us to do that is <strong>the</strong> Dominated Convergence<br />
Theorem:<br />
Theorem 0.10 (Dominated Convergence Theorem). Let X n be a sequence of<br />
random variables defined on a probability space (Ω, F, P). If <strong>the</strong> sequence X n is<br />
bounded almost surely by a random variable Y , i.e. P(|X n | ≤ Y ) = 1 and EY < ∞,<br />
and <strong>the</strong> X n converge almost surely to a random variable X, <strong>the</strong>n<br />
lim E(X n) = E(X).<br />
n→∞<br />
Example 1. We are not always allowed to interchange limits and probabilities. For<br />
example, let f n (x) = n (0,<br />
1<br />
)(x). Then,<br />
n<br />
while<br />
∀x ∈ [0, 1],<br />
∀n ≥ 1,<br />
∫ 1<br />
0<br />
lim f n (x) = 0<br />
n→∞<br />
f n (x)dx = n · 1<br />
n .<br />
6
Consequently<br />
0 =<br />
∫ 1<br />
0<br />
lim f n(x)dx ≠ lim<br />
n→∞<br />
n→∞<br />
∫ 1<br />
0<br />
f n (x)dx = 1.<br />
Note that <strong>the</strong> sequence (f n (x)) n≥1 is nei<strong>the</strong>r bounded nor increasing and thus, nei<strong>the</strong>r<br />
<strong>the</strong> monotone convergence <strong>the</strong>orem not <strong>the</strong> dominated convergence <strong>the</strong>orem apply.<br />
7