01.06.2015 Views

Hui's reference material on Markov chains (pdf)

Hui's reference material on Markov chains (pdf)

Hui's reference material on Markov chains (pdf)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2: <strong>Markov</strong> Chains and Queues in<br />

Discrete Time<br />

L. Breuer<br />

University of Kent<br />

1 Definiti<strong>on</strong><br />

Let X n with n ∈ N 0 denote random variables <strong>on</strong> a discrete space E. The sequence<br />

X = (X n : n ∈ N 0 ) is called a stochastic chain. If P is a probability measure X<br />

such that<br />

P (X n+1 = j|X 0 = i 0 , . . . , X n = i n ) = P (X n+1 = j|X n = i n ) (1)<br />

for all i 0 , . . . , i n , j ∈ E and n ∈ N 0 , then the sequence X shall be called a <strong>Markov</strong><br />

chain <strong>on</strong> E. The probability measure P is called the distributi<strong>on</strong> of X , and E is<br />

called the state space of X .<br />

If the c<strong>on</strong>diti<strong>on</strong>al probabilities P (X n+1 = j|X n = i n ) are independent of the<br />

time index n ∈ N 0 , then we call the <strong>Markov</strong> chain X homogeneous and denote<br />

p ij := P (X n+1 = j|X n = i)<br />

for all i, j ∈ E. The probability p ij is called transiti<strong>on</strong> probability from state i to<br />

state j. The matrix P := (p ij ) i,j∈E<br />

shall be called transiti<strong>on</strong> matrix of the chain<br />

X . C<strong>on</strong>diti<strong>on</strong> (1) is referred to as the <strong>Markov</strong> property.<br />

Example 1 If (X n : n ∈ N 0 ) are random variables <strong>on</strong> a discrete space E, which<br />

are stochastically independent and identically distributed (shortly: iid), then the<br />

chain X = (X n : n ∈ N 0 ) is a homogeneous <strong>Markov</strong> chain.<br />

Example 2 Discrete Random Walk<br />

Set E := Z and let (S n : n ∈ N) be a sequence of iid random variables with values<br />

in Z and distributi<strong>on</strong> π. Define X 0 := 0 and X n := ∑ n<br />

k=1 S k for all n ∈ N. Then<br />

the chain X = (X n : n ∈ N 0 ) is a homogeneous <strong>Markov</strong> chain with transiti<strong>on</strong><br />

probabilities p ij = π j−i . This chain is called discrete random walk.<br />

1


Example 3 Bernoulli process<br />

Set E := N 0 and choose any parameter 0 < p < 1. The definiti<strong>on</strong>s X 0 := 0 as<br />

well as<br />

{<br />

p, j = i + 1<br />

p ij :=<br />

1 − p, j = i<br />

for i ∈ N 0 determine a homogeneous <strong>Markov</strong> chain X = (X n : n ∈ N 0 ). It is<br />

called Bernoulli process with parameter p.<br />

So far, al examples have been chosen as to be homogeneous. The following<br />

theorem shows that there is a good reas<strong>on</strong> for this:<br />

Theorem 1 Be X = (X n : n ∈ N 0 ) a <strong>Markov</strong> chain <strong>on</strong> a discrete state space<br />

E. Then there is a homogeneous <strong>Markov</strong> chain X ′ = (X ′ n : n ∈ N 0) <strong>on</strong> the state<br />

space E × N 0 such that X n = pr 1 (X ′ n ) for all n ∈ N 0, with pr 1 denoting the<br />

projecti<strong>on</strong> to the first dimensi<strong>on</strong>.<br />

Proof: Let X be a <strong>Markov</strong> chain with transiti<strong>on</strong> probabilities<br />

p n;ij := P(X n+1 = j|X n = i)<br />

which may depend <strong>on</strong> the time instant n. Define the two–dimensi<strong>on</strong>al random<br />

variables X ′ n := (X n, n) for all n ∈ N 0 and denote the resulting distributi<strong>on</strong> of<br />

the chain X ′ = (X ′ n : n ∈ N 0) by P ′ . By definiti<strong>on</strong> we obtain X n = pr 1 (X ′ n ) for<br />

all n ∈ N 0 .<br />

Further P ′ (X ′ 0 = (i, k)) = δ k0·P(X 0 = i) holds for all i ∈ E, and all transiti<strong>on</strong><br />

probabilities<br />

p ′ (i,k),(j,l) = P ′ (X ′ k+1 = (j, l)|X ′ k = (i, k)) = δ l,k+1 · p k;ij<br />

can be expressed without a time index. Hence the <strong>Markov</strong> chain X ′ is homogeneous.<br />

□<br />

Because of this result, we will from now <strong>on</strong> treat <strong>on</strong>ly homogeneous <strong>Markov</strong><br />

<strong>chains</strong> and omit the adjective "homogeneous".<br />

Let P denote the transiti<strong>on</strong> matrix of a <strong>Markov</strong> chain <strong>on</strong> E. Then as an immediate<br />

c<strong>on</strong>sequence of its definiti<strong>on</strong> we obtain p ij ∈ [0, 1] for all i, j ∈ E and<br />

∑<br />

j∈E p ij = 1 for all i ∈ E. A matrix P with these properties is called a stochastic<br />

matrix <strong>on</strong> E. In the following we shall dem<strong>on</strong>strate that, given an initial distributi<strong>on</strong>,<br />

a <strong>Markov</strong> chain is uniquely determined by its transiti<strong>on</strong> matrix. Thus any<br />

stochastic matrix defines a family of <strong>Markov</strong> <strong>chains</strong>.<br />

2


Theorem 2 Let X denote a homogeneous <strong>Markov</strong> chain <strong>on</strong> E with transiti<strong>on</strong><br />

matrix P . Then the relati<strong>on</strong><br />

P (X n+1 = j 1 , . . . , X n+m = j m |X n = i) = p i,j1 · . . . · p jm−1 ,j m<br />

holds for all n ∈ N 0 , m ∈ N, and i, j 1 , . . . , j m ∈ E.<br />

Proof: This is easily shown by inducti<strong>on</strong> <strong>on</strong> m. For m = 1 the statement holds<br />

by definiti<strong>on</strong> of P . For m > 1 we can write<br />

P(X n+1 =j 1 , . . . , X n+m = j m |X n = i)<br />

= P (X n+1 = j 1 , . . . , X n+m = j m , X n = i)<br />

P (X n = i)<br />

= P (X n+1 = j 1 , . . . , X n+m = j m , X n = i)<br />

P (X n+1 = j 1 , . . . , X n+m−1 = j m−1 , X n = i)<br />

× P (X n+1 = j 1 , . . . , X n+m−1 = j m−1 , X n = i)<br />

P (X n = i)<br />

= P (X n+m = j m |X n = i, X n+1 = j 1 , . . . , X n+m−1 = j m−1 )<br />

× p i,j1 · . . . · p jm−2 ,j m−1<br />

= p jm−1 ,j m<br />

· p i,j1 · . . . · p jm−2 ,j m−1<br />

because of the inducti<strong>on</strong> hypothesis and the <strong>Markov</strong> property.<br />

□<br />

Let π be a probability distributi<strong>on</strong> <strong>on</strong> E with P(X 0 = i) = π i for all i ∈ E.<br />

Then theorem 2 immediately yields<br />

P (X 0 = j 0 , X 1 = j 1 , . . . , X m = j m ) = π j0 · p j0 ,j 1<br />

. . . p jm−1 ,j m<br />

(2)<br />

for all m ∈ N and j 0 , . . . , j m ∈ E. The chain with this distributi<strong>on</strong> P is denoted<br />

by X π and called the π–versi<strong>on</strong> of X . The probability measure π is called initial<br />

distributi<strong>on</strong> for X .<br />

Theorem 2 and the extensi<strong>on</strong> theorem by Tulcea (see appendix 6) show that<br />

a <strong>Markov</strong> chain is uniquely determined by its transiti<strong>on</strong> matrix and its initial distributi<strong>on</strong>.<br />

Whenever the initial distributi<strong>on</strong> π is not important or understood from<br />

the c<strong>on</strong>text, we will simply write X instead of X π . However, in an exact manner<br />

the notati<strong>on</strong> X denotes the family of all the versi<strong>on</strong>s X π of X , indexed by their<br />

initial distributi<strong>on</strong> π.<br />

3


Theorem 3 Let X denote a homogeneous <strong>Markov</strong> chain with transiti<strong>on</strong> matrix<br />

P . Then the relati<strong>on</strong><br />

P(X n+m = j|X n = i) = P m (i, j)<br />

holds for all m, n ∈ N 0 and i, j ∈ E, with P m (i, j) denoting the (i, j)th entry of<br />

the mth power of the matrix P . In particular, P 0 equals the identity matrix.<br />

Proof: This follows by inducti<strong>on</strong> <strong>on</strong> m.<br />

definiti<strong>on</strong> of P . For m > 1 we can write<br />

For m = 1 the statement holds by<br />

P(X n+m = j|X n = i) = P (X n+m = j, X n = i)<br />

P (X n = i)<br />

= ∑ P (X n+m = j, X n+m−1 = k, X n = i)<br />

P (X n+m−1 = k, X n = i)<br />

k∈E<br />

× P (X n+m−1 = k, X n = i)<br />

P (X n = i)<br />

= ∑ k∈E<br />

P (X n+m = j|X n+m−1 = k, X n = i) · P m−1 (i, k)<br />

= ∑ k∈E<br />

p kj · P m−1 (i, k) = P m (i, j)<br />

because of the inducti<strong>on</strong> hypothesis and the <strong>Markov</strong> property.<br />

□<br />

Thus the probabilities for transiti<strong>on</strong>s in m steps are given by the mth power<br />

of the transiti<strong>on</strong> matrix P . The rule P m+n = P m P n for the multiplicati<strong>on</strong> of<br />

matrices and theorem 3 lead to the decompositi<strong>on</strong>s<br />

P(X m+n = j|X 0 = i) = ∑ k∈E<br />

P(X m = k|X 0 = i) · P(X n = j|X 0 = k)<br />

which are known as the Chapman–Kolmogorov equati<strong>on</strong>s.<br />

For later purposes we will need a relati<strong>on</strong> closely related to the <strong>Markov</strong> property,<br />

which is called the str<strong>on</strong>g <strong>Markov</strong> property. Let τ denote a random variable<br />

with values in N 0 ∪ {∞}, such that the c<strong>on</strong>diti<strong>on</strong><br />

P(τ ≤ n|X ) = P(τ ≤ n|X 0 , . . . , X n ) (3)<br />

holds for all n ∈ N 0 . Such a random variable is called a (discrete) stopping<br />

time for X . The defining c<strong>on</strong>diti<strong>on</strong> means that the probability for the event {τ ≤<br />

4


n} depends <strong>on</strong>ly <strong>on</strong> the evoluti<strong>on</strong> of the chain until time n. In other words, the<br />

determinati<strong>on</strong> of a stopping time does not require any knowledge of the future.<br />

Now the str<strong>on</strong>g <strong>Markov</strong> property is stated in<br />

Theorem 4 Let X denote a <strong>Markov</strong> chain and τ a stopping time for X with P(τ <<br />

∞) = 1. Then the relati<strong>on</strong><br />

P(X τ+m = j|X 0 = i 0 , . . . , X τ = i τ ) = P(X m = j|X 0 = i τ )<br />

holds for all m ∈ N and i 0 , . . . , i τ , j ∈ E.<br />

Proof: The fact that the stopping time τ is finite and may assume <strong>on</strong>ly countably<br />

many values can be exploited in the transformati<strong>on</strong><br />

P(X τ+m = j|X 0 = i 0 , . . . , X τ = i τ )<br />

∞∑<br />

= P(τ = n, X τ+m = j|X 0 = i 0 , . . . , X τ = i τ )<br />

=<br />

=<br />

=<br />

n=0<br />

∞∑<br />

P(X τ+m = j|τ = n, X 0 = i 0 , . . . , X τ = i τ )<br />

n=0<br />

× P(τ = n|X 0 = i 0 , . . . , X τ = i τ )<br />

∞∑<br />

P(X n+m = j|X n = i τ ) · P(τ = n|X )<br />

n=0<br />

∞∑<br />

P(τ = n|X ) · P(X m = j|X 0 = i τ )<br />

n=0<br />

which yields the statement, as τ is finite with probability <strong>on</strong>e.<br />

□<br />

2 Classificati<strong>on</strong> of States<br />

Let X denote a <strong>Markov</strong> chain with state space E and transiti<strong>on</strong> matrix P . We<br />

call a state j ∈ E accessible from a state i ∈ E if there is a number m ∈ N 0<br />

with P (X m = j|X 0 = i) > 0. This relati<strong>on</strong> shall be denoted by i → j. If for<br />

two states i, j ∈ E, the relati<strong>on</strong>s i → j and j → i hold, then i and j are said to<br />

communicate, in notati<strong>on</strong> i ↔ j.<br />

5


Theorem 5 The relati<strong>on</strong> ↔ of communicati<strong>on</strong> between states is an equivalence<br />

relati<strong>on</strong>.<br />

Proof: Because of P 0 = I, communicati<strong>on</strong> is reflexive. Symmetry holds by<br />

definiti<strong>on</strong>. Thus it remains to show transitivity. For this, assume i ↔ j and j ↔ k<br />

for three states i, j, k ∈ E. This means that there are numbers m, n ∈ N 0 with<br />

P m (i, j) > 0 and P n (j, k) > 0. Hence, by the Chapman–Kolmogorov equati<strong>on</strong>,<br />

we obtain<br />

P(X m+n = k|X 0 = i) = ∑ h∈E<br />

P(X m = h|X 0 = i) · P(X n = k|X 0 = h)<br />

≥ P(X m = j|X 0 = i) · P(X n = k|X 0 = j) > 0<br />

which proves i → k. The remaining proof of k → i is completely analogous.<br />

□<br />

Because of this result and the countability, we can divide the state space E<br />

of a <strong>Markov</strong> chain into a partiti<strong>on</strong> of countably many equivalence classes with<br />

respect to the communicati<strong>on</strong> of states. Any such equivalence class shall be called<br />

communicati<strong>on</strong> class. A communicati<strong>on</strong> class C ⊂ E that does not allow access<br />

to states outside itself, i.e. for which the implicati<strong>on</strong><br />

i → j, i ∈ C ⇒ j ∈ C<br />

holds, is called closed. If a closed equivalence class c<strong>on</strong>sists <strong>on</strong>ly of <strong>on</strong>e state,<br />

then this state shall be called absorbing. If a <strong>Markov</strong> chain has <strong>on</strong>ly <strong>on</strong>e communicati<strong>on</strong><br />

class, i.e. if all states are communicating, then it is called irreducible.<br />

Otherwise it is called reducible.<br />

Example 4 Let X denote a discrete random walk (see example 2) with the specificati<strong>on</strong><br />

π 1 = p and π −1 = 1 − p for some parameter 0 < p < 1. Then X is<br />

irreducible.<br />

Example 5 The Bernoulli process (see example 3) with n<strong>on</strong>–trivial parameter<br />

0 < p < 1 is to the highest degree reducible. Every state x ∈ N 0 forms an own<br />

communicati<strong>on</strong> class. N<strong>on</strong>e of these is closed, thus there are no absorbing states.<br />

Theorem 6 Be X a <strong>Markov</strong> chain with state space E and transiti<strong>on</strong> matrix P .<br />

Let C = {c n : n ∈ I} ⊂ E with I ⊂ N be a closed communicati<strong>on</strong> class. Define<br />

the matrix P ′ by its entries p ′ ij := p ci ,c j<br />

for all i, j ∈ I. Then P ′ is stochastic.<br />

6


Proof: By definiti<strong>on</strong>, p ′ ij ∈ [0, 1] for all i, j ∈ I. Since C is closed, p c i ,k = 0 for<br />

all i ∈ I and k /∈ C. This implies<br />

∑<br />

p ′ ij = ∑ p ci ,c j<br />

= 1 − ∑ p ci ,k = 1<br />

j∈I j∈I<br />

k /∈C<br />

for all i ∈ I, as P is stochastic.<br />

□<br />

Thus the restricti<strong>on</strong> of a <strong>Markov</strong> chain X with state space E to the states of<br />

<strong>on</strong>e of its closed communicati<strong>on</strong> classes C defines a new <strong>Markov</strong> chain with state<br />

space C. If the states are relabeled according to their affiliati<strong>on</strong> to a communicati<strong>on</strong><br />

class, the transiti<strong>on</strong> matrix of X can be displayed in a block matrix form<br />

as<br />

⎡<br />

⎤<br />

Q Q 1 Q 2 Q 3 Q 4 . . .<br />

0 P 1 0 0 0 . . .<br />

P =<br />

0 0 P 2 0 0 . . .<br />

(4)<br />

⎢<br />

⎣ 0 0 0 P 3 0 . . . ⎥<br />

⎦<br />

. .<br />

.. .<br />

.. .<br />

.. .<br />

with P n being stochastic matrices <strong>on</strong> the closed communicati<strong>on</strong> classes C n . The<br />

first row c<strong>on</strong>tains the transiti<strong>on</strong> probabilities starting from communicati<strong>on</strong> classes<br />

that are not closed.<br />

Let X denote a <strong>Markov</strong> chain with state space E. In the rest of this secti<strong>on</strong> we<br />

shall investigate distributi<strong>on</strong> and expectati<strong>on</strong> of the following random variables:<br />

Define τ j as the stopping time of the first visit to the state j ∈ E, i.e.<br />

Denote the distributi<strong>on</strong> of τ j by<br />

for all i, j ∈ E and k ∈ N.<br />

τ j := min{n ∈ N : X n = j}<br />

F k (i, j) := P(τ j = k|X 0 = i)<br />

Lemma 1 The c<strong>on</strong>diti<strong>on</strong>al distributi<strong>on</strong> of the first visit to the state j ∈ E, given<br />

an initial state X 0 = i, can be determined iteratively by<br />

{<br />

p ij , k = 1<br />

F k (i, j) = ∑<br />

h≠j p ihF k−1 (h, j), k ≥ 2<br />

for all i, j ∈ E.<br />

7


Proof: For k = 1, the definiti<strong>on</strong> yields<br />

F 1 (i, j) = P(τ j = 1|X 0 = i) = P(X 1 = j|X 0 = i) = p ij<br />

for all i, j ∈ E. For k ≥ 2, c<strong>on</strong>diti<strong>on</strong>ing up<strong>on</strong> X 1 yields<br />

F k (i, j) = P(X 1 ≠ j, . . . , X k−1 ≠ j, X k = j|X 0 = i)<br />

= ∑ h≠j<br />

P(X 1 = h|X 0 = i)<br />

= ∑ h≠j<br />

due to the <strong>Markov</strong> property.<br />

□<br />

Now define<br />

× P(X 2 ≠ j, . . . , X k−1 ≠ j, X k = j|X 0 = i, X 1 = h)<br />

p ih · P(X 1 ≠ j, . . . , X k−2 ≠ j, X k−1 = j|X 0 = h)<br />

f ij := P(τ j < ∞|X 0 = i) =<br />

∞∑<br />

F k (i, j) (5)<br />

for all i, j ∈ E, which represents the probability of ever visiting state j after<br />

beginning in state i. Summing up over all k ∈ N in the formula of Lemma 1 leads<br />

to<br />

f ij = p ij + ∑ p ih f hj (6)<br />

h≠j<br />

for all i, j ∈ E. The proof is left as an exercise.<br />

Define N j as the random variable of the total number of visits to the state<br />

j ∈ E. Expressi<strong>on</strong> (6) is useful for computing the distributi<strong>on</strong> of N j :<br />

Theorem 7 Let X denote a <strong>Markov</strong> chain with state space E. The total number<br />

of visits to a state j ∈ E under the c<strong>on</strong>diti<strong>on</strong> that the chain starts in state i is given<br />

by<br />

and for i ≠ j<br />

k=1<br />

P(N j = m|X 0 = j) = f m−1<br />

jj (1 − f jj )<br />

P(N j = m|X 0 = i) =<br />

Thus the distributi<strong>on</strong> of N j is modified geometric.<br />

{<br />

1 − f ij , m = 0<br />

f ij f m−1<br />

jj (1 − f jj ), m ≥ 1<br />

8


Proof: Define τ (1)<br />

j<br />

:= τ j and τ (k+1)<br />

j<br />

with the c<strong>on</strong>venti<strong>on</strong> that min ∅ = ∞. Note that τ (k)<br />

j<br />

l > k.<br />

Then the sequence (τ (k)<br />

:= min{n > τ (k)<br />

j : X n = j} for all k ∈ N,<br />

= ∞ implies τ (l)<br />

j = ∞ for all<br />

j : k ∈ N) is a sequence of stopping times. The<br />

event {N j = m} is the same as the intersecti<strong>on</strong> of the events {τ (k)<br />

j < ∞} for<br />

k = 1, . . . , M and {τ (M+1)<br />

j = ∞}, with M = m if i ≠ j and M = m − 1 if<br />

i = j. Now this event can be further described by the intersecti<strong>on</strong> of the events<br />

{τ (k+1)<br />

j − τ (k)<br />

j < ∞} for k = 0, . . . , M − 1 and {τ (M+1)<br />

j − τ (M)<br />

j = ∞}, with M<br />

as above and the c<strong>on</strong>venti<strong>on</strong> τ (0)<br />

j := 0.<br />

The subevent {τ (k+1)<br />

j − τ (k)<br />

j < ∞} has probability f ij for k = 0 and because<br />

of the str<strong>on</strong>g <strong>Markov</strong> property (see theorem 4) probability f jj for k > 0. The<br />

probability for {τ (M+1)<br />

j − τ (M)<br />

j = ∞} is 1 − f ij for M = 0 and 1 − f jj for<br />

M > 0. Once more the str<strong>on</strong>g <strong>Markov</strong> property is the reas<strong>on</strong> for independence of<br />

the subevents. Now multiplicati<strong>on</strong> of the probabilities leads to the formulae in the<br />

statement.<br />

□<br />

Summing over all m in the above theorem leads to<br />

Corollary 1 For all j ∈ E, the zero–<strong>on</strong>e law<br />

P(N j < ∞|X 0 = j) =<br />

{<br />

1, f jj < 1<br />

0, f jj = 1<br />

holds, i.e. depending <strong>on</strong> f jj there are almost certainly infinitely many visits to a<br />

state j ∈ E.<br />

This result gives rise to the following definiti<strong>on</strong>s: A state j ∈ E is called<br />

recurrent if f jj = 1 and transient otherwise. Let us further define the potential<br />

matrix R = (r ij ) i,j∈E of the <strong>Markov</strong> chain by its entries<br />

r ij := E(N j |X 0 = i)<br />

for all i, j ∈ E. Thus an entry r ij gives the expected number of visits to the state<br />

j ∈ E under the c<strong>on</strong>diti<strong>on</strong> that the chain starts in state i ∈ E. As such, r ij can be<br />

computed by<br />

∞∑<br />

r ij = P n (i, j) (7)<br />

n=0<br />

for all i, j ∈ E. The results in theorem 7 and corollary 1 yield<br />

9


Corollary 2 For all i, j ∈ E the relati<strong>on</strong>s<br />

r jj = (1 − f jj ) −1 and r ij = f ij r jj<br />

hold, with the c<strong>on</strong>venti<strong>on</strong>s 0 −1 := ∞ and 0 · ∞ := 0 included. In particular, the<br />

expected number r jj of visits to the state j ∈ E is finite if j is transient and infinite<br />

if j is recurrent.<br />

Theorem 8 Recurrence and transience of states are class properties with respect<br />

to the relati<strong>on</strong> ↔. Furthermore, a recurrent communicati<strong>on</strong> class is always<br />

closed.<br />

Proof: Assume that i ∈ E is transient and i ↔ j. Then there are numbers<br />

m, n ∈ N with 0 < P m (i, j) ≤ 1 and 0 < P n (j, i) ≤ 1. The inequalities<br />

∞∑<br />

∞∑<br />

∞∑<br />

P k (i, i) ≥ P m+h+n (i, i) ≥ P m (i, j)P n (j, i) P k (j, j)<br />

k=0<br />

h=0<br />

now imply r jj < ∞ because of representati<strong>on</strong> (7). According to corollary 2 this<br />

means that j is transient, too.<br />

If j is recurrent, then the same inequalities lead to<br />

r ii ≥ P m (i, j)P n (j, i)r jj = ∞<br />

which signifies that i is recurrent, too. Since the above arguments are symmetric<br />

in i and j, the proof of the first statement is complete.<br />

For the sec<strong>on</strong>d statement assume that i ∈ E bel<strong>on</strong>gs to a communicati<strong>on</strong> class<br />

C ⊂ E and p ij > 0 for some state j ∈ E \ C. Then<br />

f ii = p ii + ∑ h≠i<br />

p ih f hi ≤ 1 − p ij < 1<br />

k=0<br />

according to formula (6), since f ji = 0 (otherwise i ↔ j). Thus i is transient,<br />

which proves the sec<strong>on</strong>d statement.<br />

□<br />

Theorem 9 If the state j ∈ E is transient, then lim n→∞ P n (i, j) = 0, regardless<br />

of the initial state i ∈ E.<br />

Proof: If the state j is transient, then the first equati<strong>on</strong> in corollary 2 yields r jj <<br />

∞. The sec<strong>on</strong>d equati<strong>on</strong> in the same corollary now implies r ij < ∞, which by<br />

the representati<strong>on</strong> (7) completes the proof.<br />

□<br />

10


3 Stati<strong>on</strong>ary Distributi<strong>on</strong>s<br />

Let X denote a <strong>Markov</strong> chain with state space E and π a measure <strong>on</strong> E. If<br />

P(X n = i) = P(X 0 = i) = π i for all n ∈ N and i ∈ E, then X π is called<br />

stati<strong>on</strong>ary, and π is called a stati<strong>on</strong>ary measure for X . If furthermore π is a<br />

probability measure, then it is called stati<strong>on</strong>ary distributi<strong>on</strong> for X .<br />

Theorem 10 Let X denote a <strong>Markov</strong> chain with state space E and transiti<strong>on</strong><br />

matrix P . Further, let π denote a probability distributi<strong>on</strong> <strong>on</strong> E with πP = π, i.e.<br />

π i = ∑ ∑<br />

π j p ji and π j = 1<br />

j∈E<br />

for all i ∈ E. Then π is a stati<strong>on</strong>ary distributi<strong>on</strong> for X. If π is a stati<strong>on</strong>ary<br />

distributi<strong>on</strong> for X , then πP = π holds.<br />

Proof: Let P(X 0 = i) = π i for all i ∈ E. Then P(X n = i) = P(X 0 = i) for all<br />

n ∈ N and i ∈ E follows by inducti<strong>on</strong> <strong>on</strong> n. The case n = 1 holds by assumpti<strong>on</strong>,<br />

and the inducti<strong>on</strong> step follows by inducti<strong>on</strong> hypothesis and the <strong>Markov</strong> property.<br />

The last statement is obvious.<br />

□<br />

The following examples show some features of stati<strong>on</strong>ary distributi<strong>on</strong>s:<br />

Example 6 Let the transiti<strong>on</strong> matrix of a <strong>Markov</strong> chain X be given by<br />

⎛<br />

⎞<br />

0.8 0.2 0 0<br />

P = ⎜0.2 0.8 0 0<br />

⎟<br />

⎝ 0 0 0.4 0.6⎠<br />

0 0 0.6 0.4<br />

Then π = (0.5, 0.5, 0, 0), π ′ = (0, 0, 0.5, 0.5) as well as any linear combinati<strong>on</strong> of<br />

them are stati<strong>on</strong>ary distributi<strong>on</strong>s for X . This shows that a stati<strong>on</strong>ary distributi<strong>on</strong><br />

does not need to be unique.<br />

j∈E<br />

Example 7 Bernoulli process (see example 1)<br />

The transiti<strong>on</strong> matrix of a Bernoulli process has the structure<br />

⎛<br />

⎞<br />

1 − p p 0 0 . . .<br />

. 0 1 − p p 0 ..<br />

P =<br />

⎜<br />

. ⎝ 0 0 1 − p p .. ⎟<br />

⎠<br />

.<br />

. .. . .. . .. . ..<br />

11


Hence πP = π implies first<br />

π 0 · (1 − p) = π 0 ⇒ π 0 = 0<br />

since 0 < p < 1. Assume that π n = 0 for any n ∈ N 0 . This and the c<strong>on</strong>diti<strong>on</strong><br />

πP = π further imply for π n+1<br />

π n · p + π n+1 · (1 − p) = π n+1 ⇒ π n+1 = 0<br />

which completes an inducti<strong>on</strong> argument proving π n = 0 for all n ∈ N 0 . Hence<br />

the Bernoulli process does not have a stati<strong>on</strong>ary distributi<strong>on</strong>.<br />

Example 8 The soluti<strong>on</strong> of πP = π and ∑ j∈E π j = 1 is unique for<br />

( )<br />

1 − p p<br />

P =<br />

p 1 − p<br />

with 0 < p < 1. Thus there are transiti<strong>on</strong> matrices which have exactly <strong>on</strong>e<br />

stati<strong>on</strong>ary distributi<strong>on</strong>.<br />

The questi<strong>on</strong> of existence and uniqueness of a stati<strong>on</strong>ary distributi<strong>on</strong> is <strong>on</strong>e of<br />

the most important problems in the theory of <strong>Markov</strong> <strong>chains</strong>. A simple answer<br />

can be given in the transient case (cf. example 7):<br />

Theorem 11 A transient <strong>Markov</strong> chain (i.e. a <strong>Markov</strong> chain with transient states<br />

<strong>on</strong>ly) has no stati<strong>on</strong>ary distributi<strong>on</strong>.<br />

Proof: Assume that πP = π holds for some distributi<strong>on</strong> π and take any enumerati<strong>on</strong><br />

E = (s n : n ∈ N) of the state space E. Choose any index m ∈ N with<br />

π sm > 0. Since ∑ ∞<br />

∑ n=1 π s n<br />

= 1 is bounded, there is an index M > m such that<br />

∞<br />

n=M π s n<br />

< π sm . Set ε := π sm − ∑ ∞<br />

n=M π s n<br />

. According to theorem 9, there is<br />

an index N ∈ N such that P n (s i , s m ) < ε for all i ≤ M and n ≥ N. Then the<br />

stati<strong>on</strong>arity of π implies<br />

π sm =<br />

∞∑<br />

π si P N (s i , s m ) =<br />

i=1<br />

< ε +<br />

∞∑<br />

i=M<br />

M−1<br />

∑<br />

i=1<br />

π si P N (s i , s m ) +<br />

∞∑<br />

π si P N (s i , s m )<br />

i=M<br />

π si = π sm<br />

12


which is a c<strong>on</strong>tradicti<strong>on</strong>.<br />

□<br />

For the recurrent case, a finer distincti<strong>on</strong> will be necessary. While the expected<br />

total number r jj of visits to a recurrent state j ∈ E is always infinite (see corollary<br />

2), there are differences in the rate of visits to a recurrent state. In order to describe<br />

these, define N i (n) as the number of visits to state i until time n. Further define<br />

for a recurrent state i ∈ E the mean time<br />

m i := E(τ i |X 0 = i)<br />

until the first visit to i (after time zero) under the c<strong>on</strong>diti<strong>on</strong> that the chain starts in<br />

i. By definiti<strong>on</strong> m i > 0 for all i ∈ E. The elementary renewal theorem (which<br />

will be proven later) states that<br />

E(N i (n)|X 0 = j)<br />

lim<br />

= 1 (8)<br />

n→∞ n m i<br />

for all recurrent i ∈ E and independently of j ∈ E provided j ↔ i, with the<br />

c<strong>on</strong>venti<strong>on</strong> of 1/∞ := 0. Thus the asymptotic rate of visits to a recurrent state<br />

is determined by the mean recurrence time of this state. This gives reas<strong>on</strong> to the<br />

following definiti<strong>on</strong>: A recurrent state i ∈ E with m i = E(τ i |X 0 = i) < ∞ will<br />

be called positive recurrent, otherwise i is called null recurrent. The distincti<strong>on</strong><br />

between positive and null recurrence is supported by the equivalence relati<strong>on</strong> ↔,<br />

as shown in<br />

Theorem 12 Positive recurrence and null recurrence are class properties with<br />

respect to the relati<strong>on</strong> of communicati<strong>on</strong> between states.<br />

Proof: Assume that i ↔ j for two states i, j ∈ E and i is null recurrent. Thus<br />

there are numbers m, n ∈ N with P n (i, j) > 0 and P m (j, i) > 0. Because of the<br />

13


epresentati<strong>on</strong> E(N i (k)|X 0 = i) = ∑ k<br />

l=0 P l (i, i), we obtain<br />

0 = lim<br />

k→∞<br />

∑ k<br />

l=0 P l (i, i)<br />

k<br />

≥ lim<br />

k→∞<br />

∑ k−m−n<br />

l=0<br />

P l (j, j)<br />

· P n (i, j)P m (j, i)<br />

k<br />

k − m − n<br />

= lim<br />

·<br />

k→∞ k<br />

∑ k−m−n<br />

l=0<br />

P l (j, j)<br />

· P n (i, j)P m (j, i)<br />

k − m − n<br />

∑ k<br />

l=0<br />

= lim<br />

P l (j, j)<br />

· P n (i, j)P m (j, i)<br />

k→∞ k<br />

= P n (i, j)P m (j, i)<br />

m j<br />

and thus m j = ∞, which signifies the null recurrence of j.<br />

□<br />

Thus we can call a communicati<strong>on</strong> class positive recurrent or null recurrent.<br />

In the former case, a c<strong>on</strong>structi<strong>on</strong> of a stati<strong>on</strong>ary distributi<strong>on</strong> is given in<br />

Theorem 13 Let i ∈ E be positive recurrent and define the mean first visit time<br />

m i := E(τ i |X 0 = i). Then a stati<strong>on</strong>ary distributi<strong>on</strong> π is given by<br />

π j := m −1<br />

i ·<br />

∞∑<br />

P(X n = j, τ i > n|X 0 = i)<br />

n=0<br />

for all j ∈ E. In particular, π i = m −1<br />

i<br />

communicati<strong>on</strong> class bel<strong>on</strong>ging to i.<br />

Proof: First of all, π is a probability measure since<br />

∑<br />

j∈E n=0<br />

∞∑<br />

P(X n = j, τ i > n|X 0 = i) =<br />

and π k = 0 for all states k outside of the<br />

=<br />

∞∑ ∑<br />

P(X n = j, τ i > n|X 0 = i)<br />

n=0 j∈E<br />

∞∑<br />

P(τ i > n|X 0 = i) = m i<br />

n=0<br />

14


The particular statements in the theorem are obvious from theorem 8 and the definiti<strong>on</strong><br />

of π. The stati<strong>on</strong>arity of π is shown as follows. First we obtain<br />

π j = m −1<br />

i ·<br />

= m −1<br />

i ·<br />

= m −1<br />

i ·<br />

∞∑<br />

P(X n = j, τ i > n|X 0 = i)<br />

n=0<br />

∞∑<br />

P(X n = j, τ i ≥ n|X 0 = i)<br />

n=1<br />

∞∑<br />

P(X n = j, τ i > n − 1|X 0 = i)<br />

n=1<br />

since X 0 = X τi = i in the c<strong>on</strong>diti<strong>on</strong>ing set {X 0 = i}. Because of<br />

P(X n = j, τ i > n − 1|X 0 = i)<br />

= P(X n = j, τ i > n − 1, X 0 = i)<br />

P(X 0 = i)<br />

= ∑ P(X n = j, X n−1 = k, τ i > n − 1, X 0 = i)<br />

P(X 0 = i)<br />

k∈E<br />

= ∑ P(X n = j, X n−1 = k, τ i > n − 1, X 0 = i)<br />

P(X n−1 = k, τ i > n − 1, X 0 = i)<br />

k∈E<br />

× P(X n−1 = k, τ i > n − 1, X 0 = i)<br />

P(X 0 = i)<br />

= ∑ k∈E<br />

p kj P(X n−1 = k, τ i > n − 1|X 0 = i)<br />

we can transform further<br />

∞∑ ∑<br />

π j = m −1<br />

i · p kj P(X n−1 = k, τ i > n − 1|X 0 = i)<br />

n=1 k∈E<br />

= ∑ k∈E<br />

p kj · m −1<br />

i<br />

which completes the proof.<br />

□<br />

∞∑<br />

n=0<br />

P(X n = k, τ i > n|X 0 = i) = ∑ k∈E<br />

15<br />

π k p kj


Theorem 14 Let X denote an irreducible, positive recurrent <strong>Markov</strong> chain. Then<br />

X has a unique stati<strong>on</strong>ary distributi<strong>on</strong>.<br />

Proof: Existence has been shown in theorem 13. Uniqueness of the stati<strong>on</strong>ary<br />

distributi<strong>on</strong> can be seen as follows. Let π denote the stati<strong>on</strong>ary distributi<strong>on</strong> as<br />

c<strong>on</strong>structed in theorem 13 and i the positive recurrent state that served as recurrence<br />

point for π. Further, let ν denote any stati<strong>on</strong>ary distributi<strong>on</strong> for X . Then<br />

there is a state j ∈ E with ν j > 0 and a number m ∈ N with P m (j, i) > 0, since<br />

X is irreducible. C<strong>on</strong>sequently we obtain<br />

ν i = ∑ k∈E<br />

ν k P m (k, i) ≥ ν j P m (j, i) > 0<br />

Hence we can multiply ν by a skalar factor c such that c · ν i = π i = 1/m i . Denote<br />

˜ν := c · ν.<br />

Let ˜P denote the transiti<strong>on</strong> matrix P without the ith column, i.e. we define<br />

the (j, k)th entry of ˜P by ˜p jk = p jk if k ≠ i and zero otherwise. Denote further<br />

the Dirac measure <strong>on</strong> i by δ i , i.e. δj i = 1 if i = j and zero otherwise. Then the<br />

stati<strong>on</strong>ary distributi<strong>on</strong> π can be represented by π = m −1<br />

i<br />

· δ i ∑ ∞<br />

n=0 ˜P n .<br />

We first claim that m i˜ν = δ i + m i˜ν ˜P . This is clear for the entry ˜ν i and easily<br />

seen for ˜ν j with j ≠ i because in this case (˜ν ˜P ) j = c · (νP ) j = ˜ν j . Now we can<br />

proceed with the same argument to see that<br />

m i˜ν = δ i + (δ i + m i˜ν ˜P ) ˜P = δ i + δ i ˜P + mi˜ν ˜P 2 = . . .<br />

∑ ∞<br />

= δ i ˜P n = m i π<br />

n=0<br />

Hence ˜ν already is a probability measure and the skalar factor must be c = 1. This<br />

yields ν = ˜ν = π and thus the statement.<br />

□<br />

Remark 1 At a closer look the assumpti<strong>on</strong> of irreducibility may be relaxed to<br />

some extend. For example, if there is exactly <strong>on</strong>e closed positive recurrent communicati<strong>on</strong><br />

class and a set of transient and inaccessible states (i.e. states j for<br />

which there is no state i with i → j), then the above statement still holds although<br />

X is not irreducible.<br />

A first c<strong>on</strong>sequence of the uniqueness is the following simpler representati<strong>on</strong><br />

of the stati<strong>on</strong>ary distributi<strong>on</strong>:<br />

16


Theorem 15 Let X denote an irreducible, positive recurrent <strong>Markov</strong> chain. Then<br />

the stati<strong>on</strong>ary distributi<strong>on</strong> π of X is given by<br />

for all j ∈ E.<br />

π j = m −1<br />

j =<br />

1<br />

E(τ j |X 0 = j)<br />

Proof: Since all states in E are positive recurrent, the c<strong>on</strong>structi<strong>on</strong> in theorem 13<br />

can be pursued for any inital state j. This yields π j = m −1<br />

j for all j ∈ E. The<br />

statement now follows from the uniqueness of the stati<strong>on</strong>ary distributi<strong>on</strong>.<br />

□<br />

Corollary 3 For an irreducible, positive recurrent <strong>Markov</strong> chain, the stati<strong>on</strong>ary<br />

probability π j of a state j coincides with its asymptotic rate of recurrence, i.e.<br />

E(N j (n)|X 0 = i)<br />

lim<br />

= π j<br />

n→∞ n<br />

for all j ∈ E and independently of i ∈ E. Further, if an asymptotic distributi<strong>on</strong><br />

p = lim n→∞ P(X n = .) does exist, then it coincides with the stati<strong>on</strong>ary distributi<strong>on</strong>.<br />

In particular, it is independent of the initial distributi<strong>on</strong> of X .<br />

Proof: The first statement immediately follows from equati<strong>on</strong> (8). For the sec<strong>on</strong>d<br />

statement, it suffices to employ E(N j (n)|X 0 = i) = ∑ n<br />

l=0 P l (i, j). If an<br />

asymptotic distributi<strong>on</strong> p does exist, then for any initial distributi<strong>on</strong> ν we obtain<br />

p j = lim (νP n ) j = ∑ ν i lim P n (i, j)<br />

n→∞ n→∞<br />

i∈E<br />

= ∑ ∑ n<br />

l=0<br />

ν i lim<br />

P l (i, j)<br />

= ∑ ν i π j<br />

n→∞ n<br />

i∈E<br />

i∈E<br />

independently of ν.<br />

□<br />

= π j<br />

4 Restricted <strong>Markov</strong> Chains<br />

Now let F ⊂ E denote any subset of the state space E. Define τ F (k) to be the<br />

stopping time of the kth visit of X to the set F , i.e.<br />

τ F (k + 1) := min{n > τ F (k) : X n ∈ F }<br />

17


with τ F (0) := 0. If X is recurrent, then the str<strong>on</strong>g <strong>Markov</strong> property (theorem 4)<br />

ensures that the chain X F = (X F n : n ∈ N) with X F n := X τ F (n) is a recurrent<br />

<strong>Markov</strong> chain, too. It is called the <strong>Markov</strong> chain restricted to F . In case of positive<br />

recurrence, we can obtain the stati<strong>on</strong>ary distributi<strong>on</strong> of X F from the stati<strong>on</strong>ary<br />

distributi<strong>on</strong> of X in a simple manner:<br />

Theorem 16 If the <strong>Markov</strong> chain X is positive recurrent, then the stati<strong>on</strong>ary distributi<strong>on</strong><br />

of X F is given by<br />

πj F π j<br />

= ∑<br />

k∈F π k<br />

for all j ∈ F .<br />

Proof: Choose any state i ∈ F and recall from theorem 13 the expressi<strong>on</strong><br />

π j := m −1<br />

i ·<br />

∞∑<br />

P(X n = j, τ i > n|X 0 = i)<br />

n=0<br />

which holds for all j ∈ F . For πj<br />

F we can perform the same c<strong>on</strong>structi<strong>on</strong> with<br />

respect to the chain X F . By the definiti<strong>on</strong> of X F it is clear that the number of<br />

visits to the state j between two c<strong>on</strong>secutive visits to i is the same for the <strong>chains</strong><br />

X and X F . Hence the sum expressi<strong>on</strong> for πj F , which is the expectati<strong>on</strong> of that<br />

number of visits, remains the same as for π j . The other factor m −1<br />

i in the formula<br />

above is independent of j and serves <strong>on</strong>ly as a normalizati<strong>on</strong> c<strong>on</strong>stant, i.e. in order<br />

to secure that ∑ j∈E π j = 1. Hence for a c<strong>on</strong>structi<strong>on</strong> of πj<br />

F with respect to X F<br />

this needs to be replaced by (m i · ∑k∈F π k) −1 , which then yields the statement.<br />

□<br />

Theorem 17 Let X = (X n : n ∈ N 0 ) denote an irreducible and positive recurrent<br />

<strong>Markov</strong> chain with discrete state space E. Further let F ⊂ E denote any<br />

subset of E, and X F the <strong>Markov</strong> chain restricted to F . Denote<br />

τ F := min{n ∈ N : X n ∈ F }<br />

Then a measure ν <strong>on</strong> E is stati<strong>on</strong>ary for X if and <strong>on</strong>ly if ν ′ = (ν i : i ∈ F ) is<br />

stati<strong>on</strong>ary for X F and<br />

for all j ∈ E \ F .<br />

ν j = ∑ k∈F<br />

ν k<br />

∞<br />

∑<br />

n=0<br />

P(X n = j, τ F > n|X 0 = k) (9)<br />

18


Proof: Due to theorem 16 it suffices to prove equati<strong>on</strong> (9) for j ∈ E \ F . Choose<br />

any state i ∈ F and define<br />

τ i := min{n ∈ N : X n = i}<br />

According to theorem 13 the stati<strong>on</strong>ary measure v for X is given by<br />

ν j = ν i ·<br />

(<br />

∞∑<br />

τi<br />

∑−1<br />

P(X n = j, τ i > n|X 0 = i) = ν i · E i<br />

n=0<br />

n=0<br />

1 Xn=j<br />

for j ∈ E \ F , where E i denotes the c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> given X 0 = i. Define<br />

further<br />

τ F i := min{n ∈ N : X F n = i}<br />

Because of the str<strong>on</strong>g <strong>Markov</strong> property we can proceed as<br />

⎛<br />

⎞<br />

τ<br />

∑i F −1 τ∑<br />

F −1<br />

ν j = ν i · E i<br />

⎝<br />

⎠<br />

n=0<br />

E X F n<br />

⎛<br />

= ν i · ∑ ∑<br />

E i<br />

k∈F<br />

τi F −1<br />

⎝<br />

n=0<br />

m=0<br />

1 X F n =k<br />

1 Xm=j<br />

⎞ (<br />

τF<br />

∑−1<br />

⎠ · E k<br />

m=0<br />

1 Xm=j<br />

Regarding the restricted <strong>Markov</strong> chain X F , theorem 13 states that<br />

⎛ ⎞<br />

τ<br />

∑i F −1<br />

∞∑<br />

E i<br />

⎝ 1 X F<br />

⎠<br />

n =k = P(Xn F = k, τ i F > n|X0 F = i) = ν k<br />

ν i<br />

n=0<br />

for all k ∈ F . Hence we obtain<br />

ν j = ∑ k∈F<br />

which was to be proven.<br />

□<br />

ν k<br />

n=0<br />

∞<br />

∑<br />

n=0<br />

P(X n = j, τ F > n|X 0 = k)<br />

5 C<strong>on</strong>diti<strong>on</strong>s for Positive Recurrence<br />

In the third part of this course we will need some results <strong>on</strong> the behaviour of a<br />

<strong>Markov</strong> chain <strong>on</strong> a finite subset of its state space. As a first fundamental result we<br />

state<br />

19<br />

)<br />

)


Theorem 18 An irreducible <strong>Markov</strong> chain with finite state space F is positive<br />

recurrent.<br />

Proof: For all n ∈ N and i ∈ F we have ∑ j∈E P n (i, j) = 1. Hence it is not<br />

possible that lim n→∞ P n (i, j) = 0 for all j ∈ F . Thus there is <strong>on</strong>e state h ∈ F<br />

such that r hh = ∑ ∞<br />

n=0 P n (h, h) = ∞, which means by corollary 2 that h is<br />

recurrent and by irreducibility that the chain is recurrent.<br />

If the chain were null recurrent, then according to the relati<strong>on</strong> in (8)<br />

1<br />

n∑<br />

lim P k (i, j) = 0<br />

n→∞ n<br />

k=1<br />

would hold for all j ∈ F , independently of i because of irreducibility. But this<br />

would imply that lim n→∞ P n (i, j) = 0 for all j ∈ F , which c<strong>on</strong>tradicts our first<br />

observati<strong>on</strong> in this proof. Hence the chain must be positive recurrent.<br />

□<br />

For irreducible <strong>Markov</strong> <strong>chains</strong> the c<strong>on</strong>diti<strong>on</strong> E(τ i |X 0 = i) < ∞ implies positive<br />

recurrence of state i and hence positive recurrence of the whole chain. Writing<br />

τ F for the time of the first visit to the set F , we now can state the following<br />

generalizati<strong>on</strong> of this c<strong>on</strong>diti<strong>on</strong>:<br />

Theorem 19 Let X denote an irreducible <strong>Markov</strong> chain with state space E and<br />

be F ⊂ E a finite subset of E. The chain X is positive recurrent if and <strong>on</strong>ly if<br />

E(τ F |X 0 = i) < ∞ for all i ∈ F .<br />

Proof: If X is positive recurrent, then E(τ F |X 0 = i) ≤ E(τ i |X 0 = i) < ∞ for all<br />

i ∈ F , by the definiti<strong>on</strong> of positive recurrence.<br />

Now assume that E(τ F |X 0 = i) < ∞ for all i ∈ F . Define the stopping times<br />

σ(i) := min{k ∈ N : Xk<br />

F = i} and random variables Y k := τ F (k) − τ F (k −<br />

1). Since F is finite, m := max j∈F E(τ F |X 0 = j) < ∞. We shall denote the<br />

c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> given X 0 = i by E i . For i ∈ F we now obtain<br />

⎛ ⎞<br />

σ(i)<br />

∑<br />

∞∑ ( )<br />

E(τ i |X 0 = i) = E i<br />

⎝ Y k<br />

⎠ = E i E(Yk |X τF (k−1)) · 1 k≤σ(i)<br />

≤ m ·<br />

k=1<br />

k=1<br />

∞∑<br />

P(σ(i) ≥ k|X 0 = i) = m · E(σ(i)|X 0 = i)<br />

k=1<br />

Since F is finite, X F is positive recurrent by theorem 18. Hence we know that<br />

E(σ(i)|X 0 = i) < ∞, and thus E(τ i |X 0 = i) < ∞ which shows that X is positive<br />

20


ecurrent.<br />

□<br />

An often difficult problem is to determine whether a given <strong>Markov</strong> chain is<br />

positive recurrent or not. C<strong>on</strong>cerning this, we now introduce <strong>on</strong>e of the most<br />

important criteria for the existence of stati<strong>on</strong>ary distributi<strong>on</strong>s of <strong>Markov</strong> <strong>chains</strong><br />

occuring in queueing theory. It is known as Foster’s criteri<strong>on</strong>.<br />

Theorem 20 Let X denote an irreducible <strong>Markov</strong> chain with countable state<br />

space E and transiti<strong>on</strong> matrix P . Further let F denote a finite subset of E. If<br />

there is a functi<strong>on</strong> h : E → R with inf{h(i) : i ∈ E} > −∞, such that the<br />

c<strong>on</strong>diti<strong>on</strong>s<br />

∑<br />

∑<br />

p ik h(k) < ∞ and p jk h(k) ≤ h(j) − ε<br />

k∈E<br />

hold for some ε > 0 and all i ∈ F and j ∈ E \ F , then X is positive recurrent.<br />

Proof: Without loss of generality we can assume h(i) ≥ 0 for all i ∈ E, since<br />

otherwise we <strong>on</strong>ly need to increase h by a suitable c<strong>on</strong>stant. Define the stopping<br />

time τ F := min{n ∈ N 0 : X n ∈ F }. First we observe that<br />

E(h(X n+1 ) · 1 τF >n+1|X 0 , . . . , X n ) ≤ E(h(X n+1 ) · 1 τF >n|X 0 , . . . , X n )<br />

= 1 τF >n · ∑<br />

p Xn,kh(k)<br />

k∈E<br />

k∈E<br />

≤ 1 τF >n · (h(X n ) − ε)<br />

= h(X n ) · 1 τF >n − ε · 1 τF >n<br />

holds for all n ∈ N 0 , where the first equality is due to (13). We now proceed with<br />

0 ≤ E(h(X n+1 ) · 1 τF >n+1|X 0 = i)<br />

= E(E(h(X n+1 ) · 1 τF >n+1|X 0 , . . . , X n )|X 0 = i)<br />

≤ E(h(X n ) · 1 τF >n|X 0 = i) − εP(τ F > n|X 0 = i)<br />

≤ . . .<br />

≤ E(h(X 0 ) · 1 τF >0|X 0 = i) − ε<br />

n∑<br />

·P(τ F > k|X 0 = i)<br />

which holds for all i ∈ E \ F and n ∈ N 0 . For n → ∞ this implies<br />

∞∑<br />

E(τ F |X 0 = i) = P(τ F > k|X 0 = i) ≤ h(i)/ε < ∞<br />

k=0<br />

21<br />

k=0


for i ∈ E \ F . Now the mean return time to the state set F is bounded by<br />

E(τ F |X 0 = i) = ∑ p ij + ∑<br />

p ij E(τ F + 1|X 0 = j)<br />

j∈F<br />

j∈E\F<br />

≤ 1 + ε −1 ∑ j∈E<br />

p ij h(j) < ∞<br />

for all i ∈ F , which completes the proof.<br />

□<br />

Notes<br />

<strong>Markov</strong> <strong>chains</strong> originate from a series of papers written by A. <strong>Markov</strong> at the beginning<br />

of the 20th century. His first applicati<strong>on</strong> is given here as exercise 3. However,<br />

methods and terminology at that time were very different from today’s presentati<strong>on</strong>s.<br />

The literature <strong>on</strong> <strong>Markov</strong> <strong>chains</strong> is perhaps the most extensive in the field of<br />

stochastic processes. This is not surprising, as <strong>Markov</strong> <strong>chains</strong> form a simple and<br />

useful starting point for the introducti<strong>on</strong> of other processes.<br />

Textbook presentati<strong>on</strong>s are given in Feller [3], Breiman [1], Karlin and Taylor<br />

[5], or Çinlar [2], to name but a few. The treatment in Ross [7] c<strong>on</strong>tains the useful<br />

c<strong>on</strong>cept of time–reversible <strong>Markov</strong> <strong>chains</strong>. An exhaustive introducti<strong>on</strong> to <strong>Markov</strong><br />

<strong>chains</strong> <strong>on</strong> general state spaces and c<strong>on</strong>diti<strong>on</strong>s for their positive recurrence is given<br />

in Meyn and Tweedie [6].<br />

Exercise 1 Let (X n : n ∈ N 0 ) be a family of iid random variables with discrete<br />

state space. Show that X = (X n : n ∈ N 0 ) is a homogeneous <strong>Markov</strong> chain.<br />

Exercise 2 Let (X n : n ∈ N 0 ) be iid random variables <strong>on</strong> N 0 with probabilities<br />

a i := P(X n = i) for all n, i ∈ N 0 . The event X n > max(X 0 , . . . , X n−1 ) for<br />

n ≥ 1 is called a record at time n. Define T i as the time of the ith record, i.e.<br />

T 0 := 0 and T i+1 := min{n ∈ N : X n > X Ti } for all i ∈ N 0 . Denote the ith<br />

record value by R i := X Ti . Show that (R i : i ∈ N 0 ) and ((R i , T i ) : i ∈ N 0 ) are<br />

<strong>Markov</strong> <strong>chains</strong> by determining their transiti<strong>on</strong> probabilities.<br />

Exercise 3 Diffusi<strong>on</strong> model by Bernoulli and Laplace<br />

The following is a stochastic model for the flow of two incompressible fluids<br />

between two c<strong>on</strong>tainers: Two boxes c<strong>on</strong>tain m balls each. Of these 2m balls, b<br />

22


are black and the others are white. The system is said to be in state i if the first box<br />

c<strong>on</strong>tains i black balls. A state transiti<strong>on</strong> is performed by choosing <strong>on</strong>e ball out of<br />

each box at random (meaning here that each ball is chosen with equal probability)<br />

and then interchanging the two. Derive a <strong>Markov</strong> chain model for the system and<br />

determine the transiti<strong>on</strong> probabilities.<br />

Exercise 4 Let X denote a <strong>Markov</strong> chain with m < ∞ states. Show that if state<br />

j is accessible from state i, then it is accessible in at most m − 1 transiti<strong>on</strong>s.<br />

Exercise 5 Let p = (p n : n ∈ N 0 ) be a discrete probability distributi<strong>on</strong> and define<br />

⎛<br />

⎞<br />

p 0 p 1 p 2 . . .<br />

. p<br />

P =<br />

0 p 1 ..<br />

⎜<br />

. ⎝ p 0 .. ⎟<br />

⎠<br />

. ..<br />

with all n<strong>on</strong>–specified entries being zero. Let X denote a <strong>Markov</strong> chain with<br />

state space N 0 and transiti<strong>on</strong> matrix P . Derive an expressi<strong>on</strong> (in terms of discrete<br />

c<strong>on</strong>voluti<strong>on</strong>s) for the transiti<strong>on</strong> probabilities P(X n+m = j|X n = i) with n, m ∈<br />

N 0 and i, j ∈ N 0 . Apply the result to the special case of a Bernoulli process (see<br />

example 3).<br />

Exercise 6 Prove equati<strong>on</strong> (6).<br />

Exercise 7 Prove the equati<strong>on</strong> P n (i, j) = ∑ n<br />

k=1 F k(i, j)P n−k (j, j) for all n ∈ N<br />

and i, j ∈ E.<br />

Exercise 8 Let X denote a <strong>Markov</strong> chain with state space E = {1, . . . , 10} and<br />

transiti<strong>on</strong> matrix<br />

⎛<br />

⎞<br />

1/2 0 1/2 0 0 0 0 0 0 0<br />

0 1/3 0 0 0 0 2/3 0 0 0<br />

1 0 0 0 0 0 0 0 0 0<br />

0 0 0 0 1 0 0 0 0 0<br />

P =<br />

0 0 0 1/3 1/3 0 0 0 1/3 0<br />

0 0 0 0 0 1 0 0 0 0<br />

0 0 0 0 0 0 1/4 0 3/4 0<br />

⎜ 0 0 1/4 1/4 0 0 0 1/4 0 1/4<br />

⎟<br />

⎝ 0 1 0 0 0 0 0 0 0 0 ⎠<br />

0 1/3 0 0 1/3 0 0 0 0 1/3<br />

23


Reorder the states according to their communicati<strong>on</strong> classes and determine the<br />

resulting form of the transiti<strong>on</strong> matrix as in representati<strong>on</strong> (4). Determine further<br />

a transiti<strong>on</strong> graph, in which<br />

✛✘ ✛✘<br />

i ✲ j<br />

✚✙ ✚✙<br />

means that f ij > 0.<br />

Exercise 9 Prove equati<strong>on</strong> (7).<br />

Hint: Derive a representati<strong>on</strong> of N j in terms of the random variables<br />

{<br />

1, X n = j<br />

A n :=<br />

0, X n ≠ j<br />

Exercise 10 Prove corollary 2.<br />

Exercise 11 Prove remark 1.<br />

Exercise 12 A server’s up time is k time units with probability p k = 2 −k , k ∈ N.<br />

After failure the server is immediately replaced by an identical new <strong>on</strong>e. The up<br />

time of the new server is of course independent of the behaviour of all preceding<br />

servers.<br />

Let X n denote the remaining up time of the server at time n ∈ N 0 . Determine the<br />

transiti<strong>on</strong> probabilities for the <strong>Markov</strong> chain X = (X n : n ∈ N 0 ) and determine<br />

the stati<strong>on</strong>ary distributi<strong>on</strong> of X .<br />

Exercise 13 Let P denote the transiti<strong>on</strong> matrix of an irreducible <strong>Markov</strong> chain<br />

X with discrete state space E = F ∪ F c , where F c = E \ F . Write P in block<br />

notati<strong>on</strong> as<br />

( )<br />

PF<br />

P = F P F F c<br />

P F c F P F c F c<br />

Show that the <strong>Markov</strong> chain X F restricted to the state space F has transiti<strong>on</strong><br />

matrix<br />

P F = P F F + P F F c(I − P F c F c)−1 P F c F<br />

with I denoting the identity matrix <strong>on</strong> F c .<br />

24


Exercise 14 Let X denote a <strong>Markov</strong> chain with state space E = {0, . . . , m} and<br />

transiti<strong>on</strong> matrix<br />

⎛<br />

⎞<br />

p 00 p 01<br />

p 10 p 11 p 12<br />

P =<br />

p 21 p 22 p 23<br />

⎜<br />

⎝<br />

. .. . .. . ..<br />

⎟<br />

⎠<br />

p mm<br />

i=1<br />

p m,m−1<br />

where p ij > 0 for |i − j| = 1. Show that the stati<strong>on</strong>ary distributi<strong>on</strong> π of X is<br />

uniquely determined by<br />

(<br />

n∏<br />

m<br />

) −1<br />

p i−1,i<br />

∑ j∏ p i−1,i<br />

π n = π 0 ·<br />

and π 0 =<br />

p i,i−1 p i,i−1<br />

j=0 i=1<br />

for all n = 1, . . . , m.<br />

Use this result to determine the stati<strong>on</strong>ary distributi<strong>on</strong> of the Bernoulli–Laplace<br />

diffusi<strong>on</strong> model with b = m (see exercise 3).<br />

Exercise 15 Show that the sec<strong>on</strong>d c<strong>on</strong>diti<strong>on</strong> in theorem 20 can be substituted by<br />

the c<strong>on</strong>diti<strong>on</strong><br />

∑<br />

p ij h(j) ≤ h(i) − 1 for all i ∈ E \ F .<br />

j∈E<br />

Exercise 16 Show the following complement to theorem 20: Let P denote the<br />

transiti<strong>on</strong> matrix of a positive recurrent <strong>Markov</strong> chain with discrete state space E.<br />

Then there is a functi<strong>on</strong> h : E → R and a finite subset F ⊂ E such that<br />

∑<br />

p ij h(j) < ∞ for all i ∈ F , and<br />

j∈E<br />

∑<br />

p ij h(j) ≤ h(i) − 1 for all i ∈ E \ F .<br />

j∈E<br />

Hint: C<strong>on</strong>sider the c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> of the remaining time until returning<br />

to a fixed set F of states.<br />

Exercise 17 For the discrete, n<strong>on</strong>–negative random walk with transiti<strong>on</strong> matrix<br />

⎛<br />

⎞<br />

p 00 p 01<br />

p 10 0 p 12<br />

P = ⎜<br />

⎝ p 10 0 p 12<br />

⎟<br />

⎠<br />

. .. . .. . ..<br />

25


determine the criteri<strong>on</strong> of positive recurrence according to theorem 20.<br />

6 Extensi<strong>on</strong> Theorems<br />

Throughout this book, our basic stochastic tools are either sequences of random<br />

variables (such as <strong>Markov</strong> <strong>chains</strong> or <strong>Markov</strong> renewal <strong>chains</strong>) or even uncountable<br />

families of random variables (such as <strong>Markov</strong> processes, renewal processes,<br />

or semi–regenerative processes). It is essential for our models that these random<br />

variables are dependent, and in fact we define them in terms of c<strong>on</strong>diti<strong>on</strong>al probabilities,<br />

i.e. via their dependence structure.<br />

It is then an immediate questi<strong>on</strong> whether a probability measure P exists that<br />

satisfies all the postulates in the definiti<strong>on</strong> of a stochastic sequence or process.<br />

This questi<strong>on</strong> is vital as it c<strong>on</strong>cerns the very existence of the tools we are using.<br />

6.1 Stochastic <strong>chains</strong><br />

Let (S, B) denote a measurable space, µ a probability measure <strong>on</strong> (S, B), and P n ,<br />

n ∈ N, stochastic kernels <strong>on</strong> (S, B). The latter means that for every n ∈ N,<br />

P n : S × B → [0, 1] is a functi<strong>on</strong> that satisfies<br />

(K1) For every x ∈ S, P n (x, .) is a probability measure <strong>on</strong> (S, B).<br />

(K2) For every A ∈ B, the functi<strong>on</strong> P n (., A) is B–measurable.<br />

Define S ∞ as the set of all sequences x = (x n : n ∈ N 0 ) with x n ∈ S for all<br />

n ∈ N 0 . A subset of S ∞ having the form<br />

C n1 ,...,n k<br />

(A) = {x ∈ S ∞ : (x n1 , . . . , x nk ) ∈ A}<br />

with k ∈ N, n 1 < . . . < n k ∈ N 0 , and A ∈ B k , is called cylinder (with coordinates<br />

n 1 , . . . , n k and base A). The set C of all cylinders in S ∞ forms an algebra<br />

of sets. Define B ∞ := σ(C) as the minimal σ–algebra c<strong>on</strong>taining C.<br />

Now we can state the extensi<strong>on</strong> theorem for sequences of random variables,<br />

which is proven in Gikhman and Skorokhod [4], secti<strong>on</strong> II.4.<br />

Theorem 21 There is a probability measure P <strong>on</strong> (S ∞ , B ∞ ) satisfying<br />

∫ ∫<br />

P(C 0,...,k (A 0 × . . . × A k )) = dµ(x 0 ) P 1 (x 0 , dx 1 ) . . .<br />

A 0 A<br />

∫<br />

1<br />

. . . P k−1 (x k−2 , dx k−1 ) P k (x k−1 , A k ) (10)<br />

A k−1<br />

26


for all k ∈ N 0 , A 0 , . . . , A k ∈ B. The measure P is uniquely determined by the<br />

system (10) of equati<strong>on</strong>s.<br />

The first part of the theorem above justifies our definiti<strong>on</strong>s of <strong>Markov</strong> <strong>chains</strong><br />

and <strong>Markov</strong> renewal <strong>chains</strong>. The sec<strong>on</strong>d part states in particular that a <strong>Markov</strong><br />

chain is uniquely determined by its initial distributi<strong>on</strong> and its transiti<strong>on</strong> matrix.<br />

Based <strong>on</strong> this result, we may define a stochastic chain with state space S as<br />

a sequence (X n : n ∈ N 0 ) of S–valued random variables which are distributed<br />

according to a probability measure P <strong>on</strong> (S ∞ , B ∞ ).<br />

7 C<strong>on</strong>diti<strong>on</strong>al Expectati<strong>on</strong>s and Probabilities<br />

Let (Ω, A, P ) denote a probability space and (S, B) a measurable space. A random<br />

variable is a measurable mapping X : Ω → S, which means that X −1 (B) ∈<br />

A for all B ∈ B. In other words, X is a random variable if and <strong>on</strong>ly if X −1 (B) ⊂<br />

A. In stochastic models, a random variable usually gives informati<strong>on</strong> <strong>on</strong> a certain<br />

phenomen<strong>on</strong>, e.g. the number of users in a queue at some specific time.<br />

C<strong>on</strong>sider any real–valued random variable X : (Ω, A) → (R, B), B denoting<br />

the Borel σ–algebra <strong>on</strong> R, which is integrable or n<strong>on</strong>–negative. While the random<br />

variable X itself yields the full informati<strong>on</strong>, a rather small piece of informati<strong>on</strong><br />

<strong>on</strong> X is given by its expectati<strong>on</strong><br />

∫<br />

E(X) := X dP<br />

The c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> is a c<strong>on</strong>cept that yields a degree of informati<strong>on</strong> which<br />

lies between the full informati<strong>on</strong> X and its expectati<strong>on</strong> E(X).<br />

To motivate the definiti<strong>on</strong>, we first observe that the distributi<strong>on</strong> P X = P ◦X −1<br />

of X is a measure <strong>on</strong> the sub–σ–algebra X −1 (B) of A, i.e. in order to compute<br />

∫<br />

P (X ∈ B) = P X (B) = dP<br />

we need to evaluate the measure P <strong>on</strong> sets<br />

Ω<br />

X −1 (B)<br />

A := X −1 (B) ∈ X −1 (B) ⊂ A<br />

On the other hand, the expectati<strong>on</strong> E(X) is an evaluati<strong>on</strong> of P <strong>on</strong> the set Ω =<br />

X −1 (S) <strong>on</strong>ly. Thus we can say that the expectati<strong>on</strong> employs P <strong>on</strong>ly <strong>on</strong> the trivial<br />

27


σ–algebra {∅, Ω}, while X itself employs P <strong>on</strong> the σ–algebra X −1 (B) generated<br />

by X.<br />

Now we take any sub–σ–algebra C ⊂ A. According to the Rad<strong>on</strong>–Nikodym<br />

theorem there is a random variable X 0 : Ω → S with X −1 (B) = C and<br />

∫ ∫<br />

X 0 dP = XdP (11)<br />

C<br />

for all C ∈ C. This we call the c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> of X under C and write<br />

C<br />

E(X|C) := X 0<br />

A c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> is P –almost certainly uniquely determined by (11).<br />

Typical special cases and examples are<br />

Example 9 For C = {∅, Ω}, the c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> equals the expectati<strong>on</strong>,<br />

i.e. E(X|C) = E(X). For any σ–algebra C with X −1 (B) ⊂ C we obtain<br />

E(X|C) = X.<br />

Example 10 Let I denote any index set and (Y i : i ∈ I) a family of random<br />

variables. For the σ–algebra C = σ( ⋃ i∈I Y i<br />

−1 (B)) generated by (Y i : i ∈ I), we<br />

write<br />

E(X|Y i : i ∈ I) := E(X|C)<br />

By definiti<strong>on</strong> we obtain for a σ–algebra C ⊂ A, random variables X and Y ,<br />

and real numbers α and β<br />

For σ–algebras C 1 ⊂ C 2 ⊂ A we obtain<br />

E(αX + βY |C) = αE(X|C) + βE(Y |C)<br />

E(E(X|C 2 )|C 1 ) = E(E(X|C 1 )|C 2 ) = E(X|C 1 ) (12)<br />

Let C 1 and C 2 denote sub–σ–algebras of A, C := σ(C 1 ∪ C 2 ), and X an integrable<br />

random variable. If σ(X −1 (B) ∪ C 1 ) and C 2 are independent, then<br />

E(X|C) = E(X|C 1 )<br />

If X and Y are integrable random variables and X −1 (B) ⊂ C, then<br />

E(XY |C) = X · E(Y |C) (13)<br />

28


C<strong>on</strong>diti<strong>on</strong>al probabilities are special cases of c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong>s. Define<br />

the indicator functi<strong>on</strong> of a measurable set A ∈ A by<br />

{<br />

1, x ∈ A<br />

1 A (x) :=<br />

0, x /∈ A<br />

Such a functi<strong>on</strong> is a random variable, since<br />

1 −1<br />

A (B) = {∅, A, Ac , Ω} ⊂ A<br />

with A c := Ω \ A denoting the complement of the set A. Let C denote a sub–σ–<br />

algebra of A. The c<strong>on</strong>diti<strong>on</strong>al expectati<strong>on</strong> of 1 A is called c<strong>on</strong>diti<strong>on</strong>al probability<br />

of A. We write<br />

P (A|C) := E(1 A |C)<br />

Immediate properties of c<strong>on</strong>diti<strong>on</strong>al probabilities are<br />

0 ≤ P (A|C) ≤ 1, P (∅|C) = 0, P (Ω|C) = 1<br />

A 1 ⊂ A 2 =⇒ P (A 1 |C) ≤ P (A 2 |C)<br />

all of which hold P –almost certainly. For a sequence (A n : n ∈ N) of disjoint<br />

measurable sets, i.e. A n ∈ A for all n ∈ N and A i ∩ A j = ∅ for i ≠ j, we obtain<br />

( ∞<br />

∣ )<br />

⋃ ∣∣∣∣ ∞∑<br />

P A n C = P (A n |C)<br />

n=1<br />

P –almost certainly. Let X : (Ω, A) → (R, B) denote a n<strong>on</strong>–negative or integrable<br />

random variable and Y : (Ω, A) → (Ω ′ , A ′ ) a random variable. Then there is a<br />

measurable functi<strong>on</strong> g : (Ω ′ , A ′ ) → (R, B) with<br />

n=1<br />

E(X|Y ) = g ◦ Y<br />

This is P Y –almost certainly determined by<br />

∫ ∫<br />

g dP Y =<br />

A ′<br />

Y −1 (A ′ )<br />

X dP<br />

for all A ′ ∈ A ′ . Then we can define the c<strong>on</strong>diti<strong>on</strong>al probability of X given Y = y<br />

as g(y). We write<br />

E(X|Y = y) := g(y)<br />

for all y ∈ Ω ′ .<br />

29


References<br />

[1] L. Breiman. Probability. Philadelphia, PA: SIAM, 1968.<br />

[2] E. Çinlar. Introducti<strong>on</strong> to stochastic processes. Englewood Cliffs, N. J.:<br />

Prentice-Hall, 1975.<br />

[3] W. Feller. An introducti<strong>on</strong> to probability theory and its applicati<strong>on</strong>s. Vol. I.<br />

New York etc.: John Wiley and S<strong>on</strong>s, 1950.<br />

[4] I. Gihman and A. Skorohod. The theory of stochastic processes I. Berlin etc.:<br />

Springer, 1974.<br />

[5] S. Karlin and H. M. Taylor. A first course in stochastic processes. 2nd ed.<br />

New York etc.: Academic Press, 1975.<br />

[6] S. Meyn and R. Tweedie. <strong>Markov</strong> <strong>chains</strong> and stochastic stability. Berlin:<br />

Springer-Verlag, 1993.<br />

[7] S. Ross. Stochastic Processes. John Wiley & S<strong>on</strong>s, New York etc., 1983.<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!