beamer - Vrije Universiteit Amsterdam
beamer - Vrije Universiteit Amsterdam
beamer - Vrije Universiteit Amsterdam
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Stochastic Operations Research<br />
Lecture 4: Discrete-time Markov Chains (Part II)<br />
(Chapter 3)<br />
A.A.N. Ridder<br />
Department EOR<br />
<strong>Vrije</strong> <strong>Universiteit</strong> <strong>Amsterdam</strong><br />
Homepage: http://personal.vu.nl/a.a.n.ridder/sor/default.htm<br />
21 November 2012<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 1 / 36
Topics<br />
1. §3.3 Equilibrium Probabilities<br />
2. §3.3.3 Markov Reward Theorem<br />
3. §3.4.1 Computation by an Iterative Method<br />
4. §3.5 Theoretical Considerations<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 2 / 36
Example<br />
⎛<br />
⎜<br />
P = ⎜<br />
⎝<br />
0.6 0.4 0 0 0 0 0 0 0<br />
0 0 0.3 0.2 0 0.5 0 0 0<br />
0 0.1 0.1 0.4 0 0.2 0.2 0 0<br />
0 0 0 0 1 0 0 0 0<br />
0 0 0 1 0 0 0 0 0<br />
0 0 0 0 0 0 0 1 0<br />
0 0 0 0 0 0 0 1 0<br />
0 0 0 0 0 0 0 0.5 0.5<br />
0 0 0 0 0 0.3 0.7 0 0<br />
Transient states T = {1, 2, 3}; irreducible recurrent set R1 = {4, 5} with period<br />
2; irreducible aperiodic recurrent set R2 = {6, 7, 8, 9} (see picture next page).<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 3 / 36<br />
⎞<br />
⎟<br />
⎠
State-transition Diagram of Example<br />
0.3<br />
9 6 0.4<br />
4<br />
0.5 0.2<br />
0.5<br />
0.7<br />
2<br />
1.0<br />
8<br />
1.0<br />
7<br />
0.2 0.1 0.3<br />
5<br />
1.0<br />
0.4<br />
0.5<br />
0.2<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 4 / 36<br />
0.6<br />
1<br />
3<br />
0.1<br />
1.0
Recap<br />
▶ State j transient<br />
⇔ fjj = (ever returning to j|X0 = j) < 1<br />
∞∑<br />
⇔<br />
⇒<br />
n=1<br />
∞∑<br />
n=1<br />
p (n)<br />
jj = [number of visits to j|X0 = j] < ∞<br />
p (n)<br />
ij = [number of visits to j|X0 = i] < ∞ (for all i)<br />
⇒ lim<br />
n→∞ p(n) ij = 0 (for all i)<br />
▶ State j recurrent<br />
⇔ fjj = (ever returning to j|X0 = j) = 1<br />
∞∑<br />
⇔<br />
n=1<br />
p (n)<br />
jj = [number of visits to j|X0 = j] = ∞<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 5 / 36
Recurrent States<br />
.<br />
Theorem 3.5.3(a)<br />
.<br />
Suppose that R ⊂ I is an irreducible recurrent set of states; then fij = 1 for all<br />
i, . j ∈ R.<br />
Proof:<br />
1. According to Lemma 3.5.2 it holds that i and j communicate; i.e,<br />
> 0} ̸= ∅;<br />
{m : p (m)<br />
ji<br />
2. Let r = min{m : p (m)<br />
ji<br />
> 0}. Then<br />
∞∩<br />
0 = 1 − fjj = ( {Xn ̸= j}|X0 = j)<br />
≥ p (r)<br />
ji (<br />
n=1<br />
∞∩<br />
{Xn ̸= j}|X0 = i) = p (r)<br />
(1 − fij).<br />
n=1<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 6 / 36<br />
ji
Proper First-Passage Times<br />
.<br />
Corollary<br />
.<br />
fij = 1 if either<br />
(i). i, j ∈ R the same recurrent irreducible set;<br />
(ii). i ∈ T transient, j ∈ R recurrent irreducible set; and fiR = 1.<br />
.<br />
.<br />
Theorem 3.5.7(a)<br />
.<br />
Suppose that the unichain condition holds (see lecture 3) with |T| < ∞; then<br />
. fij = 1 for all i ∈ I and j ∈ R.<br />
Proof for i ∈ T:<br />
1 − fiR = (chain nevers reaches R|X0 = 1)<br />
= ( ∩∞<br />
)<br />
<br />
{Xk ∈ T} X0 = i = lim<br />
n→∞ P<br />
( ∩n<br />
)<br />
<br />
{Xk ∈ T} X0 = i<br />
k=1<br />
= lim<br />
n→∞ P(Xn ∈ T|X0 = i) = lim<br />
n→∞<br />
∑<br />
j∈T<br />
k=1<br />
p (n)<br />
ij<br />
finite sum<br />
=<br />
∑<br />
j∈T<br />
lim<br />
n→∞ p(n) ij<br />
= 0.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 7 / 36
Compute Powers of P<br />
Example slide 3:<br />
P 128 ⎛<br />
0 0 0 0.184 0.161 0.049 0.115 0.328 0.164<br />
⎞<br />
⎜<br />
= ⎜<br />
⎝<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0<br />
0.126<br />
0.064<br />
1.000<br />
0.000<br />
0.000<br />
0.000<br />
0.000<br />
0.219<br />
0.419<br />
0.000<br />
1.000<br />
0.000<br />
0.000<br />
0.000<br />
0.049<br />
0.039<br />
0.000<br />
0.000<br />
0.075<br />
0.075<br />
0.075<br />
0.115<br />
0.091<br />
0.000<br />
0.000<br />
0.175<br />
0.175<br />
0.175<br />
0.328<br />
0.259<br />
0.000<br />
0.000<br />
0.500<br />
0.500<br />
0.500<br />
0.164<br />
0.129<br />
0.000<br />
0.000<br />
0.250<br />
0.250<br />
0.250<br />
⎟<br />
⎠<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 8 / 36
Compute Powers of P<br />
⎛<br />
P 129 ⎜<br />
= ⎜<br />
⎝<br />
0 0 0 0.161 0.184 0.049 0.115 0.328 0.164<br />
0 0 0 0.219 0.126 0.049 0.115 0.328 0.164<br />
0 0 0 0.419 0.064 0.039 0.091 0.259 0.129<br />
0 0 0 0.000 1.000 0.000 0.000 0.000 0.000<br />
0 0 0 1.000 0.000 0.000 0.000 0.000 0.000<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 9 / 36<br />
⎞<br />
⎟<br />
⎠
Compute Powers of P<br />
⎛<br />
P 130 ⎜<br />
= ⎜<br />
⎝<br />
0 0 0 0.184 0.161 0.049 0.115 0.328 0.164<br />
0 0 0 0.126 0.219 0.049 0.115 0.328 0.164<br />
0 0 0 0.064 0.419 0.039 0.091 0.259 0.129<br />
0 0 0 1.000 0.000 0.000 0.000 0.000 0.000<br />
0 0 0 0.000 1.000 0.000 0.000 0.000 0.000<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
0 0 0 0.000 0.000 0.075 0.175 0.500 0.250<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 10 / 36<br />
⎞<br />
⎟<br />
⎠
Observations<br />
We might conclude<br />
1. limn→∞ p (n)<br />
ij<br />
= 0 for transient j (and all i);<br />
2. limn→∞ p (n)<br />
ij = πj for i, j in the same aperiodic irreducible recurrent subset<br />
R;<br />
3. limn→∞ 1<br />
n<br />
4. Where ∑<br />
j∈R πj = 1;<br />
5. limn→∞ p (n)<br />
ij<br />
6. limn→∞ 1<br />
n<br />
∑n k=1 p(k) ij = πj for i, j in the same irreducible recurrent subset R;<br />
= fiRπj for j ∈ R an aperiodic irreducible recurrent subset, and<br />
i ̸∈ R (for instance transient);<br />
∑n k=1 p(k) ij = fiRπj for j ∈ R an irreducible recurrent subset, and<br />
i ̸∈ R (for instance transient);<br />
Property 1 is known from previous lecture; proofs (partly) of 2-6 on the<br />
following slides.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 11 / 36
Empirical State Averages<br />
Recall mean return time µjj = [τjj] = [inf{n : Xn = j}|X0 = j].<br />
.<br />
Equation (3.3.4)<br />
.<br />
For recurrent states j ∈ I<br />
. (with probability 1).<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
{Xk = j|X0 = j} = 1<br />
k=1<br />
Proof: let 0 = S0 < S1 < S2 < · · · be the consecutive returns to j. Due to the<br />
Markov property, the interreturn periods Tn = Sn − Sn−1 are IID as τjj. Due to<br />
recurrence, τjj is a proper random variable. Apply SLLN (or Lemma 2.2.2):<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
k=1<br />
µjj<br />
n<br />
n<br />
{Xk = j|X0 = j} = lim = lim ∑n n→∞ Sn n→∞<br />
k=1 Tk<br />
Holds also for nulrecurrent states for which µjj = ∞!<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 12 / 36
Mean Return Times and Probabilities<br />
.<br />
Definition<br />
.<br />
For all states j ∈ I:<br />
.<br />
Note that<br />
πj = 1<br />
.<br />
▶ For transient and nulrecurrent states j: πj = 0;<br />
▶ For positive recurrent states j:<br />
µjj = 1 + ∑<br />
i̸=j<br />
µjj<br />
pji[inf{n : Xn = j}|X0 = i] ≥ 1 ⇒ 0 < πj ≤ 1<br />
▶ We will see that for irreducible positive recurrent sets R, (πj)j∈R forms a<br />
probability mass function; i.e., ∑<br />
j∈R<br />
πj = 1.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 13 / 36
Probabilistic Averages I<br />
.<br />
Theorem 3.3.1 (first part)<br />
.<br />
For all states j ∈ I<br />
.<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
k=1<br />
p (k)<br />
jj<br />
= πj<br />
Proof for recurrent j: apply bounded convergence (see book p. 439)<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
k=1<br />
1<br />
= lim<br />
n→∞ n<br />
!<br />
= <br />
[<br />
lim<br />
n→∞<br />
p (k)<br />
jj<br />
n∑<br />
k=1<br />
1<br />
n<br />
1<br />
= lim<br />
n→∞ n<br />
n∑<br />
(Xk = j|X0 = j)<br />
k=1<br />
[{Xk = j|X0 = j}] = lim<br />
n→∞ <br />
n∑<br />
]<br />
{Xk = j|X0 = j}<br />
k=1<br />
Proof for transient j: limn→∞ p (n)<br />
jj<br />
[<br />
1<br />
n<br />
n∑<br />
]<br />
{Xk = j|X0 = j}<br />
k=1<br />
∑ 1 n<br />
= 0 ⇒ limn→∞<br />
n k=1 p(k) jj = 0.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 14 / 36
Probabilistic Averages II<br />
.<br />
Theorem 3.3.1 (second part)<br />
.<br />
For all states i, j ∈ I<br />
.<br />
Proof: apply (3.2.12):<br />
1<br />
n<br />
Take n → ∞.<br />
n∑<br />
k=1<br />
p (k)<br />
ij<br />
( ∑n<br />
=<br />
ℓ=1<br />
= 1<br />
n<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
k∑<br />
k=1 ℓ=1<br />
f (ℓ)<br />
)(<br />
n − ℓ<br />
ij<br />
n<br />
n∑<br />
k=1<br />
p (k)<br />
ij<br />
= fijπj<br />
f (ℓ)<br />
ij p(k−ℓ) jj = 1<br />
n<br />
1 ∑n−ℓ<br />
n − ℓ<br />
k=0<br />
p (k)<br />
)<br />
jj<br />
n∑<br />
ℓ=1<br />
f (ℓ)<br />
ij<br />
n∑<br />
k=ℓ<br />
p (k−ℓ)<br />
jj<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 15 / 36
Probabilistic Averages III<br />
.<br />
Corollary<br />
.<br />
Suppose that R is an irreducible set of recurrent states. For all states i, j ∈ R<br />
1<br />
lim<br />
n→∞ n<br />
. (i.e., independent of the initial state provided that the initial state is in R).<br />
n∑<br />
k=1<br />
p (k)<br />
ij<br />
= πj<br />
Proof: fij = 1 for all i, j ∈ R (see slide 7). Apply previous slide.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 16 / 36
Probabilistic Averages IV<br />
.<br />
Corollary<br />
.<br />
Suppose that the Markov chain satisfies the unichain condition with a finite<br />
set of transient states (|T| < ∞). Then for all states i, j ∈ I<br />
1<br />
lim<br />
n→∞ n<br />
. (i.e., independent of the initial state).<br />
n∑<br />
k=1<br />
p (k)<br />
ij<br />
= πj<br />
Proof: πj = 0 for transient j; and fij = 1 for recurrent j (see slide 7). Then<br />
apply slide 15.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 17 / 36
Finite Recurrent Sets<br />
.<br />
Corollary<br />
.<br />
Suppose that the Markov chain satisfies the unichain condition with a finite<br />
set of transient states (|T| < ∞) and with a finite irreducible set of recurrent<br />
states (|R| < ∞). Then<br />
∑<br />
πj = ∑<br />
πj = 1<br />
.<br />
Proof: for any i ∈ I and all n we have ∑<br />
j∈I p(n) ij = 1. Thus<br />
∑<br />
πj = ∑ (<br />
j∈I<br />
j∈I<br />
j∈I<br />
lim<br />
n→∞<br />
finite set ∑ 1<br />
= lim<br />
n→∞ n<br />
j∈I<br />
Use πj = 0 for transient j.<br />
1<br />
n<br />
n∑<br />
k=1<br />
n∑<br />
k=1<br />
j∈R<br />
p (k)<br />
)<br />
ij<br />
p (k)<br />
ij<br />
1<br />
= lim<br />
n→∞ n<br />
n∑ ∑<br />
Note: all states in R are positive recurrent (see also slide 20); and (πj)j∈R<br />
forms a probability mass function on R.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 18 / 36<br />
k=1<br />
j∈I<br />
p (k)<br />
ij<br />
= 1
Infinite Recurrent Sets<br />
In the situation of the previous slide but with |R| = ∞. Without proof:<br />
▶ When R is nulrecurrent: πj = 0 for all j ∈ I;<br />
▶ When R is positive recurrent: ∑<br />
j∈I πj = ∑<br />
j∈R πj = 1; i.e., (πj)j∈R forms a<br />
probability mass function.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 19 / 36
Finite Irreducible Sets<br />
.<br />
Corollary<br />
.<br />
.Finite irreducible sets consist of positive recurrent states only.<br />
Proof: Let C be a finite irreducible set. That is,<br />
▶ For any i ∈ C it holds that ∑<br />
j∈C pij = 1; thus, also for all n, ∑<br />
j∈C p(n) ij = 1;<br />
▶ All states in C communicate;<br />
▶ All states have the same classification;<br />
Suppose transient or nulrecurrent, then all πj = 0. Gives a contradiction:<br />
0 = ∑<br />
πj = ∑ (<br />
j∈I<br />
j∈I<br />
finite set ∑ 1<br />
= lim<br />
n→∞ n<br />
j∈I<br />
lim<br />
n→∞<br />
n∑<br />
k=1<br />
1<br />
n<br />
n∑<br />
k=1<br />
p (k)<br />
ij<br />
p (k)<br />
)<br />
ij<br />
1<br />
= lim<br />
n→∞ n<br />
n∑ ∑<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 20 / 36<br />
k=1<br />
j∈I<br />
p (k)<br />
ij<br />
= 1
Infinite Transient or Nonrecurrent Sets<br />
▶ Infinite irreducible sets can be transient or nulrecurrent.<br />
▶ Example of a Random Walk.<br />
▶ I = {0, 1, . . .}, p ∈ (0, 1), q = 1 − p.<br />
⎛<br />
0 1 0 . . .<br />
⎞<br />
⎜<br />
q<br />
⎜<br />
P = ⎜<br />
0<br />
⎜<br />
⎝<br />
0<br />
.<br />
0<br />
q<br />
0<br />
.<br />
p<br />
0<br />
q<br />
.<br />
0<br />
p<br />
0<br />
. ..<br />
. . .<br />
. . .<br />
p<br />
⎟<br />
. . . ⎟<br />
⎠<br />
. ..<br />
▶ If p > q the chain is transient; if q = p = 0.5 the chain is nulrecurrent; if<br />
p < q the chain is positive recurrent.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 21 / 36
Equilibrium<br />
Suppose that the Markov chain satisfies the unichain condition with a finite<br />
set of transient states (|T| < ∞). For all states j ∈ I (and arbitrary r ∈ I):<br />
(πP)j = ∑<br />
πipij = ∑ (<br />
!<br />
= lim<br />
n→∞<br />
i∈I<br />
∑<br />
i∈I<br />
1<br />
= lim<br />
n→∞ n<br />
= lim<br />
n→∞<br />
1<br />
n<br />
n∑<br />
n∑<br />
k=1<br />
i∈I<br />
p<br />
k=1<br />
(k+1)<br />
rj<br />
1<br />
( ∑n+1<br />
p<br />
n<br />
k=1<br />
(k)<br />
rj − prj<br />
(<br />
n + 1<br />
= lim<br />
n→∞ n<br />
lim<br />
n→∞<br />
1<br />
n<br />
p (k)<br />
ri pij<br />
1<br />
= lim<br />
n→∞ n<br />
n∑<br />
k=1<br />
1 ∑n+1<br />
= lim<br />
n→∞ n<br />
)<br />
1 ∑n+1<br />
n + 1<br />
k=1<br />
p (k)<br />
rj<br />
k=2<br />
p (k)<br />
)<br />
ri pij<br />
n∑ ∑<br />
k=1<br />
p (k)<br />
rj<br />
i∈I<br />
1<br />
−<br />
n prj<br />
)<br />
= πj<br />
p (k)<br />
ri pij<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 22 / 36
Equilibrium Distribution<br />
.<br />
Definition 3.3.2<br />
.<br />
A probability distribution (πj)j∈I is an equilibrium distribution for the Markov<br />
chain if<br />
πj =<br />
.<br />
∑<br />
πipij (j ∈ I)<br />
i∈I<br />
.<br />
Corollary<br />
.<br />
Let π be an equilibrium distribution for the Markov chain {Xn, n = 0, 1, . . .}<br />
and suppose that the chain starts in equilibrium, then it remains in<br />
equilibrium; i.e.,<br />
.<br />
(X0 = j) = πj ∀j ∈ I ⇒ (Xn = j) = πj ∀j ∈ I ∀n = 1, 2, . . .<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 23 / 36
Existence and Uniqueness of Equilibrium Distribution<br />
.<br />
Theorem 3.3.2 & Theorem 3.5.9<br />
.<br />
Assume the unichain condition with a finite transient set and a positive<br />
recurrent set (cf. Assumption 3.3.1), then the probabilistic long-run averages<br />
. (πj) (see slide 15) form the unique equilibrium distribution.<br />
Proof: for finite recurrent sets:<br />
▶ Existence follows from slides 22 and 18;<br />
▶ Uniqueness: suppose that (xj)j satisfies xj = ∑<br />
i∈I xipij. See page 128<br />
(book) for concluding that xj = cπj; i.e., π is the only distribution.<br />
Infinite recurrent sets: more involved.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 24 / 36
Limiting Probabilities<br />
.<br />
Equation (3.5.11)<br />
.<br />
A. For transient states j and any initial state i ∈ I<br />
.<br />
lim<br />
n→∞ p(n) ij = 0<br />
B. For recurrent aperiodic states j and any initial state i ∈ I<br />
lim<br />
n→∞ p(n) ij = fijπj<br />
Proof: see lecture 3 for A. Part B is more advanced; outside scope of this<br />
course.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 25 / 36
Unichain Case<br />
.<br />
Corollary<br />
.<br />
Suppose that the Markov chain satisfies the unichain condition with a finite<br />
set of transient states (|T| < ∞) and with an irreducible set R of aperiodic<br />
recurrent states. Then for all states i, j ∈ I<br />
. (i.e., independent of the initial state).<br />
lim<br />
n→∞ p(n) ij = πj<br />
Proof: πj = 0 for for j ∈ T; and fij = 1 for j ∈ R (see slide 7). Then apply<br />
previous slide.<br />
Note: in case of positive recurrence π is a probability distribution ( ∑<br />
j<br />
πj = 1),<br />
see slide 24.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 26 / 36
Summary<br />
Suppose Assumption 3.3.1 (equivalently, unichain with finite transient set T<br />
and positive recurrent set R). Suppose aperiodicity of the recurrent states.<br />
Then<br />
1. There is a unique equilibrium distribution π = (πj)j∈I:<br />
2. πj = 0 for j ∈ T and πj > 0 for j ∈ R;<br />
3. π is the limiting distribution:<br />
π = πP<br />
lim<br />
n→∞ p(n) ij = πj<br />
4. π is the long-run average distribution:<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
k=1<br />
p (k)<br />
ij<br />
= πj<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 27 / 36
§3.3.3 Markov Chains with Rewards<br />
▶ Let f : I → be a reward or cost function;<br />
▶ ∑ n<br />
k=1 f (Xk) is the total reward up to time n;<br />
▶ limn→∞ 1<br />
n<br />
∑ n<br />
k=1 f (Xk) is the long-run average reward per unit of time;<br />
▶ We wish to have an ergodic (or Markov-reward) property<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
f (Xk) = ∑<br />
πjf (j) (w.p. 1)<br />
k=1<br />
j∈I<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 28 / 36
Finite Unichain Case<br />
.<br />
Ergodic Theorem Finite Case<br />
.<br />
Suppose that the Markov chain satisfies the unichain condition with a finite<br />
set of transient states (|T| < ∞) and with a finite irreducible set of recurrent<br />
states (|R| < ∞). Then<br />
.<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
f (Xk) = ∑<br />
πjf (j) (w.p. 1)<br />
k=1<br />
Proof: let r be an arbitrary initial state<br />
1<br />
n<br />
n∑<br />
k=1<br />
f (Xk) = 1<br />
n<br />
k=1<br />
j∈I<br />
j∈I<br />
n∑ ∑<br />
{Xk = j|X0 = r}f (j) = ∑ (<br />
1<br />
n<br />
j∈I<br />
n∑<br />
k=1<br />
)<br />
{Xk = j|X0 = r} f (j).<br />
Take n → ∞; interchange limit and finite sum allowed; apply empirical state<br />
average property (slide 12).<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 29 / 36
Full Case<br />
.<br />
Ergodic Theorem 3.3.3 and 3.5.11<br />
.<br />
Assume<br />
.<br />
(i). Unichain with finite transient set;<br />
(ii). ∑<br />
j∈I |f (j)|πj < ∞.<br />
Then<br />
Proof: see book.<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
f (Xk) = ∑<br />
πjf (j) (w.p. 1)<br />
k=1<br />
j∈I<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 30 / 36
Expected Reward<br />
▶ See Remark 3.3.1 in book;<br />
▶ Result previous slide holds also for expected cost:<br />
1<br />
lim<br />
n→∞ n<br />
n∑<br />
[f (Xk)] = ∑<br />
πjf (j)<br />
k=1<br />
▶ Proof: apply dominated convergence.<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 31 / 36<br />
j∈I
§3.4.1 Computation of Equilibrium Probabilities<br />
▶ Given P finite, irreducible, thus positive recurrent;<br />
▶ Problem: compute the equilibrium distribution π;<br />
▶ Linear system { ∑<br />
i∈I πipij = πj (j ∈ I)<br />
∑<br />
j∈I πj = 1<br />
▶ In many applications we deal with large state spaces but sparse<br />
matrices P; Efficient to use an iterative method;<br />
▶ Jacobi or Gauss-Seidel;<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 32 / 36
Rewrite<br />
▶ Rewrite to classic form Ax = b;<br />
π T P = π T ⇔ (I − P T )π = 0<br />
▶ Delete a row (why?!) and add ∑<br />
j∈I πj = 1.<br />
▶ Example<br />
⎛<br />
0.4 0.4 0.0 0.2 0.0 0.0 0.0 0.0<br />
⎞<br />
0.0<br />
⎜<br />
0.0<br />
⎜<br />
⎜0.5<br />
⎜<br />
⎜0.0<br />
P = ⎜<br />
⎜0.0<br />
⎜<br />
⎜0.0<br />
⎜<br />
⎜0.6<br />
⎝0.0<br />
0.5<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.5<br />
0.3<br />
0.4<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.4<br />
0.0<br />
0.5<br />
0.0<br />
0.0<br />
0.2<br />
0.0<br />
0.3<br />
0.2<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.1<br />
0.0<br />
0.0<br />
0.1<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.3<br />
0.5<br />
0.0<br />
0.2<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.3<br />
0.0<br />
0.2<br />
0.4<br />
0.0 ⎟<br />
0.0 ⎟<br />
0.0 ⎟<br />
0.0 ⎟<br />
0.4 ⎟<br />
0.0 ⎟<br />
0.1⎠<br />
0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.3<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 33 / 36
Example (cont’d)<br />
⎛<br />
0.6 0.0 −0.5 0.0 0.0 0.0 −0.6 0.0<br />
⎞<br />
0.0<br />
⎜<br />
−0.4<br />
⎜ 0.0<br />
⎜<br />
⎜−0.2<br />
A = ⎜ 0.0<br />
⎜ 0.0<br />
⎜ 0.0<br />
⎝ 0.0<br />
0.5<br />
−0.3<br />
0.0<br />
−0.2<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.6<br />
0.0<br />
0.0<br />
−0.1<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.6<br />
−0.3<br />
0.0<br />
−0.3<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.8<br />
0.0<br />
−0.5<br />
−0.3<br />
0.0<br />
0.0<br />
−0.5<br />
0.0<br />
0.9<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.8<br />
−0.2<br />
−0.5<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.0<br />
0.6<br />
0.0 ⎟<br />
−0.7 ⎟<br />
0.0 ⎟<br />
0.0 ⎟<br />
0.4 ⎟<br />
0.0 ⎟<br />
0.1 ⎠<br />
1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0<br />
⎛ ⎞<br />
0<br />
⎜<br />
0 ⎟<br />
⎜<br />
⎜0<br />
⎟<br />
⎜<br />
⎜0<br />
⎟<br />
b = ⎜<br />
⎜0<br />
⎟<br />
⎜<br />
⎜0<br />
⎟<br />
⎜<br />
⎜0<br />
⎟<br />
⎝0⎠<br />
1<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 34 / 36
Gauss-Seidel Method<br />
▶ Construct a sequence vectors x (0) , x (1) , . . . by x (0)<br />
i<br />
k = 1, 2, . . .:<br />
x (k+1)<br />
i<br />
=<br />
(<br />
bi − ∑<br />
ji<br />
= 1/|I|, and for<br />
aijx (k)<br />
j<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 35 / 36<br />
)<br />
/aii.
Exercises<br />
Chapter 3 (pp 134 - 138):<br />
3.10, 3.11, 3.12, 3.14, 3.16, 3.17<br />
c⃝ Ad Ridder (VU) SOR– Fall 2012 36 / 36