28.04.2014 Views

Lecture Notes - Department of Mathematics and Statistics - Queen's ...

Lecture Notes - Department of Mathematics and Statistics - Queen's ...

Lecture Notes - Department of Mathematics and Statistics - Queen's ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

10 CHAPTER 2. CONTROLLED MARKOV CHAINS<br />

2.1.2 Classes <strong>of</strong> Control Policies<br />

Admissible Control Policies<br />

Let H 0 := X, H t = H t−1 × K for t = 1, 2, . . .. We let h t denote an element <strong>of</strong> H t , where h t = {x [0,t] , u [0,t−1] }.<br />

A deterministic admissible control policy Π is a sequence <strong>of</strong> functions {γ t } from H t → U; in this case<br />

u t = γ t (h t ). A r<strong>and</strong>omized control policy is a sequence Π = {Π t , t ≥ 0} such that Π : H t → P(U) (with P(U)<br />

being the space <strong>of</strong> probability measures on U) such that<br />

Π t (u t ∈ U(x t )|h t ) = 1, ∀h t ∈ H t .<br />

Markov Control Policies<br />

A policy is r<strong>and</strong>omized Markov if<br />

P Π x 0<br />

(u t ∈ C|h t ) = Π t (u t ∈ C|x t ),<br />

C ∈ B(U).<br />

Hence, the control action only depends on the state <strong>and</strong> the time, <strong>and</strong> not the past history. If the control strategy<br />

is deterministic, that is if<br />

Π t (u t = f t (x t )|x t ) = 1.<br />

for some function f t , the control policy is said to be deterministic Markov.<br />

Stationary Control Policies<br />

A policy is r<strong>and</strong>omized stationary if<br />

P Π x 0<br />

(u t ∈ C|h t ) = Π(u t ∈ C|x t ),<br />

C ∈ B(U).<br />

Hence, the control action only depends on the state, <strong>and</strong> not the past history or on time. If the control strategy is<br />

deterministic, that is if Π(u t = f(x t )|x t ) = 1 for some function f, the control policy is said to be deterministic<br />

stationary.<br />

2.2 Markov Chain Induced by a Markov Policy<br />

The following is an important result:<br />

Theorem 2.2.1 Let the control policy be r<strong>and</strong>omized Markov. Then, the controlled Markov chain becomes a<br />

Markov chain in X, that is, the state process itself becomes a Markov chain:<br />

P Π x 0<br />

(x t+1 ∈ B|x t , x t−1 , . . .,x 0 ) = Q Π t (x t+1 ∈ B|x t ),<br />

B ∈ B(X), t ≥ 1, P.a.s.<br />

Pro<strong>of</strong>: Let us consider the case where U is countable. Let B ∈ B(X). It follows that,<br />

Px Π 0<br />

(x t+1 ∈ B|x t , x t−1 , . . . , x 0 )<br />

= ∑ Px Π 0<br />

(x t+1 ∈ B, u t |x t , x t−1 , . . . , x 0 )<br />

u t<br />

= ∑ Px Π 0<br />

(x t+1 ∈ B|u t , x t , x t−1 , . . . , x 0 )Px Π 0<br />

(u t |x t , x t−1 , . . .,x 0 )<br />

u t<br />

= ∑ u t<br />

Q Π x 0<br />

(x t+1 ∈ B|u t , x t )π t (u t |x t )<br />

= ∑ u t<br />

Q Π x 0<br />

(x t+1 ∈ B, u t |x t )<br />

= Q Π t (x t+1 ∈ B|x t ) (2.3)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!