Lecture Notes - Department of Mathematics and Statistics - Queen's ...

i 

Queen’s University 

Mathematics and Engineering and Mathematics and Statistics 

Control of Stochastic Systems 

MATH/MTHE 472/872 Lecture Notes 

Serdar Yüksel 

December 19, 2013

ii 

This document is a set of supplemental lecture notes that has been used for Math 472/872: Control of Stochastic 

Systems, at Queen’s University, since 2009. 

Serdar Yüksel

Contents 

1 Review of Probability 1 

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

1.2 Measurable Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

1.2.1 Borel σ−field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 

1.2.2 Measurable Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 

1.2.3 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 

1.2.4 The Extension Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 

1.2.5 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 

1.2.6 Fatou’s Lemma, the Monotone Convergence Theorem and the Dominated Convergence 

Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 

1.3 Probability Space and Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 

1.3.1 Discrete-Time Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 

1.3.2 More on Random Variables and Probability density functions . . . . . . . . . . . . . . . . 5 

1.3.3 Independence and Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 

1.4 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 

1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

2 Controlled Markov Chains 9 

2.1 Controlled Markov Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 

2.1.1 Fully Observed Markov Control Problem Model . . . . . . . . . . . . . . . . . . . . . . . . 9 

2.1.2 Classes of Control Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

2.2 Markov Chain Induced by a Markov Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

2.3 Performance of Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

2.3.1 Partially Observed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 

3 Classification of Markov Chains 15 

3.1 Countable State Space Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

3.1.1 Recurrence and Transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

iii

iv 

CONTENTS 

3.2 Stability and Invariant Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

3.2.1 Invariant Measures via an Occupational Characterization . . . . . . . . . . . . . . . . . . 18 

3.3 Uncountable (Complete, Separable, Metric) State Spaces . . . . . . . . . . . . . . . . . . . . . . . 23 

3.3.1 Chapman-Kolmogorov Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 

3.3.2 Invariant Distributions for Uncountable Spaces . . . . . . . . . . . . . . . . . . . . . . . . 25 

3.3.3 Existence of an Invariant Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 

3.3.4 Dobrushin’s Ergodic Coefficient for Uncountable State Space Chains . . . . . . . . . . . . 27 

3.3.5 Ergodic Theorem for Uncountable State Space Chains . . . . . . . . . . . . . . . . . . . . 27 

3.4 Further Results on the Existence of an Invariant Probability Measure . . . . . . . . . . . . . . . 27 

3.4.1 Case with the Feller condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

3.4.2 Case without the Feller condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 

3.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 

4 Martingales and Foster-Lyapunov Criteria for Stabilization of Markov Chains 31 

4.1 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

4.1.1 More on Expectations and Conditional Probability . . . . . . . . . . . . . . . . . . . . . . 31 

4.1.2 Some Properties of Conditional Expectation: . . . . . . . . . . . . . . . . . . . . . . . . . 32 

4.1.3 Discrete-Time Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

4.1.4 Doob’s Optional Sampling Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

4.1.5 An Important Martingale Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . 34 

4.1.6 Proof of the Birkhoff Individual Ergodic Theorem . . . . . . . . . . . . . . . . . . . . . . 36 

4.1.7 This section is optional: Further Martingale Theorems . . . . . . . . . . . . . . . . . . . . 36 

4.1.8 Azuma-Hoeffding Inequality for Martingales with Bounded Increments . . . . . . . . . . . 37 

4.2 Stability of Markov Chains: Foster-Lyapunov Techniques . . . . . . . . . . . . . . . . . . . . . . 37 

4.2.1 Criterion for Positive Harris Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 

4.2.2 Criterion for Finite Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 

4.2.3 Criterion for Recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

4.2.4 On small and petite sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

4.2.5 Criterion for Transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

4.2.6 State Dependent Drift Criteria: Deterministic and Random-Time . . . . . . . . . . . . . . 42 

4.3 Convergence Rates to Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

4.3.1 Lyapunov conditions: Geometric ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.3.2 Subgeometric ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

5 Dynamic Programming 49

CONTENTS 

v 

5.1 Bellman’s Principle of Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 

5.2 Discussion: Why are Markov Policies Important? Why are optimal policies deterministic for such 

settings? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 

5.3 Existence of Minimizing Selectors and Measurability . . . . . . . . . . . . . . . . . . . . . . . . . 53 

5.4 Infinite Horizon Optimal Control Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 

5.4.1 Discounted Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 

5.5 Linear Quadratic Gaussian Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 

5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 

6 Partially Observed Markov Decision Processes 63 

6.1 Enlargement of the State-Space and the Construction of a Controlled Markov Chain . . . . . . . 63 

6.2 Linear Quadratic Gaussian Case and Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . 64 

6.2.1 Separation of Estimation and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 

6.3 Estimation and Kalman Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 

6.3.1 Optimal Control of Partially Observed LQG Systems . . . . . . . . . . . . . . . . . . . . . 68 

6.4 Partially Observed Markov Decision Processes in General Spaces . . . . . . . . . . . . . . . . . . 68 

6.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 

7 The Average Cost Problem 71 

7.1 Average Cost and the Average Cost Optimality Equation (ACOE) . . . . . . . . . . . . . . . . . 71 

7.2 The Discounted Cost Approach to the Average Cost Problem . . . . . . . . . . . . . . . . . . . . 72 

7.2.1 Borel State and Action Spaces [Optional] . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 

7.3 Linear Programming Approach to Average Cost Markov Decision Problems . . . . . . . . . . . . 73 

7.4 Discussion for More General State Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 

7.4.1 Extreme Points and the Optimality of Deterministic Policies . . . . . . . . . . . . . . . . 78 

7.4.2 Sample-Path Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 

7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 

8 Team Decision Theory and Information Structures 81 

8.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 

A On the Convergence of Random Variables 85 

A.1 Limit Events and Continuity of Probability Measures . . . . . . . . . . . . . . . . . . . . . . . . . 85 

A.2 Borel-Cantelli Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 

A.3 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 

A.3.1 Convergence almost surely (with probability 1) . . . . . . . . . . . . . . . . . . . . . . . . 86 

A.3.2 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 

A.3.3 Convergence in Mean-square . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

vi 

CONTENTS 

A.3.4 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Chapter 1 

Review of Probability 

1.1 Introduction 

Before discussing controlled Markov chains, we first discuss some preliminaries about probability theory. 

Many events in the physical world are uncertain; that is, with a given knowledge up to a certain time, the 

entire process is not accurately predictable. Probability theory attempts to develop an understanding for such 

uncertainty in a consistent way given a number of properties to be satisfied. The suggestion by probability theory 

may not be physically correct, but it is mathematically precise and consistent. 

Examples of stochastic processes include: a) The temperature in a city at noon in October 2013: This process 

takes values in R 31 , b) The sequence of outputs of a communication channel modeled by an additive scalar 

Gaussian noise when the input is given by x = {x 1 , . . .,x n } (the output process lives in R n ), c) Infinite copies 

of a coin flip process (living in {H, T } ∞ ), d) The trajectory of a plane flying from point A to point B (taking 

values in C([t 0 , t v ]), the space of continuous paths in R 3 with x t0 = A, x tf = B), e) The exchange rate between 

the Canadian Dollar and the American dollar in a given time index T. 

Some of these processes take values in countable spaces, some do not. When the state space in which a random 

variable takes values is uncountable, further technical intricacies arise. These are best addressed with a precise 

characterization of probability and random variables. 

Hence, probability theory can be used to model uncertainty in the real world in a consistent way according some 

properties that we expect such measures should admit. The theory may not model the physical world accurately, 

however. In the following, we will develop a rigorous definition for probability. 

1.2 Measurable Space 

Let X be a collection of points. Let F be a collection of subsets of X with the following properties such that F 

is a σ-field, that is: 

• X ∈ F 

• If A ∈ F, then X \ A ∈ F 

• If A k ∈ F, k = 1, 2, 3, . . ., then 

∞⋃ 

A k ∈ F 

k=1 

By De Morgan’s laws, the set has to be closed under countable intersections as well. 

1

2 CHAPTER 1. REVIEW OF PROBABILITY 

If the third item above holds for finitely many unions or intersections, then, the collection of subsets is said to 

be a field. 

With the above, (X, F) is termed a measurable space (that is we can associate a measure to this space; which 

we will discuss shortly). For example the full power-set of any set is a σ-field. 

Recall the following: A set is countable if every element of the set can be associated with an integer via an 

ordered mapping. Examples of countable spaces are finite sets and the set Q of rational numbers. An example 

of uncountable sets is the set R of real numbers. 

A σ−field J is generated by a collection of sets A, if J is the smallest σ−field containing the sets in A, and in 

this case, we write J = σ(A). Such a sigma field always exists. 

A σ−field is to be considered as an abstract collection of sets. It is in general not possible to formulate the 

elements of a σ-field through countably many operations of unions, intersections and complements, starting from 

a generating set. 

We will be interested in σ−fields which are countably generated in this course. Such a σ−field has certain 

desirable properties related to the existence of conditional probability measures and properties related to the 

extension theorem and constructions of measures. 

Subsets of σ-fields can be interpreted to represent information about an underlying process. We will discuss 

this interpretation further and this will be a recurring theme in our discussions in the context of stochastic 

control problems. 

1.2.1 Borel σ−field 

An important class of σ-fields is the Borel σ−field on a metric (or more generally topological) space. Such a 

σ−field is the one which is generated by the open sets. The term open naturally depends on the space being 

considered. For this course, we will mainly consider the space of Real numbers and complete, separable and 

metric spaces. Recall that in a metric space with metric d, a set U is open if for every x ∈ U, there exists some 

ǫ > 0 such that {y : d(x, y) < ǫ} ⊂ U. 

The Borel σ−field on R is the one generated by sets of the form (a, b), that is ones generated by open intervals. 

We will denote the Borel σ−field on a space X as B(X). 

We will take a, b to be rational numbers, so that the σ-field is countably generated. 

Not all subsets of R are Borel sets, that is, elements of the Borel σ−field. 

We can also define a Borel σ−field on a product space. let X be a complete, separable, metric space such as R. 

Let X Z denote the infinite product of X. If this space is endowed with the product metric, the open sets are sets 

of the form ∏ i∈Z A i, where only finitely many of these sets are not equal to X and the remaining ones are open. 

We define cylinder sets in this product space as: 

B [Am,m∈I] = {x ∈ X Z , x m ∈ A m , A m ∈ B(X)}, 

where I ⊂ Z with |I| < ∞, that is, the set has finitely many elements. Thus, in the above, if x ∈ B [Am,m∈I], 

then, x m ∈ A m and the remaining terms are arbitrary. The σ−field generated by such cylinder sets forms the 

Borel σ−field on the product space. Such a construction is very important for stochastic processes. 

1.2.2 Measurable Function 

If (X, B(X)) and (Y, B(Y)) are measurable spaces; we say a mapping from h : X → Y is measurable if 

h −1 (B) = {x : h(x) ∈ B} ∈ B(X), 

∀B ∈ B(Y)

1.2. MEASURABLE SPACE 3 

1.2.3 Measure 

A positive measure µ on (X, B(X)) is a function from B(X) to [0, ∞] which is countably additive such that for 

A k ∈ B(X) and A k ∩ A j = ∅: 

( ) ∞∑ 

µ ∪ ∞ k=1 A k = µ(A k ). 

Definition 1.2.1 µ is a probability measure if it is positive and µ(X) = 1. 

k=1 

Definition 1.2.2 A measure µ is finite if µ(X) < ∞ and σ−finite, if there exist a collection of subsets such that 

X = ∪ ∞ k=1 A k with µ(A k ) < ∞ for all k. 

On the real line R, the Lebesgue measure is defined on the Borel σ−field (in fact on a somewhat larger field) 

such that for A = (a, b), µ(A) = b − a. Borel field of subsets is a subset of Lebesgue measurable sets, that is 

there exist Lebesgue measurable sets which are not Borel sets. For a definition and the construction of Lebesgue 

measurable sets, see [4]. 

1.2.4 The Extension Theorem 

Theorem 1.2.1 (Carathéodory’s Extension Theorem) Let M be a field, and suppose that there exists a 

measure P satisfying: (i) P(∅) = 0, (ii) there exists countably many sets A n ∈ X such that X = ∪ n A n , with 

P(A n ) < ∞, (iii) For pairwise disjoint sets A n , if ∪ n A n ∈ M, then P(∪ n A n ) = ∑ n P(A n). Then, there exists 

a unique measure P ′ on the σ−field generated by M, B(M), which is consistent with P on M. 

The above is useful since, when one states that two measures are equal, it suffices to check if they are equal on 

the set of sets which generate the σ−algebra, and not necessarily on the entire σ−field. 

When we apply the above to probability measures defined on a product space, and consider the field of finitely 

many unions and intersections of finite dimensional Borel sets together with the empty set, we obtain the 

following. 

Theorem 1.2.2 (Kolmogorov’s Extension Theorem) Let X be a complete, separable metric space and let 

µ n be a probability measure on X n , the n product of X, for each n = 1, 2, . . ., such that 

µ n (A 1 × A 2 × · · · × A n ) = µ n+1 (A 1 × A 2 × · · · × A n × X), 

for every n and every sequence of Borel sets A k . Then, there exists a unique probability measure µ on (X ∞ , B(X ∞ )) 

which is consistent with each of the µ n ’s. 

For example, if the σ−field on a product space is generated by the sequence of finite dimensional distributions, 

one can define a measure in the product space which is consistent with the finite dimensional distributions. 

Likewise, we can construct the Lebesgue measure by defining it on finitely many unions and intersections of 

intervals of the form (a, b] or [a, b) and extending this to the Borel σ−field. 

1.2.5 Integration 

Let h be a non-negative measurable function from X, B(X) to R, B(R). The Lebesgue integral of h with respect 

to a measure µ can be defined in three steps: 

First, for A ∈ B(X), define 1 {x∈A} (or 1 (x∈A) , or 1 A (x)) as an indicator function for event x ∈ A, that is the 

value that the function takes is 1 if x ∈ A, and 0 otherwise. In this case, ∫ X 1 x∈Aµ(dx) =: µ(A). 

Now, let us define simple functions h such that, there exist A 1 , A 2 , . . . , A n all in B(X) and positive numbers 

b 1 , b 2 , . . .,b n such that h n (x) = ∑ n 

k=1 b k1 x∈Ak . Now, for any given measurable h, there exist a sequence of


simple functions h n such that h n (x) → h(x) monotonically, that is h n+1 (x) ≥ h n (x). We define the limit as the 

Lebesgue integral: 

∫ 

n∑ 

lim h n (dx)µ(dx) = lim b k µ(A k ) 

n→∞ 

n→∞ 

We note that the space of simple functions is dense in the space of bounded measurable functions on R under 

the supremum norm. 

There are three important convergence theorems which we will not discuss in detail, the statements of which will 

be given in class. 

k=1 

1.2.6 Fatou’s Lemma, the Monotone Convergence Theorem and the Dominated 

Convergence Theorem 

Theorem 1.2.3 (Monotone Convergence Theorem) If µ is a σ−finite positive measure on (X, B(X)) and 

{f n , n ∈ Z + } is a sequence of measurable functions from X to R which pointwise, monotonically, converge to f: 

that is, 0 ≤ f n (x) ≤ f n+1 (x) for all n, and 

lim f n(x) = f(x), 

n→∞ 

for µ−almost every x, then 

∫ 

X 

∫ 

f(x)µ(dx) = lim f n (x)µ(dx) 

n→∞ 

X 

Theorem 1.2.4 (Fatou’s Lemma) If µ is a σ−finite positive measure on (X, B(X)) and {f n , n ∈ Z + } is a 

sequence of non-negative measurable functions from X to R, then 

∫ 

∫ 

liminf f n(x)µ(dx) ≤ liminf f n (x)µ(dx) 

n→∞ n→∞ 

X 

Theorem 1.2.5 (Dominated Convergence Theorem) If (i) µ is a σ−finite positive measure on (X, B(X)), 

(ii) g(x) ≥ 0 is a Borel measurable function such that 

∫ 

g(x)µ(dx) < ∞, 

X 

and (iii) {f n , n ∈ Z + } is a sequence of measurable functions from X to R which satisfy |f n (x)| ≤ g(x) for 

µ−almost every x, and lim n→∞ f n (x) = f(x), then: 

∫ 

∫ 

f(x)µ(dx) = lim f n (x)µ(dx) 

n→∞ 

X 

Note that for the monotone convergence theorem, there is no restriction on boundedness; whereas for the dominated 

convergence theorem, there is a boundedness condition. On the other hand, for the dominated convergence 

theorem, the pointwise convergence does not have to be monotone. 

X 

X 

1.3 Probability Space and Random Variables 

Let (Ω, F) be a measurable space. If P is a probability measure, then the triple (Ω, F, P) forms a probability 

space. Here Ω is a set called the sample space. F is called the event space, and this is a σ−field of subsets of Ω. 

Let (E, E) be another measurable space and X : (Ω, F, P) → (E, E) be a measurable map. We call X an 

E−valued random variable. The image under X is a probability measure on (E, E), called the law of X. 

The σ-field generated by the events {{w : X(w) ∈ A}, A ∈ E} is called the σ−field generated by X and is denoted 

by σ(X).

1.3. PROBABILITY SPACE AND RANDOM VARIABLES 5 

Consider a coin flip process, with possible outcomes {H, T }, heads or tails. We have a good intuitive understanding 

on the environment when someone tells us that, a coin flip takes the value H with probability 1/2. Based on 

the definition of a random variable, we view then a con flip process as a deterministic function from some space 

(Ω, F, P) to a space of heads and tails. Here, P denotes the uncertainty measure (think of the initial condition 

of the coin when it is being flipped, the flow of air, the surface where the coin touches etc.; we summarize all 

these aspects and all the uncertainty in the universe by the abstract space (Ω, F, P)). 

1.3.1 Discrete-Time Stochastic Processes 

One can define a sequence of random variables as a single random variable living in the product set, that is we 

can consider {x 1 , x 2 , . . . , x N } as an individual random variable X which is an (E × E × . . . E)−valued random 

variable in the measurable space (E × E × . . . E, B(E × E × . . . E)), where now the fields are to be defined on 

the product set. In this case, the probability measure induced by a random sequence is defined on the σ−field 

B(E × E × . . . E). We can extend the above to a case when N = Z + , that is, the process is infinite dimensional. 

Let X be a complete, separable, metric space. Let B(X) denote the Borel sigma-field of subsets of X. Let Σ = X T 

denote the sequence space of all one-sided or two-sided (countable or uncountably) infinite variables drawn from 

X. Thus, if T = Z, x ∈ Σ then x = {. . .,x −1 , x 0 , x 1 , . . . } with x i ∈ X. 

Let X n : Σ → X denote the coordinate function such that X n (x) = x n . Let B(Σ) denote the smallest sigma-field 

containing all cylinder sets of the form {x : x i ∈ B i , m ≤ i ≤ n} where B i ∈ B(X), for all integers m, n. We 

can define a probability measure by characterization of these finite dimensional cylinder sets, by the extension 

theorem. 

A similar characterization also applies for continuous-time stochastic processes, where T is uncountable. The 

extension require more delicate arguments, since finite-dimensional characterizations are too weak to uniquely 

define a sigma-field which is consistent with such distributions. Such technicalities arise in the discussion for 

continuous-time Markov chains and controlled processes, typically requiring a construction from a separable 

product space. 

1.3.2 More on Random Variables and Probability density functions 

Consider a probability space (X, B(X), P) and consider an R−valued random variable U measurable with respect 

to (X, B(X)). 

This random variable induces a probability measure µ on B(R) such that for some (a, b] ∈ B(R): 

µ((a, b]) = P(U ∈ (a, b]) = P({x : U(x) ∈ (a, b]}) 

When U is R−valued, the Expectation of U is defined as: 

∫ 

E[U] = µ(dx)x 

We define F(x) = µ(−∞, x] as the cumulative distribution function of U. If the derivative of F(x) exists, we 

call this derivative the density function of U, and denote it by p(x) (the distribution function is not always 

differentiable, for example when the random variable is only defined on integers). If a density function exists, 

we can also write: 

∫ 

E[U] = p(x)xdx 

R 

R 

If a probability density function p exists, the measure P is said to be absolutely continuous with respect to the 

Lebesgue measure. In particular, the density function p is the Radon-Nikodym derivative of P with respect to 

the Lebesgue measure in the sense that for all Borel A: ∫ A 

p(x)λ(dx) = P(A). 

A probability density does not always exist. In particular, whenever there is a probability mass around a given 

point, then a density does not exist; hence in R, if for some x, p(x) > 0, then we say there is a probability mass 

at x, and a density function does not exist.


Some examples of commonly encountered random variables are as follows: 

• Gaussian (N(µ, σ 2 )): 

f(x) = 1 √ 

2πσ 

e −(x−µ)2 /2σ 2 . 

• Exponential: 

• Uniform on [a, b] (U([a, b])): 

• Binomial (B(n, p)): 

F(x) = 1 − e −λx , p(x) = λe −λx . 

F(x) = x − a 

b − a , p(x) = 1 , x ∈ [a, b]. 

b − a 

p(k) = 

( n 

k) 

p k (1 − p) n−k 

If n = 1, we also call a binomial variable, a Bernoulli variable. 

1.3.3 Independence and Conditional Probability 

Consider A, B ∈ B(X) such that P(B) > 0. The quantity 

P(A|B) = 

P(A ∩ B) 

P(B) 

is called the conditional probability of event A given B. This is itself a probability measure. If 

Then, A, B are said to be independent events. 

P(A|B) = P(A) 

A countable collection of events {A n } is independent if for any finitely many sub-collections A i1 , A i2 , . . . , A im , 

we have that 

P(A i1 , A i2 , . . . , A im ) = P(A i1 )P(A i2 )...P(A im ). 

Here, we use the notation P(A, B) = P(A ∩ B). A sequence of events is said to be pairwise independent if for 

any two pairs (A m , A n ): P(A m , A n ) = P(A m )P(A n ). 

Pairwise independence is weaker than independence. 

1.4 Markov Chains 

One can define a sequence of random variable as a single random variable living in the product set, that is we 

can consider {x 1 , x 2 , . . . , x N } as an individual random variable X which is an (E × E × . . . E)−valued random 

variable in the measurable space (E × E × . . . E, B(E × E × . . . E)), where now the fields are to be defined on 

the product set. In this case, the probability measure induced by a random sequence is defined on the σ−field 

B(E × E × . . . E). If the probability measure for this sequence is such that 

P(x k+1 ∈ A k+1 |x k , x k−1 , . . . , x 0 ) = P(x k+1 ∈ A k+1 |x k ) P.a.s., 

then {x k } is said to be a Markov chain. Thus, for a Markov chain, the immediate state is sufficient to predict 

the future; past variables are not needed. 

As an example, consider the following linear system: 

x t+1 = ax t + w t ,

1.5. EXERCISES 7 

where w t is an independent random variable. This sequence is Markov. Another way to construct a Markov 

chain is via the following: Let {x t , t ≥ 0} be a random sequence with state space (X, B(X)), and defined on a 

probability space (Ω, F, P), where B(X) denotes the Borel σ−field on X, Ω is the sample space, F a sigma field 

of subsets of Ω, and P a probability measure. For x ∈ X and D ∈ B(X), we let P(x, D) := P(x t+1 ∈ D|x t = x) 

denote the transition probability from x to D, that is the probability of the event {x t+1 ∈ D} given that x t = x. 

If the probability of the same event given some history of the past (and the present) does not depend on the 

past, and hence is given by the same quantity, the chain is a Markov chain. 

Thus, the Markov chain is completely determined by the transition probability and the probability of the initial 

state, P(x 0 ) = p 0 . Hence, the probability of the event {x t+1 ∈ D} for any t can be computed recursively by 

starting at t = 0, with P(x 1 ∈ D) = ∑ x P(x 1 ∈ D|x 0 = x)P(x 0 = x), and iterating with a similar formula for 

t = 1, 2, . . .. 

The above requires some justification. Kolmogorov’s extension theorem, is one way we can build on to extend 

the probability measures successively. 

We will continue our discussion on Markov chains after discussing controlled Markov chains. 

1.5 Exercises 

Exercise 1.5.1 a) Let F β be a σ−field of subsets of some space X for all β ∈ Γ where Γ is a family of indices. 

Let 

F = ⋂ β∈Γ 

F β 

Show that F is also a σ−field. 

For a space X, on which a distance metric is defined, the Borel σ−field is generated by the collection of open sets. 

This means that, the Borel σ−field is the smallest σ−field containing open sets, and as such it is the intersection 

of all σ-fields containing open sets. On R, the Borel σ−field is the smallest σ−field generated by open intervals. 

b) Is the set of rational numbers an element of the Borel σ-field on R? Is the set of irrational numbers an 

element? 

c) Let X be a countable set. On this set, let us define a metric as follows: 

{ 

0,if x = y 

d(x, y) = 

1,if x ≠ y 

Show that, the Borel σ−field on X is generated by the individual sets {x}, x ∈ X. 

d) Consider the distance function d(x, y) defined as above defined on R. Is the σ−field generated by open sets 

according to this metric the same as the Borel σ−field on R? 

Exercise 1.5.2 a) Let X and Y be real-valued random variables defined on a given probability space. Show that 

X 2 and X + Y are also random variables. 

b) Let F be a σ−field of subsets over a set X and let A ∈ F. Prove that {A ∩ B, B ∈ F} is a σ−field over A 

(that is a σ−field of subsets of A). 

Exercise 1.5.3 Consider a random variable X which takes values in [0, 1] and is uniformly distributed. We 

know that for every set S = [a, b) 0 ≤ a ≤ b ≤ 1, the probability of a random variable X taking values in S is 

P(X ∈ [a, b)) = U([a, b]) = b − a. Consider now the following question? Does every subset S ⊂ [0, 1] admit a 

probability in the form above? In the following we will provide a counterexample, known as the Vitali set. 

We can show that a set picked as follows does not admit a measure: Let us define an equivalence class such that 

x ≡ y if x − y ∈ Q. This equivalence definition divides [0, 1] into disjoint sets. Let A be a subset which picks 

exactly one element from each equivalent class (here, we adopt what is known as the Axiom of Choice). Since 

A contains an element from each equivalence class, each pint of [0, 1] is contained in the union ∪ q∈Q (A ⊕ q).


For otherwise there would be a point x which were not in any equivalence class. But x is equivalent with itself 

at least. Furthermore, since A contains only one point from each equivalence class, the sets A ⊕ q are disjoint; 

for otherwise there would be two sets which could include a common point: A ⊕ q and A ⊕ q ′ would include a 

common point, leading to the result that the difference x − q = z and x − q ′ = z are both in A, a contradiction, 

since there should be at most one point which is in the same equivalence class as x − q = z. We expect the 

uniform distribution to be shift-invariant, therefore P(A) = P(A ⊕ q). But [0, 1] = ∪ q A ⊕ q. Since a countable 

sum of identical non-negative elements can either become ∞, or 0, the contradiction follows: We can’t associate 

a number with this set.

Chapter 2 

Controlled Markov Chains 

In the following, we discuss controlled Markov models. 

2.1 Controlled Markov Models 

Consider the following model. 

x t+1 = f(x t , u t , w t ), (2.1) 

where x t is a X-valued state variable, u t is U-valued control action variable, w t a W-valued an i.i.d noise process, 

and f is a measurable function. We assume that X, U, W are subsets of Polish spaces. The model above in (2.1) 

contains (see [8]) the class of all stochastic processes which satisfy the following for all Borel sets B ∈ B(X), 

t ≥ 0, and all realizations x [0,t] , u [0,t] : 

P(x t+1 ∈ B|x [0,t] = a [0,t] , u [0,t] = b [0,t] ) = T (x t+1 ∈ B|a t , b t ) (2.2) 

where T (·|x, u) is a stochastic kernel from X × U to X. A stochastic process which satisfies (2.2) is called a 

controlled Markov chain. 

2.1.1 Fully Observed Markov Control Problem Model 

A Fully Observed Markov Control Problem is a five tuple 

where 

• X is the state space, a subset of a Polish space. 

• U is the action space, a subset of a Polish space. 

(X, U, {U(x), x ∈ X}, T, c), 

• K = {(x, u) : u ∈ U(x) ∈ B(U), x ∈ X} is the set of state, control pairs that are feasible. There might be 

different states where different control actions are possible. 

• T is a state transition kernel, that is T (A|x t , u t ) = P(x t+1 ∈ A|x t , u t ). 

• c : K → R is the cost. 

Consider for now that the objective to be minimized is given by: J(x 0 , Π) := E Π ν 0 

[ ∑ T −1 

t=0 c(x t, u t )], where ν 0 is 

the initial probability measure, that is x 0 ∼ ν 0 . The goal is to find a policy Π ∗ (to be defined next) such that 

J(x 0 , Π ∗ ) ≤ J(x 0 , Π), ∀Π, 

such that Π is an admissible policy to be defined below. Such a Π ∗ is an optimal policy. Here Π can also be 

called the strategy, or law. 

9

10 CHAPTER 2. CONTROLLED MARKOV CHAINS 

2.1.2 Classes of Control Policies 

Admissible Control Policies 

Let H 0 := X, H t = H t−1 × K for t = 1, 2, . . .. We let h t denote an element of H t , where h t = {x [0,t] , u [0,t−1] }. 

A deterministic admissible control policy Π is a sequence of functions {γ t } from H t → U; in this case 

u t = γ t (h t ). A randomized control policy is a sequence Π = {Π t , t ≥ 0} such that Π : H t → P(U) (with P(U) 

being the space of probability measures on U) such that 

Π t (u t ∈ U(x t )|h t ) = 1, ∀h t ∈ H t . 

Markov Control Policies 

A policy is randomized Markov if 

P Π x 0 

(u t ∈ C|h t ) = Π t (u t ∈ C|x t ), 

C ∈ B(U). 

Hence, the control action only depends on the state and the time, and not the past history. If the control strategy 

is deterministic, that is if 

Π t (u t = f t (x t )|x t ) = 1. 

for some function f t , the control policy is said to be deterministic Markov. 

Stationary Control Policies 

A policy is randomized stationary if 

P Π x 0 

(u t ∈ C|h t ) = Π(u t ∈ C|x t ), 

C ∈ B(U). 

Hence, the control action only depends on the state, and not the past history or on time. If the control strategy is 

deterministic, that is if Π(u t = f(x t )|x t ) = 1 for some function f, the control policy is said to be deterministic 

stationary. 

2.2 Markov Chain Induced by a Markov Policy 

The following is an important result: 

Theorem 2.2.1 Let the control policy be randomized Markov. Then, the controlled Markov chain becomes a 

Markov chain in X, that is, the state process itself becomes a Markov chain: 

P Π x 0 

(x t+1 ∈ B|x t , x t−1 , . . .,x 0 ) = Q Π t (x t+1 ∈ B|x t ), 

B ∈ B(X), t ≥ 1, P.a.s. 

Proof: Let us consider the case where U is countable. Let B ∈ B(X). It follows that, 

Px Π 0 

(x t+1 ∈ B|x t , x t−1 , . . . , x 0 ) 

= ∑ Px Π 0 

(x t+1 ∈ B, u t |x t , x t−1 , . . . , x 0 ) 

u t 

= ∑ Px Π 0 

(x t+1 ∈ B|u t , x t , x t−1 , . . . , x 0 )Px Π 0 

(u t |x t , x t−1 , . . .,x 0 ) 

u t 

= ∑ u t 

Q Π x 0 

(x t+1 ∈ B|u t , x t )π t (u t |x t ) 

= ∑ u t 

Q Π x 0 

(x t+1 ∈ B, u t |x t ) 

= Q Π t (x t+1 ∈ B|x t ) (2.3)

2.3. PERFORMANCE OF POLICIES 11 

The essential issue here is that, the control only depends on x t , and since x t+1 depends stochastically only on x t 

and u t , the desired result follows. In the above, we used properties from total probability and conditioning. ⋄ 

We note that, if the applied control policy is not only Markov but also stationary, then the induced Markov 

chain in X is time-homogeneous, that is the stochastic kernel Q Π t (.|.) does not depend on time. 

2.3 Performance of Policies 

Consider a Markov control problem with an objective given as the minimization of 

T∑ 

−1 

J(ν 0 , Π) = Eν Π 0 

[ c(x t , u t )] 

where ν 0 denotes the distribution on x 0 . If x 0 = x, we then write 

t=0 

T∑ 

−1 

T∑ 

−1 

J(x 0 , Π) = Ex Π 0 

[ c(x t , u t )] = E Π [ c(x t , u t )|x 0 ] 

Such a cost problem is known as a Finite Horizon Optimal Control problem. 

We will also study costs of the following form: 

t=0 

t=0 

J(ν 0 , Π) = 1 T∑ 

−1 

T EΠ ν 0 

[ c(x t , u t )] 

Such a problem is known as the Average Cost Optimal Control Problem. 

Finally, we will consider costs of the following form: 

t=0 

T∑ 

−1 

J(ν 0 , Π) = Eν Π 0 

[ β t c(x t , u t )], 

for some β ∈ (0, 1). This is called a Discounted Optimal Control Problem. 

Let Π A denote the class of admissible policies, Π M denote the class of Markov policies, Π S denote the class of 

Stationary policies. These policies can be both randomized or deterministic. 

In a general setting, we have the following: 

inf J(ν 0 , Π) ≤ 

Π∈Π A 

since the set of policies is progressively shrinking 

t=0 

inf J(ν 0 , Π) ≤ 

Π∈Π M 

Π S ⊂ Π M ⊂ Π A 

inf J(ν 0 , Π), 

Π∈Π S 

We will show, however, that for the optimal control of a Markov chain, under mild conditions, Markov policies 

are always optimal (that is there is no loss in optimality in restricting the policies to be Markov); that is, it is 

sufficient to consider Markov policies. This is an important result in stochastic control. That is, 

inf J(ν 0 , Π) = inf J(ν 0 , Π) 

Π∈Π A Π∈Π M 

We will also show that, under somewhat more restrictive conditions, Stationary policies are optimal (that is, 

there is no loss in optimality in restricting the policies to be Stationary). This will typically require T → ∞. 

inf J(ν 0 , Π) = inf J(ν 0 , Π) 

Π∈Π A Π∈Π S 

Furthermore, we will show that, under some conditions, 

inf J(ν 0 , Π) 

Π∈Π S


is independent of the initial distribution (or initial condition) on x 0 . 

The last two results are computationally very important, as there are powerful computational algorithms that 

allow one to develop such stationary policies. 

In the following set of notes, we will first consider further properties of Markov chains, since under a Markov 

control policy, the controlled state becomes a Markov chain. We will then get back to the Controlled Markov 

Chains and the development of optimal control policies. 

The classification of Markov Chains in the next topic will implicitly characterize the set of problems for which 

Stationary policies contain optimal admissible policies. 

2.3.1 Partially Observed Model 

Consider the following model. 

x t+1 = f(x t , u t , w t ), y t = g(x t , v t ) 

Here, as before, x t is the state, u t ∈ U is the control, (w t , v t ) ∈ W × V are second order, zero-mean, i.i.d noise 

processes and w t is independent of v t . In addition to the previous fully observed model, y t denotes an observation 

variable taking values in Y, a subset of R n in the context of this review. The controller only has causal access to 

the second component {y t } of the process. An admissible policy {Π} is measurable with respect to σ({y s , s ≤ t}). 

We denote the observed history space as: H 0 := P, H t = H t−1 × Y × U. Hence, the set of (wide-sense) causal 

control policies are such that P(u(h t ) ∈ U|h t ) = 1 ∀h t ∈ H t . 

One could transform a partially observable Markov Decision Problem to a Fully Observed Markov Decision 

Problem via an enlargement of the state space. In particular, we obtain via the properties of total probability 

the following dynamical recursion 

π t (A) : = P(x t ∈ A|y [0,t] , u [0,t−1] ) 

∫ 

X 

= 

π t−1(dx t−1 )r(y t |x t )P(dx t |x t−1 , u t−1 ) 

∫ ∫X π t−1(dx t−1 )r(y t |x t )P(dx t |x t−1 , u t−1 ) , 

X 

where we assume that ∫ B r(y|x)dy = P(y t ∈ B|x t = x) for any B ∈ B(Y) and r denotes the density process. 

The conditional measure process becomes a controlled Markov chain in P(X), which we endow with the weak 

convergence topology. 

Theorem 2.3.1 The process {π t , u t } is a controlled Markov chain. That is, under any admissible control policy, 

given the action at time t ≥ 0 and π t , π t+1 is conditionally independent from {π s , u s , s ≤ t − 1}. 

Let the cost function to be minimized be 

T∑ 

−1 

t=0 

E Π x 0 

[c(x t , u t )], 

where E Π x 0 

[] denotes the expectation over all sample paths with initial state given by x 0 under policy Π. 

We ∫ transform the system into a fully observed Markov model as follows. Define the new cost as ˜c(π, u) = 

X 

c(x, u)π(dx), π ∈ P(X). The stochastic transition kernel q is given by: 

∫ 

q(dx, dy|π, u) = P(dx, dy|x ′ , u)π(dx ′ ), π ∈ P(X) 

And, this kernel can be decomposed as q(dx, dy|π, u) = P(dy|π, u)P(dx|π, u, y). 

X 

The second term here is the filtering equation, mapping (π, u, y) ∈ (P(X) × U × Y) to P(X). It follows that 

(P(X), U, K, ˜c) defines a completely observable controlled Markov process. Here, we have 

∫ 

K(B|π, u) = 1 (P(.|π,u,y)∈B) P(dy|π, u), ∀B ∈ B(P(X)), 

with 1 (.) denoting the indicator function. 

Y


2.4 Exercises 

Exercise 2.4.1 Suppose that there are two decision makers DM 1 and DM 2 . Suppose that the information 

available to to DM 1 is a random variable Y 1 and the information available to DM 2 is Y 2 , where these random 

variables are measurable on a probability space (Ω, F, P). 

Suppose that the sigma-field generated by Y 1 is a subset of the sigma-field generated by Y 2 , that is σ(Y 1 ) ⊂ σ(Y 2 ). 

Further, suppose that the decision makers wish to minimize the following cost function: 

E[c(ω, u i )], 

where c : Ω × U → R + is a measurable cost function, with u i ∈ U a Borel space. Here, for i = 1, 2, u i = γ i (y i ) 

is generated by a measurable function γ i on the sigma-field generated by the random variable Y i . Let Γ i denote 

the space of all measurable policies. 

Prove that 

inf E[c(ω, 

γ 1 ∈Γ U1 )] ≥ inf E[c(ω, 

1 γ 2 ∈Γ U2 )]. 

2 

Exercise 2.4.2 Consider a line of customers at a service station (such as at an airport, a grocery store, or a 

communication network where customers are packets). Let L t be the length of the line, that is the total number 

of customers waiting in the line. 

Let there be M t servers, serving the customers at time t. Let there be a manager (controller) who decides on the 

number of servers to be present. Let each of the servers be able to serve N customers for every time-stage. The 

dynamics of the line can be expressed as follows: 

L t+1 = L t + A t − M t N1 (Lt≥NM t), 

where 1 (E) is the indicator function for event E, i.e., it is equal to zero if E does not occur and is equal to 1 

otherwise. In the equation above, A t is the number of customers that have just arrived at time t. We assume 

{A t } to be an independent process, and to have an exponential distribution, with mean λ, that is for all k ∈ N 

The manager has only access to the information vector 

P(A(t) = k) = λk e −λ 

, k ∈ {0, 1, 2, . . ., } 

k! 

I t = {L 0 , L 1 , . . .,L t ; A 0 , A 1 , . . . , A t−1 }, 

while implementing his policies. A consultant proposes a number of possible policies to be adopted by the manager: 

According to Policy A, the number of servers is given by: 

According to Policy B, the number of servers is 

M t = L t + L t−1 + L t−3 

. 

2 

M t = L t+1 + L t + L t−1 

. 

2 

According to Policy C, 

Finally, according to Policy D, the update is 

M t = ⌈λ + 0.1⌉1 t≥10 . 

M t = ⌈λ + 0.1⌉ 

a) Which of these policies are admissible, that is measurable with respect to the σ−field generated by I t ? Which 

are Markov, or stationary?


b) Consider the model above. Further, suppose policy D is adopted by the manager. Consider the induced Markov 

chain by the policy, that is, consider a Markov chain with the following dynamics: 

with A t as described the the earlier question. 

Is this Markov chain irreducible? 

L t+1 = L t + A t − ⌈λ + 0.1⌉N1 (Lt≥⌈λ+0.1⌉)N, 

Is there an absorbing set in this Markov chain? If so, what is it?

Chapter 3 

Classification of Markov Chains 

3.1 Countable State Space Markov Chains 

In this chapter, we first review Markov chains where the state takes values in a finite set or a countable space. 

A Markov chain {x t } on a countable state space is a collection of random variables {x t , t ∈ Z + }, where each of 

the variables take place in a countable set X. As such, we will call the collection of such variables a chain Φ 

which takes values in the (one-sided or two-sided) infinite-product space X ∞ . The Borel σ−field generated on 

the infinite space is the one generated by the finite dimensional sets of the form 

{x ∈ X ∞ : x [m,n] = {a m , a m+1 , . . . , a n }, a k ∈ X, m ≤ k ≤ n, m, n ∈ Z + }. 

We assume that ν 0 is an initial distribution for the Markov chain for x 0 . 

As such, the process Φ = {x 0 , x 1 , . . .,x n , . . . } is a (time-homogeneous) Markov chain and satisfies: 

P ν0 (x 0 = a 0 , x 1 = a 1 , x 2 = a 2 , . . .,x n = a n ) 

= ν(x 0 = a 0 )P(x 1 = a 1 |x 0 = a 0 )P(x 2 = a 2 |x 1 = a 1 )...P(x n = a n |x n−1 = a n−1 ) (3.1) 

If the initial condition is known to be x 0 , we use P x0 (· · · ) in place of P ν0 (· · · ). 

We could also write the evolution in terms of a matrix. 

P(i, j) = Pr(x t+1 = j|x t = i) ≥ 0, ∀i, j ∈ X 

Here P(·, ·) is also a probability transition kernel, that is for every i ∈ X, P(i, .) is a probability measure on X. 

Thus, ∑ j 

P(i, j) = 1. 

By induction, we could verify that 

P k (i, j) := P(x t+k = j|x t = i) = ∑ m∈X 

P(i, m)P k−1 (m, j) 

In the following, we characterize Markov Chains based on transience, recurrence and communication. We then 

consider the issue of the existence of an invariant distribution. Later, we will extend the analysis to uncountable 

space Markov chains. 

Let us consider a discrete-time, time-homogeneous Markov process living in a countable state space X. 

Communication 

If there exists an integer k such that P(x t+k = j|x t = i) = P k (i, j) > 0, and another integer l such that 

P(x t+l = i|x t = j) = P l (j, i) > 0 then state i communicates with state j. 

15

16 CHAPTER 3. CLASSIFICATION OF MARKOV CHAINS 

A set C ⊂ X is said to be communicating if every two elements (states) of C communicate with each other. 

If every member of the set can communicate to every other member, such a chain is said to be irreducible. 

The period of a state is defined to be the greatest common divisor of {k > 0 : P k (i, i) > 0}. 

A Markov chain is called aperiodic if the period of all states is 1. 

In the following, throughout the notes, we will assume that the Markov process under consideration is aperiodic 

unless mentioned otherwise. 

Absorbing Set 

A set C is called absorbing if P(i, C) = 1 for all i ∈ C. That is, if the state is in C, then the state cannot get 

out of the set C. 

A finite set Markov chain in space X is irreducible if the smallest absorbing set is the entire X itself. 

A Markov chain in X is indecomposable if X does not contain two disjoint absorbing sets. 

Occupation, Hitting and Stopping Times 

For any set A ∈ X, the occupation time η A is the number of visits of ψ to set A: 

η A = 

∞∑ 

t=1 

1 {xt∈A}, 

where 1 (E) denotes the indicator function for an event E, that is, it takes the value 1 when E takes place, and 

is otherwise 0. 

Define 

τ A = min{k > 0 : x k ∈ A}, 

is the first time that the state visits state i, known as the hitting time. 

We define also a return time: 

τ A = min{k > 0 : x k ∈ A, x 0 ∈ A}. 

A Brief Discussion on Stopping Times 

The variable τ A defined above is an example for stopping times: 

Definition 3.1.1 A function τ from a measurable space (X ∞ , B(X ∞ )) to (N + , B(N + )) is a stopping time if for 

all n ∈ N + , the event {τ = n} ∈ σ(x 0 , x 1 , x 2 , . . . , x n ), that is the event is in the sigma-field generated by the 

random variables up to time n. 

Any realistic decision takes place at a time which is measurable. For example if an investor wants to stop 

investing when he stops profiting, he can stop at the time when the investment loses value, this is a stopping 

time. But, if an investor claims to stop investing when the investment is at the peak, this is not a stopping time 

because to find out whether the investment is at its peak, the next state value should be known, and this is not 

measurable in a causal fashion. 

One important property of Markov chains is the so-called Strong Markov Property. This says the following: 

Consider a Markov chain which evolves on a countable set. If we sample this chain according to a stopping time 

rule, the sampled Markov chain starts from the sampled instant as a Markov chain: 

Proposition 3.1.1 For a (time-homogenous) Markov chain in a countable state space X, the strong Markov 

property always holds. That is, the sampled chain itself is a Markov chain; if τ is a stopping time with P(τ

3.1. COUNTABLE STATE SPACE MARKOV CHAINS 17 

∞) = 1, then for any m ∈ N + : 

for all a, b 0 , b 1 , · · ·. 

P(x τ+m = a|x τ = b 0 , x τ −1 = b 1 , . . .) = P(x τ+m = a|x τ = b 0 ) = P m (b 0 , a), 

3.1.1 Recurrence and Transience 

Let us define 

and let us define 

∞∑ 

∞∑ 

U(x, A) := E x [ 1 (xt∈A)] = P t (x, A) 

t=1 

t=1 

L(x, A) := P x (τ A < ∞), 

which is the probability of the chain visiting set A, once the process starts at state x. 

These two are important characterizations for Markov chains. 

Definition 3.1.2 A set A ⊂ X is recurrent if the Markov chain visits A infinitely often (in expectation), when 

the process starts in A. This is equivalent to 

E x [η A ] = ∞, ∀x ∈ A (3.2) 

That is, if the chain starts at a given state x ∈ A, it comes back to the set A, and does so infinitely often. If a 

state is not recurrent, it is transient. 

Definition 3.1.3 A set A ⊂ X is positive recurrent if the Markov chain visits A infinitely often, when the 

process starts in A and in addition: 

E x [τ A ] < ∞, ∀x ∈ A 

Definition 3.1.4 A state α is transient if 

The above is equivalent to the condition that 

which in turn is implied by 

U(α, α) = E α [η α ] < ∞, (3.3) 

∞∑ 

P i (α, α) < ∞, 

i=1 

P i (τ i < ∞) < 1. 

While discussing recurrence, there is an equivalent, and often easier to check condition: If E i [τ i ] < ∞, then the 

state {i} is positive recurrent. If L(i, i) = P i (τ i < ∞) = 1, but E i [τ i ] = ∞, then i is recurrent (also called, null 

recurrent). 

The reader should connect the above with the strong Markov property: once the process hits a state, it starts 

from the state as if the past never happened; the process recurs itself. 

We have the following. The proof is presented later in the chapter, see Theorem 3.3.1. 

Theorem 3.1.1 If P i (τ i < ∞) = 1, then P i (η i = ∞) = 1. 

There is a more general notion of recurrence, named Harris recurrence, which is exactly the condition that 

P i (η i = ∞) = 1. We will investigate this further below while studying uncountable state space Markov chains, 

however one needs to note that even for countable state space chains Harris recurrence is stronger than recurrence. 

For a finite state Markov chain it is not difficult to verify that (3.3) is equivalent to L(i, i) < 1, since P(τ i (k) < 

∞) = P(τ i (k − 1) < ∞)P(τ i (1) < ∞) and that E[η] = ∑ k 

P(η ≥ k). 

If every state in X is recurrent, then the chain is recurrent, if all are transient, then the chain is transient.


3.2 Stability and Invariant Distributions 

Stability is an important concept, but it has different meanings in different contexts. In a stochastic setting, 

stability means that the state does not grow large too often, and the average over time (temporal) converges to 

a statistical average. 

If a Markov chain starts at a given time, in the long-run, the chain may forget its initial condition, that is, the 

probability distribution at a later time will be less and less dependent on the initial distribution. Given an initial 

state distribution, the probability distribution on the state at time 1 is given by: 

π 1 = π 0 P 

And for t > 1: 

π t+1 = π t P = π 0 P t+1 

One important property of Markov chains is whether the above iteration leads to a fixed point in the set of 

probability measures. Such a fixed point π is called an invariant distribution. A distribution in a countable 

state Markov chain is invariant if 

π = πP 

This is equivalent to 

π(j) = ∑ i∈X 

π(i)P(i, j), ∀j ∈ X 

We note that, if such a π exists, it must be written in terms of π = π 0 lim t→∞ P t , for some π 0 . Clearly, π 0 can 

be π itself, but often π 0 can be any initial distribution under irreducibility conditions which will be discussed 

further. Invariant distributions are especially important in networking problems and stochastic control, due 

to the Ergodic Theorem (which shows that temporal averages converge to statistical averages), which we will 

discuss later in the semester. 

3.2.1 Invariant Measures via an Occupational Characterization 

Theorem 3.2.1 For a Markov chain, if there exists an element i such that E i [τ i ] < ∞; the following is an 

invariant measure: 

[∑ τi−1 

k=0 

µ(j) = E 

1 ] 

x k =j 

|x 0 = i , ∀j ∈ X 

E i [τ i ] 

The invariant measure is unique if the chain is irreducible. 

Proof: 

We show that 

[∑ τi−1 

k=0 

E 

1 ] 

x k =j 

|x 0 = i 

E[τ i ] 

= ∑ s 

[∑ τi−1 

k=0 

P(s, j)E 

1 ] 

x k =s 

|x 0 = i 

E[τ i ]

3.2. STABILITY AND INVARIANT DISTRIBUTIONS 19 

Note that E[1 xt+1=j] = P(x t+1 = j). Hence, 

∑ 

s 

[∑ τi−1 

k=0 

P(s, j)E 

1 ] 

x k =s 

|x 0 = i 

E i [τ i ] 

= E 

[∑ τi−1 ∑ 

k=0 

E i [τ i ] 

[ ] 

∑τi−1 ∑ 

E 

k=0 s 1 x k =sE[1 xk+1 =j|x k = s, x 0 = i]|x 0 = i 

= 

E i [τ i ] 

[ ] 

∑τi−1 

E 

k=0 E[∑ s 1 x k =s1 xk+1 =j|x k = s, x 0 = i]|x 0 = i 

= 

E i [τ i ] 

[ ] 

∑τi−1 ∑ 

E 

k=0 s 1 x k =s1 xk+1 =j|x 0 = i 

= 

E i [τ i ] 

[∑ τi−1 

k=0 

= E 1 ] [∑ τi 

x k+1 =j 

k=1 i = E E[1 ] 

x k =j] 

i 

E i [τ i ] 

E i [τ i ] 

[∑ τi−1 

k=0 

= E E[1 ] 

x k =j] 

i = µ(j), 

E i [τ i ] 

s P(s, j)1 x k =s 

] 

|x 0 = i 

(3.4) 

where we use the fact that the number of visits to a given set does not change whether we include or exclude 

time t = 0 or τ i , since any state j ≠ i is not visited at these times. Here, (3.4) follows from the law of the iterated 

expectations, see Theorem 4.1.3. This concludes the proof. 

⊓⊔ 

Remark: If E[τ i ] < ∞, then the above measure becomes a probability measure, as it follows that 

∑ 

µ(j) = 

∑ 

j 

[∑ τi−1 

k=0 

E 

1 ] 

x k =j 

|x 0 = i 

E[τ i ] 

For example, for a random walk E[τ i ] = ∞, hence there does not exist a unique invariant probability measure. 

But, it has an invariant measure: µ(i) = K, for a constant K. The reader can verify this. 

= 1. 

⊓⊔ 

Theorem 3.2.2 For an irreducible Markov chain, positive recurrence implies the existence of an invariant distribution 

with Π(i) = 1 

E i[τ i] 

. Furthermore the invariant distribution is unique. 

Proof: From Theorem 3.2.1, it follows that π(i) = 1 

E i[τ i] 

is satisfied for an invariant measure. That, E i [τ i ] is 

finite follows from irreducibility of a finite state Markov chain. We will prove the uniqueness result for the case 

where P(i, j) > ǫ for some ǫ > 0 for all i, j ∈ X. 

First, we will show that the absorbing set has to be the entire state space, and the invariant measure should be 

non-zero on each set in the state space. 

We then show that, the measures cannot differ on any given state. Let i be such that π(i) = 0, for an invariant 

distribution. Then, for X − i, it follows that 

π(X \ {i}) = 

∑ 

( ∑ 

) 

π(j)P(j, X \ {i}) + 0P(i, X \ {i}) ≤ π(j) P(j, X \ {i}) < π(X \ {i})(1 − ǫ), 

j∈X\{i} 

j∈X\{i} 

since there exists some ǫ > 0 such that P(j, i) > ǫ for all j ∈ X − i. We now show that, there cannot be two 

different distributions. 

Let π(i) and π ′ (i) be two different distributions. By the previous argument, they have the same support set. It 

must be that for some linear functions F, G: 

Fπ(i) = G[π(j), j ∈ X \ {i}]


and 

Fπ ′ (i) = G[π ′ (j), j ∈ X \ {i}] 

Let us define J := {i : π(i) ≥ π ′ (i)}, then for some linear functions with positive coefficients: 

F J [π(j), j ∈ J] = G −J [π(j), j ∈ X \ J] 

F J [π ′ (j), j ∈ J] = G −J [π ′ (j), j ∈ X \ J] 

and by linearity, the same should hold for the difference: 

F J [π(j) − π ′ (j), j ∈ J] = G −J [π(j) − π ′ (j), j ∈ X \ J] 

But, the terms in the left hand vector are all non-negative, and the terms on the other hand are negative (since 

the sums in π(i) equals 1, the terms on the right hand side should be strictly negative). Since the matrices 

multiplying the vectors are all positive (this is where irreducibility is used), this is a contradiction. We skip the 

proof of why the matrix elements are all positive -this follows from the equation π = πP and that all entries in 

P are non-zero. As such, the entries in the measures should be identical to each other. 

⊓⊔ 

Invariant Distribution via Dobrushin’s Ergodic Coefficient 

One interesting result we have observed is the following. For every positive recurrent state i 

lim P n (i, i) = 1 

n→∞ E i [τ i ] . 

Consider the iteration π t+1 = π t P. We would like to know when this iteration converges to a limit. We first 

review some notions from analysis. 

Review of Vector Spaces 

Definition 3.2.1 A linear space is a space which is closed under addition and multiplication by a scalar. 

Definition 3.2.2 A normed linear space X is a linear vector space on which a functional (a mapping from X to 

R, that is a member of R X ) called norm is defined such that: 

• ||x|| ≥ 0 ∀x ∈ X, ||x|| = 0 if and only if x is the null element (under addition and multiplication) of X. 

• ||x + y|| ≤ ||x|| + ||y|| 

• ||αx|| = |α|||x||, ∀α ∈ R, ∀x ∈ X 

Definition 3.2.3 In a normed linear space X, an infinite sequence of elements {x n } converges to an element x 

if the sequence {||x n − x||} converges to zero. 

Definition 3.2.4 A metric defined on a set X, is a function d : X × X → R such that: 

• d(x, y) ≥ 0, ∀x, y ∈ X and d(x, y) = 0 if and only if x = y. 

• d(x, y) = d(y, x), ∀x, y ∈ X. 

• d(x, y) ≤ d(x, z) + d(z, y), ∀x, y, z ∈ X. 

Definition 3.2.5 A metric space (X, d) is a set equipped with a metric d. 

A normed linear space is also a metric space, with metric 

d(x, y) = ||x − y||.

3.2. STABILITY AND INVARIANT DISTRIBUTIONS 21 

Definition 3.2.6 A sequence {x n } in a normed space X is Cauchy if for every ǫ, there exists an N such that 

||x n − x m || ≤ ǫ, for all n, m ≥ N. 

The important observation on Cauchy sequences is that, every converging sequence is Cauchy, however, not 

all Cauchy sequences are convergent: This is because the limit might not live in the original space where the 

sequence lives in. This brings the issue of completeness: 

Definition 3.2.7 A normed linear space X is complete, if every Cauchy sequence in X has a limit in X. A 

complete normed linear space is called Banach. 

A map T from one complete normed linear space X to itself is called a contraction if for some 0 ≤ ρ < 1 

||T(x) − T(y)|| ≤ ρ||x − y||, ∀x, y ∈ X. 

Theorem 3.2.3 A contraction map in a Banach space has a unique fixed point. 

Proof: {T n (x)} forms a Cauchy sequence, and by completeness, the Cauchy sequence has a limit. 

⊓⊔ 

Contraction Mapping via Dobrushin’s Ergodic Coefficient 

Consider a countable state Markov Chain. Define 

∑ 

δ(P) = min min(P(i, j), P(k, j)) 

i,k 

j 

Observe that for two scalars a, b 

|a − b| = a + b − 2 min(a, b). 

Let us define for a vector v the l 1 norm: 

||v|| 1 = 

N∑ 

|v i |. 

It is known that the set of all countable index real-valued vectors (that is functions which map Z → R) with a 

finite l 1 norm 

{v : ||v|| 1 < ∞} 

is a complete normed linear space, and as such, is a Banach space (every Cauchy sequence has a limit). 

With these observations, we state the following: 

i=1 

Theorem 3.2.4 (Dobrushin) For any probability measures π, π ′ , it follows that 

||πP − π ′ P || 1 ≤ (1 − δ(P))||π − π ′ || 1 

Proof: Let ψ(i) = π(i) − min(π(i), π ′ (i)) for all i. Further, let ψ ′ (i) = π ′ (i) − min(π(i), π ′ (i)). Then, 

||π − π ′ || 1 = ||ψ − ψ ′ || 1 = 2||ψ|| 1 = 2||ψ ′ || 1 , 

since the sum 

is zero, and 

∑ 

π(i) − π ′ (i) = ∑ i 

i 

ψ(i) − ψ ′ (i) 

∑ 

|π(i) − π ′ (i)| = ||ψ|| 1 + ||ψ ′ || 1 . 

i


Now, 

||πP − π ′ P || = ||ψP − ψ ′ P || 

= ∑ | ∑ ψ(i)P(i, j) − ∑ 

j i 

k 

ψ ′ (k)P(k, j)| 

= 1 ∑ 

||ψ ′ | ∑ ∑ 

ψ(i)ψ ′ (k)P(i, j) − ψ(i)ψ ′ (k)P(k, j)| (3.5) 

|| 1 

j k i 

≤ 1 ∑∑ 

∑ 

||ψ ′ ψ(i)ψ ′ (k)|P(i, j) − P(k, j)| (3.6) 

|| 1 

j 

k 

i 

= 1 ∑∑ 

||ψ ′ ψ(i)ψ ′ (k) ∑ |P(i, j) − P(k, j)| 

|| 1 

k i 

j 

= 1 ∑∑ 

||ψ ′ |ψ(i)||ψ ′ (k)|{ ∑ P(i, j) + P(k, j) − 2 min(P(i, j), P(k, j))} 

|| 1 

k i 

j 

(3.7) 

≤ 1 ∑∑ 

||ψ ′ |ψ(i)||ψ ′ (k)|(2 − 2δ(P)) 

|| 1 

(3.8) 

k 

i 

= ||ψ ′ || 1 (2 − 2δ(P)) (3.9) 

= ||π − π ′ || 1 (1 − δ(P))} (3.10) 

In the above, (3.5) follows from adding terms in the summation, (3.6) from taking the norm inside, (3.7) follows 

from the relation ||a − b|| = a + b − 2 min(a, b), (3.8) from the definition of δ(P) and finally (3.9) follows from 

the l 1 norms of ψ, ψ ′ . 

As such, the process P is a contraction mapping if δ(P) > 0. In essence, one proves that such a sequence is 

Cauchy, and as every Cauchy sequence in a Banach space has a limit, this process also has a limit. In our setting 

{π 0 P m } is a Cauchy sequence. The limit is the invariant distribution. ⊓⊔ 

Dobrushin’s ergodic theorem provides a guarantee that the limit is unique. 

⊓⊔ 

It should also be noted that Dobrushin’s theorem tells us how fast the sequence of probability distributions 

{π 0 P n } converges to the invariant distribution for any arbitrary π 0 . 

Ergodic Theorem for Countable State Space Chains 

For a Markov chain which has a unique invariant distribution µ(i), we have that almost surely 

1 

lim 

T →∞ T 

T∑ 

t=1 

f(x t ) = ∑ i 

f(i)µ(i) 

∀ f : X → R. 

This is called the ergodic theorem due to Birkhoff and is a very powerful result. This is a very important theorem, 

because, in essence, this property is what makes the connection with stochastic control, and Markov chains in a 

long time horizon. In particular, for a stationary control policy leading to a unique invariant distribution with 

bounded costs, it follows that, almost surely, 

1 

lim 

T →∞ T 

T∑ 

c(x t , u t ) = ∑ c(x, u)µ(x, u), 

x,u 

t=1 

∀ real-valued bounded c. The ergodic theorem is what makes a dynamic optimization problem equivalent to a 

static optimization problem under mild technical conditions. This will set the core ideas in the convex analytic 

approach and the linear programming approach that will be discussed later.

3.3. UNCOUNTABLE (COMPLETE, SEPARABLE, METRIC) STATE SPACES 23 

3.3 Uncountable (Complete, Separable, Metric) State Spaces 

We now briefly extend the above definitions and discussions to an uncountable state space setting. 

Let {x t , t ∈ Z + } be a Markov chain with a complete, separable, metric state space (X, B(X), and defined on 

a probability space (Ω, F, P), where B(X) denotes the Borel σ−field on X, Ω is the sample space, F a sigma 

field of subsets of Ω, and P a probability measure. Let P(x, D) := P(x t+1 ∈ D|x t = x) denote the transition 

probability from x to D, that is the probability of the event {x t+1 ∈ D} given that x t = x. 

3.3.1 Chapman-Kolmogorov Equations 

Consider a Markov chain with transition probability given by P(x, D), that is 

P(x t+1 ∈ D|x t = x) = P(x, D) 

We could compute 

P(x t+k ∈ D|x t = x) 

inductively as follows: 

∫ 

P(x t+k ∈ D|x t = x) = 

∫ ∫ 

. . . 

P(x t , dx t+1 )...P(x t+k−2 , dx t+k−1 )P(x t+k−1 , D) 

As such, we have for all n ≥ 1 

∫ 

P n (x, A) = P(x t+n ∈ A|x t = x) = 

X 

P n−1 (x, dy)P(y, A). 

The justification for the above construction will be discussed further with regard to conditional expectations in 

the next chapter. 

Definition 3.3.1 A Markov chain is µ-irreducible, if for any set B ∈ B(X), such that µ(B) > 0, and ∀x ∈ X, 

there exists some integer n > 0, possibly depending on B and x, such that P n (x, B) > 0, where P n (x, B) is the 

transition probability in n stages, that is P(x t+n ∈ B|x t = x). 

One popular case is with the Lebesgue measure: 

Definition 3.3.2 A Markov chain is Lebesgue-irreducible, if for any non-empty open set B ⊂ B(R), ∀x ∈ R, 

there exists some integer n > 0, possibly depending on B and x, such that P n (x, B) > 0, where P n (x, B) is the 

transition probability in n stages, that is P(x t+n ∈ B|x t = x). 

Example: Linear system with a drift. Consider the following linear system: 

x t+1 = ax t + w t , 

This chain is Lebesgue irreducible if w t is a Gaussian variable. 

A ϕ-irreducible Markov chain is aperiodic if for any x ∈ X, and any B ∈ B(X) satisfying ϕ(B) > 0, there exists 

n 0 = n 0 (x, B) such that 

P n (x, B) > 0 for all n ≥ n 0 . 

See Meyn and Tweedie [23] for a detailed discussion on periodicity of Markov chains and its connection with 

small sets, to be discussed shortly. 

The definitions for recurrence and transience follow those in the countable state space setting.


Definition 3.3.3 A Markov chain is called recurrent if it is µ−irreducible and 

whenever µ(A) > 0 and A ∈ B(X). 

∞∑ 

∞∑ 

E x [ 1 xt∈A] = P t (x, A) = ∞, ∀x ∈ X, 

t=1 

t=1 

While studying chains in an infinite state space, we use a stronger recurrence definition: A set A ∈ B(X) is 

Harris recurrent if the Markov chain visits A infinitely often with probability 1, when the process starts in A: 

Definition 3.3.4 A set A ∈ B(X) is Harris recurrent if 

P x (η A = ∞) = 1, A ∈ B(X), ∀x ∈ A, (3.11) 

Theorem 3.3.1 Harris recurrence of a set A is equivalent to 

P x [τ A < ∞] = 1, ∀x ∈ A. 

Proof: We now prove this argument ([Meyn-Tweedie, Chapter 9]): let τ A (1) be the first time the state hits 

A. By the Strong Markov Property, the Markov chain sampled at successive intervals τ A (1), τ A (2) and so on is 

also a Markov chain. Let Q be the transition kernel for this sampled Markov Chain. Now, the probability of 

τ A (2) < ∞ can be computed recursively as 

P(τ A (2) < ∞) 

∫ 

= Q (x xτA (1) τ A(1), dy)P y (τ A (1) < ∞) 

A 

( ∫ 

) 

= Q (x xτA (1) τ A(1), dy) P y (τ A (1) < ∞) 

A 

= 1 (3.12) 

By induction, for every n ∈ Z + 

P(τ A (n + 1) < ∞) 

∫ 

= Q (x xτA (1) τ A(1), dy)P y (τ A (n) < ∞) 

Now, 

A 

= 1 (3.13) 

P x (η A ≥ k) = P x (τ A (k) < ∞), 

since k times visiting a set requires k times returning to a set, when the initial state x is in the set. As such, 

P x (η A ≥ k) = 1, ∀k ∈ Z + 

is identically equal to 1. Define B k = {ω ∈ Ω : η(ω) ≥ k}, and it follows that B k+1 ⊂ B k . By the continuity of 

probability P(∩B k ) = lim k→∞ P(B k ), it follows that P x (η A = ∞) = 1. 

The proof of the other direction for showing equivalence is left as an exercise to the reader. 

⊓⊔ 

Definition 3.3.5 A Markov Chain is Harris recurrent if the chain is µ−irreducible and every set A ⊂ X is 

Harris recurrent whenever µ(A) > 0. If the chain admits an invariant probability measure, then the chain is 

called positive Harris recurrent. 

Remark: Harris recurrence is stronger than recurrence. In one, an expectation is considered, in the other, a 

probability is considered. Consider the following example: Let P(1, 1) = 1 and P(x, x + 1) = 1 − 1/x 2 and 

P(x, 1) = 1/x 2 . P x (τ 1 = ∞) = ∏ t≥x (1 − 1/t2 ) > 0. This chain is not Harris recurrent. It is π−irreducible for 

the measure π(A) = 1 if 1 ∈ A. Hence, this chain is recurrent but not Harris recurrent. 

⊓⊔

3.3. UNCOUNTABLE (COMPLETE, SEPARABLE, METRIC) STATE SPACES 25 

If a set is not recurrent, it is transient. A set A is transient if 

U(x, A) = E x [η A ] < ∞, ∀x ∈ A, (3.14) 

Note that, this is equivalent to 

∞∑ 

P i (x, A) < ∞, 

i=1 

3.3.2 Invariant Distributions for Uncountable Spaces 

Definition 3.3.6 For a Markov chain with transition probability defined as before, a probability measure π is 

invariant on the Borel space (X, B(X)) if 

∫ 

π(D) = P(x, D)π(dx), ∀D ∈ B(X). 

X 

A Harris recurrent chain always admits an invariant measure. But this measure might not be a probability 

measure. 

Uncountable chains act like countable ones when there is a single atom α ∈ X. An atom is one which has a 

positive probability (hence there is a probability mass). 

Definition 3.3.7 A set α is called an atom if there exists a probability measure ν such that 

P(x, A) = ν(A), 

∀x ∈ α, ∀A ∈ B(X). 

If the chain is µ−irreducible and µ(α) > 0, then α is called an accessible atom. 

In case there is an accessible atom α, we have the following: 

Theorem 3.3.2 For an µ−irreducible Markov chain for which E α [τ α ] < ∞; the following is the invariant 

probability measure: 

∑ τα−1 

k=0 

π(A) = E α [ 

1 x k ∈A 

|x 0 = α], ∀A ∈ B(X), µ(A) > 0 

E[τ α ] 

In case an atom is not present, one needs further settings: 

Definition 3.3.8 (Tweedie) A set A ⊂ X is n-small on (X, B(X)) if for some positive measure µ 

P n (x, B) ≥ µ(B), 

∀x ∈ A, and B ∈ B(X), 

where B(X) denotes the (Borel) sigma-field on X. 

Definition 3.3.9 (Meyn-Tweedie) A set A ⊂ X is µ − petite on (X, B(X)) if for some distribution T on N 

(set of natural numbers), and some measure µ, 

∞∑ 

P n (x, B)T (n) ≥ µ(B), 

n=0 

where B(X) denotes the (Borel) sigma-field on X. 

∀x ∈ A, and B ∈ B(X), 

For an aperiodic Markov chain which is irreducible, every petite set is also small. 

Remark 3.3.1 Small sets are very important in generalizing the results for countable state space Markov chains 

to more general spaces. Meyn and Tweedie present a discussion on periodicity as the greatest common divisor 

of the set of possible return times from a given small set; such an approach lets one generalize the definitions of 

aperiodicity and periodicity.


Theorem 3.3.3 (Theorem 5.5.3 of [25], lemma 17 [28] ) For an aperiodic and irreducible Markov chain 

{x t } every petite set is small. 

A maximal irreducibility measure ψ is an irreducibility measure such that for all other irreducibility measures 

φ, we have ψ(B) = 0 ⇒ φ(B) = 0 for any B ∈ B(X). Define B + (X) = {A ∈ B(X) : ψ(A) > 0} where ψ is a 

maximal irreducibility measure. 

Theorem 3.3.4 For an aperiodic and irreducible Markov chain every petite set is petite with a maximal irreducibility 

measure for a distribution with finite mean. 

Nummelin’s Splitting Technique 

The results on recurrence apply to uncountable chains with no atom provided there is a small set or a petite set. 

Suppose a set A is 1-small. Define a process z t = (x t , a t ), z t ∈ X × {0, 1}. That is we enlarge the probability 

space. Suppose that when x t /∈ A, (a t , x t ) evolve independently from each other. However, when x t ∈ A, we 

pick a Bernoulli random variable, and with probability δ the state visits A × {1} and with probability 1 −δ visits 

A × {0}. From A × {1}, the transition for the next time stage is given by ν(dxt) and from A × {0}, it visits the 

future time stage with probability 

P(dx t+1 |x t ) − ν(dx t+1 ) 

. 

1 − δ 

Now, pick δ = ν(X). In this case, A×{1} is an accessible atom, and one can verify that the marginal distribution 

of the original Markov process {x t } has not been altered. 

The following can be established using the above construction. 

δ 

Proposition 3.3.1 If 

then, 

sup E[min(t > 0 : x t ∈ A)|x 0 = x] < ∞ 

x∈A 

sup E[min(t > 0 : z t ∈ (A × {1}))|z 0 = z] < ∞. 

z∈(A×{1}) 

3.3.3 Existence of an Invariant Distribution 

We state the final Theorem on the invariant distributions for Markov chains: 

Theorem 3.3.5 (Meyn-Tweedie) Consider a Markov process {x t } taking values in X. If there is a Harris 

recurrent set A which is also a µ− petite set for some positive measure µ, and if the set satisfies 

sup E[min(t > 0 : x t ∈ A)|x 0 = x] < ∞, 

x∈A 

then the Markov chain is positive Harris recurrent and it admits a unique invariant distribution. 

In this case, the invariant measure satisfies the following, which is a generalization of Kac’s Lemma [14]: 

Theorem 3.3.6 For a µ-irreducible Markov chain with a unique invariant probability measure; the following is 

the invariant probability measure: 

∫ 

π(A) = 

C 

τ∑ 

C−1 

π(dx)E x [ 1 {xk ∈A}], ∀A ∈ B(X), µ(A) > 0, π(C) > 0 

k=0

3.4. FURTHER RESULTS ON THE EXISTENCE OF AN INVARIANT PROBABILITY MEASURE 27 

The above can also be extended to compute that expected values of a function of the Markov states. 

The above can be verified along the same lines used for the countable state space case (see Theorem 3.2.1). 

Example: Linear system with a drift. Consider the following linear system: 

x t+1 = ax t + w t , 

is positive Harris recurrent if |a| < 1, is null-recurrent if |a| = 0 and is transient if |a| > 1. That is, when 

recurrent, every open set will be visited with probability 1 infinitely often. 

3.3.4 Dobrushin’s Ergodic Coefficient for Uncountable State Space Chains 

We can extend Dobrushin’s contraction result for the uncountable state space case. By the property that 

|a − b| = a + b − 2 min(a, b), the Dobrushin’s coefficient is also related to the term (for the countable state space 

case): 

δ(P) = 1 − 1 2 max 

i,k 

∑ 

|(P(i, j) − P(k, j)| 

j 

In case P(x, dy) is the stochastic transition kernel of a real-valued Markov process with transition kernels admitting 

densities (that is P(x, A) = ∫ A 

P(x, y)dy admits a density), the expression 

δ(P) = 1 − 1 ∫ 

2 sup |P(x, y) − P(z, y)|dy 

x,z 

is the Dobrushin’s ergodic coefficient for R−valued Markov processes. 

As such, if δ(P) > 0, then the iterations π t (.) = ∫ π t−1 (z)P(z, .)dz converge to a unique fixed point. 

R 

3.3.5 Ergodic Theorem for Uncountable State Space Chains 

Likewise, for the uncountable setting, if µ() is the invariant probability distribution for a positive Harris recurrent 

Markov chain, it follows that: 

almost surely. 

1 

lim 

T →∞ T 

T∑ 

∫ 

∫ 

c(x t ) = c(x)µ(dx), ∀c ∈ L 1 (µ) := {f : f(x)µ(dx) < ∞}, 

t=1 

We will prove this theorem after we discuss Martingales. 

3.4 Further Results on the Existence of an Invariant Probability 

Measure 

3.4.1 Case with the Feller condition 

A Markov chain is weak Feller if ∫ X 

P(dz|x)v(z) is continuous in x for every continuous v on X. 

Theorem 3.4.1 Let {x t } be a weak Feller Markov process living in a compact subset of a complete, seperable 

metric space. Then {x t } admits an invariant distribution. 

Proof. Proof follows the observation that the space of probability measures on such a compact set is tight. 

Consider a sequence µ T = 1 ∑ T −1 

T t=0 P t . There exists a subsequence µ Tk which converges weakly to some µ ∗ . It 

follows that for every continuous and bounded function 

∫ 

〈µ T , f〉 := µ T (dx)f(x) → 〈µ ∗ , f〉


Likewise, 

∫ 

〈µ T , Pf〉 := 

∫ 

µ T (dx)( 

P(dy|x)f(y)) → 〈µ ∗ , Pf〉. 

Now, 

(µ T − µ T P)(f) = 1 ( T∑ −1 

T E µ 0 

Px k f − 

k=0 

T∑ 

−1 

k=0 

) 

Px 

k+1 f 

= 1 ) 

T E µ 0 

(f(x 0 ) − f(x T ) → 0. (3.15) 

Thus, 

(µ T − µ T P)(f) = 〈µ T , f〉 − 〈µ T P, f〉 = 〈µ T , f〉 − 〈µ T , Pf〉 → 〈µ ∗ − µ ∗ P, f〉 = 0. 

Thus, µ ∗ is an invariant probability measure. 

⊓⊔ 

Lasserre [19] gives the following example to emphasize the importance of the Feller property: Consider a Markov 

chain evolving in [0, 1] given by: P(x, x/2) = 1 for all x ≠ 0 and P(0, 1) = 1. This chain does not admit an 

invariant measure. 

Remark 3.4.1 Lasserre presents a class of chains, Quasi-Feller chains, which are weak Feller except on a closed 

set of points N such that P(x, N) = 0 for all x ∈ D = X \ N, P(x, N) < 1 for x ∈ N, and for all x ∈ D, the 

chain is weak Feller. 

3.4.2 Case without the Feller condition 

One can also relax the weak Feller condition. 

Theorem 3.4.2 [20] For a Markov chain, under a uniform countable additivity condition as in (4.7) in the next 

chapter, there exists an invariant probability measure if and only if there exists a non-negative function g with 

lim 

n→∞ 

inf g(x) = ∞ 

x/∈K n 

and an initial probability measure ν 0 such that sup t E ν0 [g(x t )] < ∞. 

The proof of this result follows from a similar observation as (3.15) but with weak convergence replaced by 

setwise convergence: 

Lemma 3.4.3 [7, Thm. 4.7.25] Let µ be a finite measure on a measurable space (T, A). Assume a set of 

probability measures Ψ ⊂ P(T) satisfies 

Then Ψ is setwise precompact. 

P ≤ µ, for all P ∈ Ψ. 

A sequence of occupation measures under (4.7) has a subsequence which converges setwise and the limit of this 

subsequence is invariant. As an example, consider a system of the form: 

x t+1 = f(x t ) + w t (3.16) 

where w t admits a distribution with a bounded density function, which is positive everywhere. If the system has 

a finite second moment, then, the system admits an invariant probability measure which is unique.


3.5 Exercises 

Exercise 3.5.1 Prove that if {x t } is irreducible, then all states have the same period. 

Exercise 3.5.2 Prove that 

and for n ≥ 1, 

P x (τ A = 1) = P(x, A), 

P x (τ A = n) = ∑ i/∈A 

P(x, i)P i (τ A = n − 1) 

Exercise 3.5.3 Show that irreducibility of a Markov chain in a finite state space implies that every set A and 

every x satisfies U(x, A) = ∞. 

Exercise 3.5.4 Show that for an irreducible Markov chain, either the entire chain is transient, or recurrent. 

Exercise 3.5.5 If P x (τ A < ∞) < 1, show that A is transient. 

Exercise 3.5.6 If P x (τ A < ∞) = 1, show that A is Harris recurrent. 

Exercise 3.5.7 Prove that a discrete-time random walk on Z k defined by the transition kernel P(i, j) = 1 

2k 1 {|i−j|=1} 

is recurrent for k = 1 and k = 2.

30 CHAPTER 3. CLASSIFICATION OF MARKOV CHAINS

Chapter 4 

Martingales and Foster-Lyapunov 

Criteria for Stabilization of Markov 

Chains 

4.1 Martingales 

In this chapter, we first discuss some martingale theorems. Only a few of these will be important, some others 

are presented for the sake of completeness, but we will not be concerned with those within the scope of our 

course. 

These are very important for us to understand stabilization of controlled stochastic systems. These also will 

pave the way to optimization of dynamical systems. The second half of this chapter is on the stabilization of 

Markov Chains. 

4.1.1 More on Expectations and Conditional Probability 

Let (Ω, F, P) be a probability space and let G be a subset of F. Let X be a R−valued random variable measurable 

with respect to (Ω, F) with a finite absolute expectation that is 

∫ 

E[|X|] = |X(ω)|P(dω) < ∞, 

where ω ∈ Ω. We call such random variables integrable. 

Ω 

We say that Ξ is the conditional expectation random variable (and is also called a version of the conditional 

expectation) E[X|G], of X given G if 

i) Ξ is G-measurable. ii) For every A ∈ G, 

where 

E[1 A Ξ] = E[1 A X], 

∫ 

E[1 A Ξ] = 

Ω 

X(ω)1 ω∈A P(dω) 

For example, if the information that we know about a process is whether an event A ∈ F happened or not, then: 

X A := E[X|A] = 1 ∫ 

P(dω)X(ω). 

P(A) 

If the information we have is that A did not take place: 

X A C := E[X|A C ] = 

A 

1 

P(Ω \ A) 

31 

∫ 

Ω\A 

P(dω)X(ω).

32CHAPTER 4. MARTINGALES AND FOSTER-LYAPUNOV CRITERIA FOR STABILIZATION OF MARKOV CHAINS 

Thus, the conditional expectation given by the sigma-field generated by A (which is F A = {∅, Ω, A, Ω \ A} is 

given by: 

E[X|F A ] = X A 1 ω∈A + X A C1 ω/∈A . 

We note that conditional probability can be written as: 

P(A|G) = E[1 {x∈A} |G], 

hence, conditional probability is a special case of conditional expectation. 

Let {B i } be a finite (or countable) partition of Ω which generate the σ−field G. It must be that E[X|ω 1 ∈ B 1 ] = 

E[X|ω 2 ∈ B 1 ] since otherwise E[X|G] would not be measurable on G. As such for all B ∈ G, we have that 

∫ 

B 

∫ 

X(ω)P(dω) = 

B 

∫ 

E[X|B]P(dω) = E[X|B] P(dω) = P(B)E[X|B] 

B 

The notion of conditional expectation is key for the development of stochastic processes which evolve according 

to a transition kernel. This is useful for optimal decision making when a partial information is available with 

regard to a random variable. 

The following discussion is optional until the next subsection. 

Theorem 4.1.1 (Radon-Nikodym) Let µ and ν be two σ−finite positive measures on (Ω, F) such that ν(A) = 

0 implies that µ(A) = 0 (that is µ is absolutely continuous with respect to ν). Then, there exists a measurable 

function f : Ω → R such that for every A: 

∫ 

µ(A) = f(ω)ν(dω) 

A 

The representation above is unique, up to points of measure zero. With the above discussion, the conditional 

expectation X = E[X|F ′ ] exists for any sub-σ-field, subset of F. 

It is a useful exercise to now consider the σ-field generated by an observation variable, and what a conditional 

expectation means in this case. 

Theorem 4.1.2 Let X be a X valued random variable, where X is a complete, separable, metric space and Y be 

another Y−valued random variable, Then, a random variable X is F Y (the σ−field generated by Y ) measurable 

if and only if there exists a measurable function f : Y → X such that X = f(Y (ω)). 

Proof: We prove the more difficult direction. Suppose X lived in a countable space {a 1 , a 2 , . . . , a n , . . . }. We 

can write A n = X −1 (a n ). Since X is measurable on F Y , A n ∈ F Y . Let us now make these sets disjoint: Define 

C 1 = A 1 and C n = A n \ ∪ n−1 

i=1 A i. These sets are disjoint, but they all live in F Y . Thus, we can construct a 

measurable function defined as f(ω) = a n if ω ∈ C n : 

The issue is now what happens when X lives in an uncountable, complete, separable, metric space such as R. 

We can define countable sets which cover the entire state space (such as intervals with rational endpoints to 

cover R). Let us construct a sequence of partitions Γ n = {B i,n , ∪ i B i,n = X} which gets finer and finer (such 

as for R: [⌊X(ω)⌋, ⌈X(ω)n⌉)/n). We can define a sequence of measurable functions f n for every such partition. 

Furthermore, f n (ω) converges to a function f(ω) pointwise, the limit is measurable. 

⋄ 

With the above, the expectation E[X|Y = y 0 ] can be defined, this expectation is a measurable function of Y . 

4.1.2 Some Properties of Conditional Expectation: 

One very important property is given by the following. 

Iterated expectations:

4.1. MARTINGALES 33 

Theorem 4.1.3 If H ⊂ G ⊂ F, and X is F−measurable, then it follows that: 

E[E[X|G]|H] = E[X|H] 

Proof: Proof follows by taking a set A ∈ H, which is also in G and F. Let η be the conditional expectation 

variable with respect to H. Then it follows that 

E[1 A η] = E[1 A X] 

Now let E[X|G] be η ′ . Then, it must be that E[1 A η ′ ] = E[1 A X] for all A ∈ G and hence for all A ∈ H. Thus, 

the two expectations are the same. 

⋄ 

4.1.3 Discrete-Time Martingales 

Let (Ω, F, P) be a probability space. An increasing family {F n } of sub-σ−fields of F is called a filtration. 

A sequence of random variables on (Ω, F, P) is said to be adapted to F n if X n is F n -measurable, that is 

X −1 

n (D) = {w ∈ Ω : X n(w) ∈ D} ∈ F n for all Borel D. This holds for example if F n = σ(X m , m ≤ n), n ≥ 0. 

Given a filtration F n and a sequence of real random variables adapted to it, (X n , F n ) is said to be a martingale 

if 

E[|X n |] < ∞ 


E[X n+1 |F n ] = X n . 

We will occasionally take the sigma-fields to be F n = σ(X 1 , X 2 , . . . , X n ). For example M t = ∑ t 

k=0 ǫ k, where ǫ k 

is a zero-mean i.i.d. random sequence, is a Martingale for every t ∈ N. 

A fair game is a Martingale when it is played up to a finite time. 

Let n > m ∈ Z + . Then, since F m ⊂ F n , it must be that A ∈ F m should also be in F n . Thus, if X n is a 

Martingale sequence, 

Thus, E[X n |F m ] = X m . 

If we have that 

then {X n } is called a submartingale. 

And, if 

then {X n } is called a supermartingale. 

E[1 A X n ] = E[1 A X n−1 ] = · · · = E[1 A X m ]. 

E[X n |F m ] ≥ X m 

E[X n |F m ] ≤ X m 

A useful concept related to filtrations is that of a stopping time, which we discussed while studying Markov 

chains. A stopping time is a random time, whose occurrence is measurable with respect to the filtration in the 

sense that for each n ∈ N, {T ≤ n} ∈ F n . 

4.1.4 Doob’s Optional Sampling Theorem 

Theorem 4.1.4 Suppose (X n , F n ) is a martingale sequence, and ρ, τ < n are bounded stopping times with ρ ≤ τ. 

Then, 

E[X τ |F ρ ] = X ρ


Proof: We observe that 

τ∑ 

−1 

E[X τ − X ρ |F ρ ] = E[ X k+1 − X k |F ρ ] 

k=ρ 

τ∑ 

−1 

τ∑ 

−1 

= E[ E[X k+1 − X k |F k ]|F ρ ] = E[ 0|F ρ ] = 0 (4.1) 

k=ρ 

k=ρ 

⋄ 

Theorem 4.1.5 Let {X n } be a sequence of F n -adapted integrable real random variables. Then, the following 

are equivalent: (i) (X n , F n ) is a sub-martingale. (ii) If T, S are bounded stopping times with T ≥ S (almost 

surely), then E[X S ] ≤ E[X T ] 

Proof: Let S ≤ T ≤ n Now, note that, 

E[X T − X S |F S ] = 

n∑ 

E[1 (T ≥k) 1 (S

4.1. MARTINGALES 35 

Theorem 4.1.6 Let X T be a supermartingale sequence. Then, 

E[ζ N (a, b)] ≤ E[|X N |] + |a| 

. 

(b − a) 


There are three possibilities that might take place: The process can end below a, between a and b and above 

b. If it crosses above b, then we have completed an upcrossing. In view of this, we may proceed as follows: Let 

β N := max(m : T 2m−1 ≤ N), that is T 2m = N or T 2m−1 = N. Here, β N is measurable on σ(X 1 , X 2 , . . .,X N ), so 

it is a stopping time. 

Then 

Thus, 

0 ≥ E[ 

β N 

∑ 

i=1 

X T2i − X T2i−1 ] 

( 

∑ βN 

= E[ X T2i − X T2i−1 

)1 {T2βN =N}] 

i=1 

( 

∑ βN 

+E[ X T2i − X T2i−1 

)1 {T2βN −1=N}] 

i=1 

β∑ 

N −1 

= E[( (X T2i − X T2i−1 ) + X N − X )1 T2βN −1 {T 2βN =N}] 

i=1 

( βN −1 

+E[ 

∑ 

) 

(X T2i − X T2i−1 ) 

i=1 

1 {T2βN −1=N}] 

( βN 

∑−1 

) 

= E X T2i − X T2i−1 + E[(X N − X )1 T2βN −1 {T 2βN =N}] (4.3) 

i=1 

( βN 

∑−1 

) 

E X T2i − X T2i−1 ≤ −E[(X N − X )1 T2βN +1 {T 2βN =N}] ≤ E[(a − X N )] (4.4) 

i=1 

( ) 

∑βN −1 

Since, 0 ≥ E 

i=1 

X T2i − X T2i−1 ≥ E[β N − 1](b − a), it follows that: 

satisfies: 

and the result follows. 

ζ N (a, b) = (β N − 1), 

ζ N (b − a) ≤ E[(a − X N )] ≤ |a| + E[|X N |], 

Recall that a sequence of random variables defined on a probability space (Ω, F, P) converges to X almost surely 

(a. s.) if 

P(w : lim 

n→∞ x n(w) = x(w)) = 1. 

Theorem 4.1.7 Suppose X n is a supermartingale and sup n≥0 E[|X n |] < ∞. Then lim n→∞ X n = X exists 

(almost surely) and E[|X|] < ∞. Note that, the same result applies for submartingales, by considering −X n 

regarded as a supermartingale and the condition sup n≥0 E[max(0, X n )] < ∞. 

Proof: The proof follows from Doob’s upcrossing lemma. Suppose X n does not converge to a variable almost 

surely. This means that limsupX n ≠ liminf X n . In particular, there exist reals a, b such that limsupX n ≥ b 

and liminf X n ≤ a and b > a. Hence, by the upcrossing lemma, we have that 

E[ζ N (a, b)] ≤ E[|X N |] + |a| 

(b − a) 

< ∞. 

⋄


This is true for every N. Since ζ N (a, b) is a monotonically increasing sequence in N, by the monotone convergence 

theorem it follows that 

lim E[ζ N(a, b)] = E[ lim β N(a, b)] < ∞. 

N→∞ N→∞ 

Hence, this shows that X n must converge as the number of upcrossings is finite (it cannot be infinity with a 

non-zero probability). Hence X n converges almost surely to a number, as otherwise, there would be infinitely 

many crossings. The expected number of crossings being bounded implies that the probability of the number of 

crossings being finite is one. 

⋄ 

Theorem 4.1.8 (Submartingale Convergence Theorem) Suppose X n is a submartingale and sup n≥0 E[|X n |] < 

∞. Then X := lim n→∞ X n exists (almost surely) and E[|X|] < ∞. 

Proof: Note that, sup n≥0 E[|X n |] < ∞, is a sufficient condition both for a submartingale and a supermartingale 

in Theorem 4.1.7. Hence X n → X almost surely. For finiteness, suppose E[|X|] = ∞. Then, liminf |X n | = 

lim |X n | = ∞ a.s.. By Fatou’s lemma, 

limsup E[|X n |] ≥ E[liminf |X n |] = ∞. 

But this is a contradiction as we had assumed that sup n E[|X n |] < ∞. 

⋄ 

4.1.6 Proof of the Birkhoff Individual Ergodic Theorem 

This will be discussed in class. 

4.1.7 This section is optional: Further Martingale Theorems 

This section is optional. If you wish not to read it, please proceed to the discussion on stabilization of Markov 

Chains. 

Theorem 4.1.9 Let X n be a Martingale such that X n converges to X in L 1 that is E[|X n − X|] → 0. Then, 

X n = E[X|F n ], 

n ∈ N 

We will use the following while studying the convex analytic method. Let us define uniform integrability: 

Definition 4.1.1 : A sequence of random variables {X n } is uniformly integrable if 

∫ 

lim sup |X n |P(dX n ) = 0 

K→∞ n 

|X n|≥K 

This implies that 

supE[|X n |] < ∞ 

n 

Let for some ǫ > 0, 

sup E[|X| 1+ǫ 

n ] < ∞. 

n 

This implies that the sequence is uniformly integrable as 

sup 

n 

∫ 

|X n|≥K 

|X n |P(dX n ) ≤ sup 

n 

The following is a very important result: 

∫ 

|X n|≥K 

( |X n| 

1 

K )ǫ |X n |P(dX n ) ≤ sup 

n K E[|X n| 1+ǫ ] → K→∞ 0. 

Theorem 4.1.10 If X n is a uniformly integrable Martingale, then X = lim n→∞ X n exists almost surely (for all 

sequences with probability 1) and in L 1 , and X n = E[X|F n ].

4.2. STABILITY OF MARKOV CHAINS: FOSTER-LYAPUNOV TECHNIQUES 37 

Optional Sampling Theorem For Uniformly Integrable Martingales 

Theorem 4.1.11 Suppose (X n , F n ) be a uniformly integrable martingale sequence, and ρ, τ are stopping times 

with ρ ≤ τ. Then, 

E[X τ |F ρ ] = X ρ 

Proof: By Uniform Integrability, it follows that {X t } has a limit. Let this limit be X ∞ . It follows that 

E[X ∞ |F τ ] = X τ and 

E[E[X ∞ |F τ ]|F ρ ] = X ρ 

which is also equal to E[X τ |F ρ ] = X ρ . 

⋄ 

4.1.8 Azuma-Hoeffding Inequality for Martingales with Bounded Increments 

The following is an important concentration result: 

Theorem 4.1.12 Let X t be a Martingale sequence such that |X t − X t−1 | ≤ c for every t, almost surely. Then 

for any x > 0, 

P( X t − X 0 

t 

≥ x) ≤ 2e − tx2 

2c 

4.2 Stability of Markov Chains: Foster-Lyapunov Techniques 

A Markov chain’s stability can be characterized by drift conditions, as we discuss below in detail. 

4.2.1 Criterion for Positive Harris Recurrence 

Theorem 4.2.1 (Foster-Lyapunov for Positive Recurrence) [23] Let S be a petite set, b ∈ R and V (.) be 

a mapping from X to R + . Let {x n } be an irreducible Markov chain on X. If the following is satisfied for all 

x ∈ X: 

∫ 

E[V (x t+1 )|x t = x] = P(x, dy)V (y) ≤ V (x) − 1 + b1 {x∈S} , (4.5) 

then a unique invariant probability measure exists for the Markov chain. 

X 

Proof: We will provide a proof assuming that S is such that sup x∈S V (x) < ∞ (The proof is applicable without 

such an assumption also, with further details on replacing S with one which satisfies the above assumption. That 

is, for every petite set, we can find one on which V is bounded. See [23] for details on this). Define M 0 := V (x 0 ), 

and for t ≥ 1 

∑t−1 

M t := V (x t ) − (−1 + b1 (xt∈S)) 

It follows that 

i=0 

E[M (t+1) |x s , s ≤ t] ≤ M t , ∀t ≥ 0. 

Furthermore, it can be shown that E[|M t |] ≤ ∞ for all t. Thus, {M t } is a supermartingale. Define a stopping 

time: τ N = min(N, τ), where τ = min{i > 0 : x i ∈ S}. The stopping time is bounded. Hence, we have, by the 

martingale optional sampling theorem: 

E[M τ N] ≤ E[M 0 ]. 

Hence, we obtain 

τ∑ 

N −1 

E x0 [ ] ≤ V (x 0 ) + bE x0 [ 

i=0 

∑ 

τ N −1 

1 (xt∈S)] 

i=0


Thus, E x0 [τ N − 1 + 1] ≤ V (x 0 ) + b, and by the Monotone Convergence Theorem, 

lim 

N→∞ E x 0 

[τ N ] = E x0 [τ] ≤ V (x 0 ) + b 

Now, if we had that 

sup V (x) < ∞, 

x∈S 

the proof would be complete in view of Theorem 3.3.5 and the fact that E x [τ] < ∞ for any x, leading to the 

Harris recurrence of S. 

Typically, this condition is satisfied. However, in case it is not easy to directly verify, we may need additional 

steps to construct a petite set. In the following, we will consider this: Following [23], Chapter 11, define for some 

l ∈ Z + 

V C (l) = {x : V (x) ≤ l} 

We will show that B := V C (l) is itself a petite set, which is recurrent and satisfies the uniform finite-mean-return 

property. Since C is petite for some measure ν, we have that 

where K a (x, B) = ∑ i a(i)P i (x, B) and hence 

K a (x, B) ≥ 1 {x∈C} ν(B), x ∈ X, 

1 {x∈C} ≤ 1 

ν(B) K a(x, B) 

Now, 

τ∑ 

B−1 

E x [τ B ] ≤ V (x) + bE x [ 

1 

= V (x) + b 

ν(B) E x[ 

k=0 

τ∑ 

B−1 

k=0 

1 ∑ 

τ∑ 

B−1 

= V (x) + b a(i)E x [ 

ν(B) 

i k=0 

1 ∑ 

≤ V (x) + b a(i)(1 + i), 

ν(B) 

i 

τ∑ 

B−1 

1 {xk ∈C}] ≤ V (x) + bE x [ 

k=0 

1 

K a (x k , B)] = V (x) + b 

ν(B) E x[ 

1 {xk+i ∈B}] 

1 

ν(B) K a(x k , B)] 

τ∑ 

B−1 

k=0 

∑ 

a(i)P i (x k , B)] 

i 

where (4.6) follows since at most once the process can hit B between 0 and τ B − 1. Now, the petiteness measure 

can be adjusted such that ∑ i a ii < ∞ (by Proposition 5.5.6 of [23]), leading to the result that 

sup 

x∈B 

which can serve as the recurrent set. 

This set is furthermore recurrent, since 

1 ∑ 

E x [τ B ] ≤ V (x) + b a(i)(1 + i) < ∞, 

ν(B) 

sup x∈B E x [τ C ] ≤ sup V (x) + 1 =: K < ∞, 

x∈B 

and as a result P x (τ C > n) ≤ K/n for any n. Finally, since C is petite, so is B. 

There are other versions of Foster-Lyapunov criteria. 

i 

⋄


4.2.2 Criterion for Finite Expectations 

Theorem 4.2.2 [Comparison Theorem] Let V (.) : X → R + , f, g : X → R + such that E[f(x)] < ∞, E[g(x)] < 

∞. Let {x n } be a Markov chain on X. If the following is satisfied: 

∫ 

P(x, dy)V (y) ≤ V (x) − f(x) + g(x), ∀x ∈ X , 

then, for any stopping time τ : 

X 

∑ 

τ −1 

E[ 

t=0 

τ∑ 

−1 

f(x t )] ≤ V (x 0 ) + E[ 

t=0 

g(x t )] 

The proof for the above follows from the construction of a supermartingale sequence and the monotone convergence 

theorem. The above also allows us the computation of useful bounds. For example if g(x) = b1 {x∈A} , then 

one obtains that E[ ∑ τ −1 

t=0 f(x t)] ≤ V (x 0 ) + b. In view of the invariant distribution properties, if f(x) ≥ 1, this 

provides a bound on ∫ π(dx)f(x). 

Theorem 4.2.3 [Foster-Lyapunov Moment Bound] [23] Let S be a petite set, b ∈ R and V (.) : X → R + , 

f(.) : X → [1, ∞). 

Let {x n } be a Markov chain on X. If the following is satisfied: 

∫ 

P(x, dy)V (y) ≤ V (x) − f(x) + b1 x∈S , ∀x ∈ X , 

then, 

X 

T 

1 ∑−1 

∫ 

lim f(x t ) = lim E[f(x t )] = 

T →∞ T 

t→∞ 

almost surely, where µ(.) is the invariant probability measure on X. 

t=0 

µ(dx)f(x) < ∞, 

Recall that, the invariant distribution π, satisfies for any D ∈ B(X) (see Theorem 3.3.6): 

∫ 

π(D) = 

A 

τ∑ 

A−1 

π(dx)E x [ 

t=0 

1 xt∈D] 

Thus, for a bounded measurable function f, by the property that simple functions are dense in the space of 

measurable, bounded functions under the supremum norm, it follows that, 

∫ 

∫ 

π(dx)f(x) = 

A 

τ∑ 

A−1 

π(dx)E x [ f(x t )]. 

Thus, if we can verify that for all x ∈ A, E x [ ∑ τ A−1 

t=0 

f(x t )] is uniformly bounded, we can obtain a bound on the 

expectation of π(f) = ∫ f(x)µ(dx). 

Another approach would be the following. By Theorem 4.2.2, with taking T to be the stopping times, 

limsup 

T 

1 

T E x 0 

[ 

T∑ 

k=0 

f(x k )] ≤ limsup 

T 

It follows that by Fatou’s Lemma and monotone convergence theorem 

limsup 

T 

1 

T E π[ 

T∑ 

∫ 

f(x k )] ≤ 

k=0 

We have the following corollary to Theorem 4.2.3 

t=0 

π(dx)E x [limsup 

T 

1 

T (V (x 0) + bT) = b. 

1 

T 

T∑ 

f(x k )] ≤ b. 

k=0


Corollary 4.2.1 In particular, under the conditions of Theorem 4.2.3, 

∫ 

∫ 

π(dx)f(x) = 

lets one obtain a bound on the expectation of f(x). 

S 

τ∑ 

A−1 

π(dx)E x [ f(x t )] ≤ 

t=0 

∫ 

A 

π(dx)E x [V (x) + b]. 

See chapter 14 of [23] for further details. 

We need to ensure that there exists an invariant measure however. This is why we require that f : X → [1, ∞), 

where 1 can be replaced with any positive number. 

4.2.3 Criterion for Recurrence 

Theorem 4.2.4 (Foster-Lyapunov for Recurrence) Let S be a compact set, b < ∞ and V (.) be an infcompact 

functional on X such that for all α ∈ R + {x : V (x) ≤ α} is compact. Let {x n } be an irreducible Markov 

chain on X. If the following is satisfied: 

∫ 

P(x, dy)V (y) ≤ V (x) + b1 x∈S , ∀x ∈ X , (4.6) 

X 

then, with τ S = min(t > 0 : x t ∈ S), P x (τ S < ∞) = 1 for all x ∈ X, that is the Markov chain is Harris recurrent. 

Proof: Let τ S = min(t > 0 : x t ∈ S). Define two stopping times: τ S and τ BN where B N = {x : V (x) ≥ N}. Note 

that a sequence defined by M t = V (x t ) (which behaves as a supermartingale for t = 0, 1, · · · , min(τ S , τ BN ) − 1) 

is uniformly integrable until τ BN , and a variation of the optional sampling theorem applies for stopping times 

which are not necessarily bounded by a given finite number with probability one. Note that, due to irreducibility, 

min(τ S , τ BN ) < ∞ with probability 1. Now, it follows that for x /∈ S ∪ B N , since when exiting into B N , the 

minimum value of the Lyapunov function is N: 

for some finite positive M. Hence, 

V (x) = E x [V (x min(τS,τ BN ))] ≥ P x (τ BN < τ S )N + P x (τ BN ≥ τ S )M, 

P x (τ BN < τ S ) ≤ V (x)/N 

We also have that P(min(τ S , τ BN ) = ∞) = 0, since the chain is irreducible and it will escape any compact set in 

finite time. As a consequence, we have that 

P x (τ S = ∞) ≤ P(τ BN < τ S ) ≤ V (x)/N 

and taking the limit as N → ∞, P x (τ S = ∞) = 0. 

⊓⊔ 

If S is further petite, then once the petite set is visited, any other set with a positive measure (under the 

irreducibility measure) is visited with probability 1 infinitely often. 

⋄ 

4.2.4 On small and petite sets 

Establishing petiteness may be difficult to directly verify. In the following, we present two conditions that may 

be used to establish the petiteness properties. 

By [23], p. 131: For a Markov chain with transition kernel P and K a probability measure on natural numbers, 

if there exists for every E ∈ B(X), a lower semi-continuous function N(·, E) such that ∑ ∞ 

n=0 P n (x, E)K(n) ≥ 

N(x, E), for a sub-stochastic kernel N(·, ·), the chain is called a T −chain. 

Theorem 4.2.5 [23] For a T −chain which is irreducible, every compact set is petite.


For a countable state space, under irreducibility, every finite set S in (4.5) is petite. 

The assumptions on petite sets and irreducibility can be relaxed for the existence of an invariant probability 

measure. Tweedie [31] considers the following. If S is such that, the following uniform countable additivity 

condition 

lim sup P(x, B n ) = 0, (4.7) 

n→∞ 

x∈S 

is satisfied for B n ↓ ∅, then, (4.5) implies the existence of an invariant probability measure. There exists at 

most finitely many invariant probability measures. By Proposition 5.5.5 (iii) of Meyn-Tweedie [23], that under 

irreducibility, the Harris recurrent component of the space can be expressed as a countable union of petite sets 

C n with ∪ ∞ n=1 C n, with ∪ ∞ m C m → ∅. By Lemma 4 of Tweedie (2001), under uniform countable additivity, any set 

∪ M i=1 C i is uniformly accessible from S. Therefore, if the Markov chain is irreducible, the condition (4.7) implies 

that the set S is petite. This may be easier to verify for a large class of applications, than using the condition 

of establishing a T −chain property. Under further conditions (such as if S is compact and V used in a drift 

criterion has compact level sets), one can take B n to be subsets of a compact set, making the verification even 

simpler. 

4.2.5 Criterion for Transience 

Criteria for transience is somewhat more difficult to establish. One convenient way is to construct a stopping 

time sequence and show that the state does not come back to some set infinitely often. We state the following. 

Theorem 4.2.6 ([23], [16]) Let V : X → R + . If there exists a set A such that E[V (x t+1 )|x t = x] ≤ V (x) for all 

x /∈ A and ∃¯x /∈ A such that V (¯x) < inf z∈A V (z), then {x t } is not recurrent, in the sense that P¯x (τ A < ∞) < 1. 

Proof: Let x = ¯x. Proof now follows from observing that V (x) ≥ ∫ y V (y)P(x, dy) ≥ (inf z∈A V (z))P(x, A) + 

∫ 

y/∈A V (y)P(x, dy) ≥ (inf z∈A V (z))P(x, A). It thus follows that 

P(τ A < 2) = P(x, A) ≤ 

V (x) 

(inf z∈A V (z)) 

Likewise, 

≥ 

∫ 

V (¯x) ≥ V (y)P(¯x, dy) 

y 

∫ 

(inf V (z))P(¯x, A) + ( 

z∈A 

∫y/∈A 

s 

V (s)P(y, ds))P(¯x, dy) 

≥ (inf V (z))P(¯x, A) + P(¯x, dy)((inf 

z∈A 

∫y/∈A 

V (s))P(y, A) + V (s)P(y, ds)) 

s∈A 

∫s/∈A 

≥ (inf V (z))P(¯x, A) + P(¯x, dy)((inf V (s))P(y, A)) 

z∈A 

∫y/∈A s∈A 

( ∫ 

) 

= (inf V (z)) P(¯x, A) + P(¯x, dy)P(y, A) . (4.8) 

z∈A 

y/∈A 

Thus, observing that, P({ω : τ A (ω) < 3}) = ∫ A P(¯x, dy) + ∫ y/∈A 

P(¯x, dy)P(y, A), we observe that: 

P¯x (τ A < 3) ≤ 

V (¯x) 

(inf z∈A V (z)) 

V (¯x) 

(inf z∈A V (z)) . 

Thus, this follows for any n: P¯x (τ A < n) ≤ < 1. Continuity of probability measures (by defining: 

B n = {ω : τ A < n} and observing B n ⊂ B n+1 and that lim n P(τ A < n) = P(∪ n B n ) = P(τ A < ∞) < 1) now 

leads to transience. 

⋄


Observe the difference with the inf-compactness condition leading to recurrence and the above condition, leading 

to non-recurrence. 

We finally note that, a convenient way to verify instability or transience is to construct an appropriate martingale 

sequence. 

4.2.6 State Dependent Drift Criteria: Deterministic and Random-Time 

It is also possible that, in many applications, the controllers act on a system intermittently. In this case, we 

have the following results [33]. These extend the deterministic state-dependent results presented in [23], [24]: 

Let τ z , z ≥ 0 be a sequence of stopping times, measurable on a filtration, possible generated by the state process. 

Theorem 4.2.7 [33] Suppose that x is a ϕ-irreducible Markov chain. Suppose moreover that there are functions 

V : X → (0, ∞), δ: X → [1, ∞), f : X → [1, ∞), a petite set C on which V is bounded, and a constant b ∈ R, 

such that the following hold: 

E[V (x τz+1 ) | F τz ] ≤ V (x τz ) − δ(x τz ) + b1 {xτz ∈C} 

[ τz+1−1 ∑ ] 

E f(x k ) | F τz ≤ δ(x τz ), z ≥ 0. 

k=τ z 

Then X is positive Harris recurrent, and moreover π(f) < ∞, with π being the invariant distribution. 

(4.9) 

⊓⊔ 

By taking f(x) = 1 for all x ∈ X, we obtain the following corollary to Theorem 4.2.7. 

Corollary 4.2.2 [33] Suppose that X is a ϕ-irreducible Markov chain. Suppose moreover that there is a function 

V : X → (0, ∞), a petite set C on which V is bounded,, and a constant b ∈ R, such that the following hold: 

Then X is positive Harris recurrent. 

E[V (x τz+1 ) | F τz ] ≤ V (x τz ) − 1 + b1 {xτz ∈C} 

sup E[τ z+1 − τ z | F τz ] < ∞. 

z≥0 

(4.10) 

⊓⊔ 

More on invariant probability measures 

Without the irreducibility condition, if the chain is weak Feller, if (4.5) holds with S compact, then there exists 

at least one invariant probability measure as discussed in Section 3.4. 

Theorem 4.2.8 [33] Suppose that X is a Feller Markov chain, not necessarily ϕ-irreducible. Then, 

If (4.9) holds with C compact then there exists at least one invariant probability measure. Moreover, there 

exists c < ∞ such that, under any invariant probability measure π, 

∫ 

E π [f(x)] = π(dx)f(x) ≤ c. (4.11) 

4.3 Convergence Rates to Equilibrium 

X 

In addition to obtaining bounds on the rate of convergence through Dobrushin’s coefficient, one powerful approach 

is through the Foster-Lyapunov drift conditions. 

Regularity and ergodicity are concepts closely related through the work of Meyn and Tweedie [25], [26] and 

Tuominen and Tweedie [30].

4.3. CONVERGENCE RATES TO EQUILIBRIUM 43 

Definition 4.3.1 A set A ∈ B(X) is called (f, r)-regular if 

τ∑ 

B−1 

sup E x [ r(k)f(x k )] < ∞ 

x∈A 

for all B ∈ B + (X). A finite measure ν on B(X) is called (f, r)-regular if 

k=0 

τ∑ 

B−1 

E ν [ r(k)f(x k )] < ∞ 

for all B ∈ B + (X), and a point x is called (f, r)-regular if the measure δ x is (f, r)-regular. 

This leads to a lemma relating regular distributions to regular atoms. 

k=0 

Lemma 4.3.1 If a Markov chain {x t } has an atom α ∈ B + (X) and an (f, r)-regular distribution λ, then α is 

an (f, r)-regular set. 

Definition 4.3.2 (f-norm) For a function f : X → [1, ∞) the f-norm of a measure µ defined on (X, B(X)) is 

given by 

∫ 

‖µ‖ f = sup | µ(dx)g(x)|. 

g≤f 

The total variation norm is the f-norm when f = 1, denoted by ‖.‖ TV . 

Coupling inequality and moments of return times to a small set The main idea behind the coupling 

inequality is to bound the total variation distance between the distributions of two random variables by the 

probability they are different; the more coupled the two random variables are the more their distributions match 

up. Let X and Y be two jointly distributed random variables on a space X with distributions µ x , µ y respectively. 

Then we can bound the total variation between the distributions by the probability the two variables are not 

equal. 

‖µ x − µ y ‖ TV 

= sup|µ x (A) − µ y (A)| 

A 

= sup|P(X ∈ A, X = Y ) + P(X ∈ A, X ≠ Y ) 

A 

− P(Y ∈ A, X = Y ) − P(Y ∈ A, X ≠ Y )| 

≤ sup|P(X ∈ A, X ≠ Y ) − P(Y ∈ A, X ≠ Y )| 

A 

≤P(X ≠ Y ) 

The coupling inequality is useful in discussions of ergodicity when used in conjunction with parallel Markov chains, 

as in Theorem 4.1 of [28]. One creates two Markov chains having the same one-step transition probabilitis. Let 

{x n } and {x ′ n } be two Markov chains that have probability transition kernel P(x, ·), and let C be an (m, δ, ν)- 

small set. We use the coupling construction provided by Roberts and Rosenthal. 

Let x 0 = x and x ′ 0 ∼ π(·) where π(·) is the invariant distribution of both Markov chains. 1. If x n = x ′ n then 

x n+1 = x ′ n+1 ∼ P(x n, ·) 2. Else, if (x n , x ′ n ) ∈ C × C then with probability δ, x n+m = x ′ n+m ∼ ν(·) 

with probability 1 − δ then independently 

x n+m ∼ 1 

1−δ (P m (x n , ·) − δν(·)) 

x ′ n+m ∼ 1 

1−δ (P m (x ′ n , ·) − δν(·)) 

3. Else, independently x n+m ∼ P m (x n , ·) and x ′ n+m ∼ P m (x ′ n, ·). 

The in-between states x n+1 , ...x n+m−1 , x ′ n+1 , ...x′ n+m−1 are distributed conditionally given x n, x n+m ,x ′ n , x′ n+m . 

By the Coupling Inequality and the previous discussion with Nummelin’s Splitting technique we have ‖P n (x, ·)− 

π(·)‖ TV ≤ P(x n ≠ x ′ n).


4.3.1 Lyapunov conditions: Geometric ergodicity 

Roberts and Rosenthal use this approach to establish geometric ergodicity. 

An irreducible Markov chain satisfies the univariate drift condition if there are constants λ ∈ (0, 1) and b < ∞, 

along with a function W : X → [1, ∞), and a small set C such that 

PV ≤ λV + b1 C . (4.12) 

One can now define a bivariate drift condition for two independent copies of a Markov chain with a small set C. 

This condition requires that there exists a function h : X × X → [1, ∞) and α > 1 such that 

where 

¯Ph(x, y) ≤h(x, y)/α (x, y) /∈ C × C 

¯Ph(x, y) b 

1−λ 

− 1, then the 

bivariate drift condition is satisfied for h(x, y) = 1 2 (V (x) + V (y)) and α−1 = λ + b/(d + 1) > 1. 

The bivariate condition ensures that the processes x, x ′ hits the set C × C, from where the two chains may be 

coupled. The coupling inequality then leads to the desired conclusion. 

Theorem 4.3.3 (Theorem 9 of [28]) Suppose {x t } is an aperiodic, irreducible Markov chain with invariant 

distribution π(·). Suppose C is a (1, ǫ, ν)-small set and V :→ [1, ∞) satisfies the univariate drift condition (4.12) 

with constants λ ∈ (0, 1) and b < ∞ with V (x) < ∞ for some x ∈ X. Then {x t } is geometrically ergodic. 

As a side remark, we note the following observation. If the univariate drift condition holds then the sequence of 

random variables M n = λ −n V (x n ) − n−1 ∑ 

b1 C (x k ) is a supermartingale and thus we have the inequality 

k=0 

for all n and x 0 ∈ X, and with Theorem 3.3.4 we also have 

for all B ∈ B + (X). 

n−1 

∑ 

E x0 [λ −n V (x n )] ≤ V (x 0 ) + E x0 [ b1 C (x k )] (4.13) 

k=0 

E x [λ −τB ] ≤ V (x) + c(B) 

4.3.2 Subgeometric ergodicity 

Here, we review the class of subgeometric rate functions (see section 4 of [16], section 5 of [12], [23], [13], [30]). 

Let Λ 0 be the family of functions r : N → R >0 such that 

r is non-decreasing, r(1) ≥ 2 


log r(n) 

n 

↓ 0 

as n → ∞

4.4. CONCLUSION 45 

The second condition implies that for all r ∈ Λ 0 if n > m > 0 then 

so that 

n log r(n + m) ≤ n log r(n) + m logr(n) ≤ n log r(n) + n logr(m) 

r(m + n) ≤ r(m)r(n) for all m, n ∈ N. (4.14) 

The class of subgeometric rate functions Λ defined in [30] is the class of sequences r for which there exists a 

sequence r 0 such that 

r(n) 

0 < liminf 

n→∞ r 0 (n) ≤ limsup r(n) 

n→∞ r 0 (n) < ∞. 

The main theorem we cite on subgeometric rates of convergence is due to Tuominen and Tweedie [30]. 

Theorem 4.3.4 (Theorem 2.1 of [30]) Suppose that {x t } t∈N is an irreducible and aperiodic Markov chain on 

state space X with stationary transition probabilities given by P. Let f : X → [1, ∞) and r ∈ Λ be given. The 

following are equivalent: 

(i) there exists a petite set C ∈ B(X) such that 

τ∑ 

C−1 

sup E x [ r(k)f(x k )] < ∞ 

x∈C 

k=0 

(ii) there exists a sequence (V n ) of functions V n : X → [0, ∞], a petite set C ∈ B(X) and b ∈ R + such that V 0 is 

bounded on C, 

V 0 (x) = ∞ ⇒ V 1 (x) = ∞, 


(iii) there exists an (f, r)-regular set A ∈ B + (X). 

PV n+1 ≤ V n − r(n)f + br(n)1 C , 

n ∈ N 

(iv) there exists a full absorbing set S which can be covered by a countable number of (f, r)-regular sets. 

Theorem 4.3.5 [30] If a Markov chain {x t } satisfies Theorem 4.3.4 for (f, r) then r(n)‖P n (x 0 , ·) −π(·)‖ f → 0 

as n increases 

Thus, we observe that the hitting times to a small set is a very important measure in characterizing not only 

the existence of an invariant probability measure, but also how fast a Markov chain converges to equilibrium. 

4.4 Conclusion 

This concludes our discussion for Controlled Markov Chains via martingale Methods. We will revisit one more 

application of martingales while discussing the convex analytic approach to Controlled Markov Problems. 

4.5 Exercises 

Exercise 4.5.1 Let G = {Ω, ∅}. Then, show that E[X|G] = E[X] and if G = F, then E[X|F] = X. 

Exercise 4.5.2 Let X be a random variable such that P(|X| < K) = 1 for some K ∈ R, that is X takes values 

from a compact set. Let Y n = X + W n for n ∈ Z + and W n be an independent noise for every n. 

Let F n be the σ-field generated by Y 0 , Y 1 , . . . , Y n . 

Does lim n→∞ E[X|F n ] exist almost surely? Prove your statement. 

Can you replace the boundedness assumption on X with E[|X|] < ∞?


Exercise 4.5.3 Consider the following two-server system: 

x 1 t+1 = max(x 1 t + A1 t − u1 t , 0) 

x 2 t+1 = max(x 2 t + A 2 t + u 1 t1 (u 1 

t ≤x 1 t +A1 t ) − u 2 t, 0), (4.15) 

where 1 (.) denotes the indicator function and A 1 t,A 2 t are independent and identically distributed (i.i.d.) random 

variables with geometric distributions, that is, for i = 1, 2, 

P(A i t = k) = p i(1 − p i ) k k ∈ {0, 1, 2, . . ., }, 

for some scalars p 1 , p 2 such that E[A 1 t ] = 1.5 and E[A2 t ] = 1. 

a) Suppose the control actions u 1 t, u 2 t are such that u 1 t + u 2 t ≤ 5 for all t ∈ N + and u 1 t, u 2 t ∈ R + . At any given 

time t, the controller has to decide on u 1 t and u2 t with knowing {x1 s , x2 s , s ≤ t} but not knowing A1 t , A2 t . 

Is this server system stochastically stabilizable by some policy, that is, does there exist an invariant distribution 

under some control policy? 

If your answer is positive, provide a control policy and show that there exists a unique invariant distribution. 

If your answer is negative, precisely state why. 

b) Repeat part a where now we restrict u 1 t , u2 t ∈ Z +, that is the actions are to be integer valued. 

Exercise 4.5.4 a) Consider a Controlled Markov Chain with the following dynamics: 

x t+1 = ax t + bu t + w t , 

where w t is a zero-mean Gaussian noise with a finite variance, a, b ∈ R are the system dynamics coefficients. One 

controller policy which is admissible (that is, the policy at time t is measurable with respect to σ(x 0 , x 1 , . . .,x t ) 

and is a mapping to R) is the following: 

u t = − a + 0.5 x t . 

b 

Show that {x t }, under this policy, has a unique invariant distribution. 

b) Consider a similar setup to the one earlier, with b = 1: 

x t+1 = ax t + u t + w t , 

where w t is a zero-mean Gaussian noise with a finite variance, and a ∈ R is a known number. 

This time, suppose, we would like to find a control policy such that 

lim 

t→∞ E[x2 t] < ∞ 

Further, suppose we restrict the set of control policies to be linear, time-invariant; that is of the form u(x t ) = kx t 

for some k ∈ R. 

Find the set of all k values for which the state has a finite second moment, that is find 

as a function of a. 

Hint: Use Foster-Lyapunov criteria. 

{k : k ∈ R, lim 

t→∞ 

E[x 2 t] < ∞}, 

Exercise 4.5.5 Prove Birkhoff’s Ergodic Theorem for a countable state space; that is the result that for a Markov 

chain {x t } living in a countable space X, which has a unique invariant distribution µ, the following applies almost 

surely: 

1 

T∑ 

lim f(x t ) = ∑ f(i)µ(i) 

T →∞ T 

t=1 i


∀ bounded functions f : X → R. 

Hint: You may proceed as follows. Define the occupation measures,, for all t and A ∈ B(X): 

v T (A) = 1 T 

T∑ 

−1 

1 ((xt)∈A), ∀A ∈ σ(X), 

t=0 

Now, define: 

( t∑ 

F t (A) = 1 (xs)∈A − t ∑ 

s=1 

X 

) 

P π (A|x)v t (x) 

( t∑ ∑t−1 

∑ 

) 

= 1 (xs)∈A − P π (A|x)1 ((xs)=(x)) 

s=1 s=0 

X 

(4.16) 

And verify that, for t ≥ 2, 

E[F t (A)|F t−1 ] 

[( t∑ ∑t−1 

∑ 

] 

= E 1 (xs)∈A − P π (A|x)1 (xs=x) 

)|F t−1 

s=1 s=0 X 

[( 

= E 1 (xt)∈A − ∑ ] 

P π (A|x)1 (xt−1=x) 

)|F t−1 

X 

( t−1 

∑t−2 

∑ 

) 

+ 1 xs∈A − P π (A|x)1 (xs=x) 

= 0 

+ 

∑ 

s=1 

( ∑t−1 

s=1 

s=0 

X 

∑t−2 

∑ 

] 

1 xs∈A − P π (A|x)1 (xs=x) 

)|F t−1 

s=0 

X 

(4.17) 

= F t−1 (A), (4.18) 

where the last equality follows from the fact that E[1 xt∈A|F t−1 ] = P(x t ∈ A|F t−1 ). 

Furthermore, 

|F t (A) − F t−1 (A)| ≤ 1. 

Now, we have a sequence which is a martingale sequence. We will invoke a Martingale Convergence Theorem; 

which is applicable for martingales with bounded increments. By a version of the Martingale stability theorem, 

it follows that 

1 

lim 

t→∞ t F t(A) = 0 

You need to now complete the remaining steps. 

You can use the Azuma-Hoeffding inequality and the Borel-Cantelli Lemma to complete the steps; or you may 

use any other method you might wish to use. 

Exercise 4.5.6 Consider a queuing process, with i.i.d. Poisson arrivals and departures, with arrival mean µ 

and service mean λ and suppose the process is such that when a customer leaves the queue, with probability p 

(independent of time) it comes back to the queue. That is, the dynamics of the system satisfies: 

where E[A t ] = λ, E[N t ] = µ and E[p t ] = p. 

L t+1 = max(L t + A t − N t + p t N t , 0), t ∈ N. 

For what values of µ, λ is such a system stochastically stable? Prove your statement.


u 

Station 1 

Station 2 

Figure 4.1 

Exercise 4.5.7 Consider a two server-station network; where a router routes the incoming traffic, as is depicted 

in Figure 4.1. 

Let L 1 t , L2 t 

following: 

denote the number of customers in stations 1 and 2 at time t. Let the dynamics be given by the 

L 1 t+1 = max(L1 t + u tA t − Nt 1 , 0), t ∈ N. 

L 2 t+1 = max(L 2 t + (1 − u t )A t − Nt 2 , 0), t ∈ N. 

Customers arrive according to an independent Bernoulli process, A t , with mean λ. That is, P(A t = 1) = λ and 

P(A t = 0) = 1 − λ. Here u t ∈ [0, 1] is the router action. 

Station 1 has a Bernoulli service process N 1 t with mean n 1 , and Station 2 with n 2 . 

Suppose that a router decides to follow the following algorithm to decide on u t : If a customer arrives, the router 

simply sends the incoming customer to the shortest queue. 

Find necessary and sufficient conditions (on λ, n 1 , n 2 ) for this algorithm to lead to a stochastically stable system 

which satisfies lim t→∞ E[L 1 t + L2 t ] < ∞.

Chapter 5 

Dynamic Programming 

In this chapter, we introduce the notion of dynamic programming for stochastic systems. Let us first recall a 

few notions discussed earlier. 

Recall that A Markov control model is a five-tuple 

(X, U, {U(x), x ∈ X}, Q, c t ) 

such that X is a (complete and separable) metric space, U is the control action space, U t (x) is the control space 

when the state is x at time t, such that 

K t = {(x, u) : x ∈ X, u ∈ U t (x)}, 

is a subset X × U, is a feasible state, control pair, Q is a stochastic kernel on X given K t . Finally c t : K t → R is 

the cost function. 

In case U t (x) and c t () do not depend on t, we drop the time index. Then, the Markov model is called a stationary 

Markov model. 

If Π = {Π t } is a randomized Markov policy, we define the cost function as 

∫ 

c t (x t , Π) = c(x, u)Π t (du|x t ) 

U(x t) 

and the transition function 

∫ 

Q t (dz|x t , Π) = Q(dz|x, u)Π t (du|x t ) 

U(x t) 

Let, as in Chapter 2, Π A denote the set of all admissible policies. 

Recall that Π = {Π t , 0 ≤ t ≤ N − 1} is a policy. Consider the following expected cost: 

N−1 

J(Π, x) = Ex Π [ ∑ 

where c N (.) is the terminal cost function. Define 

t=0 

c(x t , u t ) + c N (x N )], 

J ∗ (x) := inf 

Π∈Π A 

J(Π, x) 

As earlier, let h t = {x [0,t] , u [0,t−1] } denote the history or the information process. 

The goal is to find, if there exists an admissible policy, such that J ∗ (x) is attained. We note that, the infimum 

value is not always attained. 

49

50 CHAPTER 5. DYNAMIC PROGRAMMING 

Before we proceed further, we note that we could express the cost as: 

[ 

J(Π, x) = Ex 

Π c(x 0 , u 0 ) 

+E Π x 1 

[c(x 1 , u 1 ) 

= E Π0 

x 

+E Π x 2 

[c(x 2 , u 2 ) 

+ . . . 

] 

+Ex Π N −1 

[c(x N−1 , u N−1 ) + c N (x N )|h N−1 ]|h N−2 

[ 

c(x 0 , u 0 ) 

+E Π1 

x 1 

[c(x 1 , u 1 ) 

. . . |h 1 

] 

|h 0 

], 

+E Π2 

x 2 

[c(x 2 , u 2 ) 

+ . . . 

] ] 

ΠN −1 

+Ex N −1 

[c(x N−1 , u N−1 ) + c N (x N )|h N−1 ]|h N−2 . . . |h 1 |h 0 

], 

(5.1) 

Thus, 

N−1 

J(Π, x) = Ex Π [ ∑ 

t=0 

c(x t , u t ) + c N (x N )], 

k−1 

= Ex Π [ ∑ 

c(x t , u t ) + Ex Π k 

[c(x k , u k ) + 

t=0 

N−1 

∑ 

t=k+1 

c(x t , u t ) + c N (x N )|h k ]|h 0 ], (5.2) 

The above follows from the properties of conditioning discussed in Chapters 1 and 4. Also, observe that for all 

t, and measurable functions g t 

E[g t (x t+1 )|h t , u t ] = E[g t (x t+1 )|x t , u t ] 

This follows from the the controlled Markov property. 

5.1 Bellman’s Principle of Optimality 

Let {J t (x t )} be a sequence of functions on X denifed by 

and for 0 ≤ t ≤ N − 1 

J N (x) = c N (x) 

∫ 

J t (x) = min {c t(x, u) + J t+1 (z)Q t (dz|x, u)}. 

u∈U t(x) X 

Suppose these functions admit a minimum and are measurable. We will discuss a number of sufficiency conditions 

on when this is possible. Let there be minimizing selectors which are deterministic and denoted by {f t (x)} and 

let the minimum cost be equal to 

∫ 

J t (x) = c(x, f t (x t )) + J t+1 (z)Q t (dz|x, f t (x))} 

X 

Then we have the following:

5.2. DISCUSSION: WHY ARE MARKOV POLICIES IMPORTANT? WHY ARE OPTIMAL POLICIES DETERMINISTIC FOR 

Theorem 5.1.1 The policy Π ∗ = {f 0 , f 1 , . . .,f N−1 } is optimal and the cost function is equal to 

J ∗ (x) = J 0 (x) 

We compare the cost generated by the above policy, with respect to the cost obtained by any other policy. 

Proof: We provide the proof by a backwards induction method. Consider the time stage t = N − 1. For this 

stage, the cost is equal to 

∫ 

J N−1 (x) = min{c N−1 (x N−1 , u N−1 ) + J N (z)Q N (dz|x N−1 , u N−1 )} 

Suppose there is a cost C ∗ N−1 (x N−1), achieved by some policy η N−1 (x) C ∗ N−1 (x N−1) ≥ J N−1 (x N−1 ). Then, 

CN−1(x ∗ N−1 ) 

∫ 

= c N−1 (x N−1 , η N−1 (x N−1 )) + 

≥ J N−1 (x N−1 ) 

∫ 

= min{c N−1 (x N−1 , u N−1 ) + 

X 

X 

X 

J N (z)Q N (dz|x N−1 , η N−1 (x N−1 )) 

J N (z)Q N (dz|x N−1 , u N−1 )} (5.3) 

Now, we move to time stage N − 2. In this case, the cost to be minimized is 

∫ 

c N−2 (x N−2 , u N−2 ) + CN−1 ∗ (z)Q N(dz|x N−2 , u N−2 ) 

X 

∫ 

≥ min{c N−2 (x N−2 , u N−2 ) + J N−1 (z)Q N−1 (dz|x N−2 , u N−2 )} 

=: J N−2 (x N−2 ) 

Thus, the cost function to be minimized will keep being greater than or equal to the one achieved by the 

minimizing policy. 

X 

(5.4) 

We can, by induction show that the recursion holds for all 0 ≤ t ≤ N − 2. 

⊓⊔ 

5.2 Discussion: Why are Markov Policies Important? Why are optimal 

policies deterministic for such settings? 

We observe that when there is an optimal solution, the optimal solution is Markov. Let us provide a more 

insightful way to derive this property. 

In the following, we will follow David Blackwell’s [5] and Hans Witsenhausen’s ideas: 

Theorem 5.2.1 (Blackwell’s Principle of Irrelevant Information) Let X, Y, U be complete, separable, metric 

spaces, and let P be a probability measure on B(X × Y), and let c : X × U → R be a Borel measurable cost 

function. Then, for any Borel measurable function γ : X×Y → U, there exists another Borel measurable function 

γ ∗ : X → U such that ∫ 

∫ 

c(x, γ ∗ (x))P(dx) ≤ c(x, γ(x, y))P(dx, dy) 

X 

X×Y 

Furthermore, policies based on only x almost surely, are optimal. 

Proof: We will construct a γ ∗ given γ. 

Let us write 

∫ 

X×Y 

∫ 

c(x, γ(x, y))P(dx, dy) = 

X 

( ∫ 

) 

c(x, u)P γ (du|x) P(dx), 

U


where P γ (u ∈ D|x) = ∫ ( ) 

∫ 

1 {γ(x,y)∈D} P(dy|x). Consider h γ (x) := 

U c(x, u)P γ (du|x) 

Let 

D = {(x, u) ∈ X × U : c(x, u) ≤ h γ (x)}, 

D is a Borel set since c(x, u) − h γ (x) is a Borel measurable function. 

a) Suppose the space U is countable. In this case, let us enumerate the elements in U as {u k , k = 1, 2, . . . }. In 

this case, we could define: 

D i = {x ∈ X : (x, u i ) ∈ D}, i = 1, 2, . . ., . 

and define: 

γ ∗ (x) = u k 

Such a function is measurable, by construction. 

if x ∈ D k \ (∪ k−1 

i=1 D i), k = 1, 2, . . ., 

b) Suppose now the space U is separable, but assume that c(x, u) is continuous in u. 

Suppose that the space U was partitioned into a countable collection of disjoint sets, K n = {Kn, 1 Kn, 2 . . . , Kn m , . . . , } 

and elements u k n ∈ Kk n in the sets. In this case, we could define, by an enumeration of the control actions in 

{K n }: 

D i = {x ∈ X : (x, u k n) ∈ D}, 

and define: 

γ n (x) = u k n if x ∈ D k \ (∪ k−1 

i=1 D i) 

We can make K n finer, by replacing the above with K n+1 such that Kn+1 i ⊂ Km n 

U = R, let K n = {[ 1 

2 

[⌊u2 n ⌋, ⌈u2 n ⌉), u ∈ R}). 

n 

for some m (For example if 

Each of the γ n functions are measurable. Furthermore, by a careful enumeration, since the image partitions 

become finer, lim n→∞ γ n (x) exists pointwise. Being pointwise limit of measurable functions, the limit is also 

measurable, define this as γ ∗ . By continuity, then, it follows that, {(x, γ ∗ (x))} ∈ D since lim n→∞ c(x, γ n (x)) = 

c(x, γ ∗ (x)) ≤ h γ (x). 

This would complete the proof for the case when c is continuous in u. 

c) For the general case, when c is not necessarily continuous, we define D x = {u : (x, u) ∈ D}. It follows that 

P γ (D x |x) > 0, for otherwise we would arrive at a contradiction. It then follows from Theorem 2 of [6] that we 

can construct a measurable function γ ∗ whose graph {(x, γ ∗ (x))} ∈ D. We do not provide the detailed proof for 

this case. 

⊓⊔ 

We also have the following discussion. 

Theorem 5.2.2 Let {(x t , u t )} be a controlled Markov Chain. Consider the minimization of E[ ∑ T −1 

t=0 c(x t, u t )], 

over all control policies which are sequences of causally measurable functions of {x s , s ≤ t}, for all t ≥ 0. Suppose 

further that the cost function is measurable and the transition kernel is such that ∫ Φ(dx t+1 |x t , u t )f(x t+1 ) is 

measurable on U for every measurable and bounded function f on X. If an optimal control policy exists, there 

is no loss in restricting policies to be Markov, that is a policy which only uses the current state x t and the time 

information t. 

Proof: Let us define a history process {h t } where h t = {x 0 , . . . , x t } for t ≥ 0. Let there be an optimal policy 

given by {η t , t ≥ 0} such that u t = η t (h t ). It follows that 

P(x t+1 ∈ B|x t ) 

∫ ∫ 

= Φ(x t+1 ∈ B|u t , x t )1 {ut=η(h t)}P(dh t |x t ) 

U X 

∫ 

t ∫ 

= Φ(x t+1 ∈ B|u t , x t ) 1 {ut=η(h t)∈du}P(dh t |x t ) 

U 

X 

∫ 

t 

= Φ(x t+1 ∈ B|u, x t ) ˜P(du|x t ) (5.5) 

U

5.3. EXISTENCE OF MINIMIZING SELECTORS AND MEASURABILITY 53 

where 

∫ 

˜P(du|x t ) = 1 {u=η(ht)∈du}P(h t |x t ) 

X t 

Thus, we may replace any history dependent policy with a possibly randomized Markov policy. 

By hypothesis, there exists a solution almost surely. If the solution is randomized, then there exists some 

distribution on U such that P(u t ∈ B|x t ) = κ t (B), and the optimal cost can be expressed as a dynamic 

programming recursion for some sequence of functions on X, {J t , t ≥ 0} with: 

∫ { ( ∫ 

)} 

J t (x t ) = κ t (du) c(x t , u) + Φ(dx t+1 |x t , u)J t+1 (x t+1 ) 

U 

X 

Define 

( ∫ 

) 

M(x t , u t ) = c(x t , u t ) + Φ(dx t+1 |x t , u t )J t+1 (x t+1 ) 

X 

If inf ut∈U M(x t , u t ) admits a solution, the distribution κ can be taken to be a Dirac-delta (δ) distribution at the 

minimizing value. By hypothesis, a solution exists P − a.s.. Thus, an optimal policy depends on the state value 

and time variable (dependency on time is due to the time-dependent nature of J t ). 

⊓⊔ 

5.3 Existence of Minimizing Selectors and Measurability 

The above dynamic programming arguments hold when there exist minimizing control policies (selectors measurable 

with respect to the Borel field on X). 

Measurable Selection Hypothesis: Given a sequence of functions J t : X → R, there exists 

∫ 

J t (x) = min (c(x t, u t ) + J t+1 (y)Q(dy|x, u)), 

u t∈U t(x) 

for all x ∈ X, for t ∈ {0, 1, 2 . . ., N − 1} with 

X 

J N (x N ) = c N (x N ). 

Furthermore, there exist measurable functions f t such that 

∫ 

J t (x) = (c(x t , f(x t )) + J t+1 (y)Q(dy|x, f(x t ))), 

X 

⊓⊔ 

Recall that a set in a normed linear space is (sequentially) compact if every sequence in the set has a converging 

subsequence. 

Condition 1: The cost function to be minimized c(x t , u t ) is continuous on both U and X; U t (x) = U is compact; 

and ∫ X 

Q(dy|x, u)v(y) is a continuous function on X × U for every continuous and bounded v on X. ⊓⊔ 

Condition 2: For every x ∈ X the cost function to be minimized c(x t , u t ) is continuous on U; U t (x) is compact; 

and ∫ Q(dy|x, u)v(y) is a continuous function on U for every bounded, measurable function v on X. 

X ⊓⊔ 

Theorem 5.3.1 Under Condition 1 or Condition 2, there exists an optimal solution and the Measurable Selection 

applies, and there exists a minimizing control policy f t : X → U t (x t ). 

Furthermore, under Condition 1, the function J t (x) is continuous, if c N (x N ) is continuous.


The result follows from the two lemma below: 

Lemma 5.3.1 A continuous function f : X → R over a compact set A ⊂ X admits a minimum. 

Proof: Let δ = inf x∈A f(x). Let {x i } be a sequence such that f(x i ) converges to δ. Since A is compact {x i } 

must have a converging subsequence {x i(n) }. Let the limit of this subsequence be x 0 . Then, it follows that, 

{x i(n) } → x 0 and thus, by continuity {f(x i(n) )} → f(x 0 ). As such f(x 0 ) = δ. ⊓⊔ 

To see why compactness is important, consider 

1 

inf 

x∈A x 

for A = [1, 2). How about for A = R? In both cases there does not exist an x value in the specified set which 

attains the infimum. 

Lemma 5.3.2 Let U be compact, and c(x, u) be continuous on X × U. Then, min u c(x, u) is continuous on X. 

Proof: Let x n → x, u n optimal for x n and u optimal for x. Such optimal action values exist as a result of 

compactness of U and continuity. Now, 

| min c(x n , u) − min c(x, u)| 

u 

u 

( 

) 

≤ max c(x n , u) − c(x, u), c(x, u n ) − c(x n , u n ) 

(5.6) 

The first term above converges since c is continuous in x, u. The second converges also. Suppose otherwise. 

Then, for some ǫ > 0, there exists a subsequence such that 

c(x, u kn ) − c(x kn , u kn ) ≥ ǫ 

Consider the sequence (x kn , u kn ). There exists a subsequence such that (x k ′ n 

, u k ′ n 

) which converges to x, u ′ for 

some u ′ since U is compact. Hence, for this subsequence, we have convergence of c(x k ′ 

n 

, u k ′ 

n 

) as well as c(x, u k ′ 

n 

), 

leading to a contradiction. 

⋄ 

Compactness is a crucial component for this result. 

The textbook by Hernandez-Lerma and Lasserre provides more general conditions. We did not address here the 

issue of the existence of measurable functions. We refer the reader again to Schäl [29]. We, however, state the 

following result: 

Lemma 5.3.3 Let c(x, u) be a continuous function on U for every x, U be a compact set. Then, there exists a 

Borel measurable function f : X → U such that 

c(x, f(x)) = min c(x, u) 

u∈U 

Proof: Let 

We first observe the following. The function 

˜c(x) := min c(x, u) 

u∈U 

˜c(x) := min c(x, u), 

u∈U 

is Borel measurable. This follows from the observation that it is sufficient to prove that {x : ˜c(x) ≤ α} is Borel 

for every α ∈ R. By continuity of c and compactness of U, the result follows. 

Define {(x, u) : c(x, u) ≤ ˜c(x).} This set is a Borel set. We can now construct a measurable function which lives 

in this set, using the property that the control action space is a separable, complete, metric space as discussed 

in the proof of Theorem 5.2.1. Note that, the continuity of c in u ensures that the discussion in the graph issues 

does not occur here since lim n→∞ c(x, γ n (x)) = c(x, γ ∗ (x)) ≤ ˜c(x). 

⊓⊔

5.4. INFINITE HORIZON OPTIMAL CONTROL PROBLEMS 55 

We can replace the compactness condition with an inf-compactness condition, and modify the Condition 1 as 

below: 

Condition 3: For every x ∈ X the cost function to be minimized c(x t , u t ) is continuous on X×U; is non-negative; 

{u : c(x, u) ≤ α} is compact for all α > 0 and all x ∈ X; ∫ X 

Q(dy|x, u)v(y) is a continuous function on X × U for 

every continuous and bounded v. 

⊓⊔ 

Theorem 5.3.2 Under Condition 3, the Measurable Selection Hypothesis applies. 

Remark: We could relax the continuity condition and change it with lower-semi continuity. A function is lower 

semi-continuous at x 0 if liminf x→x0 f(x) ≥ f(x 0 ). 

⊓⊔ 

Essentially what is needed is the existence of a minimizing control function; or a selector which is optimal for 

every x t ∈ X. In class we solved the following problem with q > 0, r > 0, p T > 0: 

for a linear system: 

inf 

Π EΠ x [ T −1 

∑ 

qx 2 t + r2 t + p Tx 2 T ] 

t=0 

x t+1 = ax t + u t + w t , 

where w t is a zero-mean, i.i.d. Gaussian random variable with variance σw 2 . We showed that, by the method of 

completing the squares: 

T −1 

J t (x t ) = P t x 2 t + ∑ 

P t+1 σw 

2 

where 

and the optimal control policy is 

k=t 

P t = q + P t+1 a 2 − P t+1a 2 

u t = −P t+1a 

P t+1 + r x t. 

P t+1 + r 

Note that, the optimal control policy is Markov (as it uses only the current state). 

5.4 Infinite Horizon Optimal Control Problems 

When the time horizon becomes unbounded, we can not directly invoke Dynamic Programming in the form 

considered earlier. In such settings, we look for conditions for optimality. We will investigate such conditions in 

this section. 

Infinite horizon problems that we will consider will belong to two classes: Discounted Cost and Average cost 

problems. We first discuss the discounted cost problem. The average cost problem is discussed in Chapter 7. 

5.4.1 Discounted Cost 

One popular type of problems are ones where the future costs are discounted: The future is less important than 

today, in part due to the uncertainty in the future, as well as due to economic understanding that current value 

of a good is more important than the value in the future. 

The discounted cost function is given as: 

T∑ 

−1 

J(ν 0 , Π) = Eν Π 0 

[ β t c(x t , u t )], 

t=0


for some β ∈ (0, 1). 

If there exists a policy Π ∗ which minimizes this cost, the policy is said to be optimal. 

We can consider an infinite horizon problem by taking the limit (when c is non-negative) 

and invoking the Monotone Convergence Theorem: 

for some β ∈ (0, 1). 

Note that, 

or 

or 

T∑ 

−1 

J(ν 0 , Π) = lim 

T →∞ EΠ ν 0 

[ β t c(x t , u t )], 

t=0 

∞∑ 

J(ν 0 , Π) = Eν Π 0 

[ β t c(x t , u t )], 

t=0 

∞∑ 

J(x 0 , Π) = Ex Π 0 

[c(x 0 , u 0 ) + Ex Π 1 

[ β t c(x t , u t )]|x 0 , u 0 ], 

t=1 

∞∑ 

J(x 0 , Π) = Ex Π 0 

[c(x 0 , u 0 ) + βEx Π 1 

[ β t−1 c(x t , u t )|x 0 , u 0 ]], 

t=1 

J(x 0 , Π) = E Π x 0 

[c(x 0 , u 0 ) + βE Π x 1 

[J(x 1 , Π)]|x 0 , u 0 ], 

A function v : X → R is a solution to the infinite horizon problem if it satisfies: 

for all x ∈ X. We call v(·) a value function. 

v(x) = min {c(x, u) + β v(y)Q(dy|x, u)}, 

U 

∫X 

Now, let J(x) = inf Π∈ΠA J(x, Π). We observe that the cost of starting at any position, at any time k leads to a 

recursion where the future cost is multiplied by β. This can be observed by applying the Dynamic Programming 

equations discussed earlier: For a given finite horizon truncation, we work with the sequential updates: 

∫ 

J t (x t ) = min {c(x t, u t ) + β J t+1 (x t+1 )Q(dx t+1 |x t , u t )}, 

U 

and hope that this converges to a limit. 

X 

Under measurable selection conditions, and a boundedness condition for the cost function, there exists a solution 

to the discounted cost problem. In particular, the following holds: 

Theorem 5.4.1 Suppose the cost function is bounded, non-negative, and one of the the measurable selection 

conditions (Condition 1, Condition 2 or Condition 3) applies. Then, there exists a solution to the discounted 

cost problem. Furthermore, the optimal cost (value function) is obtained by a successive iteration of policies: 

with v 0 (x) = 0. 

v n (x) = min {c(x, u) + β v n−1 (y)Q(dy|x, u)}, ∀x, n ≥ 1 

U 

∫X 

Before presenting the proof, we state the following two lemmas: The first one is on the exchange of the order of 

the minimum and limits. 

Lemma 5.4.1 [Hernandez-Lerma and Lasserre] Let V n (x, u) ↑ V (x, u) pointwise. Suppose that V n and V are 

continuous in u and u ∈ U(x) is compact for every x ∈ X.Then, 

lim 

n→∞ 

min V n(x, u) = min V (x, u) 

u∈U(x) u∈U(x)

5.4. INFINITE HORIZON OPTIMAL CONTROL PROBLEMS 57 

Proof. The proof follows from essentially the same arguments as in the proof of Lemma 5.3.2. Let u ∗ n solve 

min u∈U(x) V n (x, u). Note that 

since V n (x, u) ↑ V (x, u). Now, suppose that 

| min 

u∈U(x) V n(x, u) − min 

u∈U(x) V (x, u)| ≤ V (x, u∗ n) − V n (x, u ∗ n), (5.7) 

V (x, u ∗ n ) − V n(x, u ∗ n ) > ǫ (5.8) 

along a subsequence n k . There exists a further subsequence n ′ k such that u∗ n ′ k 

→ ū for some ū. Now, for every 

n ′ k ≥ n, since V n is monotonically increasing: 

V (x, u ∗ n − V 

k) ′ n ′ (x, u∗ k n ≤ V (x, u 

k) ∗ ′ n − V 

k) ′ n (x, u ∗ n ) ′ 

k 

However, V (x, u ∗ n 

) and for a fixed n, V ′ n (x, u ∗ n 

are continuous hence these two terms converge to: 

′ 

k 

k), 

V (x, ū) − V n (x, ū). 

For every fixed x and ū, and every ǫ > 0, we can find a sufficiently large n such that V (x, ū) − V n (x, ū) ≤ ǫ/2. 

Hence (5.8) cannot hold. 

⊓⊔ 

Lemma 5.4.2 The space of measurable functions X → R endowed with the ||.|| ∞ norm is a Banach space, that 

is 

is a Banach space. 

L ∞ (X) = {f : ||f|| ∞ = sup |f(x)| < ∞} 

x 

Proof of Theorem 5.4.1. We obtain the optimal solution as a limit of discounted cost problems with a finite 

horizon T. By dynamic programming, we obtain the recursion for every T as: 

J T T (x) = 0 

Jt T (x) = T(Jt+1)(x) T = {min {c(x, u) + β Jt+1(y)Q(dy|x, U 

∫X 

T u)}}. 

This sequence will lead to a solution for an T-stage discounted cost problem. We then look for a solution as 

T → ∞ and invoke Lemma 5.4.1 since J T t (x) ↑ J ∞ t (x) for any t ∈ Z + . Note also that J ∞ 1 (x) = J ∞ 2 (x) = J ∞ 3 (x) 

and so on. Thus, the sequence of such solutions leads to a solution for the infinite horizon problem. This will 

satisfy: 

J0 ∞ (x) = T(J ∞ )(x) = {min {c(x, u) + β J 

U 

∫X 

∞ (y)Q(dy|x, u)}}. 

Now, the next step is to obtain a solution to this fixed point equation. We will obtain the solution using an 

iteration, known as the value iteration algorithm. We observe that the vector J ∞ (·) lives in L ∞ (X) (since the 

cost is bounded, there is a uniform bound for every x). We show that, the iteration given by 

T(v)(x) = {min {c(x, u) + β v(y)Q(dy|x, u)}} 

U 

∫X


is a contraction in L ∞ (X). Let 

||T(v) − T(v ′ )|| ∞ 

= sup |T(v)(x) − T(v ′ )(x)| 

x∈X 

= sup |{min {c(x, u) + β 

x∈X U 

∫X 

= sup 

x∈X 

v(y)Q(dy|x, u)} − {min {c(x, u) + β v 

U 

∫X 

′ (y)Q(dy|x, u)}| 

{ (1 A1 min {c(x, u) + β v(y)Q(dy|x, u)} − min 

U 

∫X 

{c(x, u) + β v 

U 

∫X 

′ (y)Q(dy|x, u)} 

}) 

{ 

+1 A2 − min {c(x, u) + β v(y)Q(dy|x, u)} + min 

U 

∫X 

{c(x, u) + β v 

U 

∫X 

′ (y)Q(dy|x, u)} 

{ ∫ 

∫ 

} 

≤ sup 

(1 A1 c(x, u ∗ ) + β v(y)Q(dy|x, u ∗ ) − c(x, u ∗ ) − β v ′ (y)Q(dy|x, u ∗ ) 

x∈X 

X 

X 

{ ∫ 

∫ 

}) 

+1 A2 − c(x, u ∗∗ ) − β v(y)Q(dy|x, u ∗∗ ) + c(x, u ∗∗ ) + β v ′ (y)Q(dy|x, u ∗∗ ) 

X 

X 

( ∫ 

) ( ∫ 

) 

= sup 1 A1 {β (v(y) − v(y ′ ))Q(dy|x, u ∗ )} + sup 1 A2 {β (v ′ (y) − v(y))Q(dy|x, u ∗∗ )} 

x∈X 

X 

x∈X 

X 

∫ 

∫ 

≤ β||v − v ′ || ∞ {1 A1 Q(dy|x, u ∗∗ ) + 1 A2 Q(dy|x, u ∗ )} 

X 

X 

= β||v − v ′ || ∞ (5.9) 

} 

Here A1 denotes the event that 

min {c(x, u) + β v(y)Q(dy|x, u)} ≥ min 

U 

∫X 

{c(x, u) + β v 

U 

∫X 

′ (y)Q(dy|x, u)}, 

and A2 denotes the complementary event, u ∗∗ is the minimizing control for {c(x, u) + β ∫ X 

v(y)Q(dy|x, u)} and 

u ∗ is the minimizer for {c(x, u) + β ∫ X v′ (y)Q(dy|x, u)}. 

⊓⊔ 

This iteration is known as the value iteration algorithm. 

Remark 5.4.1 We note that if the cost is not bounded, one can define a weighted sup-norm: ||c|| f = sup x | c(x) 

v(x) |, 

where v is a positive function. The discussions above will apply to this context as well. 

5.5 Linear Quadratic Gaussian Problem 

Consider the following linear system: 

x t+1 = Ax t + Bu t + w t , 

where x ∈ R n , u ∈ R m and w ∈ R n . Suppose {w t } is i.i.d. Gaussian with a given covariance matrix E[w t w T t ] = W 

for all t ≥ 0. 

The goal is to obtain 

where 

inf J(Π, x), 

Π 

T −1 

J(Π, x) = Ex Π [ ∑ 

x T t Qx t + u T t Ru t + x T T Q Tx T ], 

with R, Q, Q T > 0 (that is, these matrices are positive definite). 

In class, we obtained the Dynamic Programming recursion for the optimal control problem. 

t=0


Theorem 5.5.1 The optimal control is linear and has the form: 

u t = −(BP t+1 B + R) −1 B T P t+1 Ax t 

where P t solves the Discrete-Time Riccati Equation: 

P t = Q + A T P t+1 A − A T P t+1 B((BP t+1 B + R)) −1 B T P t+1 A, 

with final condition P T = Q T . 

Theorem 5.5.2 As T → ∞ in the optimization problem above, if (A, B) is controllable and with Q = C T C and 

(A, C) is observable, the optimal policy is stationary. The Riccati recursion, under the stated assumptions of 

observability and controllability 

P t = Q + A T P t+1 A − A T P t+1 B((BP t+1 B + R)) −1 B T P t+1 A, 

has a unique solution P, which is furthermore positive-definite. 

We observe that, if the system has noise, and if we have an an average cost optimization problem, the effect of 

the noise will stay bounded, and the same solution with P and the induced control policy, will be optimal for 

the minimization of the expected average cost optimization problem: 

T −1 

1 

lim 

T →∞ T EΠ x [ ∑ 

x T t Qx t + u T t Ru t + x T T Q Tx T ] 

t=0 

5.6 Exercises 

Exercise 5.6.1 Consider the following problem considered in class. A investor’s wealth dynamics is given by 

the following: 

x t+1 = u t w t , 

where {w t } is an i.i.d. R + -valued stochastic process with E[w t ] = 1. The investor has access to the past and 

current wealth information and his previous actions. The goal is to maximize: 

T∑ 

−1 

J(x 0 , Π) = Ex Π √ 

0 

[ xt − u t ]. 

The investor’s action set for any given x is: U(x) = [0, x]. 

Formulate the problem as an optimal stochastic control problem by clearly identifying the state, the control action 

spaces, the information available at the controller, the transition kernel and a cost functional mapping the actions 

and states to R. 

Find an optimal policy. 

Hint: For α ≥ 0, √ x − u + α √ u is a concave function of u for 0 ≤ u ≤ x and its maximum is computed when 

the derivative of √ x − u + α √ u is set to zero. 

t=0 

Exercise 5.6.2 Consider an investment problem in a stock market. Suppose there are N possible goods and 

suppose these goods have prices {p i t , i = 1, . . .,N} at time t taking values in a countable set P. Further, suppose 

that the stocks evolve stochastically according to a Markov transition such that for every 1 ≤ i ≤ N 

P(p i t+1 = m|p i t = n) = P i (n, m). 

Suppose that there is a transaction cost of c for every buying and selling for each good. Suppose that the investor 

can only hold on to at most M units of stock. At time t, the investor has access to his current and past price 

information and his investment allocations up to time t.


When a transaction is made the following might occur. If the investor sells one unit of i, the investor spends c 

dollars for the transaction, and adds p i t to his account. If he only buys one unit of j, then he takes out pj t dollars 

per unit from his account and also takes out c dollars. For every such a transaction per unit, there is a cost of 

c dollars. If he does not do anything, then there is no cost. Assume that the investor can make at most one 

transaction per time stage. 

Suppose the investor initially has an unlimited amount of money and a given stock allocation. 

The goal is to maximize the total worth of the stock at the final time stage t = T ∈ Z + minus the total transaction 

costs plus the earned money in between. 

a) Formulate the problem as a stochastic control problem by clearly identifying the state, the control actions, the 

information available at the controller, the transition kernel and a cost functional mapping the actions and states 

to R. 

b) By Dynamic Programming, generate a recursion which will lead to an optimal control policy. 

Exercise 5.6.3 Consider a special case of the above problem, where the objective is to maximize the final worth 

of the investment and minimize the transaction cost. Furthermore, suppose there is only one uniform stock price 

with a transition kernel given by 

P(p t+1 = m|p t = n) = P(n, m) 

The investor has again a possibility of at most M units of goods. But, the investor might wish to purchase more, 

or sell them. 

a) Formulate the problem as a stochastic control problem by clearly identifying the state, the control actions, the 

transition kernel and a cost functional mapping the actions and states to R. 

b) By Dynamic Programming generate a recursion which will lead to an optimal control policy. 

c) What is the optimal control policy? 

Exercise 5.6.4 A fishery manager annually has x t units of fish and sells u t x t of these where u t ∈ [0, 1]. With 

the remaining ones, the next year’s production is given by the following model 

x t+1 = w t x t (1 − u t ) + w t , 

with x 0 is given and w t is an independent, identically distributed sequence of random variables and w t ≥ 0 for 

all t and therefore E[w t ] = ˜w ≥ 0. 

The goal is to maximize the profit over the time horizon 0 ≤ t ≤ T − 1. At time T, he sells all of the fish. 

a) Formulate the problem as an optimal stochastic control problem by clearly identifying the state, the control 

actions, the information available at the controller, the transition kernel and a cost functional mapping the 

actions and states to R. 

b) Does there exist an optimal policy? If it does, compute the optimal control policy as a dynamic programming 

recursion. 

Exercise 5.6.5 Consider the following linear system: 

x t+1 = Ax t + Bu t + w t , 

where x ∈ R n , u ∈ R m and w ∈ R n . Suppose {w t } is i.i.d. Gaussian with a given covariance matrix E[w t w T t ] = 

W for all t ≥ 0. 


where 

inf J(Π, x), 

Π 

T∑ 

−1 

J(Π, x) = Ex Π [ x T t Qx t + u T t Ru t + x T TQ T x T ], 

t=0


with R, Q, Q T > 0 (that is, these matrices are positive definite). 

a) Show that there exists an optimal policy. 

b) Obtain the Dynamic Programming recursion for the optimal control problem. Is the optimal control policy 

Markov? Is it stationary? 

c) For T → ∞, if (A, B) is controllable and with Q = C T C and (A, C) is observable, prove that the optimal 

policy is stationary. 

Exercise 5.6.6 Consider an unemployed person who will have to work for years t = 1, 2, ..., 10 if she takes a job 

at any given t. 

Suppose that each year in which she remains unemployed; she may be offered a good job that pays 10 dollars per 

year (with probability 1/4); she may be offered a bad job that pays 4 dollars per year (with probability 1/4); or 

she may not be offered a job (with probability 1/2). These events of job offers are independent from year to year 

(that is the job market is represented by an independent sequence of random variables for every year). 

Once she accepts a job, she will remain in that job for the rest of the ten years. That is, for example, she cannot 

switch from the bad job to the good job. 

Suppose the goal is maximize the expected total earnings in ten years, starting from year 1 up to year 10 (including 

year 10). 

Should this person accept good jobs, bad jobs at any given year? What is her best policy? 

State the problem as a Markov Decision Problem, identify the state space, the action space and the transition 

kernel. 

Obtain the optimal solution.

62 CHAPTER 5. DYNAMIC PROGRAMMING

Chapter 6 

Partially Observed Markov Decision 

Processes 

As discussed earlier in chapter 2, consider a system of the form: 

x t+1 = f(x t , u t , w t ), y t = g(x t , v t ). 

Here, x t is the state, u t ∈ U is the control, (w t , v t ) ∈ W × V are second order, zero-mean, i.i.d noise processes 

and w t is independent of v t . We also assume that the state noise w t either has a probability mass function, 

or admits a probability measure which is absolutely continuous with respect to the Lebesgue measure; this will 

ensure that the probability measure admits a density function. Hence, the notation p(x) will denote either the 

probability mass for discrete-valued spaces or probability density function for uncountable spaces. The controller 

only has causal access to the second component {y t } of the process. A policy {Π} is measurable with respect 

to σ(I t ) with I t = {y [0,t] , u [0,t−1] }. We denote the observed history space as: H 0 := P, H t = H t−1 × Y × U. 

Here P, denotes the space of probability measures on X, which we assume to be a Borel space. Hence, the set 

of randomized causal control policies are such that P(u(h t ) ∈ U|I t ) = 1 ∀I t ∈ H t . 

6.1 Enlargement of the State-Space and the Construction of a Controlled 

Markov Chain 

One could transform a partially observable Markov Decision Problem to a Fully Observed Markov Decision 

Problem via an enlargement of the state space. In particular, we obtain via the properties of total probability 

the following dynamical recursion 

π t (x) : = P(x t = x|y [0,t] , u [0,t−1] ) 

∑ 

X 

= 

π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 )P(u t−1 |y [0,t−1] , u [0,t−2] ) 

∑X π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 )P(u t−1 |y [0,t−1] , u [0,t−2] ) . 

∑ 

X 

Here, the summation needs to be exchanged with an integral in case the variables live in an uncountable space. 

The conditional measure process becomes a controlled Markov chain in P. 

Theorem 6.1.1 The process {π t , u t } is a controlled Markov chain. That is, under any admissible control policy, 

given the action at time t ≥ 0 and π t , π t+1 is conditionally independent from {π s , u s , s ≤ t − 1}. 

63

64 CHAPTER 6. PARTIALLY OBSERVED MARKOV DECISION PROCESSES 

Proof: Suppose X is countable. Let D ∈ B(P(X)): 

P(dπ t+1 ∈ D|π s , u s , s ≤ t) 

∑ 

X 

= 

π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 )P(u t−1 |y [0,t−1] , u [0,t−2] ) 

∑ 

X 

∑X π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 )P(u t−1 |y [0,t−1] , u [0,t−2] ) ∈ D|π s, u s , s ≤ t) 

∑ 

X 

= 

π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 ) 

∑ 

X 

∑X π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 ) ∈ D|π s, u s , s ≤ t) 

∑ 

X 

= 

π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 ) 

∑X π t−1(x t−1 )P(y t |x t )P(x t |x t−1 , u t−1 ) ∈ D|π t, u t ) 

∑ 

X 

Let the cost function to be minimized be 

T∑ 

−1 

Ex Π 0 

[ c(x t , u t )], 

t=0 


[·] denotes the expectation over all sample paths with initial state given by x 0 under policy Π. We can 

transform the system into a fully observed Markov model as follows. Define the new cost as Since 

T∑ 

−1 

T∑ 

−1 

Ex Π 0 

[ c(x t , u t )] = Ex Π 0 

[ E[c(x t , u t )|I t ]], 

t=0 

we can write the per=stage cost function as 

t=0 

⋄ 

˜c(π, u) = ∑ X 

c(x, u)π(x), π ∈ P (6.1) 

The stochastic transition kernel q is given by: 

q(x, y|π, u) = ∑ X 

P(x, y|x ′ , u)π(x ′ ), 

π ∈ P 

And, this kernel can be decomposed as q(x, y|π, u) = P(y|π, u)P(x|π, u, y). 

The second term here is the filtering equation, mapping (π, u, y) ∈ (P × U × Y) to P. It follows that (P, U, K, ˜c) 

defines a completely observable controlled Markov process. Here, we have 

K(B|π, u) = ∑ Y 

1 {P(.|π,u,y)∈B} P(y|π, u), ∀B ∈ σ(P). 

As such, one can obtain the optimal solution by using the filtering equation as a sufficient statistic in a centralized 

setting, as Markov policies (policies that use the Markov state as their sufficient statistics) are optimal for control 

of Markov chains, under well-studied sufficiency conditions for the existence of optimal selectors. 

We call the control policies which use π as their information to generate control as separated control policies. 

This will become more clear in the context of Linear Quadratic Gaussian Control Systems. 

Since in our analysis earlier, we had assumed that the state belongs to a complete, separable, metric space, the 

analysis here follows the analysis in the previous chapter, provided that the measurability, continuity, compactness 

conditions are verified. 

6.2 Linear Quadratic Gaussian Case and Kalman Filtering 

Consider the following linear system: 


x t+1 = Ax t + Bu t + w t , 

y t = Cx t + w t ,

6.3. ESTIMATION AND KALMAN FILTERING 65 

where x ∈ R n , u ∈ R m and w ∈ R n , y ∈ R p , v ∈ R p . Suppose {w t , v t } are i.i.d. random Gaussian vectors with 

given covariance matrices E[w t wt T] = W and E[v tvt T ] = V for all t ≥ 0. 


where 

inf 

Π J(Π, µ 0), 

T∑ 

−1 

J(Π, µ 0 ) = Eµ Π 0 

[ x T t Qx t + u T t Ru t + x T TQ T x T ], 

t=0 

with R > 0 and Q, Q T ≥ 0 (that is, these matrices are positive definite and positive semi-definite). 

We observe that the optimal control is linear and has the form: 

where K t solves the Discrete-Time Riccati Equation: 

with final condition K T = Q T . We obtained 

u t = −(BK t+1 B + R) −1 B T K t+1 AE[x t |I t ] 

K t = Q + A T K t+1 A − A T K t+1 B((BK t+1 B + R)) −1 B T K t+1 A, 

T∑ 

−1 ( 

) 

J t (I t ) = E[x T t K t x t ] + E[(x t − E[x t |I t ]) T Q t (x t − E[x t |I t ])] + E[ ˜w t T K t+1 ˜w] , (6.2) 

k=t 

where ˜w t = Σ t|t−1 C T (CΣ t|t−1 C T + V ) −1 (y t − CE[x t |I t−1 ]) and Σ is to be derived below. 

6.2.1 Separation of Estimation and Control 

In the above problem, we observed that the optimal control has a separation structure: The controller first 

estimates the state, and then applies its control action, by regarding the estimate as the state itself. 

We say that control has a dual effect, if control affects the estimation quality. Typically, when the dual effect is 

absent, separation principle observed above applies. In most general problems, however, the dual effect of the 

control is present. That is, depending on the control, the estimation quality at the controller will be affected. 

As an example, consider a linear system controlled over the Internet, where the controller applies a control, but 

does not know if the packet reaches the destination or not. In this case, the control signal which was intended 

to be sent, does affect the estimation error quality. 

6.3 Estimation and Kalman Filtering 

Lemma 6.3.1 Let x, y be random variables with finite second moments and R > 0. The function which solves 

inf g(y) E[(x − g(y)) T R(x − g(y))] is G(y) = E[x|y]. 

Proof: Let G(y) = E[x|y] + h(y), for some measurable h, then, Pa.s. we have the following 

E[(x − E[x|y] − h(y)) T R(x − E[x|y] − h(y))|y] 

= E[(x − E[x|y]) T R(x − E[x|y])|y] + E[h T (y)h(y)|y] + E[(x − E[x|y])h(y)|y] 

= E[(x − E[x|y]) T R(x − E[x|y])|y] + E[h T (y)h(y)|y] + E[(x − E[x|y]|y]h(y) 

= E[(x − E[x|y]) T R(x − E[x|y])|y] + E[h T (y)h(y)|y] 

≥ E[(x − E[x|y]) T R(x − E[x|y])|y] (6.3) 

We note that the above also admits a Hilbert space formulation. Let H denote the space random variables on 

which an inner product 〈x, y〉 = E[x T Ry] is defined. Let M H be a subspace of H, the closed space of random 

⋄


variables which are measurable on σ(y) which have finite second moments. Then, the Projection theorem leads 

to the observation that, an optimal estimate g(y) minimizing ||x − g(y)|| 2 2 , denoted here by G(y), is one which 

satisfies: 

〈x − G(y), h(y)〉 = E[x − G(y) T Rh(y)] = 0, ∀h ∈ M H 

The conditional expectation satisfies this since: 

E[(x − E[x|y]) T Rh(y)] = E[E[(x − E[x|y]) T Rh(y)|y]] = E[E[(x − E[x|y]) T |y]Rh(y)] = 0, 

since Pa.s., E[(x − E[x|y]) T |y] = 0. 

For a linear Gaussian system, the state process has a Gaussian probability measure. A Gaussian probability 

measure can be uniquely identified by knowing the mean and the covariance of the Gaussian random variable. 

This makes the analysis for estimating a Gaussian random variable particularly simple to perform, since the 

conditional estimate of a partially observed (through an additive Gaussian noise) Gaussian random variable is a 

linear function of the observed variable. 

Recall that a Gaussian measure with mean µ and covariance matrix K XX has the following density: P(x) = 

1 

K −1 

(2π) n 2 |K XX | 1/2e−1/2((x−µ)T XX (x−µ)) . 

Lemma 6.3.2 Let x, y be zero-mean[ Gaussian processes. ] E[x|y] is linear in y (if not zero mean, than the 

ΣXX Σ 

expectation is affine). Let K XY := 

XY 

, whereΣ 

Σ Y X Σ XX = E[xx T ]. Then, E[x|y] = Σ XY Σ −1 

Y Y y and 

Y Y 

E[(x − E[x|y])(x − E[x|y]) T ] = Σ XX − Σ XY Σ −1 

Y Y ΣT XY =: D. 

Proof: By Baye’s rule and the fact that the processes admit densities: P(x|y) = P(x, y)/P(y). It follows that 

is also symmetric (since the eigenvectors are the same and the eigenvalues are inverted) and given by: 

Σ −1 

XY 

[ ] 

XY = ΨXX Ψ XY 

, 

Ψ Y X Ψ Y Y 

K −1 

Thus, 

p(x, y) 

p(y) 

Ψ XXx+2x T Ψ XY y+y T Ψ Y Y y) 

= Ce1/2(xT e 1/2(yT K −1 

Y Y )y 

By completing the squares method, one obtains a quadratic exponent of the form 

p(x|y) = Q(y)e 1/2(x−Hy)T D −1 (x−Hy) , 

where Q(y) is a quadratic exponent function which only depends on y. Since ∫ p(x|y)dx = 1, it follows that 

1 

Q(y) = and is in fact independent of y. 

⋄ 

(2π) n 2 |D| 1/2 

Remark 6.3.1 Even if the random variables are not Gaussian (but zero-mean), through another Hilbert space 

formulation and an application of the Projection Theorem, the expression Σ XY Σ −1 

Y Y 

y is the best linear estimate. 

One can naturally generalize this for random variables with non-zero mean. 

We will derive the Kalman filter in the following. The following two Lemma are instrumental. 

Lemma 6.3.3 If E[x] = 0 and z 1 , z 2 independent Gaussian processes, then E[x|z 1 , z 2 ] = E[x|z 1 ] + E[x|z 2 ]. 

Proof: The proof follows by writing z = [z 1 , z 2 ] and E[X|z] = Σ XZ Σ −1 

ZZ z. 

⋄ 

Lemma 6.3.4 E[(x − E[x|y])(x − E[x|y]) T ] does not depend on y and is given by D above.

6.3. ESTIMATION AND KALMAN FILTERING 67 

Now, we can move on to the derivation of the Kalman Filter. 

Consider; 

x t+1 = Ax t + w t , y t = Cx t + v t , 

with E[w t wt T ] = W and E[v t vt T ] = V . 

Define: 

m t = E[x t |y [0,t−1] ] 

Σ t|t−1 = E[(x t − E[x t |y [0,t−1] ])(x t − E[x t |y [0,t−1] ]) T |y [0,t−1] ] 

Theorem 6.3.1 The following holds: 

m t+1 = Am t + AΣ t|t−1 C T (CΣ t|t−1 C T + V ) −1 (y t − Cm t ) 

Σ t+1|t = AΣ t|t−1 A T + W − (AΣ t|t−1 C T )(CΣ t|t−1 C T + V ) −1 (CΣ t|t−1 A T ) 

with 


m 0 = E[x 0 ] 

Σ 0|−1 = E[x 0 x T 0 ] 


x t+1 = Ax t + w t 

m t+1 = E[Ax t + w t |y [0,t] ] = E[Ax t + w t |y [0,t−1] , y t − E[y [0,t−1] ]] 

= Am t + E[A(x t − m t )|y [0,t−1] , y t − E[y t |y [0,t−1] ]] + E[w t |y [0,t−1] , y t − E[y t |y [0,t−1] ]] 

= Am t + E[A(x t − m t )|y [0,t−1] ] + E[A(x t − m t )|y t − E[y t |y [0,t−1] ]] + E[w t |y t − E[y t |y [0,t−1] ]] 

= Am t + E[A(x t − m t )|y t − E[y [0,t−1] ]] + E[w t |y t − E[y t |y [0,t−1] ]] 

= Am t + E[A(x t − m t ) + w t |y t − E[y t |y [0,t−1] ]] 

= Am t + E[A(x t − m t ) + w t |y t − E[y t |y [0,t−1] ]] (6.4) 

We now apply the Lemma above: Let x = A(x t − m t ) + w t and y = y t − Cm t , and E[x|y] = Σ xx Σ −1 

yy y leads to: 

Likewise, the covariance matrix: 

m t+1 = Am t + AΣ t|t−1 C T (CΣ t|t−1 C T + V ) −1 (y t − Cm t ) 

x t+1 − m t+1 = A(x t − m t ) + w t − AΣ t|t−1 C T (CΣ t|t−1 C T + V ) −1 (y t − Cm t ), 

leads to, after a few lines of calculations: 

Σ t+1|t = AΣ t|t−1 A T + W − (AΣ t|t−1 C T )(CΣ t|t−1 C T + V ) −1 (CΣ t|t−1 A T ) 

The above is the celebrated Kalman filter. 

Define now 

It follows from m t = A ˜m t−1 that 

˜m t = E[x t |y [0,t] ] = m t + E[x t − m t |y [0,t] ] = m t + E[x t − m t |y t − E[y t |y [0,t] ]]. 

˜m t = A ˜m t−1 + Σ t|t−1 C T (CΣ t|t−1 C T + V ) −1 (y t − CA ˜m t−1 ) 

We observe that the zero-mean variable x t − ˜m t is orthogonal to I t , in the sense that the error is independent 

of the information available at the controller, and since the information available is Gaussian, independence and 

orthogonality are equivalent. 

We observe that the recursions in Theorem 6.3.1 remind us the recursions in Theorem 5.5.2. This leads to the 

following result. 

⋄


Theorem 6.3.2 Suppose that W = BB T and that (B T , A) is observable, (A T , C T ) is controllable and V > 0. 

Then, the recursions for the covariance matrices Σ (·) in 6.3.1 converge to a unique solution which is positive 

definite. 

6.3.1 Optimal Control of Partially Observed LQG Systems 

With this observation, in class we reformulated the quadratic optimization problem as a function of ˜m t , u t 

and x t − ˜m t , the optimal control problem is equivalent to the control fully observed state ˜m t , with additive 

time-varying independent Gaussian noise process. 

In particular, the cost: 

writes as: 

T∑ 

−1 

Eµ Π 0 

[ 

t=0 

for the fully observed system: 

with 

T∑ 

−1 

J(Π, µ 0 ) = Eµ Π 0 

[ x T t Qx t + u T t Ru t + x T TQ T x T ], 

t=0 

T∑ 

˜m T t Q ˜m t + u T t Ru t + ˜m T TQ T ˜m T ] + Eµ Π 0 

[ (x t − ˜m t ) T Q(x t − ˜m t )] 

˜m t = A ˜m t−1 + ¯w t , 

t=0 

¯w t = Σ t|t−1 C T (CΣ t|t−1 C T + V ) −1 (y t − CA ˜m t−1 ) 

That we can take out the error term (x t − ˜m t ) out of the summation is a consequence of what is known as the 

separation of estimation and control, and a more special version is known as the certainty equivalence principle. 

In class, we observed that, the absence of dual effect plays a key part in this analysis leading the separation of 

estimation and control principle, in taking E[(x t − ˜m) T Q(x t − ˜m)] out of the conditioning on I t . 

Thus, the optimal control has the following form for all time stages: 

u t = −(BK t+1 B + R) −1 B T K t+1 AE[x t |I t ], 

and the optimal cost will be 

T 

˜m T 0 P ∑−1 

0 ˜m 0 + E[ 

k=0 

¯w T k P k+1 ¯w k ] 

6.4 Partially Observed Markov Decision Processes in General Spaces 

Topological Properties of Space of Probability Measures 

In class we discussed three convergence notions; weak convergence, setwise convergence and convergence under 

total variation: 

Let P(R N ) denote the family of all probability measure on (X, B(R N )) for some N ∈ N. Let {µ n , n ∈ N} be a 

sequence in P(R N ). 

A sequence {µ n } is said to converge to µ ∈ P(R N ) weakly if 

∫ 

∫ 

c(x)µ n (dx) → c(x)µ(dx) 

R N R N 

for every continuous and bounded c : R N → R. On the other hand, {µ n } is said to converge to µ ∈ P(R N ) 

setwise if 

∫ 

c(x)µ n (dx) → 

R N ∫ 

c(x)µ(dx) 

R N

6.4. PARTIALLY OBSERVED MARKOV DECISION PROCESSES IN GENERAL SPACES 69 

for every measurable and bounded c : R N → R. Setwise convergence can also be defined through pointwise 

convergence on Borel subsets of R N (see, e.g., [18]), that is 

µ n (A) → µ(A), for all A ∈ B(R N ) 

since the space of simple functions are dense in the space of bounded and measurable functions under the 

supremum norm. 

For two probability measures µ, ν ∈ P(R N ), the total variation metric is given by 

‖µ − ν‖ TV := 2 sup 

B∈B(R N ) 

= sup 

f: ‖f‖ ∞≤1 

|µ(B) − ν(B)| 

∫ 

∣ 

∫ 

f(x)µ(dx) − 

f(x)ν(dx) 

∣ , (6.5) 

where the infimum is over all measurable real f such that ‖f‖ ∞ = sup x∈R N |f(x)| ≤ 1. A sequence {µ n } is said 

to converge to µ ∈ P(R N ) in total variation if ‖µ n − µ‖ TV → 0. 

Setwise convergence is equivalent to pointwise convergence on Borel sets whereas total variation requires uniform 

convergence on Borel sets. Thus these three convergence notions are in increasing order of strength: convergence 

in total variation implies setwise convergence, which in turn implies weak convergence. 

Weak convergence may not be strong enough for certain measurable selection conditions. One could use other 

metrics, such as total variation, or other topologies, such as the one generated by setwise convergence. 

Total variation is a stringent notion for convergence. For example a sequence of discrete probability measures 

never converges in total variation to a probability measure which admits a density function with respect to the 

Lebesgue measure and such a space is not separable. Setwise convergence also induces a topology on the space 

of probability measures and channels which is not easy to work with since the space under this convergence is 

not metrizable ([15, p. 59]). 

However, the space of probability measures on a complete, separable, metric (Polish) space endowed with the 

topology of weak convergence is itself a complete, separable, metric space [3]. The Prohorov metric, for example, 

can be used to metrize this space. 

In the following, we will verify the sufficiency of weak convergence. 

The discussion in the previous chapter for countable spaces applies also to Polish spaces. In particular, with 

π denoting a conditional probability measure π t (A) = P(x t ∈ A|I t ), we can define a new cost as ˜c(π, u) = 

c(x, u)π(x), π ∈ P. 

∑ 

X 

Let Γ(S) be the set of all Borel measurable and bounded functions from S to R. We first wish to find a topology 

on P(S), under which functions of the form: 

∫ 

Θ := { π(dx)f(x), f ∈ M(X)} 

are measurable on P(S). 

x∈S 

By the above definitions, it is evident that both setwise convergence and total variation are sufficient for measurability 

of cost functions of the form ∫ π(dx)f(x), since these are (sequentially) continuous on P(S) for every 

f ∈ Γ(S). However, as we state in the following, the cost functions defined in (6.1) ˜c : P(X) × U → R is a measurable 

cost function under weak convergence topology, as we discuss in the following (for a proof, see Bogachev 

(p. 215 [7])). 

Theorem 6.4.1 Let S be a metric space and let M be the set of all measurable and bounded functions f : S → R 

under which 

∫ 

π(dx)f(x) 

defines a measurable function on P(S) under the weak convergence topology. Then, M is the same as Γ(S) of all 

bounded and measurable functions. 

This discussion is useful to establish that under weak convergence topology π t , u t forms a controlled Markov 

chain for the dynamic programming purposes.


6.5 Exercises 

Exercise 6.5.1 Consider a linear system with the following dynamics: 

x t+1 = ax t + u t + w t , 

and let the controller have access to the observations given by: 

y t = x t + v t . 

Here {w t , v t , t ∈ Z} are independent, zero-mean, Gaussian random variables, with variances E[w 2 ] and E[v 2 ]. 

The controller at time t ∈ Z has access to I t = {y s , u s , s ≤ t − 1} ∪ {y t }. 

The initial state has a Gaussian distribution, with zero mean and variance E[x 2 0 ], which we denote by ν 0. We 

wish to find for some r > 0: 

3∑ 

inf J(x 0, Π) = Eν Π Π 0 

[ x 2 t + ru 2 t], 

Compute the optimal control policy and the optimal cost. It suffices to provide a recursive form. 

Hint: Show that the optimal control has a separation structure. Compute the conditional estimate through the 

Kalman Filter. 

t=0 

Exercise 6.5.2 Consider a Markov Decision Problem set up as follows. Let there be two possible links S 1 , S 2 

a decision maker can take to send information packets to a destination. These links each are Markov and 

for each of the links we have that S i ∈ {0, 1}, i = 1, 2. Here, 0 denotes a good link and 1 denotes a bad 

link. If a decision maker picks one link, he can find the state of the link. Suppose the goal is to maximize 

E[ ∑ T −1 

k=0 1 U t=11 S 1 

t =1 + 1 Ut=21 S 2 

t 

=1]. Each of the links evolve according to a Markov model, independent of the 

other link. 

The only information the encoder has is the initial probability measure on the links and the actions it has 

taken. Obtain a dynamic programming recursion, by clearly identifying the controlled Markov chain and the cost 

function. Is the value function convex? Can you deduce any structural properties on the optimal policies?

Chapter 7 

The Average Cost Problem 

Consider the following average cost problem: 

J(x, Π) = limsup 

T →∞ 

T 

1 ∑−1 

T EΠ x 

t=0 

c(x t , u t ) 

This is an important problem in applications where one is concerned about the long-term behavior, unlike the 

discounted cost setup where the primary interest is in finite horizon problems. 

7.1 Average Cost and the Average Cost Optimality Equation (ACOE) 

To study this problem, one approach is to establish the existence of an Average Cost Optimality Equation. 

We will discuss further conditions on how to achieve ACOE. Later, we will discuss alternative approaches for 

obtaining optimal solutions. 

We now discuss the Average Cost Optimality Equation (ACOE) for infinite horizon problems. 

Considering a sequence of normalized finite horizon optimization problems, a dynamic programming recursion 

of the form: 

J T (x 0 ) = inf 

u 0 

(c(x 0 , u 0 )) + (T − 1)E[J T −1 (x 1 )|x 0 , u 0 ]) 

with J 0 (·) = 0, leads to the following result. If one takes T to infinity, the following discussion seems natural. 

Theorem 7.1.1 a) [Ross] If there exists a bounded function f and a constant g such that 

∫ 

g + f(x) = min (c(x, u) + f(y)P(dy|x, u)), x ∈ X, 

u∈U 

then, there exists a stationary deterministic policy Π ∗ so that 

g = J(x, Π ∗ ) = min 

Π∈Π A J(x, Π) 

where 

J(x, Π) = limsup 

T →∞ 

T 

1 ∑−1 

T EΠ x 

k=0 

c(x t , u t ). 

b) If g, considered above, is not a constant and depends on x, and we have 

∫ 

g(x) + f(x) = min (c(x, u) + f(y)P(dy|x, u)), x ∈ X, 

u∈U 

71

72 CHAPTER 7. THE AVERAGE COST PROBLEM 

with 

∫ 

g(x) = min 

u 

g(y)P(dy|x, u) 

and the pointwise minimizations are achieved by a measurable function Π ∗ , then 

1 

T EΠ∗ 

∑ 

T −1 

x [ 

t=0 

g(x t )] ≤ inf 

Π limsup 

N→∞ 

T 

1 ∑−1 

N EΠ x [ 

t=0 

c(x t , u t )] 

Proof: We prove a. For b, which follows from a similar construction, see the text Hernandez-Lerma and Lasserre. 

For any policy Π, 

n∑ 

[ 

] 

f(x t ) − E Π [f(x t )|x [0,t−1] , u [0,t−1] ] = 0 

Now, 

E Π x 

t=1 

∫ 

E Π [f(x t )|x [0,t−1] , u [0,t−1] ] = f(y)P(x t ∈ dy|x t−1 , u t−1 ) (7.1) 

y 

∫ 

= c(x t−1 , u t−1 ) + f(y)P(dy|x t−1 , u t−1 ) − c(x t−1 , u t−1 ) (7.2) 

≥ 

( 

min c(x t−1 , u t−1 ) + 

u t−1∈U 

y 

∫ 

y 

) 

f(y)P(dy|x t−1 , u t−1 ) − c(x t−1 , u t−1 ) (7.3) 

= g + f(x t−1 ) − c(x t−1 , u t−1 ) (7.4) 

The above hold with equality if Π ∗ = {f} is adopted since Π ∗ provides the pointwise minimum. Hence, 


with equality under Π ∗ . 

0 ≤ 1 n∑ 

n EΠ x [f(x t ) − g − f(x t−1 ) + c(x t−1 , u t−1 )] 

t=1 

n∑ 

g ≤ E Π [f(x n )]/n − Ex Π [f(x 0)]/n + Ex 

Π [c(x t−1 , u t−1 )] 

t=1 

⊓⊔ 

We note that, the boundedness condition can be relaxed, provided that E Π x [f(x n)]/n → 0 under every policy 

and any x ∈ X. 

7.2 The Discounted Cost Approach to the Average Cost Problem 

Average cost emphasizes the asymptotic values of the cost function whereas the discounted cost emphasizes the 

short-term cost functions. However, under technical restrictions, one can show that the limit as the discounted 

factor converges to 1, one can obtain a solution for the average cost optimization. We now state one such 

condition below. 

Theorem 7.2.1 Consider a finite state controlled Markov chain where the state and action spaces are finite, 

and for every deterministic policy the entire state space is a recurrent set. Let 

J β (x) = inf Ex Π [ ∑ 

∞ β t c(x t , u t )] 

Π∈Π A 

and suppose that Π ∗ n is an optimal deterministic policy for J β n 

(x). Then, there exists some Π ∗ which is optimal 

for every β sufficiently close to 1, and is also optimal for the average cost 

t=0 

T −1 

1 

J(x) = inf limsup 

Π∈Π A T →∞ T EΠ x [ ∑ 

c(x t , u t )] 

t=0

7.3. LINEAR PROGRAMMING APPROACH TO AVERAGE COST MARKOV DECISION PROBLEMS 73 

7.2.1 Borel State and Action Spaces [Optional] 

In the following, we assume that c is bounded, measurable selection hypothesis hold and X is a compact subset 

of a Polish space and U is a compact subset of a Polish space. 

Consider the value function for a discounted cost problem as discussed in Section 5.4.1: 

Let x 0 be an arbitrary state and for all x ∈ X consider 

J β (x) = min {c(x, u) + β J β (y)Q(dy|x, u)}, ∀x. 

U 

∫X 

J β (x) − J β (x 0 ) 

= min 

u∈U 

( 

c(x, u) + β 

∫ 

) 

P(dx ′ |x t , u t )(J β (x ′ ) − J β (x 0 )) − (1 − β)J β (x 0 ) 

(7.5) 

As argued in Section 5.4.1, this has a solution for every β ∈ (0, 1). 

Now suppose that J β (x) − J β (x 0 ) is uniformly (over β) equi-continuous and X is compact. By the Arzela-Ascoli 

Theorem, taking β ↑ 1 along some sequence, for some subsequence, J β (x) − J β (x 0 ) → η(x) and by a Tauberian 

theorem [1] (1 −β nk )J β (x 0 ) → ζ ∗ will converge possibly along a further subsequence. This limit will then satisfy 

the Average Cost Optimality Equation (ACOE) 

( ∫ 

) 

η(x) = min c(x, u) + P(dx ′ |x t , u t )η(x ′ ) − ζ ∗ (7.6) 

u∈U 

Since under the hypothesis of the setting we have that lim T →∞ sup Π 

1 

T E x[c(x T , u T )] = 0 for every admissible x, 

ACOE implies that ζ ∗ is the optimal cost and η is the value function (see Theorem 7.1.1 and Theorem 5.2.4 in 

[17]). 

Remark 7.2.1 If ∑ ∞ 

n=0 ||P n (x, B) − P n (x ′ , B)|| TV ≤ K(x, x ′ ) < ∞ under every optimal policy for 0 ≤ β ∗ ≤ 

β < 1 for some β ∗ , and K is uniformly equi-continuous, then the condition for ACOE is satisfied. It is worth 

noting that if under an optimal policy, there exists an atom α (or a pseudo-atom in view of the splitting technique 

discussed in Chapter 3) so that in some finite expected time, for any two initial conditions x, x ′ , the processes 

are coupled in the sense described in Chapter 3, then the condition holds since after hitting α the processes will 

be coupled and equal, leading to a difference in the costs only until hitting the atom. If this difference is further 

continuous in x, x ′ so that the Arzela-Ascoli Theorem is satisfied, the optimality condition holds. For example, 

the univariate drift condition (4.12) is sufficient for the ergodicity condition (By Theorem 15.0.1 of [23] geometric 

ergodicity is equivalent to the satisfaction of this drift condition). 

We also note that for the above arguments to hold, there does not need to be a single invariant distribution. 

Here in (7.6), the pair x and x 0 should be picked as a function of the reachable set under a given sequence of 

policies. The analysis for such a condition is tedious in general since for every β a different optimal policy will 

typically be adopted; however, for certain applications the reachable set from a given point may be independent 

of the control policy applied. 

7.3 Linear Programming Approach to Average Cost Markov Decision 

Problems 

Convex analytic approach of Manne [21] and Borkar [10] (later generalized by Hernandez-Lerma and Lasserre 

[17]) is a powerful approach to the optimization of infinite-horizon problems. It is particularly effective in 

proving results on the optimality of stationary policies, which can lead to an infinite-dimensional linear program. 

This approach is particularly effective for constrained optimization problems and infinite horizon average cost 

optimization problems. It avoids the use of dynamic programming.


In this chapter, we will consider average cost Markov Decision Problems. In particular, we are interested in the 

minimization 

1 ∑ 

T 

inf limsup 

Π∈Π A T →∞ T EΠ x 0 

[c(x t , u t )], (7.7) 


[] denotes the expectation over all sample paths with initial state given by x 0 under the admissible 

policy Π. 

In class, we considered the finite space setting where both X and U were finite sets. We will restrict the analysis 

to such a setup, and briefly discuss later the generalization to Polish spaces. 

We study the limit distribution of the following occupation measures. 

t=1 

v T (D) = 1 T 

T∑ 

1 {xt,u t)∈D}, D ∈ B(X × U). 

t=1 

Define a F t measurable process with A ∈ B(X): 

( t∑ 

F t (A) = 1 xs∈A − t ∑ ) 

u)v t (x, u) 

s=1 

X×UP(A|x, 

We have that 

and {F t (A)} is a Martingale sequence. 

E[F t (A)|F t−1 ] = F t−1 (A) ∀t ≥ 0, 

Furthermore, F t (A) is a bounded-increment Martingale since |F t (A) − F t−1 (A)| ≤ 1. Hence, for every T > 2, 

{F 1 (A), . . . , F T (A)} forms a Martingale sequence with uniformly bounded increments, and we could invoke the 

Azuma-Hoeffding inequality to show that for all x > 0 

P(| F t(A) 

| ≥ x) ≤ 2e −2x2 t 

t 

Finally, invoking the Borel-Cantelli Lemma [9] for the summability of the estimate above, that is: 

∞∑ 

2e −2x2t < ∞, ∀x > 0, 

n=1 

we deduce that 

Thus, 

F t (A) 

lim = 0 a.s. 

t→∞ t 

( 

lim v T (A) − ∑ ) 

P(A|x, u)v T (x, u) = 0, A ⊂ X 

T →∞ 

X×U 

Define 

G X = {v ∈ P(X × U) : v(B × U) = ∑ x,u 

P(x t+1 ∈ B|x, u)v(x, u), B ∈ B(X)} 

Further define 

G = {v ∈ P(X × U) : ∃Π ∈ Π S , v(A) = ∑ x,u 

P Π ((x t+1 , u t+1 ) ∈ A|x)v(x, u), A ∈ B(X × U)} 

It is evident that G ⊂ G X since there are fewer restrictions for G X . We can show that these two sets are indeed 

equal. In particular, for v ∈ G X , if we write: v(x, u) = π(x)P(u|x), then, we can construct a consistent v ∈ G: 

v(B × C) = ∑ x∈B P(C|x).

7.3. LINEAR PROGRAMMING APPROACH TO AVERAGE COST MARKOV DECISION PROBLEMS 75 

Thus, every converging sequence v t will satisfy the above equation (note that, we do not claim that every sequence 

converges). And hence, any converging sequence of v T (.) will asymptotically live in the set G. 

This is where finiteness is helpful: If the state space were countable, we could also have to consider the sequences 

which possible converge to infinity. 

Thus, under any sequence of admissible policies, every converging occupation measure sequence converges to the 

set G. 

Lemma 7.3.1 Under any admissible policy, with probability 1, any converging sequence {v t } will converge to 

the set G. 

Let us define 

∑ 

γ ∗ = inf v(x, u)c(c, u) 

v∈G 

The above optimality is also in a stronger sense under Positive recurrence conditions. Consider the following: 

inf limsup 1 

Π T →∞ T 

where there is no expectation. The above is known as the sample path cost. 

Now, we have that 

T∑ 

[c(x t , u t )], (7.8) 

t=1 

liminf 

T →∞ 〈v T,c〉 ≥ γ ∗ 

since for any sequence v Tk which converges to the liminf value, there exists a further subsequence (due to the 

weak compactness of the space of occupation measures) which has a weak limit, and this weak limit is in G. By 

Fatou’s Lemma: 

lim 〈v T k 

, c〉 = 〈 lim v T k 

, c〉 ≥ γ ∗ . 

T k →∞ T k →∞ 

Here, 〈v, c〉 := ∑ v(x, u)c(x, u). 

Likewise, for the average cost problem 

liminf 

T →∞ E[〈v T,c〉] ≥ E[liminf 

T →∞ 〈v T,c〉] ≥ γ ∗ . 

Lemma 7.3.2 [Proposition 9.2.5 in [22]] The space G is convex and closed. Let G e denote the set of extreme 

points of G. A measure is in G e if and only if two conditions are satisfied: i) The control policy inducing it is 

deterministic ii) Under this deterministic policy, the induced Markov chain is irreducible in the support set of 

the invariant measure (there can’t be two invariant measures under this deterministic policy). 


Consider η, an invariant measure in G which is non-extremal. Then, this measure can be written as a convex 

combination of two measures: 

η(x, u) = Θη 1 (x, u) + (1 − Θ)η 2 (x, u) 

π(x) = Θπ 1 (x) + (1 − Θ)π 2 (x) 

Let φ 1 (u|x), φ 2 (u|x) be two policies leading to ergodic measures η 1 and η 2 and invariant measures π 1 and π 2 

respectively. Then, there exists γ(·) such that: 

where under φ the invariant measure η is attained. 

φ(u|x) = γ(x)φ 1 (u|x) + (1 − γ(x))φ 2 (u|x), 

There are two ways this can be satisfied: i) φ is randomized and ii) If φ is not randomized with φ 1 = φ 2 for all 

x and are deterministic, then, it must be that π 1 and π 2 are different measures and have different support sets. 

In this case, the measures η 1 and η 2 do not correspond to ergodic occupation measures (in the sense of positive 

Harris recurrence).


We can show that for the countable state space case a converse also holds, that is, if a policy is randomized 

it cannot lead to an extreme point unless it has the same invariant measure under some deterministic policy. 

That is, extreme points are achieved under deterministic policies. Let there be a policy P which is randomizing 

between two deterministic policies P 1 and P 2 at a state α such that with probability θ, P 1 is chosen; and with 

probability 1 − θ, P 2 is chosen. Let v, v 1 , v 2 be corresponding invariant measures to P, P 1 , P 2 . Here, α is an 

accessible atom if E P i 

α [τ α ] < ∞ and P P i (x, B) = P P i (y, B) for all x, y ∈ α, where τ α = min(k > 0 : π k = α) is 

the return time to α. In this case, the expected occupation measure can be shown to be equal to: 

which is a convex combination of v 1 and v 2 . 

v(x, u) = EP1 α [τ α]θv 1 (x, u) + Eα P2[τ 

α](1 − θ)v 2 (x, u) 

Eα P1[τ 

α]θ + Eα P2[τ 

, (7.9) 

α](1 − θ) 

Hence, a randomized policy cannot lead to an extreme point in the space of invariant occupation measures. We 

note that, even if one randomizes at a single state, under irreducibility assumptions, then the above argument 

applies. 

The above does not easily extend to uncountable settings. In Section 3.2 of [10], there is a discussion for a 

specialized setting. 

We can deduce that, for such a finite state space, an optimal policy is deterministic: 

⊓⊔ 

Theorem 7.3.1 Let X, U be finite spaces. Furthermore, let, for every stationary policy, there be a unique 

invariant distribution on a single communication class. In this case, a sample path optimal policy and an 

average cost policy exists. These policies are deterministic and stationary. 


G is a compact set. There exists an optimal v. Furthermore, by some further study, one realizes that the set G 

is closed, convex; and its extreme points are obtained by deterministic policies. 

⊓⊔ 

The set of extreme points will contain optimal solutions for this case. However, one needs to be careful in 

characterizing the set of extreme points of the set of invariant occupation measures. 

Let µ ∗ be the optimal occupation measure (this exists, as discussed above, since the state space is finite, and 

thus G is compact, and ∑ X×U 

µ(x, u)c(x, u) is continuous in µ(., .)). This induces an optimal policy π(u|x) as: 

π(u|x) = µ∗ (x, u) 

∑U µ∗ (x, u) . 

Thus, we can find the optimal policy conveniently, without using dynamic programming. 

In class, a simple example was provided. 

7.4 Discussion for More General State Spaces 

We now discuss a more general setting where these spaces are Polish spaces. The main difference is that, while 

in class we only considered the occupation measures: 

v T (A) = 1 T 

T∑ 

−1 

1 ((xt,u t)∈A), ∀A ∈ σ(X × U), 

t=0 

and the convergence of v T to some set; for an uncountable setting, we use test functions to lead to the same 

result. Let φ : X → R be a continuous and bounded function. Define: 

v T (φ) = 1 T 

T∑ 

φ(x, u). 

t=1

7.4. DISCUSSION FOR MORE GENERAL STATE SPACES 77 

Define a F t measurable process, with π an admissible control policy (not necessarily stationary or Markov): 

( t∑ ) (∫ ∫ 

F t (φ) = φ(x t ) − t 

s=1 

P×U 

) 

φ(x ′ t)P π (dx ′ t, du ′ t)|x)v t (dx) 

(7.10) 

We define G X to be the following set in this case. 

∫ 

G X = {η(D) = P(D|z)η(dz), 

X×U 

∀D ∈ B(X)}. 

The above holds under a class of technical conditions: 

1. (A) The state process lives in a compact space X. The control space U is also compact 

2. (A’) The cost function satisfies the following condition: lim Kn↑X inf u,x/∈Kn c(x, u) = ∞, (here, we assume 

that the space X is locally compact, or σ−compact). 

3. (B) There exists a policy leading to a finite cost 

4. (C) The cost function is continuous in x and u 

5. (D) The transition kernel is weakly continuous in the sense that ∫ Q(dz|x, u)v(z) is continuous in both x 

and u, for every continuous and bounded function v. 

Theorem 7.4.1 Under the above Assumptions A, B, C, and D (or A’, B, C, and D) there exists a solution to the 

optimal control problem given in (7.8) for almost every initial condition under the optimal invariant probability 

measure. 

The key step is the observation that every expected occupation measure sequence has a weakly 

converging subsequence. If this exists, then the limit of such a converging sequence will be in G and the 

analysis presented for the finite state space case will be applicable. 

Let ν(A) = E[v T (A)]. For C, without the boundedness assumption, the argument follows from the observation 

that, for ν n → ν weakly, for every lower semi-continuous functions (see [17]) 

∫ 

∫ 

liminf ν n (dx, du)c(x, u) ≥ ν(dx, du)c(x, u) 

n→∞ 

However, by sequential compactness, every sequence has a weakly converging subsequence and the result follows. 

The issue here is that a sequence of measures on the space of invariant measures ν n may have a limiting 

probability measure, but this limit may not correspond to an invariant measure under a stationary policy. The 

weak continunity of the kernel, and the separability of the space of continuous functions on a compact set, allow 

for this generalization. 

This ensures that every sequence has a converging subsequence weakly. In particular, there exists an optimal 

occupation measure. 

Lemma 7.4.1 There exists an optimal occupation measure in G under the Assumptions above. 

Proof: The problem has now reduced to 

s.t. 

∫ 

min 

µ 

µ(dx, du)c(x, u), 

µ ∈ G.


The set of invariant measures is weakly sequentially compact and the integral is lower semi-continuous on the 

set of measures. 

⊓⊔ 

Following the Lemma above, there exists optimal occupation measures, say v(dπ, df). This defines the optimal 

stationary control policy by the decomposition: 

µ(df|π) = 

v(dx, df) 

v(dx, df), 

∫u 

for all x such that ∫ v(dx, df) > 0. 

u 

There is a final consideration of reachability; that is, whether from any initial state, or an initial occupation set, 

the region where the optimal policy is defined is attracted (see [1]). 

If the Markov chain is stable and irreducible, then the optimal cost is independent of where the chain starts 

from, since in finite time, the states on which the optimal occupation measure has support, can be reached. 

If this Markov Chain is not irreducible, then, the optimal stationary policy is only optimal when the state process 

starts at the locations where in finite time the optimal stationary policy is applicable. This is particularly useful 

for problems where one has control over in which state to start the system. 

7.4.1 Extreme Points and the Optimality of Deterministic Policies 

Let µ be an invariant measure in G which is not extreme. This means that there exists κ ∈ (0, 1) and invariant 

empirical occupation measures µ 1 , µ 2 ∈ G such that: 

µ(dx, du) = κµ 1 (dx, du) + (1 − κ)µ 2 (dx, du) 

Let µ(dx, du) = P(du|x)µ(dx) and for i = 1, 2, µ i (dx, du) = P i (du|x)µ i (dx). Then, 

µ(dx) = κµ 1 (dx) + (1 − κ)µ 2 (dx) 

Note that µ(dx) = 0 =⇒ µ i (dx) = 0 for i = 1, 2. As a consequence, the Radon-Nikodym derivative of µ i with 

respect to µ exists. Let dµi 

dµ (x) = fi (x). Then 

P(du|x) = P 1 (du|x) dµ1 

dµ (x)κ + P 2 (du|x) dµ2 (x)(1 − κ) 

dµ 

is well-defined. Then, 

P(du|x) = P 1 (du|x)f 1 (x)κ + P 2 (du|x)(1 − κ)f 2 (x), 

such that f i (x)κ + (1 − κ)f 2 (x) = 1. As a result, we have that 

P(du|x) = P 1 (du|x)η(x) + P 2 (du|x)(1 − η(x)), 

with the randomization kernel η(x) = f 1 (x)κ. As in Meyn [22], there are two ways where such a representation 

is possible: Either P(du|x) is randomized, or P 1 (du|x) = P 2 (du|x) and deterministic but there are multiple 

invariant probability measures under P 1 . 

The converse direction can also be obtained for the countable state space case: If a measure in G is extreme, 

than it corresponds to a deterministic process. The result also holds for the uncountable state space case under 

further technical conditions. See section 3.2 of Borkar [10] for further discussions. 

Remark 7.4.1 If there is an atom (or a pseudo-atom constructed through Nummelin’s splitting technique) which 

is visited under every stationary policy, then as in the arguments leading to (7.9), one can deduce that on this 

atom there cannot be a randomized policy which leads to an extreme point on the set of invariant occupation 

measures.


7.4.2 Sample-Path Optimality 

To apply the convex analytic approach, we require that under any admissible policy, the set of sample path 

occupation measures would be tight, for almost every sample path realization. If this can be established, then 

the result goes through not only for the expected cost, but also the sample-path average cost. 

7.5 Exercises 

Exercise 7.5.1 Consider the convex analytic method we discussed in class. Let X, U be countable sets and 

consider the occupation measure: 

v T (A) = 1 T 

T∑ 

−1 

t=0 

1 (xt∈A), ∀A ∈ B(X). 

While proving that the limit of such a measure process lives in a specific set, we used the following result: Let F t 

be the σ−field generated by {x s , u s , s ≤ t}. Define a F t measurable process 

( t∑ 

F t (A) = 

s=1 

1 xs∈A − t ∑ ) 

P(A|x)v t (x, u) , 

X×U 

where Π is some arbitrary but admissible control policy. Show that, {F t (A), 

sequence for every T ∈ Z. 

t ∈ {1, 2, . . ., T }} is a Martingale 

Exercise 7.5.2 Let, for a Markov control problem, x t ∈ X, u t ∈ U, where X and U are finite sets denoting the 

state space and the action space, respectively. Consider the optimal control problem of the minimization of 

T 

1 ∑−1 

lim 

T →∞ T EΠ x [ c(x t , u t )], 

t=0 

where c is a bounded function. Further assume that under any stationary control policy, the state transition 

kernel P(x t+1 |x t , u t ) leads to an irreducible Markov Chain. 

Does there exist an optimal control policy? Can you propose a way to find the optimal policy? 

Is the optimal policy also sample-path optimal?

80 CHAPTER 7. THE AVERAGE COST PROBLEM

Chapter 8 

Team Decision Theory and Information 

Structures 

We will primarily follow Chapters 2, 3 and 4 of [32] for this topic. 

8.1 Exercises 

Exercise 8.1.1 Consider a common probability space on which the information available to two decision makers 

DM 1 and DM 2 are defined, such that I 1 is available at DM 1 and I 2 is available at DM 2 . 

R. J. Aumann [2] defines that an information E is common knowledge between two decision makers DM 1 and 

DM 2 , if whenever E happens, DM 1 knows E, DM 2 knows E, DM 1 knows that DM 2 knows E, DM 2 knows that 

DM 1 knows E, and so on. 

Suppose that one claims that an event E is common knowledge if and only if E ∈ σ(I 1 ) ∩ σ(I 2 ), where σ(I 1 ) 

denotes the σ−field generated by information I 1 and likewise for σ(I 2 ). 

Is this argument correct? Provide an answer with precise arguments. You may wish to consult [2], [27], [11] and 

Chapter 12 of [32]. 

Exercise 8.1.2 Consider the following static team decision problem with dynamics: 

x 1 = ax 0 + b 1 u 1 0 + b 2 u 2 0 + w 0 , 

y 1 0 = x 0 + v 1 0, 

y 2 0 = x 0 + v 2 0, 

Here v 1 0 , v2 0 , w 0 are independent, Gaussian, zero-mean with finite variances. 

Let γ i : R → R be policies of the controllers: u 1 0 = γ1 0 (y1 0 ), u2 0 = γ2 0 (y2 0 ). 

Find 

,γ 

min 

2 

γ 1 ,γ 2 Eγ1 ν 0 

[x 2 1 + ρ 1(u 1 0 )2 + ρ 2 (u 2 0 )2 ], 

where ν 0 is a zero-mean Gaussian distribution and ρ 1 , ρ 2 > 0. 

Find an optimal team policy γ = {γ 1 , γ 2 }. 

81

82 CHAPTER 8. TEAM DECISION THEORY AND INFORMATION STRUCTURES 

Exercise 8.1.3 Consider a linear Gaussian system with mutually independent and i.i.d. noises: 

x t+1 = Ax t + 

with the one-step delayed observation sharing pattern. 

L∑ 

B j u j t + w t , 

j=1 

y i t = C i x t + v i t , 1 ≤ i ≤ L, (8.1) 

Construct a controlled Markov chain for the team decision problem: First show that one could have 

as the state of the controlled Markov chain. 

Consider the following problem: 

{y 1 t , y 2 t , . . . , y L t , P(dx t |y 1 [0,t−1] , y2 [0,t−1] , . . .,yL [0,t−1] )} 

E γ T∑ 

−1 

ν 0 

[ ]c(x t , u 1 t, · · · , u L t )? 

t=0 

For this problem, if at time t ≥ 0 each of the decision makers (say DM i) has access to P(dx t |y[0,t−1] 1 , y2 [0,t−1] , . . .,yL [0,t−1] ) 

and their local observation y[0,t] i , show that they can obtain a solution where the optimal decision rules only uses 

{P(dx t |y[0,t−1] 1 , y2 [0,t−1] , . . . , yL [0,t−1] ), yi t }: 

What if, they do not have access to P(dx t |y 1 [0,t−1] , y2 [0,t−1] , . . . , yL [0,t−1] ), and only have access to yi [0,t] ? What 

would be a sufficient statistic for each decision maker for each time stage? 

Exercise 8.1.4 Two decision makers, Alice and Bob, wish to control a system: 

x t+1 = ax t + u a t + ub t + w t, 

y a t = x t + v a t , 

y b t = x t + v b t , 

where u a t , ya t are the control actions and the observations of Alice, u b t , yb t are those for Bob and va t , vb t , w t are 

independent zero-mean Gaussian random variables with finite variance. Suppose the goal is to minimize for 

some T ∈ Z + : 

[ T∑ −1 

E Πa ,Π b 

x 0 

x 2 t + r a(u a t )2 + r b (u b t 

], )2 

t=0 

for r a , r b > 0, where Π a , Π b denote the policies adopted by Alice and Bob. Let the local information available to 

Alice be It a = {ya s , ua s , s ≤ t − 1} ∪ {ya t }, and Ib t = {yb s , ub s , s ≤ t − 1} ∪ {yb t } is the information available at Bob 

at time t. 

Consider an n−step delayed information pattern: In an n−step delayed information sharing pattern, the information 

at Alice at time t is 

I a t ∪ I b t−n, 

and the information available at Bob is 

State if the following are true or false: 

I b t ∪ I a t−n. 

a) If Alice and Bob share all the information they have (with n = 0), it must be that, the optimal controls are 

linear. 

b) Typically, for such problems, for example, Bob can try to send information to Alice to improve her estimation 

on the state, through his actions. When is it the case that Alice cannot benefit from the information from Bob, 

that is for what values of n, there is no need for Bob to signal information this way? 

c) If Alice and Bob share all information they have with a delay of 2, then their optimal control policies can be 

written as 

u a t = f a (E[x t |I a t−2, I b t−2], y a t−1, y a t ),


u b t = f b (E[x t |I a t−2, I b t−2], y b t−1, y b t), 

for some functions f a , f b . Here, E[.|.] denotes the expectation. 

d) If Alice and Bob share all information they have with a delay of 0, then their optimal control policies can be 

written as 

u a t = f a (E[x t |I a t , I b t ]), 

u b t = f b(E[x t |I a t , Ib t ]), 

for some functions f a , f b . Here, E[.|.] denotes the expectation. 

Exercise 8.1.5 In Radner’s theorem, explain why continuous differentiability of the cost function (in addition 

to its convexity) is needed for person-by-person-optimality of a team policy to imply global optimality.

84 CHAPTER 8. TEAM DECISION THEORY AND INFORMATION STRUCTURES

Appendix A 

On the Convergence of Random 

Variables 

A.1 Limit Events and Continuity of Probability Measures 

Given A 1 , A 2 , . . . , A n ⊂ F, define: 

limsup A n = ∩ ∞ n=1 ∪ ∞ k=n A k 

n 

liminf A n = ∪ ∞ 

n 

n=1 ∩∞ k=n A k 

For the superior limit, an element is in this set, if it is in infinitely many A n s. For the inferior case, an element 

is in the limit, if it is in almost except for a finite number of A n s. 

The limit of a sequence of sets exists if the above limits are equal. 

We have the following result: 

Theorem A.1.1 For a sequence of events A n : 

P(liminf 

n 

A n ) ≤ liminf 

n 

P(A n ) ≤ limsup 

n 

We have the following regarding continuity of probability measures: 

P(A n ) ≤ P(limsup A n ) 

n 

Theorem A.1.2 (i)For a sequence of events A n , A n ⊂ A n+1 , 

lim P(A n) = P(∪ ∞ 

n→∞ 

n=1 A n) 

(ii) For a sequence of events A n , A n+1 ⊂ A n , 

lim P(A n) = P(∩ ∞ 

n→∞ 

n=1 A n) 

A.2 Borel-Cantelli Lemma 

Theorem A.2.1 (i) If ∑ n P(A n) converges, then P(limsup n A n ) = 0. (ii) If {A n } are independent and if 

∑ P(An ) = ∞, then P(limsup n A n ) = 1. 

85

86 APPENDIX A. ON THE CONVERGENCE OF RANDOM VARIABLES 

Exercise A.2.1 Let {A n } be a sequence of independent events where A n is the event that the nth coin flip is 

head. What is the probability that there are infinitely many heads if P(A n ) = 1/n 2 ? 

An important application of the above is the following: 

Theorem A.2.2 Let Z n , n ∈ N and Z be random variables and for every ǫ > 0, 

∑ 

P(|Z n − Z| ≥ ǫ) < ∞. 

Then, 

That is Z n converges to Z with probability 1. 

n 

P({ω : Z n (ω) = Z(ω)}) = 1. 

A.3 Convergence of Random Variables 

A.3.1 Convergence almost surely (with probability 1) 

Definition A.3.1 A sequence of random variables X n converges almost surely to a random variable X if P({ω : 

lim n→∞ X n (ω) = X(ω)}) = 1. 

A.3.2 Convergence in Probability 

Definition A.3.2 A sequence of random variables X n converges in probability to a random variable X if 

lim n→∞ P(|X n − X| ≥ ǫ}) = 0 for every ǫ > 0. 

A.3.3 Convergence in Mean-square 

Definition A.3.3 A sequence of random variables X n converges in the mean-square sense to a random variable 

X if lim n→∞ E[|X n − X| 2 ] = 0. 

A.3.4 Convergence in Distribution 

Definition A.3.4 Let X n be a random variable with cumulative distribution function F n , and X be a random 

variable with cumulative distribution function F. A sequence of random variables X n converges in distribution 

(or weakly) to a random variable X if lim n→∞ F n (x) = F(x) for all points of continuity of F. 

Theorem A.3.1 a) Convergence in almost sure sense implies in probability. b) Convergence in mean-square 

sense implies convergence in probability. c) If X n → X in probability, then X n → X in distribution. 

We also have partial converses for the above results: 

Theorem A.3.2 a) If P(|X n | ≤ Y ) = 1 for some random variable Y with E[Y 2 ] < ∞, and if X n → X in 

probability, then X n → X in mean-square. b) If X n → X in probability, there exists a subsequence X nk which 

converges to X almost surely. c) If X n → X and X n → Y in probability, mean-square or almost surely. Then 

P(X = Y ) = 1. 

A sequence of random variables is uniformly integrable if: 

lim sup E[X n 1 

K→∞ |Xn|≥K] = 0. 

Note that, if {X n } is uniformly integrable, then, sup n E[|X n |] < ∞. 

n

A.3. CONVERGENCE OF RANDOM VARIABLES 87 

Theorem A.3.3 Under uniform integrability, convergence in almost sure sense implies convergence in meansquare. 

Theorem A.3.4 If X n → X in probability, there exists some subsequence X nk 

surely. 

which converges to X almost 

Theorem A.3.5 (Skorohod’s representation theorem) Let X n → X in distribution. Then, there exists a 

sequence of random variables Y n and Y such that, X n and Y n have the same cumulative distribution functions; 

X and Y have the same cumulative distribution functions and Y n → Y almost surely. 

With the above, we can prove the following result. 

Theorem A.3.6 The following are equivalent: i) X n converges to X in distribution. ii) E[f(X n )] → E[f(X)] 

for all continuous and bounded functions f. iii) The characteristic functions Φ n (u) := E[e iuXn ] converge pointwise 

for every u ∈ R.

88 APPENDIX A. ON THE CONVERGENCE OF RANDOM VARIABLES

Bibliography 

[1] A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I. Marcus. Discrete-time 

controlled Markov processes with average cost criterion: A survey. SIAM J. Control and Optimization, 

31:282–344, 1993. 

[2] R. J. Aumann. Agreeing to disagree. Annals of Statistics, 4:1236 – 1239, 1976. 

[3] P. Billingsley. Convergence of Probability Measures. New York, NY, John Wiley, 1968. 

[4] P. Billingsley. Probability and Measure. Wiley, 3rd ed.), New York, 1995. 

[5] D. Blackwell. Memoryless strategies in finite-stage dynamic programming. Annals of Mathematical Statistics, 

35:863–865, 1964. 

[6] D. Blackwell and C. Ryll-Nadrzewski. Non-existence of everywhere proper conditional distributions. Annals 

of Mathematical Statistics, pages 223–225, 1963. 

[7] V. I. Bogachev. Measure Theory. Springer-Verlag, Berlin, 2007. 

[8] V. S. Borkar. White-noise representations in stochastic realization theory. SIAM J. on Control and Optimization, 

31:1093–1102, 1993. 

[9] V. S. Borkar. Probability Theory: An Advanced Course. Springer, New York, 1995. 

[10] V. S. Borkar. Convex analytic methods in Markov decision processes. In Handbook of Markov Decision 

Processes, E. A. Feinberg, A. Shwartz (Eds.), pages 347–375. Kluwer, Boston, MA, 2001. 

[11] A. Brandenburger and E. Dekel. Common knowledge with probability 1. J. Mathematical Economics, 

16:237–245, 1987. 

[12] S. B. Connor and G. Fort. State-dependent foster-lyapunov criteria for subgeometric convergence of Markov 

chains. Stoch. Process Appl., 119:176–4193, 2009. 

[13] R. Douc, G. Fort, E. Moulines, and P. Soulier. Practical drift conditions for subgeometric rates of convergence. 

Ann. Appl. Probab, 14:1353–1377, 2004. 

[14] R. Durrett. Probability: theory and examples, volume 3. Cambridge university press, 2010. 

[15] J. K. Ghosh and R. V. Ramamoorthi. Bayesian Nonparametrics. Springer, New York, 2003. 

[16] M. Hairer. Convergence of markov processes. Lecture Notes, 2010. 

[17] O. Hernandez-Lerma and J. Lasserre. Discrete-time Markov control processes. Springer, 1996. 

[18] O. Hernandez-Lerma and J. B. Lasserre. Markov Chains and Invariant Probabilities. Birkhäuser, Basel, 

2003. 

[19] J. B. Lasserre. Invariant probabilities for Markov chains on a metric space. Statistics and Probability Letters, 

34:259–265, 1997. 

[20] J. Liu and E. Susko. On strict stationarity and ergodicity of a non-linear arma model. Journal of Applied 

Probability, pages 363–373, 1992. 

89

90 BIBLIOGRAPHY 

[21] A. S. Manne. Linear programming and sequential decision. Management Science, 6:259–267, April 1960. 

[22] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, 2007. 

[23] S. P. Meyn and R. Tweedie. Markov Chains and Stochastic Stability. Springer Verlag, London, 1993. 

[24] S. P. Meyn and R. Tweedie. State-dependent criteria for convergence of Markov chains. Ann. Appl. Prob, 

4:149–168, 1994. 

[25] S.P. Meyn and R.L. Tweedie. Markov Chains and Stochastic Stability. Springer, 1993. 

[26] S.P. Meyn and R.L. Tweedie. State-dependent criteria for convergence of markov chains. Annals Appl. 

Prob., 4(1):149–168, 1994. 

[27] L. Nielsen. Common knowledge, communication and convergence of beliefs. Mathematical Social Sciences, 

8:1–14, 1984. 

[28] G.O. Roberts and J.S. Rosenthal. General state space markov chains and mcmc algorithms. Probability 

Survery, 1:20–71, 2004. 

[29] M. Schäl. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policies 

to be optimal. Z. Wahrscheinlichkeitsth, 32:179–296, 1975. 

[30] P. Tuominen and R.L. Tweedie. Subgeometric rates of convergence of f-ergodic markov chains. Adv. Annals 

Appl. Prob., 26(3):775–798, September 1994. 

[31] R. L. Tweedie. Drift conditions and invariant measures for Markov chains. Stochastic Processes and Their 

Applications, 92:345–354, 2001. 

[32] S. Yüksel and T. Başar. Stochastic Networked Control Systems: Stabilization and Optimization under 

Information Constraints. Birkhäuser, Boston, MA, 2013. 

[33] S. Yüksel and S. P. Meyn. Random-time, state-dependent stochastic drift for Markov chains and application 

to stochastic stabilization over erasure channels. IEEE Transactions on Automatic Control, 58:47 – 59, 

January 2013.

Lecture Notes - Department of Mathematics and Statistics - Queen's ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?