MATH3013: Simulation and Queues - University of Southampton

MATH3013: Simulation and Queues 

Athanassios (Thanos) N. Avramidis 

School of Mathematics 

University of Southampton 

Highfield, Southampton, United Kingdom 

April 2012

Contents 

1 Probability 1 

1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

1.2 Probability Spaces* . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

1.3 Random Variables and Distributions . . . . . . . . . . . . . . . . . . 4 

1.3.1 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . 5 

1.3.2 Continuous distributions . . . . . . . . . . . . . . . . . . . . . 5 

1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 8 

1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.6 Mean and Variance, and Sums . . . . . . . . . . . . . . . . . . . . . . 11 

1.7 Probability Generating Functions . . . . . . . . . . . . . . . . . . . . 12 

1.8 Two Major Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . 13 

1.9 Questions / Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

1.10 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

2 Poisson and Related Processes 20 


2.2 Residual Time and Memoryless Property . . . . . . . . . . . . . . . . 22 

2.3 Counting Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

2.4 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 

2.4.1 Distribution of counts (the N t process) . . . . . . . . . . . . . 25 

2.4.2 Distribution of Times (the X n and S n ) . . . . . . . . . . . . . 28 

2.4.3 Simulating a Poisson Process . . . . . . . . . . . . . . . . . . 29 

2.4.4 Merging Poisson Processes . . . . . . . . . . . . . . . . . . . . 30 

2.5 Poisson Process of General Rate Function . . . . . . . . . . . . . . . 31 

2.5.1 Thinning a Poisson Process . . . . . . . . . . . . . . . . . . . 31 

2.5.2 Simulating an NHPP via Thinning . . . . . . . . . . . . . . . 32 

2.6 Estimating a Poisson Process: Brief View . . . . . . . . . . . . . . . . 33 

2.7 Large-t Nt 

t 

via Strong Law of Large Numbers . . . . . . . . . . . . . . 34 

2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 

i


3 Queues 41 


3.1.1 Queues: Terminology and Global Assumptions . . . . . . . . . 41 

3.2 The Birth-Death Process . . . . . . . . . . . . . . . . . . . . . . . . . 42 

3.3 Continuous-Time Markov Chains (CTMCs) . . . . . . . . . . . . . . 44 

3.3.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

3.3.2 Time-dependent Distribution and Kolmogorov’s Differential System 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 

3.4 Long-Run Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 

3.4.1 Long-Run Average Cost . . . . . . . . . . . . . . . . . . . . . 49 

3.4.2 Explicit Stationary Distributions for Specific Models . . . . . 53 

3.5 Arrivals That See Time Averages . . . . . . . . . . . . . . . . . . . . 56 

3.5.1 A Steady-State Delay Distribution . . . . . . . . . . . . . . . 57 

3.6 Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 

3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 


4 Sampling from Distributions 67 


4.2 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 

4.2.1 Calculating the Inverse Explicitly: Examples . . . . . . . . . . 68 

4.3 Acceptance-Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 

4.3.1 Method and Theory . . . . . . . . . . . . . . . . . . . . . . . 70 

4.3.2 Feasibility and Efficiency . . . . . . . . . . . . . . . . . . . . . 72 

4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 


5 Guidance for Exam Preparation 77 

ii

. 

iii

Chapter 1 

Probability 

This chapter builds foundations in probability, the main mathematical tool behind 

queues and stochastic simulation. Because it is more foundational than an end in 

itself, the examination under-emphasizes it. Guidance for the examination is given 

in Chapter 5. 

1.1 Preliminaries 

The expression “x := y” defines x as being equal to y, while “x =: y” defines y as 

being equal to x. 

1.2 Probability Spaces* 

A random experiment involves an outcome that cannot be determined in advance. 

The set of all possible outcomes is called the certain event and denoted Ω. 

An event is a subset of the certain event. An event A is said to occur if and only 

if the observed outcome ω of the experiment is an element of the set A. 

Example 1.1 Consider the experiment of flipping a coin once. The two possible 

outcomes are “Heads” and “Tails”, and a natural certain event is the set {H, T }. 

Given a certain event Ω and an event A, the event that occurs if and only if A 

does not occur is called the complement of A and is denoted A c . That is, 

A c = {ω ∈ Ω : ω /∈ A}. 

Given two events A and B, their union is the event “A or B”, also written 

A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}. 

1

The intersection of A and B is the event “A and B”, also written 

A ∩ B = {ω ∈ Ω : ω ∈ A and ω ∈ B}. 

The operations of complement, union, and intersection arise frequently and give new 

events, for which we observe the following identities: 

(A ∪ B) c = A c ∩ B c , (A ∩ B) c = A c ∪ B c . (1.1) 

The first says “not (A or B)” equals “(not A) and (not B)”. The second says “not (A 

and B)” equals “(not A) or (not B)”. 

The set containing no elements is called the empty event and is denoted ∅. Note 

that Ω c = ∅ and ∅ c = Ω. 

Event A is said to imply event B, written A ⊂ B, if every element of A belongs 

to B; in other words, the occurrence of A makes the occurrence of B certain. This is 

also written as B ⊃ A. 

The union of several (i.e., more than two) events means the occurrence of any one 

of the events, and the intersection of several events means the occurrence of all the 

events. 

Two events A and B are called disjoint if they cannot happen simultaneously; 

equivalently, they have no elements in common, i.e., 

A ∩ B = ∅, 

A family of events is called disjoint if every pair of them are disjoint. 

The following is our definition of probability. 

Definition 1.2 Let Ω be a certain event and let P be a function that assigns a number 

to each event. Then P is called a probability provided that 

1. For any event A, 0 ≤ P(A) ≤ 1; 

2. P(Ω) = 1; 

3. for any sequence A 1 , A 2 , . . . of disjoint events, 

P(∪ ∞ i=1A i ) = 

∞∑ 

P(A i ) (1.2) 

i=1 

The following summarises properties of a probability P. 

Proposition 1.3 (a) P(∅) = 0. 

2

(b) For any event A, P(A) + P(A c ) = 1. 

(c) If events A and B satisfy A ⊂ B, then P(A) ≤ P(B). 

(d) If events A 1 , A 2 , . . . A n are disjoint, then P(∪ n i=1A i ) = ∑ n 

i=1 P(A i). 

Proof. Left as exercise. 

✷ 

3

1.3 Random Variables and Distributions 

Definition 1.4 A random variable (rv) X with values in the set E is a function that 

assigns a value X(ω) in E to each outcome ω in Ω. 

We will refer to the set E as the support of X. In all cases of interest to us, E is 

one of the following: (a) the set of integer numbers; (b) the set of real numbers; (c) 

a subset of these sets. 

Any set containing a support is also a support, so it is most useful to prescribe the 

smallest possible support. An example of a discrete random variable is the number, 

X, of successes during n independent trials each having success probability p, i.e., 

X has the Binomial(n, p) distribution. Here, the smallest possible support is the set 

{0, 1, . . . , n}. 

Example 1.1 (continued) Define X by putting X(H) = 1, X(T ) = −1. Then X 

is a random variable with support {−1, 1}. 

Notationally, we generally abbreviate P({ω : X(ω) ≤ b}) as P(X ≤ b). 

The function F defined by 

F (b) = P(X ≤ b), 

−∞ 

is called the (Cumulative) Distribution Function of the random variable X. We call 

F in short the cdf of X. 

Example 1.1 (continued) Suppose the probability of “Heads” is 0.6. We have 

P(X = −1) = P({T }) = 1 − P({H}) = 0.4, P(X = 1) = P({H}) = 0.6, 

and thus the cdf of X is 

⎧ 

⎨ 0 b < −1, 

F (b) = P(X ≤ b) = 0.4 −1 ≤ b < 1, 

⎩ 

1 1 ≤ b. 

(1.3) 

A cdf has the following properties. 

Proposition 1.5 If F is a cdf of a finite-valued rv, then: 

(a) F is nondecreasing. 

(b) F is right-continuous . 

(c) F (−∞) = lim x→−∞ F (x) = 0 . 

4

(d) F (∞) = lim x→∞ F (x) = 1 . 

Proof. We only prove (a). If a ≤ b, then 

{X ≤ a} ⊂ {X ≤ b} (1.4) 

and Proposition 1.3(c) gives 

P(X ≤ a) ≤ P(X ≤ b). (1.5) 

✷ 

1.3.1 Discrete distributions 

In Example 1.1 the support had two points. 

discrete rv X with support the ordered points 

Consider now the general case of a 

x 1 < x 2 < x 3 < . . . . (1.6) 

Then, for each k = 1, 2, 3, . . ., the cdf at x k is the sum of the probabilities associated 

to the values less than or equal to x k : 

F (x k ) = 

∑ 

P(X = x j ) = ∑ 

j:x j ≤x k 

j:j≤k 

P(X = x j ). 

Because each term in the sum is non-negative, we have F (x 1 ) ≤ F (x 2 ) ≤ . . .. Thus, 

the cdf is the non-decreasing step function 

⎧ 

0 x < x 1 

F (x 1 ) x 1 ≤ x < x 2 

⎪⎨ F (x 2 ) x 2 ≤ x < x 3 

F (x) = 

. 

F (x k ) x k ≤ x < x k+1 

⎪⎩ 

. 

(1.7) 

In the opposite direction, the individual probabilities follow from the cdf as P(X = 

x k ) = F (x k ) − F (x k−1 ) for all k. 

1.3.2 Continuous distributions 

A major class of non-discrete distributions has a support that is not countable, meaning 

it cannot be enumerated, i.e., cannot be represented as in (1.6). A typical such 

support is any subinterval of the real numbers, i.e, [a, b], with a < b. (Enumerating 

5

this set is impossible because for any real number x 1 > a, there exists a real number 

x 2 such that a < x 2 < x 1 .) 

Suppose further that F is differentiable at x, i.e., the left- and right-derivatives of 

F at x are equal, i.e., we can define 

F ′ (x) = lim 

ɛ→0+ 

F (x + ɛ) − F (x) 

ɛ 

= lim 

ɛ→0+ 

F (x) − F (x − ɛ) 

ɛ 

F (x + ɛ) − F (x − ɛ) 

= lim 

. 

ɛ→0+ 2ɛ 

(1.8) 

The last enumerator is P(x − ɛ < X ≤ x + ɛ), so F ′ (x) describes the “probabilistic 

intensity” of falling “arbitrarily close to x”. For this reason, it is called the probability 

density function (pdf) (of F and of the associated rv). 

The cdf F and its pdf F ′ are mirrors of each other: they are linked by a differentiation 

step when going from F to F ′ (by definition); and they are linked by an 

integration step when going from F ′ to F (this is said by the Fundamental Theorem 

of Calculus (FTC)). The integration link is 

F (b) = F (b) − F (−∞) = 

∫ b 

−∞ 

F ′ (t)dt, −∞ 

Alternatively, we could calculate the function 1 − F (b) = P(X > b) (called the tail 

probability or complementary cdf ) as 

1 − F (b) = F (∞) − F (b) = 

∫ ∞ 

b 

F ′ (t)dt, −∞ 

The integral in (1.9) is the area under the graph of F ′ between −∞ and b. If F ′ 

changes form anywhere there, then the integral (area) is typically calculated piecewise. 

The following is a minimal example illustrating this. 

Example 1.6 Equiprobable outcomes on [2, 4] ∪ [5, 6] (i.e., the union of these real 

intervals) are described by the pdf 

f(x) = 

{ 

1 

3 , 2 ≤ x ≤ 4 

1 

, 5 ≤ x ≤ 6 

3 

F (x) is calculated as the area piecewise. The answer is 

⎧ 

x−2 

⎪⎨ , 2 ≤ x ≤ 4 

3 

2 

F (x) = 

3 

⎪⎩ 

, 4 < x ≤ 5 

2 

+ x−5, 

5 < x ≤ 6 

3 3 

The Uniform Distribution 

Let a < b. The Unif(a, b) distribution has support 

[a, b] and constant pdf. Thus, the cdf F satisfies F (a) = 0 and F (b) = 1. The 

6

constant value of the pdf, call it c, is determined from 

1 = F (b) = 

i.e., c = 1/(b − a). Thus, 

F (x) = 

∫ x 

a 

∫ b 

a 

cdt = c(b − a), 

1 

b − a dt = x − a , a ≤ x ≤ b. (1.10) 

b − a 

In particular, the Unif(0, 1) distribution has cdf F (x) = x and pdf f(x) = 1 with 

support [0, 1]. 

The Exponential Distribution Let λ > 0. We write X ∼ Expon(λ) and read 

“X has the exponential distribution with rate λ” to mean that X has support [0, ∞) 

and has tail probability 

P(X > x) = e −λx , x > 0. (1.11) 

Equivalently, the cdf is 

P(X ≤ x) = 1 − e −λx , x ≥ 0, 

and the pdf is 

d 

dx (1 − e−λx ) = λe −λx , x > 0. 

7

1.4 Conditional Probability 

Let Ω be a certain event and let P be a probability on it. 

Definition 1.7 Let B be an event such that P(B) > 0. For any event A, the conditional 

probability of A given B, written P(A|B), is a number satisfying 

(a) 0 ≤ P(A|B) ≤ 1; 

(b) P(A ∩ B) = P(A|B)P(B). 

Remark 1.8 Fix B such that P(B) > 0. Then, the P(A|B), viewed as a function of 

A, satisfies all the conditions in Definition 1.2. That is: 

1. 0 ≤ P(A|B) ≤ 1; 

2. P(Ω|B) = 1; 

3. For any sequence A 1 , A 2 , . . . of disjoint events, P(∪ ∞ i=1A i |B) = ∑ ∞ 

i=1 P(A i|B). 

That is, the probability P(·|B) satisfies all the usual properties as the unconditional 

probability, P(·). 

The intuition is: knowledge that the event B has occurred forces us to revise 

the probability of all events. A key exception to this happens when A and B are 

independent, as will be seen shortly. 

We now give a fundamental idea for probabilistic calculations. 

Definition 1.9 (Partition.) The set of events {B 1 , B 2 , . . .} is called a partition if it 

satisfies: 

1. (Disjointness.) B i ∩ B j = ∅ for all i ≠ j (B i and B j cannot happen simultaneously). 

2. (Exhaustiveness.) ∪ ∞ i=1B i = Ω. (One of the B i must happen.) 

Theorem 1.10 (Law of Total Probability (LTP)). Let B 1 , B 2 , . . . be a partition, i.e., 

the B i are disjoint and exhaustive. Then, for any event A, 

P(A) = 

∞∑ 

P(A|B i )P(B i ). (1.12) 

i=1 

8

Proof. Write 

A = ∪ ∞ i=1(A ∩ B i ) (1.13) 

and take probabilities, noting that the A ∩ B i are disjoint because the B i are disjoint, 

to obtain 

∞∑ 

P(A) = P(A ∩ B i ). 

i=1 

Finally, replace P(A ∩ B i ) by P(A|B i )P(B i ). ✷ 

9

1.5 Independence 

Definition 1.11 Events A and B are said to be independent if 

P(A ∩ B) = P(A)P(B). (1.14) 

If (1.14) holds, then Definition 1.7 gives P(A|B) = P(A) (and also P(B|A) = 

P(B)), i.e., the conditional probabilities equal the unconditional ones. The intuition 

behind independence is that the occurrence of one event does not contain information 

about the occurrence of the other. 

Definition 1.12 The random variables X 1 and X 2 are said to be independent if 

the events {X 1 ≤ b 1 } and {X 2 ≤ b 2 } are independent for all b 1 , b 2 . (1.15) 

It turns out that (1.15) implies that all “interesting” events about X 1 and X 2 are 

independent (e.g., events where “≤” is replaced by “≥” or by “=” are also independent). 

10

1.6 Mean and Variance, and Sums 

Definition 1.13 Suppose X is a discrete random variable with support {1, 2, 3, . . .} 

and its distribution is P(X = i) = p i for all i. The expected value (mean) of X is 

E[X] = 1p 1 + 2p 2 + 3p 3 + · · · = 

∞∑ 

ip i . (1.16) 

i=1 

E[X] can be seen to be the area on the (x, y) plane, bounded below by the cdf of 

X, bounded above by the line y = 1, and bounded to the left by the line x = 0. 

Definition 1.14 Suppose X is a random variable with mean µ. The variance of X 

is the mean square deviation of X from its mean µ: 

Var(X) = E[(X − µ) 2 ]. (1.17) 

Here is how summation of random variables affects the mean and variance. 

Fact 1.15 1. For any random variable X and constants a, b, 

E[a + bX] = a + bE[X]. (1.18) 

2. For any random variables X 1 and X 2 , 

E[X 1 + X 2 ] = E[X 1 ] + E[X 2 ]. (1.19) 

Fact 1.16 If X 1 and X 2 are independent random variables, then 

Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ). (1.20) 

Properties (1.19) and (1.20) extend to any sum of finitely many random variables, 

as can be seen by mathematical induction; that is, 

E[X 1 + X 2 + . . . + X n ] = E[X 1 ] + E[X 2 ] + . . . + E[X n ], 

Var(X 1 + X 2 + . . . + X n ) = Var(X 1 ) + Var(X 2 ) + . . . + Var(X n ) for independent X’s. 

11

1.7 Probability Generating Functions 

Definition 1.17 Let X be a discrete random variable with support {0, 1, . . .} and 

probabilities p n := P(X = n), where ∑ ∞ 

n=0 p n = 1. The probability generating function 

(pgf) of X is the function 

G(z) = 

∞∑ 

p n z n , 0 ≤ z ≤ 1. (1.21) 

n=0 

The pgf contains the distribution of X, as stated below. 

Fact 1.18 The function G and its derivatives of all orders are continuous functions 

on [0, 1]. 

Write the n-order derivative of G as G (n) , and put G (0) = G. We have: 

G (1) (z) = d ∞ 

dz G(z) = ∑ d 

dz p nz n = 

G (2) (z) = d dz G(1) (z) = 

n=0 

∞∑ 

n=1 

∞∑ 

p n nz n−1 , (1.22) 

n=1 

d 

dz p nnz n−1 = 

∞∑ 

p n n(n − 1)z n−2 . (1.23) 

The derivatives of higher order work analogously. At z = 0, powers 0 k appear; they 

equal zero unless k = 0, where 0 0 = 1. That is, we get 

n=2 

G(0) = p 0 , G (1) (0) = p 1 , G (2) (0) = 2p 2 , 

and, in the same way, G (n) (0) = n!p n for all n > 0. We now see all of the following: 

Fact 1.19 The pgf gives the distribution as 

p 0 = G(0), p n = G(n) (0) 

, n = 1, 2, . . . . (1.24) 

n! 

Fact 1.20 G(1) = ∑ ∞ 

n=0 p n = 1, directly from (1.21). 

Fact 1.21 The pgf determines the mean of X as 

E[X] = 

∞∑ 

p n n = G (1) (1) from (1.22). 

n=1 

12

1.8 Two Major Limit Theorems 

For independent and identically distributed (iid) random variables X 1 , X 2 , . . . , there 

are strong theorems about their average and their sum, as the number n being averaged 

or summed goes to infinity. The first theorem is about the average. 

Theorem 1.22 (Strong Law of Large Numbers (SLLN)) If X 1 , X 2 , . . . are independent 

and identically distributed (iid) random variables with finite mean µ, then their 

∑ 

average, ¯Xn = 1 n 

n i=1 X i, converges to µ in a probabilistic sense: 

( 

) 

P ¯X n = µ = 1. (1.25) 

lim 

n→∞ 

In words, the average converges to the mean with probability one, or w.p. 1. 

The second theorem is about the sum, S n = X 1 + X 2 + · · · + X n , as n goes to 

infinity. Here is an example where the theorem is useful. 

Example 1.23 A person, JD, arrives at a single-server system and finds n = 100 

customers ahead, one of them already in service. Suppose the individual customer 

service (processing) times are iid random variables X 1 , X 2 , . . ., with mean µ = 5 and 

variance σ 2 = 4. Can we approximate JD’s waiting time in queue? 

In general, the in-service customer has a remaining service time, S rem , whose 

distribution depends on the elapsed time in service (unless service times are exponentially 

distributed, as we will see). But provided n is not small, the sum of n service 

times should not be affected too much by any one. Let us then pretend the in-service 

customer’s remaining service time is like a “fresh” one, so JD’s waiting time is the 

sum S 100 = X 1 + X 2 + . . . + X 100 , where the X i are iid with mean 5 and variance 4. 

One approximation could be based, roughly, on the SLLN: 

S 100 = 100 1 

100 (X 1 + . . . + X 100 ) ≈ 500. (1.26) 

} {{ } 

≈µ=5 

Using only the mean service time, we got a deterministic approximation of S 100 . 

The Central Limit Theorem, (1.27) below, says that the distribution of S n 

approximately Normal with mean 

and variance 

approx 

Thus, S 100 ∼ 

E[S n ] = nµ by (1.19) 

Var(S n ) = nσ 2 by (1.20). 

N(500, 400), where approx 

∼ 

means “is distributed approximately as”, 

and N(µ, σ 2 ) denotes a Normal distribution with mean µ and variance σ 2 . 

13 

is

The CLT is recorded below. 

Theorem 1.24 (Central Limit Theorem (CLT)) Let X 1 , X 2 , . . . be independent and 

identically distributed (iid) random variables with mean µ and variance σ 2 < ∞, and 

put S n = X 1 + X 2 + · · · + X n . Then 

S n − nµ 

σ √ n 

D 

→ N(0, 1) as n → ∞, 

where D → means convergence in distribution. More practically, 

approx 

S n ∼ N(nµ, nσ 2 ) = D nµ + σ √ nN(0, 1) 

}{{} } {{ } 

deterministic stochastic 

as n → ∞, (1.27) 

where D = means equality in distribution 1 . 

Remarks: 

1. No assumption is made about the distribution of the X i . 

2. Assuming µ ≠ 0, we can see that as n → ∞, the stochastic effect becomes (in 

the limit) negligible relative to the deterministic effect. To see this, divide the 

multiplier of the stochastic term N(0, 1) over the deterministic term: 

√ nσ 

and note this goes to zero. 

nµ 

1 Recall that X ∼ N(0, 1) is equivalent to aX + b ∼ N(b, a 2 ), for any a, b. 

14

. 

15

1.9 Questions / Exercises 

Indications: “P”: proof; “S”: short answer that refers appropriately to definitions. 

1. Two standard dice are thrown, and the outcomes X 1 , X 2 (a number in {1, 2, . . . 6} 

for each) are recorded. Assume X 1 and X 2 are independent. Are X 1 and X 1 +X 2 

independent random variables? Explain carefully. 

2. For a random variable with pdf f(), the expected value can be defined as 

µ = E[X] = 

Its variance can then be defined as 

Var(X) = 

∫ ∞ 

−∞ 

∫ ∞ 

−∞ 

xf(x)dx. (1.28) 

(x − µ) 2 f(x)dx. (1.29) 

Compute the mean and variance of the exponential distribution seen in Section 

1.3.2. Hint: Note the identity Var(X) = E[(X − µ) 2 ] = E[X 2 − 2µX + µ 2 ] = 

E[X 2 ] − 2µE[X] + µ 2 = E[X 2 ] − µ 2 . 

3. Out of n servers in a system (e.g., persons answering calls at a call center), seen 

at a certain time point, say at noon, some are temporarily absent (e.g., they are 

at the bathroom). Assume individual-server absences occur independently of 

each other with probability p. Let N be the number of absent servers. Express 

the mean and variance of N in terms of n and p and use the CLT to approximate 

the distribution of N. 

4. In Example 1.23, let the number of customers ahead (number of rv’s in the sum) 

be n. Express the probability that the approximated (i.e., normally-distributed) 

waiting time is at least 10% above its mean, writing it as a function of the N(0, 1) 

cdf and n, then compute it explicitly for n = 4, 16, 64, and 256. Determine the 

limit of this as n → ∞. 

5. (S) Suppose a customer arrives at a system where multiple servers each serve 

their own queue and chooses to join the shortest queue. Suggest assumptions, 

focusing on the customer service times seen as random variables, that would 

explain this customer behaviour. Strive for the weakest assumption(s) possible. 

Then, assuming shorter waiting is preferred, suggest a situation where join-theshortest 

queue would not necessarily make sense. 

6. (P) 

(a) Let X and Y be independent random variables, each with support the 

integer numbers. Show that 

E[XY ] = E[X]E[Y ]. (1.30) 

Hint: Put A i := {X = i} and B j := {Y = j}. Use the fact that A i and 

B j are independent events for all i and j. 

(b) Suppose X 1 and X 2 are rvs, and let µ 1 , µ 2 be their respective means. 

Observe 

Var(X 1 + X 2 ) = E[(X 1 − µ 1 + X 2 − µ 2 ) 2 ]. 

Show that when X 1 and X 2 are discrete and independent, the above equals 

Var(X 1 ) + Var(X 2 ). Hint: expand the square appropriately, and use properties 

of E[], including (1.30). 

16

7. (P) Throughout, X is a real-valued random variable with cdf F , and b is a 

real (number). We give an idea why F is right-continuous at b (the result in 

Proposition 1.5(b)). If the events A 1 , A 2 , . . . “decrease to” A, meaning that 

A 1 ⊃ A 2 ⊃ A 3 ⊃ . . . and A = ∩ ∞ n=1A n , then it can be shown that 

P(A) = lim 

n→∞ 

P(A n ). (1.31) 

Now, note that the events A n = {X ≤ b + 1/n}, n = 1, 2, 3, . . ., decrease to the 

event {X ≤ b}. Applying (1.31), 

i.e., F is right-continuous at b. 

F (b) = P(X ≤ b) = lim 

n→∞ 

P(A n ) = lim 

n→∞ 

F (b + 1 n ) 

(a) If the events A 1 , A 2 , . . . “increase to” A, meaning that A 1 ⊂ A 2 ⊂ A 3 ⊂ . . . 

and A = ∪ ∞ n=1A n , then again (1.31) holds. Using this property, show that 

P(X < b) = F (b−) (1.32) 

where F (b−) = lim x→b− F (x), the left-limit of F at b. 

(b) Show that 

P(X = b) = F (b) − F (b−). (1.33) 

(c) Define X to be a continuous random variable if 

P(X = b) = 0, all b. (1.34) 

Restate this definition via a property of F , the cdf of X. 

1.10 Solutions to Exercises 

1. Short argument: as X 1 + X 2 contains X 1 , we do not expect X 1 and X 1 + X 2 to 

be independent. As a more careful argument, we show one conditional probability 

about X 1 + X 2 , given an event about X 1 , that is different from the unconditional 

one: P(X 1 + X 2 = 8|X 1 = 1) = 0, versus the (easily seen) P(X 1 + X 2 = 8) > 0. 

The inequality of the two shows that independence fails. 

2. We need the integration-by-parts rule 2 

∫ b 

a 

f(x)g ′ (x)dx = f(b)g(b) − f(a)g(a) − 

where f ′ (x) = d f(x), and likewise for g. The calculation is: 

dx 

E[X] = 

∫ ∞ 

0 

yλe −λy dy = 1 λ 

∫ ∞ 

0 

x 

∫ b 

}{{} 

f(x) 

a 

f ′ (x)g(x)dx. (1.35) 

e −x 

}{{} 

g ′ (x) 

2 This is obtained from f(x)g(x) ∣ ∣ b a = ∫ b 

a (fg)′ (x)dx = ∫ b 

a f ′ (x)g(x)dx + ∫ b 

a f(x)g′ (x)dx. 

dx 

17

(change of variable x = λy), where the last integral is 

I := 

∫ ∞ 

0 

xe −x dx = x(−e −x ) ∣ ∣ ∞ 0 

∫ ∞ 

− 1 · (−e −x )dx = 0 + e −x∣ ∣ ∞ = 1, 0 

0 

(by (1.35) with f(x) = x, g(x) = −e −x , a = 0, b = ∞). Thus, E[X] = 1/λ. 

Now, calculate E[X 2 ] using similar steps: 

E[X 2 ] = 

∫ ∞ 

0 

y 2 λe −λy dy = 1 λ 2 ∫ ∞ 

0 

}{{} x 2 

f(x) 

e −x 

}{{} 

g ′ (x) 

(again x = λy), and the last integral is, using (1.35) with f(x) = x 2 and g, a, b as 

above, 

dx, 

∫ ∞ 

0 

x 2 e −x dx = x 2 (−e −x ) ∣ ∣ ∞ 0 

∫ ∞ 

− (2x)(−e −x )dx = 0 + 2I = 2. 

0 

Thus, E[X 2 ] = 2/λ 2 , and now Var(X) = 2/λ 2 − (1/λ) 2 = 1/λ 2 . 

3. Let I j be random, taking value 1 if server j is absent, and 0 otherwise. The number 

of absent servers is N = I 1 + I 2 + . . . + I n . By assumption, the I j are independent 

and identically distributed, where P(I 1 = 1) = p, so P(I 1 = 0) = 1 − p. The CLT 

gives N approx 

∼ Normal(E[N], Var(N)), where 

E[N] = nE[I 1 ] 

Var(N) = nVar(I 1 ) 

E[I 1 ] = 1 · p + 0 · (1 − p) = p 

Var(I 1 ) = (1 − p) 2 · p + (0 − p) 2 (1 − p) = p(1 − p). 

4. By the CLT, the approximated waiting time when there are n customers ahead is 

X ∼ N(nµ, nσ 2 ), or equivalently X−nµ √ nσ 

∼ N(0, 1). 

The exercise inadvertently asked about the event “X is 10% above its mean”. This 

event has probability zero, as will be seen. Now we calculate the probability that 

“X ≥ 10% above its mean”, i.e., {X ≥ 1.1nµ}. 

P(X ≥ 1.1nµ) = P(X > 1.1nµ) = P 

( X − nµ 

√ nσ 

> 0.1nµ ) ( ) 0.1nµ 

√ = 1 − Φ √ nσ nσ 

where Φ() is the N(0, 1) cdf. In the first step we used P(X = 1.1nµ) = 0. 

3 

As n → ∞, the argument inside Φ goes to ∞, so the P() goes to 1 − Φ(∞) = 

0. For calculations, I used matlab and the code “n=[4 16 64 256]; proba= 1 

- normcdf(0.1*5*n./(sqrt(4*n)))”. The probabilities are, in order, 0.3085, 

0.1587, 0.0228, and 0.00003167. 

5. Assume first-come first-serve discipline and let n i be the number of people at server 

i upon arrival of a test customer X. Assume X joins one server immediately, and 

does not switch to another server. If service times are identically distributed, then, 

3 this follows from (1.34) and the continuity of the normal distribution. 

18

6.(a) 

by (1.19), the mean waiting time of X at server i is n i b, where b is the mean service 

time. Minimising the mean waiting time is then equivalent to minimising n i across 

i, i.e., joining the shortest queue. Independence of service times is not necessary 

in this argument. If a server tends to be “faster” than others, then the assumption 

“identically distributed service times across servers” does not seem sensible. Note: 

The argument generalises: if the mean service time (per customer) at server i is 

b i , then the mean waiting time at i is n i b i , and we could minimise this. 

E[XY ] = ∑ i 

∑ 

ijP(X = i, Y = j). (1.36) 

j 

∑ 

j ijP(X = i)P(Y = j) = ∑ i iP(X = 

Using the independence, this equals ∑ i 

i) ∑ j 

jP(Y = j) = E[X]E[Y ]. 

Note: it might be more natural to only sum the distinct outcomes of XY , unlike 

(1.36) above, but this again results in (1.36), as we now explain. For any k, write 

{XY = k} as the union (logical “or”) of disjoint events ∪ (i,j):ij=k {X = i, Y = j} 

(for example, {XY = 3} = {X = 1, Y = 3} ∪ {X = 3, Y = 1}). Thus 

P(XY = k) = ∑ (i,j):ij=k 

P(X = i, Y = j), and (1.36) follows from 

E[XY ] = ∑ k 

= ∑ k 

kP(XY = k) 

∑ 

ijP(X = i, Y = j) = ∑ i 

(i,j):ij=k 

∑ 

ijP(X = i, Y = j). 

j 

(b) (X 1 − µ 1 + X 2 − µ 2 ) 2 = (X 1 − µ 1 ) 2 + (X 2 − µ 2 ) 2 + 2(X 1 − µ 1 )(X 2 − µ 2 ). The 

mean of this equals 

E[(X 1 − µ 1 ) 2 ] + E[(X 2 − µ 2 ) 2 ] + 2E[(X 1 − µ 1 )(X 2 − µ 2 )] by (1.19) and (1.18) 

= Var(X 1 ) + Var(X 2 ) + 2E[X 1 − µ 1 ]E[X 2 − µ 2 ] by part (a) and Var() definition 

and the rightmost term is zero (E[X 1 − µ 1 ] = E[X 1 ] − µ 1 = 0). 

7.(a) The events A n = {X ≤ b − 1/n}, n = 1, 2, . . . increase to the event {X < b}, so 

(1.31) gives 

P(X < b) = lim 

n→∞ 

P(A n ), 

which is lim n F ( b − 1 n) 

= F (b−). 

(b) Now, {X ≤ b} = {X < b} ∪ {X = b}, and the events {X < b} and {X = b} 

are disjoint, so, by Proposition 1.3(d), P(X ≤ b) = P(X < b) + P(X = b), i.e., 

F (b) = F (b−) + P(X = b), which is (1.33). 

(c) From (1.33) and (1.34) for a fixed b, we have 

F (b) − F (b−) = P(X = b) = 0, 

i.e., F is left-continuous at b, which is equivalent to F being continuous at b because 

F is always right-continuous. The re-stated definition reads: a continuous 

random variable is one whose cdf is an everywhere-continuous function. 

19

Chapter 2 

Poisson and Related Processes 


Function Order 

If a function f satisfies 

then we write f(h) = o(h). Examples: 

f(h) 

lim 

h→0 h = 0, 

• h 2 : h2 

h 

= h → 0, so h2 is o(h) 

• √ h: √ h 

h = 1 √ 

h 

→ ∞, so √ h is not o(h). 

It is easy to check that: 

1. f(h) = o(h) ⇒ cf(h) = o(h) for any constant c. 

2. f(h) = o(h), g(h) = o(h) ⇒ f(h) + g(h) = o(h). 

Later on, we encounter expressions such as λh + o(h), where λ > 0. These mean 

to say that the o(h) term becomes negligible compared to λh in the limit as h → 0 

(o(h)/h tends to zero, whereas λh/h = λ > 0). In particular, the sign in front of o(h) 

is irrelevant, as 

λh + o(h) 

lim 

h→0 h 

= lim 

h 

λh − o(h) 

h 

= lim 

h 

λh 

h = λ. 

(Notation: the limit detail is dropped after first use.) 

Expansion of the Exponential Function Expanding the exponential function 

around zero via Taylor’s Theorem with remainder of order 2, 

e x = 1 + x + o(x). (2.1) 

20

The Poisson Distribution The discrete random variable N with support {0, 1, 2, . . .} 

is said to have the Poisson distribution with mean λ if 

P(N = n) = e 

−λ λn 

, n = 0, 1, 2, . . . . (2.2) 

n! 

We write this in short N ∼ Poisson(λ). A calculation gives E[N] = λ and Var(N) = λ. 

21

2.2 Residual Time and Memoryless Property 

We are interested in the time X until a specified event happens, having in mind 

events such as customer arrivals and customer service completions. X is modelled as 

a positive-real-valued random variable with known distribution. 

Suppose an observer knows that, as of s time units ago, the event has not occurred. 

Then, the time until the event occurs, called the residual time, is X −s. The observer 

is then interested in the conditional distribution of X − s given X > s: 

P(X − s > t|X > s) t ≥ 0. (2.3) 

For s = 5, for example, this is the function 

P(X − 5 > t|X > 5), t ≥ 0. 

In general the distribution changes with s, reflecting a “memory” mechanism. A key 

exception is described below. 

Definition 2.1 A positive-valued random variable X is said to be memoryless if it 

satisfies 

P(X − s > t|X > s) = P(X > t), s, t ≥ 0. (2.4) 

This says that the conditional distribution of X − s given X > s equals the 

unconditional distribution of X; we write this in short 

(X − s|X > s) = D X. (2.5) 

Proposition 2.2 X satisfies (2.4) ⇔ X has an exponential distribution. 

Proof of the “⇐”: Observe that 

P(X − s > t|X > s) = P(X > s + t|X > s) = 

P(X > s + t, X > s) 

P(X > s) 

now for X ∼ Expon(λ), i.e., satisfying (1.11), this equals 

e −λ(s+t) 

e −λs 

i.e. (2.4) holds. The proof of “⇒” is omitted. 

= e −λt = P(X > t), 

= 

P(X > s + t) 

; 

P(X > s) 

Example 2.3 Suppose we are putting out (servicing) fires, and X is the time required 

to service a fire, in minutes. Suppose we try to predict when the service will finish 

given that service started s minutes ago, and our predictor is the mean remaining 

service time, 

g(s) = E[X − s|X > s]. 

First, we give a simulation method for estimating g(s). 

22

1. Fix s. Obtain random samples of (X − s|X > s), as in (2.6) below, independently 

of others. 

2. Calculate ĝ(s) as the sample average. This is an estimate of g(s). 

How do we obtain random samples of (X − s|X > s)? The simplest method is: 

Sample (randomly) X; if X > s, record X − s; otherwise, reject (no record is kept). 

(2.6) 

The distribution of X can affect g(s) a lot. Suppose X ∼ N(40, 100), a Normal 

distribution with mean 40 and standard deviation 10. 1 million trials of the form 

(2.6) gave the following estimates (which are accurate for our purpose): 

s ĝ(s) 

20 20.5 

30 12.9 

40 8.0 

50 5.2 

60 3.7 

In contrast, suppose X ∼ Expon(1/40), the Exponential distribution with the 

same mean as the Normal, 40. Then, from (2.5), g(s) = E[X] = 40 for all s. This is 

very different: for example, for a service that began s = 40 minutes ago, the Normal 

gives a mean remaining service time of 8.0, much smaller than the Exponential’s 40. 

23

2.3 Counting Processes 

How do we model events occurring “randomly” over time? Let t denote time, and let 

it range from 0 to infinity. 

One approach focuses on the (cumulative) count of events over time: 

N t = number of events that occur after time 0 and up to time t, t ≥ 0. 

Another approach focuses on the times of events: 

S n = time of occurrence of n-th event, n = 1, 2, . . . , (2.7) 

or, equivalently, the inter-event times (times between successive events) 

X 1 = S 1 = time of 1st event 

X 2 = S 2 − S 1 = time between 1st and 2nd event 

. . . = . . . 

X n = S n − S n−1 = time between the (n − 1)-st and n-th event, n = 1, 2, 3, . . . 

(Put S 0 = 0 so that X n = S n − S n−1 for n = 0 as well.) We make some natural 

assumptions: 

1. N 0 = 0; that is, the counting starts at time zero. 

2. N t is integer-valued, and it is a nondecreasing function of t. 

3. X n > 0 for all n; that is, events occur one at a time–two events cannot happen 

at the same time. 

As the counts (N t : t ≥ 0) and the event times (S n , n = 0, 1, 2, . . .) are alternative 

descriptions of the same (random) experiment, they are connected by 

{“time of the n-th event” > t} ⇔ {“# events in (0, t]” < n} 

for all n and t 

i.e., we have the equality of events 

{S n > t} = {N t < n} (2.8) 

and 

{N t = n} = {S n ≤ t < S n+1 } (2.9) 

for any n = 0, 1, 2, . . . and t ≥ 0. These event equalities are fundamental to calculations 

later on. 

24

2.4 The Poisson Process 

We study a special counting process, the Poisson, that is analytically simple and 

commonly used. 

2.4.1 Distribution of counts (the N t process) 

For s, t ≥ 0, we have by definition 

N s+t − N s = # (number of) events that occur after s and up to s + t. 

This is the count or increment on (the time interval) (s, s + t]. 

The description is via the (probabilistic) behaviour of such counts, particularly on 

disjoint (non-overlapping) intervals. 

Definition 2.4 The counting process (N t : t ≥ 0) is called a Poisson process of rate 

λ if: 

1. (Independence) The counts on disjoint intervals are independent rv’s. 

2. (Identical distribution, or stationarity) The distribution of N s+t − N s depends 

on the interval’s length, t, and not the startpoint, s. 

3. (Small-interval behaviour) 

P(N h = 1) = λh + o(h), P(N h ≥ 2) = o(h), as h → 0. (2.10) 

Here is another definition: 

Definition 2.5 The counting process (N t : t ≥ 0) is called a Poisson process of rate 

λ if: 


2. (Poisson distribution) The count in any interval of length t (t ≥ 0) has the 

Poisson distribution with mean λt. That is, for any s, t ≥ 0, 

−λt (λt)n 

P(N s+t − N s = n) = P(N t = n) = e , n = 0, 1, . . . (2.11) 

n! 

For s, t ≥ 0, the intervals (0, s] and (s, s + t] are disjoint, so N s and N s+t − N s are 

independent. 

Theorem 2.6 Definition 2.4 ⇔ Definition 2.5. 

25

Proof. Proof of “⇒”. It suffices to show that the functions 

are as specified on the right of (2.11). 

p n (t) = P(N t = n), t ≥ 0, n = 0, 1, 2, . . . , 

We will show this by writing and solving 

differential equations linking the p n (t) and their derivatives, (d/dt)p n (t), across n. As 

N 0 = 0, these functions at time 0 are: 

p 0 (0) = 1, p n (0) = 0, n = 1, 2, . . . . 

We put p −1 (t) = 0, all t; this function has no physical meaning and serves to 

simplify the notation. 

Fix h > 0 and put D h = N t+h − N t . Note D h D = N h , by the identical-distribution 

assumption. Fix n. Then 

{N t+h = n} = {N t = n, D h = 0} or {N t = n − 1, D h = 1} or {D h ≥ 2, N t = n − D h } 

(2.12) 

where the “or”’s are between disjoint events (as the value of D h is different across). 

Now note 

P(D h ≥ 2, N t = n − D h ) ≤ P(D h ≥ 2) = o(h) by (2.10). 

Taking probabilities in (2.12) and using the above gives 

p n (t + h) = P(N t = n, D h = 0) + P(N t = n − 1, D h = 1) + o(h), (2.13) 

and the probabilities on the right can be written 

and, similarly, 

P(N t = n, D h = 0) = P(N t = n)P(D h = 0) by independence 

= p n (t)[1 − λh + o(h)] by (2.10), 

P(N t = n − 1, D h = 1) = P(N t = n − 1)P(D h = 1) by independence 

= p n−1 (t)[λh + o(h)] by (2.10). 

Thus (2.13) gives (substitute the above, re-arrange, and divide by h): 

( 

p n (t + h) − p n (t) 

= − λ + o(h) ) ( 

p n (t) + λ + o(h) ) 

p n−1 (t) + o(h) 

h 

h 

h 

h . 

Now take limits as h ↓ 0 (i.e., h decreases to zero); the o(h)/h terms tend to zero, so 

d 

dt p n(t) = −λp n (t) + λp n−1 (t), n = 0, 1, 2, . . . . (2.14) 

26

To solve this set of differential equations, multiply both sides by e λt and re-arrange 

to obtain: 

λe λt p n−1 (t) = e λt d dt p n(t) + λe λt p n (t) = d ( 

e λt p n (t) ) . (2.15) 

dt 

The above can be solved for p n (t) in the order n = 0, 1, 2, . . ., as follows. For n = 0: 

p −1 (t) = 0, and analytical integration of (2.15) gives 

0 = d ( 

e λt p 0 (t) ) ⇒ e λt p 0 (t) = c ⇒ p 0 (t) = ce −λt , t ≥ 0 

dt 

for some constant c, and the requirement p 0 (0) = 1 gives c = 1. Now we use induction 

on n. Assume 

−λt (λt)n−1 

p n−1 (t) = e 

(n − 1)! . 

Putting this into the left of (2.15), 

λ n t n−1 

(n − 1)! = d ( 

e λt p n (t) ) . (2.16) 

dt 

Integrating analytically, 

e λt p n (t) = λn t n 

+ c 

n! 

for some constant c, and the condition p n (0) = 0 gives c = 0. This concludes the 

induction step and the proof. 

Proof of “⇐”: Using (2.11) and expressing the exponentials as in (2.1), we find 

P(N h = 0) = e −λh , 

P(N h = 1) = e −λh λh = λh + o(h) 

P(N h ≥ 2) = 1 − P(N h = 0) − P(N h = 1) = 1 − e −λh − e −λh λh = o(h). 

✷ 

Example 2.7 (Calculating joint probabilities.) Let (N t : t ≥ 0) be a Poisson process 

of rate λ. For times 0 = t 0 < t 1 < t 2 < . . . < t k and natural numbers n 1 ≤ n 2 ≤ . . . ≤ 

n k , probabilities of the form 

P(N t1 = n 1 , N t2 = n 2 , . . . , N tk = n k ) 

are calculable via the independence and Poisson-distribution properties: 

P(N t1 = n 1 , N t2 = n 2 , . . . , N tk = n k ) 

= P(N t1 − N t0 = n 1 , N t2 − N t1 = n 2 − n 1 , . . . , N tk − N tk−1 = n k − n k−1 ) 

= P(N t1 − N t0 = n 1 )P(N t2 − N t1 = n 2 − n 1 ) · · · P(N tk − N tk−1 = n k − n k−1 ) 

= 

by independence 

k∏ 

e −λ(t i−t i−1 ) [λ(t i − t i−1 )] n i−n i−1 

(n i − n i−1 )! 

i=1 

27 

by (2.11).

2.4.2 Distribution of Times (the X n and S n ) 

Theorem 2.8 The counting process, N = (N t : t ≥ 0) is a Poisson process of rate 

λ ⇔ The associated inter-event times, X 1 = S 1 , X 2 = S 2 − S 1 , X 3 = S 3 − S 2 , . . . are 

independent exponentially distributed rv’s of rate λ. 

Proof. (Partial proof.) Apply (2.8) for n = 1: 

Taking probabilities, 

{S 1 > t} = {N t = 0}. (2.17) 

P(S 1 > t) = P(N t = 0) = e −λt , t ≥ 0 

where the last step holds by (2.11). That is, S 1 has the Expon(λ) distribution, as 

claimed. The remainder is a sketch of the remaining proof. Using the independence 

property of the N t process, it can be shown that S 2 −S 1 is independent of S 1 ; moreover, 

using the stationarity property, it can be shown that S 2 −S 1 has the same distribution 

as S 1 . Similarly, one can show that for any n, S n − S n−1 is independent of the 

corresponding differences for any smaller n, and moreover it has the same distribution 

as S 1 . 

✷ 

Now we give the distribution of S n by taking probabilities in (2.8): 

P(S n > t) = P(N t < n) = P(∪ n−1 

k=0 {N ∑n−1 

∑n−1 

−λt (λt)k 

t = k}) = P(N t = k) = e , t ≥ 0. 

k! 

k=0 

k=0 

(2.18) 

(Note in step 3 that the events N t = k are disjoint across k.) In the last step, we 

used the Poisson formula, (2.11). The above distribution is known as Gamma(n, λ) 

and also as Erlang-n. 

28

2.4.3 Simulating a Poisson Process 

To simulate a Poisson process, the result from Theorem 2.8 is applied. Specifically, we 

simulate the inter-event times by using the fact that for any λ > 0, a random sample 

of the Expon(λ) distribution can be obtained as (−1/λ) log(U), where log denotes 

natural logarithm and U ∼ Unif(0, 1) denotes a pseudo-random number. Note that 

log(U) is always negative, and the result is always positive, as it should. 

Method 2.9 (How to sample a Poisson process on a finite interval.) Given T > 0 

and λ > 0, the event times of a Poisson process of rate λ on the interval [0, T ] can be 

simulated based on the inter-event times, as follows. Set S 0 = 0, and for n = 1, 2, . . ., 

set S n = S n−1 +X n , where X n , the n-th inter-event time is sampled as (−1/λ) log(U n ), 

where U n ∼ Unif(0, 1) is independent of all other U’s; stop as soon as S n > T . 

29

2.4.4 Merging Poisson Processes 

Suppose we merge the events of two independent Poisson processes of rates λ 1 and 

λ 2 (i.e., all the N t are independent across the two; or, equivalently, all the X n , or all 

the S n , are independent across the two). Put 

Nt i = number of events of process i that occur up to time t, i = 1, 2 

N t = Nt 1 + Nt 2 = number of events of the merged process that occur up to time t 

Then the merged process (N t : t ≥ 0) is Poisson of rate λ 1 + λ 2 . This property 

generalises to merging any finite number of processes, by induction. That is: 

Proposition 2.10 The merging of the events of independent Poisson processes gives 

a Poisson process of rate equal to the sum of the rates of the merged processes. 

Sketch of proof (merge two processes for simplicity): 

1. The merged process has independent counts on disjoint intervals; this comes 

from the independence of counts across the processes being merged and across 

disjoint time intervals for the same process. 

2. Check that the merged process satisfies (2.10) with the rate claimed. 

Application 2.11 Customer JD arrives to find q customers waiting in queue, at 

a system of k non-idling servers (servers do not idle if work is available), where 

customer service times are independent Expon(µ) rvs. Assuming first-come firstserve 

discipline, how long is JD’s waiting time? 

1. JD joins the queue in position q + 1 (due to first-come first-serve discipline). 

Thus, with time 0 being the time he joins, his waiting time equals the time of 

the (q + 1)-st (customer) departure (service completion). 

2. While JD waits, each server is working (due to no-server-idling). The independent 

Expon(µ) service times mean that the departures (i.e., the counting 

process of departure events) at a particular server is a Poisson processes of rate 

µ (Theorem 2.8), and independent of the departures at other servers. 

3. Departures from the system are simply the merging of departures at individual 

servers, so they form a Poisson process of rate k times µ (Proposition 2.10). 

4. The distribution of the time of the (q + 1)-st event of a Poisson process was 

derived earlier; see (2.18). So JD’s waiting time has the Gamma(q + 1, kµ) 

distribution, that is, tail probability function (2.18) with n = q + 1 and λ = kµ. 

30

2.5 Poisson Process of General Rate Function 

So far, the constant λ was the probabilistic rate of an event occurrence in any small 

time interval. The forthcoming model replaces this constant by a general (nonnegative) 

function of time, λ(u), u ≥ 0. 

Definition 2.12 The counting process (N t : t ≥ 0) is called a Poisson process of 

(with) rate function λ(t) if: 


2. (Small-interval behaviour) 

P(N s+h − N s = 1) = λ(s)h + o(h), P(N s+h − N s ≥ 2) = o(h), as h → 0. 

(2.19) 

The term Non-Homogeneous (or time-inhomogeneous) Poisson Process (NHPP) 

indicates the rate function is non-constant. 

In this case the distribution of N t is again Poisson, but its mean is now the mean 

function 

m(t) = 

∫ t 

0 

λ(u)du, (2.20) 

that is, the area under the graph of the rate function from 0 to t. By solving differential 

equations like those in the proof of Theorem 2.6, we obtain the summary result below: 

Proposition 2.13 In a Poisson process of rate function λ(·), the count N e − N s has 

the Poisson distribution with mean ∫ e 

λ(u)du = m(e) − m(s). That is, 

s 

−[m(e)−m(s)] [m(e) − m(s)]n 

P(N e − N s = n) = e , n = 0, 1, . . . (2.21) 

n! 

Proof. Left as Exercise 3. 

✷ 

Example: Exam E03 3. Recognise it is an NHPP and identify the rate function; 

the solution is then standard, exactly the main result above for s = 0. 

2.5.1 Thinning a Poisson Process 

Suppose: 

1. N = (N t : t ≥ 0) is a Poisson process of rate K. 

2. Conditional on any event of N occurring at time t, accept the event with probability 

p(t) and reject it otherwise, doing so independently of anything else. (The 

acceptance step is simulated via pseudo-random numbers later.) 

31

Let Ñt be the number of accepted events occurring up to time t inclusive. Call 

(Ñt : t ≥ 0) the thinned process. 

Proposition 2.14 (Thinning a Poisson process.) The process (Ñt : t ≥ 0) is a 

Poisson process of rate function Kp(t). 

Sketch of proof (assume the function p(·) is continuous, for simplicity): 

1. The thinned process has independent counts on disjoint intervals; this comes 

from the independence of counts of the original process together with the independence 

of the acceptance/rejection random variables from everything else. 

2. Check that the thinned process satisfies (2.19) with the rate claimed. 

2.5.2 Simulating an NHPP via Thinning 

Problem 2.15 Simulate (or sample) on a given time interval [0, T ] a non-homogeneous 

Poisson process of given rate function λ(t), 0 ≤ t ≤ T . 

If the rate function is piece-wise constant, a simple solution is: simulate a constantrate 

process on each interval where the rate is constant, via Method 2.9. 

If the rate function is not piecewise constant (examples include: linear with a 

nonzero slope, non-constant polynomial, trigonometric, exponential), then a simple 

solution is as follows. 

Method 2.16 (The Thinning Method.) To simulate the event times of a Poisson 

process of rate function λ(t) on the interval [0, T ], do: 

1. Calculate the rate 

K = max λ(t) < ∞. (2.22) 

0≤t≤T 

2. Using Method 2.9, simulate the event times of a Poisson process of constant 

rate K on the given interval. That is, set S 0 = 0, and for n = 1, 2, . . ., set 

S n = S n−1 + (−1/K) log(U n ), where U n ∼ Unif(0, 1) is independent of other 

U’s, stopping as soon as S n > T . The set S 1 , S 2 , . . . , S n−1 is a random sample 

of the event times of a Poisson process of rate K on the given interval. 

3. (Thinning step.) For i = 1, 2, . . . , n − 1, accept S i with probability λ(S i )/K 

(reject with the remaining probability; note that λ(S i )/K is always between 

0 and 1, by the choice of K). The acceptance/rejection is done by randomly 

sampling V i ∼ Unif(0, 1) independently of everything else and accepting if V i ≤ 

32

λ(S i )/K. The set of accepted S’s is a random sample of the event times of a 

Poisson process of rate function λ(t) on the given interval. 

2.6 Estimating a Poisson Process: Brief View 

Suppose we want to estimate (construct) the functions λ(t) and m(t) from given event 

times s 1 < s 2 < . . . < s n . Note: 

• m() can always be found by integrating λ() as in (2.20). 

• Obtaining λ(t) from m() is possible, assuming m is differentiable at t: 

λ(t) = d dt m(t). 

• A constant rate λ is equivalent to a linear mean function, m(t) = λt. 

As a start, consider: 

Step-function estimate of m(t). 

ˆm(t) = observed count up to t = number of s’s that are ≤ t. (2.23) 

That is, 

ˆm(s k ) = k, k = 1, 2, . . . n, 

and it is constant between s k−1 and s k . Unattractively, it is impossible to infer λ() 

by differentiation 1 . 

One way to get a reasonable estimate of λ() is by a slight revision of the above: 

Piece-wise linear estimate of m(t) (linear in-between event times). 

the function ˜m(t) to be as above at the event times, i.e., 

Define 

and estimate ˜λ as the slope of ˜m: 

˜λ(t) = ˜m(s k) − ˜m(s k−1 ) 

s k − s k−1 

= 

˜m(s k ) = k, k = 1, 2, . . . n, 

1 

s k − s k−1 

for s k−1 ≤ t < s k . (2.24) 

If the ˜λ(t) are close between adjacent (s k−1 , s k ] intervals, we could average them, 

loosing little information in the averaging, to simplify the estimate. 

1 ˆm(t) has derivative zero at all points other than the s’s; at the s’s it is not differentiable. 

33

2.7 Large-t N t 

t 

via Strong Law of Large Numbers 

Let N t be the count of events up to time t. Assume 

A1. Inter-event times are independent, identically distributed, with mean µ > 0. 

A2. lim t→∞ N t = ∞. 

What is the average number of events per unit time, N t /t, for t large ? 

For any t, 

Dividing by N t , 

S Nt ≤ t < S Nt+1. 

S Nt 

≤ t < S N t+1 N t + 1 

. (2.25) 

N t N t N t + 1 N t 

By the SLLN (Theorem 1.22), both the right and left side of (2.25) converge to µ 

as t → ∞, with probability one. Hence the middle must converge to the same, with 

probability one (the event of convergence of the left and right, call it A, implies 

the event of convergence of the middle, call it B; thus 1 = P(A) ≤ P(B), proving 

P(B) = 1). We conclude 

N t 

lim 

t→∞ t 

= 1 µ 

w.p. 1. (2.26) 

34

2.8 Exercises 

1. Let N = (N t : t ≥ 0) be a Poisson process of rate λ = 0.4. 

i Calculate: (a) P(N 5 = 3); (b) P(N 5 = 3, N 15 = 7); (c) P(N 15 = 7|N 5 = 3); 

(d) P(N 5 = 3|N 15 = 7). 

ii Let X be the time between two successive events of this process. Specify the 

distribution of X fully, including its mean. Write P(X ≤ 0.5) and P(X ≤ 2) 

explicitly. 

iii State, briefly, how the results in (i) change if the process is Poisson of mean 

function m() with m(0) = 0 as usual. 

2. In Proposition 2.10, two independent Poisson processes of rates λ 1 and λ 2 are 

merged; Nt i is the count of process i up to time t; and N t = Nt 1 +Nt 

2 is the count 

of the merged process. Show that as h → 0, we have P(N h = 2) = o(h) and 

P(N h = 1) = (λ 1 +λ 2 )h+o(h). Note: similar, slightly more complex calculations 

are seen later in (3.1) to (3.3). 

3. (NHPP) 

(a) (Distribution of N t .) For the (NHPP) N t in Definition 2.12, write p n (t) = 

P(N t = n), n = 0, 1, 2, . . . (put p −1 (t) = 0 for later convenience), and show 

that these functions must satisfy the set of differential equations 

d 

dt p n(t) = −λ(t)p n (t) + λ(t)p n−1 (t), n = 0, 1, 2, . . . . (2.27) 

Then verify that the functions p n (t) = e −m(t) [m(t)] n /n!, n = 0, 1, 2, . . . satisfy 

(2.27). Note that (d/dt)m(t) = λ(t). Note p 0 (0) = 1 and p n (0) = 0 for n > 0, 

effectively saying that N 0 = 0. 

(b) (Distribution of S n .) Working as for the constant-rate process, show that the 

distribution of S n for the general case above is 

∑n−1 

−m(t) [m(t)]k 

P(S n > t) = e , t ≥ 0 

k! 

k=0 

so the distribution of the time of the 1st event, S 1 , is 

4. (Simulating Poisson processes.) 

P(S 1 > t) = e −m(t) , t ≥ 0. (2.28) 

(a) Simulate the event times of a Poisson process of rate λ = 2 during [0, T = 2] 

based on assumed pseudo-random numbers {.8187, .2466, .5488, .3679, .4066}. 

(b) Simulate the event times of the NHPP of rate function λ(t) = 1 + sin(πt), 

0 ≤ t ≤ 2, where π = . 3.14159, by thinning the process sampled previously. 

For the acceptance test, assume pseudo-random numbers 0.7, 0.4, 0.2, 0.6, 

adding your own if needed. 

(c) It can be shown that a random variable with cdf F (x) can be simulated 

by solving for x the equation F (x) = U, where U ∼ Unif(0, 1) (a pseudorandom 

number). Using this property together with (2.28), show how the 

time of the 1st event of the NHPP in (b) may be simulated. You may use 

that ∫ t 

sin(πu)du = (1/π)[1 − cos(πt)]. 

0 

35

. 

36


1. For s ≤ e, N e − N s has the Poisson distribution with mean λ(e − s) the rate λ 

times the length of the interval, e − s. Recall the Poisson distribution is given in 

(2.2). 

i (a) Recall the assumption N 0 = 0, so N 5 = N 5 − N 0 ; here e − s = 5, so 

N 5 ∼ Poisson(2) and thus P(N 5 = 3) = e −2 2 3 /3!. 

(b) In calculating this joint probability, note that N 5 and N 15 refer to overlapping 

time intervals, so they are not independent. The “trick” is to write N 15 = 

N 5 + N 15 − N 5 and use that N 15 − N 5 is independent of N 5 and Poisson(4)- 

distributed (e − s = 15 − 5 = 10, times rate, 0.4).Thus 

(c) 

P(N 5 = 3, N 15 = 7) = P(N 5 = 3, N 15 − N 5 = 7 − 3) 

= P(N 5 = 3)P(N 15 − N 5 = 4) 

= e 

−2 23 

3! 

e−4 

44 

4! . 

P(N 15 = 7|N 5 = 3) = P(N 15 = 7, N 5 = 3) 

P(N 5 = 3) 

= P(N 15 − N 5 = 4) by (b), the P(N 5 = 3) cancels out 

(d) 

−4 44 

= e 

4! . 

P(N 5 = 3|N 15 = 7) = P(N 5 = 3, N 15 = 7) 

P(N 15 = 7) 

= 

23 44 

e−2 e−4 

3! 4! 

. 

e −6 6 7 

7! 

ii We know X ∼ Expon(0.4), whose mean is 1/0.4 (seen elsewhere). Then 

P(X ≤ 1 2 ) = 1 − e−0.4· 1 

2 = 1 − e −0.2 . 

P(X ≤ 2) = 1 − e −0.4·2 = 1 − e −0.8 . 

iii In the Poisson process with mean function m(), N e − N s has the Poisson distribution 

with mean m(e) − m(s), and such counts on disjoint intervals are 

independent, as with the constant-rate process. Thus, we only modify the Poisson 

means; the mean of N 5 is m(5) − m(0) = m(5) and the mean of N 15 − N 5 is 

m(15) − m(5). 

2. The key idea is to express the events of interest as (note superscripts are not 

powers): 

{N h = 2} = {N 1 h = 2, N 2 h = 0} ∪ {N 1 h = 1, N 2 h = 1} ∪ {N 1 h = 0, N 2 h = 2} 

{N h = 1} = {N 1 h = 1, N 2 h = 0} ∪ {N 1 h = 0, N 2 h = 1} (2.29) 

37

where the events on the right are disjoint. Taking probabilities, 

P(N h = 2) = P(Nh 1 = 2, Nh 2 = 0) + P(Nh 1 = 1, Nh 2 = 1) + P(Nh 1 = 0, Nh 2 = 2) 

= P(Nh 1 = 2)P(Nh 2 = 0) + P(Nh 2 = 1)P(Nh 2 = 1) + P(Nh 1 = 0)P(Nh 2 = 2) 

by independence. 

The first term is ≤ P(N 1 h = 2) = o(h); likewise, the third term is ≤ P(N 2 h = 2) = 

o(h). The middle term is 

[λ 1 h + o(h)][λ 2 h + o(h)] = o(h) 

(terms λ i ho(h) and λ 1 λ 2 h 2 are o(h)), as required. Similarly, 

P(N h = 1) = P(Nh 1 = 1, Nh 2 = 0) + P(Nh 1 = 0, Nh 2 = 1) 

= P(Nh 1 = 1)P(Nh 2 = 0) + P(Nh 1 = 0)P(Nh 2 = 1) by independence 

= [λ 1 h + o(h)][1 − λ 2 h + o(h)] + [1 − λ 1 h + o(h)][λ 2 h + o(h)] 

= λ 1 h + λ 2 h + o(h) 

where, again, various original terms are combined into the o(h). 

3.(a) The count from t to t + h, denoted D t,h = N t+h − N t , is independent of N t by 

assumption. The essential difference relative to the homogeneous case is that 

the distribution of D t,h depends on both t and h, whereas that of D h there 

depended only on h. The derivation mimics closely the steps from (2.12) to 

(2.14), “correcting” for the difference above. 

{N t+h = n} = {N t = n, D t,h = 0} or {N t = n−1, D t,h = 1} or {D t,h ≥ 2, N t = n−D t,h } 

where the events on the right are disjoint, so the left probability is the sum of 

the events’ probabilities on the right. Work these out: 

P(D t,h ≥ 2, N t = n − D t,h ) ≤ P(D t,h ≥ 2) = o(h) by (2.19), 

P(N t = n, D t,h = 0) = p n (t)[1 − λ(t)h + o(h)] by independence and (2.19), 


P(N t = n − 1, D t,h = 1) = p n−1 (t)[λ(t)h + o(h)] by independence and (2.19). 

Putting these together, 

p n (t + h) = p n (t)[1 − λ(t)h + o(h)] + p n−1 (t)[λ(t)h + o(h)] + o(h). 

Exactly as in the constant-rate case (move p n (t) from right to left, divide by h, 

take limits as h → 0), we get (2.27). To verify, differentiate the given functions 

p n (t) = e −m(t) [m(t)] n /n!: 

d 

dt 

(e 

as required. 

−m(t) [m(t)]n 

n! 

) 

= 

( d 

dt e−m(t) ) [m(t)] 

n 

n! 

( ) d [m(t)] 

= −e −m(t) n 

dt m(t) n! 

= −λ(t)p n (t) + λ(t)p n−1 (t), 

38 

+ e −m(t) d [m(t)] n 

dt n! 

−m(t) n[m(t)]n−1 

+ e 

n! 

d 

dt m(t)

(b) 

∑n−1 

∑n−1 

−m(t) [m(t)]k 

P(S n > t) = P(N t < n) = P(N t = k) = e . 

k! 

k=0 

k=0 

4.(a) The event times are simulated iteratively as S 0 = 0, S i = S i−1 + X i for i = 

1, 2, . . ., stopping as soon as S i > T = 2, where the inter-event times X i are 

simulated by the formula X i = (−1/λ) log(U i ), the U i being Unif(0, 1) random 

numbers (Method 2.16). We obtain: 

i U i X i = −(1/λ) log(U i ) S i = S i−1 + X i 

1 .8187 0.1 0.1 

2 .2466 0.7 0.8 

3 .5488 0.3 1.1 

4 .3679 0.5 1.6 

5 .4066 0.45 2.05 

(b) Following Method 2.16, compute the maximum: K = max 0≤t≤2 [1 + sin(πt)] = 2. 

The event times in (a) may be used as the times to be thinned because they were 

sampled with the appropriate rate, 2. Denoting by V i the given pseudo-random 

numbers, calculate: 

S i λ(S i )/K V i Is V i ≤ λ(S i )/K? 

0.1 0.65 0.7 No 

0.8 0.79 0.4 Yes 

1.1 0.34 0.2 Yes 

1.6 0.02 0.6 No 

Thus, the simulated process has events at times 0.8 and 1.1 only. 

(c) The mean function is 

m(t) = 

∫ t 

0 

(1 + sin(πu))du = t + (1/π)[1 − cos(πt)]. 

Aim for the point at which the cdf of S 1 equals U: 

F (t) = 1 − e −m(t) = U ⇔ m(t) = − log(1 − U). 

The explicit point t is not pursued here. 

39

. 

40

Chapter 3 

Queues 


Vectors are column vectors. The transpose of a vector p is denoted p T . 

We often take limits as the length h of a time interval goes to zero; we write this 

as “lim h→0 ”, or even as “lim” when the condition is obvious. 

A probability distribution with support a set E is called in short a distribution on 

E. 

3.1.1 Queues: Terminology and Global Assumptions 

Jobs (customers) arrive at a system at which a number of servers are stationed. A 

job may have to wait in a waiting area (queue) until a server becomes available. After 

being served in a service area, jobs leave. The system includes both these areas. 

1. The times between arrivals are independent identically distributed (iid) random 

variables. 

2. The service times are random variables and independent of the inter-arrival times. 

3. No idling. Server(s) will not idle if jobs are waiting. 

4. M/M/c/k means Memoryless inter-arrival times, Memoryless service times, c 

Servers, and a maximum of k jobs waiting in queue, where k = ∞ if not specified. 

“Memoryless” means that these times are exponentially distributed. 

5. The inter-arrival times and the service times have means that are positive and 

finite. 

41

3.2 The Birth-Death Process 

Denote X t the number of jobs in the system at time t, also called the state. We are 

interested in the (stochastic) process X = (X t : t ≥ 0). If there is a maximum allowed 

number in the system, say k, then X t takes values in the finite set E = {0, 1, 2, . . . , k}, 

0 being the lowest state, reflecting an empty system. If there is no such maximum, 

then X t takes values in the infinite set E = {0, 1, 2, . . .}. A birth-death process arises 

as follows: 

1. Whenever X t = i, events whose effect is to increase X by one (“births”) occur 

according to a Poisson process B = (B t ) of (birth) rate λ i (the time T B until the 

next birth is an Expon(λ i ) random variable). 

2. In addition, again given X t = i, events whose effect is to decrease X by one 

(“deaths”) occur according to a Poisson process D = (D t ) of (death) rate µ i (the 

time T D until the next death is an Expon(µ i ) random variable). 

3. These Poisson processes are independent. 

The process X moves as follows. The next state of X, as well as the timing of the 

state change, are the result of the competition between the birth and death processes. 

This means that: (a) Starting from state i, the time when the process X first changes 

state, T = min(T B , T D ), has an Expon(λ i + µ i ) distribution (as seen in merging); and 

(b) The next state is i + 1 with probability λ i /(λ i + µ i ) (a birth happens before a 

death), or i − 1 with the remaining probability (a death happens before a birth). 

For each i, we now derive conditional (transition) probabilities of the future state 

of the process given X t = i, h time units in the future. Put: 

• B = number of births (occurring) in (t, t + h], and D = number of deaths. 

• N := B + D is the number of events (births plus deaths) in (t, t + h]. 

Then, 

P(X t+h = i + 1|X t = i) = P(B − D = 1|X t = i) 

= P(B − D = 1, N = 1|X t = i) + P(B − D = 1, N ≥ 2|X t = i) 

} {{ } 

o(h) 

= P(B = 1, D = 0|X t = i) + o(h) 

= P(B = 1|X t = i)P(D = 0|X t = i) + o(h) 

= [λ i h + o(h)][1 − µ i h + o(h)] + o(h) = λ i h + o(h), (3.1) 

42

P(X t+h = i − 1|X t = i) 

= P(B − D = −1|X t = i) 

= P(B − D = −1, N = 1|X t = i) + P(B − D = −1, N ≥ 2|X t = i) 

} {{ } 

o(h) 

= P(B = 0, D = 1|X t = i) + o(h) 

= P(B = 0|X t = i)P(D = 1|X t = i) + o(h) 

= [1 − λ i h + o(h)][µ i h + o(h)] + o(h) = µ i h + o(h), (3.2) 

and X changes by 2 or more with probability that is negligible as h → 0: 

P(X t+h = j|X t = i) ≤ P(N ≥ 2|X t = i) = o(h) for |j − i| ≥ 2. (3.3) 

Consequently, X remains at the same state with the probability that remains, i.e., 

P(X t+h = i|X t = i) = 1 − (λ i + µ i )h + o(h). 

The distribution of X t can be expressed via differential equations obtained similarly 

to those for the Poisson process, (2.14). In these equations, to be seen later for a more 

general process, the following limits appear: 

⎧ 

P(X t+h = j|X t = i) 

⎨ 

q i,j := lim 

= 

h→0 h 

⎩ 

λ i j = i + 1 

µ i j = i − 1 

0 otherwise (j /∈ {i − 1, i + 1}) 

⎫ 

⎬ 

⎭ , j ≠ i. 

(3.4) 

The q i,j , sometimes called transition rates or probability rates, are probabilities of 

transitions (state changes) relative to time. Note that if a i → j transition requires 

at least 2 events in order to happen, then q i,j = 0. On the other hand, if i → j is 

caused by a single event, then q i,j is the rate of the underlying process. 

The transition rates are sometimes summarised in a transition diagram, as in 

Figure 3.1. 

λ 0✲ λ 1 

0 1 

✲ 

λ n−1 

✛ 2 n − 1 ✲ λ 

n n ✲ n + 1 

µ2 

✛ 

µ1 

✛ 

µn 

✛ 

µn+1 

 

Figure 3.1: Transition diagram of a birth-death process. 

43

3.3 Continuous-Time Markov Chains (CTMCs) 

Definition 3.1 The stochastic process (X t : t ≥ 0) is called a continuous-time 

Markov chain (CTMC) with state space (set of possible states) E if for all i, j, i 1 , . . . , i k ∈ 

E, all t ≥ 0, and all p 1 , p 2 , . . . p k ≤ s, 

P(X s+t = j|X p1 = i 1 , . . . , X pk = i k , X s = i) = P(X s+t = j|X s = i). (3.5) 

If the right side of (3.5) does not depend on s, then the CTMC is called homogeneous. 

In words, given the process’ history up to time s, the conditional distribution of X 

at future time points is the same as the conditional distribution given only its present 

value, X s . Yet in other words, we can say that the X process is memoryless, because 

its future behaviour depends on its history only through the present; the strict past 

is irrelevant. We are mainly interested in homogeneous CTMCs, where 

is a function of i, j and t. 

3.3.1 Generator 

We assume there exist q i,j such that 

p i,j (t) := P(X s+t = j|X s = i) 

p i,j (h) 

q i,j := lim , i, j ∈ E, i ≠ j. 

h→0 h 

That is, q i,j is a “probability rate” of going from i to j. The birth-death process seen 

earlier is a special case of a homogeneous CTMC with q i,j being the λ i for j = i + 1, 

the µ i for j = i − 1, and 0 otherwise. 

We also define 

P(X t+h ≠ X t |X t = i) 1 − p i,i (h) 

q i = lim 

= lim 

h→0 

∑ h 

h→0 h 

j∈E,j≠i 

= lim 

p i,j(h) 

= ∑ ( ) 

p i,j (h) 

lim = ∑ 

h→0 h 

h→0 h 

j∈E,j≠i 

j∈E,j≠i 

where we will always assume the interchange between limit and summation is valid 

(it could fail, but only if E is infinite and under further unusual conditions). (We 

used that ∑ j∈E p i,j(t) = 1 for all i and t, a consequence of the fact that X t must take 

a value in E.) That is, q i is a “probability rate” of leaving i (to go anywhere else). 

We summarise these in the matrix 

q i,j 

Q = [q i,j ] i,j,∈E 

44

where we put q i,i = −q i . Q is called the (CTMC) generator and is central to later 

calculations. It can be calculated without taking limits, after a little experience. 

Example 3.2 A repair shop has two workers named fast (F) and slow (S). Jobs arrive 

according to a Poisson process of rate 4. Service times are exponentially distributed 

with mean 1/3 and 1/2 at F and S respectively. Whenever both F and S are available 

to process a job, preference is given to F. Arrivals that find a job waiting are lost. 

Focus on the system state over time, with the following possible states: 

Numeric Code State Description 

1 empty 

2 F working, S idle 

3 S working, F idle 

4 both servers working, 0 jobs in queue 

5 both servers working, 1 job in queue 

A detailed derivation of the generator would resemble that done for the birth-death 

process in (3.1) to (3.3). Here, a 5 → 4 transition is caused by a departure at F 

or S, i.e., an event of the process that merges these two event types, whose rate is 

3 + 2 = 5 (merging result). Thus, similar to (3.2), p 5,4 (h) = 5h + o(h) as h → 0, 

so q 5,4 = lim h→0 p 5,4 (h)/h = 5. Consider now q 5,3 . Similar to (3.3), p 5,3 (h) = o(h) 

because 5 → 3 requires at least two departures, so q 5,3 = lim h→0 p 5,3 (h)/h = 0. And 

so on for other i, j. 

Generally, a systematic calculation of the generator can go as follows. For each 

type of event (Poisson process), write the corresponding rate, and list all transitions 

caused by a single such event in the form “from i to j”, for specified i and j. Here 

this gives: 

Causing Event Rate List of State Changes 

Arrival 4 from 1 to 2, from 2 to 4, from 3 to 4, from 4 to 5 

Departure at F 3 from 2 to 1, from 4 to 3, from 5 to 4 

Departure at S 2 from 3 to 1, from 4 to 2, from 5 to 4 

Then, for each i and j ≠ i, locate all i → j transitions listed and find q i,j as the sum 

of the corresponding rates. Here this gives (empty matrix entries indicate zeros): 

⎡ 

⎤ 

−4 4 

3 −7 4 

Q = 

⎢ 2 −6 4 

⎥ 

⎣ 2 3 −9 4 ⎦ 

3 + 2 −5 

45

3.3.2 Time-dependent Distribution and Kolmogorov’s Differential 

System 

Assume the process state is known at time 0, e.g., X 0 = 5. Put p i (t) = P(X t = i). 

We focus on the vector p(t) = (p i (t)) i∈E , which is the distribution at time t of X t . 

The whole distribution is related to that at previous times via 

p i (t + h) = P(X t+h = i) 

= ∑ k∈E 

P(X t+h = i, X t = k) 

= ∑ k∈E 

P(X t = k)P(X t+h = i|X t = k) 

= ∑ k∈E 

p k (t)p k,i (h) 

= p i (t)p i,i (h) + ∑ 

k∈E,k≠i 

p k (t)p k,i (h) t, h ≥ 0 

(Theorem 1.10, Law of Total Probability). Subtract p i (t) from both left and right, 

and divide by h: 

p i (t + h) − p i (t) 

h 

= −p i (t) 1 − p i,i(h) 

h 

+ ∑ 

k∈E,k≠i 

p k (t) p k,i(h) 

. 

h 

Take limits as h → 0. In the rightmost term, we assume the limit can pass inside the 

sum (which is the case if E is finite), i.e., 

lim 

∑ 

h→0 

k∈E,k≠i 

p k (t) p k,i(h) 

h 

= ∑ 

k∈E,k≠i 

p k (t) 

and arrive at Kolmogorov’s differential system: 

where p ′ i(t) = d dt p i(t). 

p ′ i(t) = −p i (t)q i + 

∑ 

k∈E,k≠i 

( 

lim 

h→0 

) 

p k,i (h) 

= ∑ 

p k (t)q k,i 

h 

k∈E,k≠i 

p k (t)q k,i , i ∈ E. (3.6) 

Supposing for example X 0 = 5, we would have the initial condition p 5 (0) = 1 and 

p j (0) = 0 for j ≠ 5, and p(t) should satisfy (3.6) together with the initial condition. 

46

3.4 Long-Run Behaviour 

With X t being a count of jobs in the system at time t, or something similar (repairshop 

example), and all events causing changes to X “being” independent Poisson 

processes (inter-event times having exponential distributions), (X t : t ≥ 0) will be a 

CTMC. 

The theory of (Homogeneous) CTMCs is rich, especially for time going to ∞. It 

is built upon a corresponding theory for discrete-time Markov chains (t integer). The 

theory is technical and will not be seen in depth. A key summary result, Fact 3.3 

below, is our basic supporting theory. 

1 {A} is the indicator function, valued 1 on the set (event) A and 0 elsewhere. Thus, 

for example, 1 {Xu=i} indicates if the process X at time u is at state i. Put 

T i 

t = time from 0 to t that X is in state i = 

∫ t 

0 

1 {Xu=i}du. (3.7) 

Fact 3.3 The CTMC (X t : t ≥ 0) corresponding to “interesting” and “stable” queues 

converges, regardless of the initial state X 0 , as follows. 

(a) 

lim p i(t) = π i , 

t→∞ 

lim 

t→∞ 

p ′ i(t) = 0, i ∈ E. (3.8) 

(b) Putting (3.8) into (3.6), we have the balance equations 

π i q i = 

∑ 

k∈E:k≠i 

π k q k,i , i ∈ E (3.9) 

(π T Q = 0). If the equation set (3.9) together with the normalising equation 

∑ 

π i = 1 (3.10) 

i∈E 

has a solution (π i ) i∈E , then the solution is unique and π i > 0 for all i. This is 

called the stationary (or steady-state or limit) distribution of the CTMC. 

(c) If a stationary distribution (π i ) i∈E exists, then 

lim 

t→∞ 

and as a consequence of π i > 0 we have 

T i 

t 

t = π i w.p. 1 for all i ∈ E (3.11) 

lim T t i = ∞, i ∈ E. (3.12) 

t→∞ 

47

Result (3.11) is basic to calculations below. For this reason, we want at least some 

idea why it holds. We argue why the corresponding mean converges to π i : 

[∫ t 

] ∫ t 

ETt i = E 1 {Xu=i}du = E[1 {Xu=i}]du (interchange of E[] and ∫ assumed valid) 

Then 

0 

= 

0 

∫ t 

0 

p i (u)du. 

[ ∫ ] 

t 

E 1 0 {X u=i}du 

lim 

t→∞ t 

∫ t 

= lim 

p 0 i(u)du 

t→∞ t 

= π i , 

the last step following from lim u→∞ p i (u) = π i , which is said in (3.8). 

As π i is the (long-run) fraction of time spent at state i, the balance equation (3.9) 

for state i says 

(fraction of time at i) × q i = 

∑ 

k∈E,k≠i 

(fraction of time at k) × q k,i 

the left side being a “rate out of i” and the right side being a “rate into i”. 

48

3.4.1 Long-Run Average Cost 

Proofs here are non-examinable. (2.26) will be used heavily, for arrival events. 1 

Cost a Deterministic Function of System State 

Suppose that whenever X u = i, we incur a cost f(i) per unit time, where f is a given 

function (while thinking “cost” may help, f could represent anything, e.g., a gain). 

Then the cost per unit time, up to time t, is 

∫ t 

f(X 0 u)du 

= ∑ f(i) 

t 

i∈E 

Thus the long-run cost per unit time is 

∫ 

∑ t 

lim f(i) 

1 0 {X u=i}du 

= ∑ ( 

f(i) 

t→∞ 

t 

i∈E 

i∈E 

lim 

t→∞ 

∫ t 

∫ t 

1 0 {X u=i}du 

, 

t 

0 1 {X u=i}du 

t 

) 

= ∑ i∈E 

f(i)π i w.p. 1 

(3.13) 

by (3.11) in the last step, and by assuming that the limit can pass inside the sum, 

which is the case if E is finite. This is abbreviated as “average of f(X)”. Note this 

is E[f(X)] for the random variable X whose distribution is (π i ) i∈E . 

We now apply this. 

Example 3.2 (continued) Find the following long-run quantities: (a) the average 

number of jobs in the system; (b) the average number of busy servers; and (c) the 

fraction of time that both servers are busy. 

To identify the stationary distribution, solve ∑ 5 

i=1 π i = 1 together with (3.9), i.e. 

(the generator was given previously): 

State Rate Out = Rate In 

1 4π 1 = 2π 3 + 3π 2 

2 7π 2 = 4π 1 + 2π 4 

3 6π 3 = 3π 4 

4 9π 4 = 4π 2 + 4π 3 + 5π 5 

5 5π 5 = 4π 4 

Obtaining the solution is then straightforward. 2 

(a) Following the cost result (3.13), the average number of jobs in the system is 

1 · (π 2 + π 3 ) + 2π 4 + 3π 5 (f(1) = 0; f(2) = f(3) = 1; f(4) = 2; f(5) = 3). 

(b) 1 · (π 2 + π 3 ) + 2(π 4 + π 5 ) (f(1) = 0; f(2) = f(3) = 1; f(4) = f(5) = 2). 

(c) π 4 +π 5 (f is an indicator function, valued 1 at states 4 and 5, and 0 elsewhere). 

1 Assume all inter-arrival times are finite; then the arrival times S n are finite for all n; then the 

number of events up to t, N t = ∑ ∞ 

n=1 1 {S n≤t}, satisfies lim t→∞ N t = ∞, verifying the A2 there. 

2 The solution is the vector (65, 60, 40, 80, 64)/309. 

49

Cost Events 

We consider here costs that arise as counts of certain cost events. In the first model, 

while the system state is i, cost events are a Poisson process with rate c i . Put 

C i t = number of cost events that occur while the state is i, up to time t (3.14) 

so C t = ∑ i∈E Ci t is the number of cost events up to time t. Write 

C t 

t = ∑ i∈E 

and recall T i 

t is the time spent at i, see (3.7). The fractions on the right converge: 

1. T i 

t /t converges (w.p. 1) to the state-i stationary probability, π i , by (3.11). 

2. C i t/T i 

t converges (w.p. 1) to the underlying rate, c i , by (2.26) (the assumption there 

that time goes to ∞ is checked by (3.12)). 

C i t 

T i 

t 

T i 

t 

t 

Thus 

C t 

lim 

t→∞ t = ∑ i∈E 

Ct 

i lim 

t→∞ Tt 

i 

lim 

t→∞ 

Tt 

i 

t = ∑ c i π i (3.15) 

i∈E 

where the “w.p. 1” will be dropped for convenience. 

In the second model, assume a cost event is triggered for each arrival if and only 

if it finds the system (X t ) in one of the states in a set A (for example, A could have 

a single “system is full” state). Let C t be the number of cost events up to t; then the 

average number of cost events per unit time is 

where 

C t 

t = ∑ Nt 

i N t 

N t t , 

i∈A 

N i t = number of arrivals that find the system in state i, up to time t (3.16) 

and N t is the number of arrivals up to time t. As t → ∞: 

1. N t /t converges (w.p. 1) to the arrival rate, call it λ, again by (2.26). 

2. We will later see a Theorem called PASTA (Poisson Arrivals see Time Averages) 

that ensures that N i t 

N t 

converges (w.p. 1) to the state-i stationary probability, π i . 

Thus 

C t 

lim 

t→∞ t = ∑ i∈A 

Nt 

i lim 

t→∞ N t 

N t 

lim 

t→∞ t 

= ∑ i∈A 

π i λ. (3.17) 

50

Example 3.4 Jobs (customers) arrive to a system according to a Poisson process of 

rate λ = 5. There are two servers, and service times at either server are exponentially 

distributed with mean 1/µ = 1/3 (rate µ = 3), and independent of everything else. 

At most two jobs may wait for service, so a job that arrives to find 2 jobs waiting 

balks, meaning it does not join the queue. Moreover, each job i abandons, meaning it 

leaves without being served, as soon as its waiting time in queue is equal to Y i , where 

the Y i are exponentially distributed with mean 1/η = 1/2 (rate η = 2), independently 

of everything else. 

(The system state, X t , defined as the number of jobs in the system, thus takes 

values in E = {0, 1, 2, 3, 4} and the independent exponential distributions imply that 

(X t : t ≥ 0) is a CTMC.) We want to calculate: 

(a) The long-run average number of abandons per unit time. 

(b) The long-run average number of balks per unit time. 

First, calculate the generator. 

Single events that cause a state change are the 

arrivals, departures of served customers, and (what is new here) abandons, which act 

the same way as departures, decreasing the state by one. The balk events do not 

change the state. 

The rate of abandons depends on the state: when in a state with k jobs in queue 

(importantly the state determines k: k = 1 in state 3, k = 2 in state 4, k = 0 in 

other states), and since η is given as the rate at which an individual job abandons, 

the abandon rate is kη (merging property). Summing the rates of the events causing 

the same state transition (as usual), we find the generator is 

⎡ 

⎤ 

λ 

µ λ 

Q = 

⎢ 2µ λ 

⎥ 

⎣ 2µ + η λ ⎦ 

2µ + 2η 

(the main diagonal is implicit, and other entries are zero). 

This is a birth-death 

process (i.e., we have zeros everywhere except on the diagonals above and below the 

main diagonal). In Section 3.4.2 below, we derive the stationary distribution of a 

general birth-death process (with infinite state space); this essentially also shows how 

the finite-state space solution can be obtained. So we do not pursue a solution here. 

Denote the stationary distribution (π i ) 4 i=0. 

(a) Following the first cost result, (3.15), the long-run average number of abandons 

per unit time is ηπ 3 + 2ηπ 4 (cost rates c 3 = η, c 4 = 2η, and 0 at other states). 

51

(b) Following the second cost result, (3.17), the long-run average number of balks 

per unit time is π 4 λ (A = {4}). 

♣ Study the following. E03 2, Two Stations in Tandem. Involves thinning and 

thereby a relatively tricky calculation of the generator. 

52

3.4.2 Explicit Stationary Distributions for Specific Models 

Birth-Death Process with Infinite State Space 

The generator is in (3.4), so the balance equations become 

π 0 λ 0 = π 1 µ 1 

π i (λ i + µ i ) = π i−1 λ i−1 + π i+1 µ i+1 , i = 1, 2, . . . . 

} 

(3.18) 

Solving iteratively gives 

π i λ i = π i+1 µ i+1 , i = 0, 1, . . . (3.19) 

i.e., π i+1 = π i λ i /µ i+1 , from which we obtain (proceeding down to state zero) 

π k = π k−1 

λ k−1 

µ k 

= π k−2 

λ k−2 

µ k−1 

λ k−1 

µ k 

= . . . = π 0 

λ 0 λ 1 . . . λ k−1 

µ 1 µ 2 . . . µ k 

, k = 1, 2, . . . . (3.20) 

Normalise, i.e., require (3.10) (substitute the π’s in terms of π 0 ): 

∞∑ 

( 

π i = π 0 1 + λ 0 

+ λ ) ( 

) 

0λ 1 

∞∑ λ 0 λ 1 . . . λ k−1 

+ . . . = π 0 1 + 

= 1. (3.21) 

µ 1 µ 1 µ 2 µ 1 µ 2 . . . µ k 

i=0 

We see that a unique solution (stationary distribution) exists if and only if 

1 + 

∞∑ 

k=1 

k=1 

It is then given by (3.20), with π 0 determined from (3.21). 

λ 0 λ 1 . . . λ k−1 

µ 1 µ 2 . . . µ k 

< ∞. (3.22) 

53

Application 3.5 Our standard notation for the M/M/c model will be an arrival rate 

denoted λ (i.e., inter-arrival times exponentially distributed with mean 1/λ) and an 

individual-server service rate denoted µ (i.e., service times exponentially distributed 

with mean 1/µ). With X t the number of jobs in the system at time t, X = (X t : t ≥ 0) 

is a birth-death process with state space {0, 1, 2, . . .}, as there is infinite waiting space. 

As servers are non-idling, µ n , the death rate when X t = n, is the rate of departures 

aggregated over all busy servers, i.e., the individual-server departure rate, µ, times 

the number of busy servers; thus 

µ n = 

{ nµ n < c 

cµ n ≥ c 

The birth (arrival) rate is λ, regardless of X t ; that is, λ n = λ for all n. Putting 

ρ := λ 

cµ , 

the birth-death solution (3.20) becomes 

{ 

λ n 

π0 

π n = 

= π µ·2µ·3µ···nµ 0 cn ρ n 

n ≤ c 

n! 

π c ρ n−c c 

= π c ρ n 

0 n ≥ c 

c! 

(the two branches agree at n = c) and π 0 must satisfy 

∑c−1 

1 = π n + 

n=0 

∞∑ 

n=c 

[ ∑c−1 

(cρ) n 

π n = π 0 + (cρ)c 

n! c! 

n=0 

] 

∞∑ 

ρ n−c . 

n=c 

(3.23) 

Thus, a stationary distribution exists if and only if ∑ ∞ 

j=0 ρj < ∞, i.e., if and only if 

ρ < 1. In this case, ∑ ∞ 

j=0 ρj = 1 , and 

1−ρ 

[ c−1 

] −1 

∑ (cρ) n 

π 0 = 

+ (cρ)c . (3.24) 

n! c!(1 − ρ) 

n=0 

For future reference, for c = 1 (M/M/1 model), we get π 0 = 1 − ρ (the sum is 1, 

the second term is ρ/(1 − ρ)), and we find π n = (1 − ρ)ρ n for all n. 

With X t being the number of jobs in the system in the finite-waiting space model 

M/M/c/k and in extensions that allow balks and abandons (with exponentially distributed 

patience times), a stationary distribution exists for any ρ (as there are only 

c + k + 1 π’s); it is not difficult to compute by using the general birth-death solution 

(3.20) and (3.21). 

♣ Study the following M/M/c-type problems. • E04 1, except for the question on 

waiting times (last paragraph) : M/M/1 versus an M/M/2 with half-as-fast servers, 

i.e., same overall speed. 

centralised M/M/2 that handles all work. 

• E06 1: compare three separate M/M/1 systems to a 

54

Transitions Beyond Infinite-Case Birth-Death, Solvable via PGFs 

♣ Study the following. 

• Exercise 6. Then recognise that E06 3, E07 2, E10 3 are all similar to this. The 

modelling novelty is to allow batch arrivals. An infinite-state-space CTMC arises, 

with structure more complex than the birth-death case. Solvable via the pgf of 

the stationary distribution, as shown in the exercise. 

• E04 2. This is the M/E k /1 Model, where E k refers to the Erlang-k distribution, 

meaning that the service time is the sum of k independent exponentially distributed 

service stages (of known rate each). A reduction to an infinite-state-space CTMC 

is done, the state variable now counting stages (in the system) rather than jobs. 

The balance equations are exactly as in Exercise 6 (but the state variable has different 

meaning across the two problems), and thus so is the pgf of the stationary 

distribution. In answering questions involving the number of jobs, we must translate 

from a job count to (a set of) stage counts: for example, if k = 2, then the 

state “1 job in the system” is equivalent to “1 stage in the system” or “2 stages in 

the system”. 

55

3.5 Arrivals That See Time Averages 

We now state carefully a key theorem that states, under conditions, the equality of 

long-run customer averages to associated stationary probabilities. 

Theorem 3.6 (Arrivals See Time Averages (ASTA)) Let X = (X t : t ≥ 0) be 

the number of jobs in the system, assumed to be a CTMC with state space E and 

stationary distribution (π i ) i∈E . Write T j for the j-th arrival time. Assume that for 

all t, the arrival-counting process (essentially the T j ’s) forward from t is independent 

of the history of X up to t. Then 

lim 

n→∞ 

∑ n 

j=1 1 {X Tj =i} 

n 

= π i w.p. 1, i ∈ E. (3.25) 

Taking expected value of the left side results in the indicator random variables being 

replaced by probabilities, so the above gives 

∑ n 

j=1 

lim 

P(X T j 

= i) 

= π i . (3.26) 

n→∞ n 

If the arrivals are a Poisson process, then the theorem’s assumptions are true, 

and this special case is called PASTA, the added “P” standing for “Poisson”. 

Exercises and Exams from Year 2007/08 onwards, PASTA is mentioned when used. 

In pre-2007/08 exams, PASTA is implicitly assumed without being mentioned. 

With deterministic arrival and service times, the ASTA conclusions tend to fail: 

although there exists a long-run fraction of time in state i, it tends to differ from the 

long-run fraction of arrivals that find the system in this state. 

In 

56

3.5.1 A Steady-State Delay Distribution 

Problem 3.7 Customers arrive to an M/M/1 queue at rate λ = 1/2 per hour. 

Service discipline is first come first served (FCFS). Two rates of service are possible. 

The faster rate of service is µ = 1 per hour and costs £40 per hour; the slower service 

rate is µ = 0.75 per hour and costs £30 per hour. A cost of £200 is incurred for 

each customer that waits in queue (i.e., excluding service) more than one hour. Find 

which service rate gives the lowest average cost per hour. 

We will analyse a slightly more general problem, where we replace the constant 

“1” (the “1 hour”) by any s ≥ 0, and assume the M/M/c model for c general. Let 

W j be the delay (time spent in queue) of the j-th job (customer). Write 

∑N t 

C t = number of arrivals up to t that wait more than s = 

Similar to the cost model that gave (3.17), write 

C t 

t = C t N t 

N t t 

j=1 

1 {Wj >s}. 

where N t is the number of arrivals up to t. Now, as t → ∞, N t /t converges (w.p. 

1) to the arrival rate, as seen in (2.26), and lim t→∞ N t = ∞. We attempt to guess 

lim t→∞ C t /N t from the corresponding (limit of) expected value: 

[ ] [ 

] 

Ct 

1 ∑N t 

1 

n∑ 

F (s) := lim E = lim E 1 {Wj >s} = lim P(W j > s), s ≥ 0. 

t→∞ N t t→∞ n→∞ n 

N t 

j=1 

We call F () a steady-state delay distribution (it is usually a proper distribution). 

j=1 

(3.27) 

To find F , let T j be the arrival time of the j-th job, and condition on the number 

of jobs it finds on the system, denoted X Tj : 

∞∑ 

P(W j > s) = P(W j > s|X Tj = k)P(X Tj = k) (3.28) 

k=0 

for any s ≥ 0 (this is the Law of Total Probability (LTP), Theorem 1.10, with partition 

events {X Tj = k}, k = 0, 1, 2, . . .). 

Supposing the system has c servers, it suffices to sum over k ≥ c, since otherwise 

the waiting time is zero; thus 

1 

n∑ ∞∑ 

F (s) = lim P(W j > s|X Tj = c + k)P(X Tj = c + k) by (3.28) 

n→∞ n 

j=1 k=0 

( 

) 

∞∑ 

1 

n∑ 

= P(W > s|X = c + k) · lim P(X Tj = c + k) (3.29) 

n→∞ n 

k=0 

57 

j=1

(limit passed inside sum), where we abbreviate W j as W and X Tj as X, as j does not 

affect the probability. Now note that: 

• The term in parenthesis, lim n→∞ 

1 

n 

∑ n 

j=1 P(X T j 

= c + k), equals the stationary 

probability that the process X is at state c + k, by PASTA, (3.26). 

• The conditional probability has been considered, under the First-Come, First- 

Serve (FCFS) discipline, in Application 2.11: given X = c + k with k ≥ 0, W has 

a Gamma distribution. Specifically, assuming Expon(µ) service times, plugging 

the Gamma result from there and the π’s from (3.23) into (3.29) and simplifying 

gives 

F (s) = 

π c 

1 − ρ e−(cµ−λ)s , s ≥ 0, (3.30) 

where F (0) = π c /(1 − ρ) = ∑ ∞ 

n=c π n is the stationary probability that all servers 

are busy. The formula describes the steady-state delay as a mixed distribution 

with two components: with probability 1 − F (0), it is zero; with the remaining 

probability, F (0), it is distributed as Expon(cµ − λ). 

The answers to Problem 3.7 are now direct from (3.30) and the M/M/1 stationary 

distribution, given following (3.24). In the slow system, π c /(1 − ρ) = ρ = 2/3, 

F slow (1) = 2 3 e− 1 4 ·1 . = 0.519, 

and the long-run average cost per hour is 30 + 200λF slow (1) = £81.9. Exactly in the 

same way, in the fast system, π c /(1 − ρ) = ρ = 1/2, 

F fast (1) = 1 2 e− 1 2 ·1 . = 0.303 

and the long-run average cost per hour is 40 + 200λF fast (1) = £70.3. Thus the faster 

system has lower cost. 

♣ Study the following. Exams E03 1, E07 3 (last 10 points), E11 4(b) are all 

essentially the above analysis. 

58

3.6 Little’s Law 

Little’s Law states the existence of three long-run limits and a link between them. 

We need the following. 

N t 

X t 

W j 

number of arrivals up to time t 

number of jobs present in the system at time t 

time spent in the system by job j 

Very roughly speaking, the central idea is that there exist (random) times such 

that all the processes “probabilistically restart” themselves, independently of the past, 

and moreover, the processes go through infinitely many such “independent identically 

distributed” cycles. Taking these times as 

τ i = i-th time the system becomes empty after having been non-empty, i = 1, 2, . . . 

where we put τ 0 = 0, and putting 

A i = ∫ τ i 

τ i−1 

X u du = “area” of X process during the i-th cycle 

Ñ i = N τi − N τi−1 = number of arrivals during the i-th cycle 

the following is a set of precise supporting assumptions: 

A1. The cycle lengths (τ i − τ i−1 ) ∞ i=1 are iid. The areas (A i ) ∞ i=1 are iid. The arrival 

counts (Ñi) ∞ i=1 are iid. 

A2. E[τ 1 ] < ∞ (finite mean cycle length) and E[Ñ1] < ∞ (finite mean number of 

arrivals per cycle). 

A3. The cycle containing t, R t := min{i : τ i ≥ t}, and the cycle during which job k 

arrives, C k := min{i : N τi ≥ k}, satisfy 

lim R t = ∞, 

t→∞ 

lim C k = ∞, w.p. 1. 

k→∞ 

Theorem 3.8 (Little’s Law) Assume A1 to A3. Then, 

(i) Long-run average number in system: 

lim 

t→∞ 

∫ t 

0 X udu 

t 

= E[A 1] 

E[τ 1 ] 

=: L w.p. 1. (3.31) 

(ii) Long-run average arrival rate: 

N t 

lim 

t→∞ t 

= E[Ñ1] 

E[τ 1 ] 

=: λ w.p. 1. (3.32) 

59

(iii) Long-run average waiting time: 

lim 

k→∞ 

∑ k 

i=1 W i 

k 

= E[A 1] 

E[Ñ1] 

=: W w.p. 1. (3.33) 

Thus, in particular, L = λW . 

Proof. The proof is non-examinable and given for completeness. 


(i) For any t and for n = R t being the cycle containing t, we have τ n−1 ≤ t ≤ τ n 

A 1 + . . . + A n−1 ≤ 

∫ t 

0 

X u du ≤ A 1 + . . . + A n . 

Dividing the latter inequality by the (reversed) former inequality, 

∫ t 

A 1 + . . . + A n−1 

≤ 

X 0 udu 

≤ A 1 + . . . + A n 

. (3.34) 

τ n 

t 

τ n−1 

We will take limits of the bounds as t → ∞ and thus n = R t → ∞. Consider the 

lower bound (left side of (3.34)) carefully. The enumerator is a sum of a large number 

of A’s, n − 1 of them, and the denominator τ n = ∑ n 

i=1 (τ i − τ i−1 ) is the sum of n 

cycle lengths, suggesting a “Strong Law of Large Numbers” effect for both. Indeed, 

as n → ∞, 

A 1 + . . . + A n−1 

τ n 

= A 1 + . . . + A n−1 

n − 1 

} {{ } 

→E[A 1 ] w.p. 1 

n − 1 

n 

} {{ } 

→1 

1 

P n 

i=1 (τ i−τ i−1 ) 

n 

} {{ } 

→1/E[τ 1 −τ 0 ]=1/E[τ 1 ] w.p. 1 

(3.35) 

by using the SLLN for the areas (A’s) and for the cycle lengths (τ i − τ i−1 ’s). That 

is, the above converges to E[A 1 ]/E[τ 1 ] w.p. 1. Checking that the upper bound (right 

side of (3.34)) has the same limit, the proof is complete. 

(ii) The ideas are similar to (i). For any t and for n = R t , we have 

N τn−1 

τ n 

≤ N t 

t 

≤ N τ n 

τ n−1 

(3.36) 

and the limit of the lower bound as t → ∞ (thus n = R t → ∞) is 

N τn−1 

lim 

n→∞ τ n 

Ñ 1 + . . . + 

= lim 

Ñn−1 

n→∞ n − 1 

n − 1 

n 

n 1 

= E[Ñ1] 

τ n E[τ 1 ] 

w.p. 1 (3.37) 

by using the SLLN for the Ñ’s and for the cycle lengths. Checking that the upper 

bound (right side of (3.36)) has the same limit, the proof is complete. 

(iii). Since a job always departs by the end of the cycle during which it arrives, 

we have 

N 

∑ τn 

W i = A 1 + . . . + A n for all n. 

i=1 

60

Consider the k-th arrival and let n = C k be the cycle during which it occurs. Then, 

N τn−1 ≤ k ≤ N τn , and 

A 1 + . . . + A n−1 

N τn 

= 

∑ Nτn−1 

i=1 W i 

N τn 

≤ 

∑ k 

i=1 W i 

k 

≤ 

∑ Nτn 

i=1 W i 

N τn−1 

= A 1 + . . . + A n 

N τn−1 

. (3.38) 

The limit of the lower bound as k → ∞ (thus n = C k → ∞) is 

A 1 + . . . + A n−1 

lim 

n→∞ N τn 

= lim 

n→∞ 

A 1 + . . . + A n−1 

τ n 

lim 

n→∞ 

τ n 

= E[A 1] E[τ 1 ] 

N τn E[τ 1 ] 

E[Ñ1] = E[A 1] 

E[Ñ1] w.p. 1 

(the first limit is proved in (3.35) and the second limit is essentially proved in (3.37)). 

Checking that the upper bound (right side of (3.38)) has the same limit, the proof is 

complete. 

✷ 

Application 3.9 Based on the π’s in (3.23) for the M/M/c model, we give formulas 

for certain (long-run) averages for the system and for the queue only (i.e., excluding 

service), using Little’s Law to go from average numbers to average waiting times 

(for which we previously had no theory). The long-run average number of jobs in the 

M/M/c queue, call it L q , is (apply (3.13) with cost function “# in queue”, f(n) = n−c 

for n ≥ c and 0 otherwise): 

L q = average # in queue = 

∞∑ 

∑ ∞ 

(n − c)π n = π c jρ j ρ 

= π c 

(1 − ρ) 2 .3 (3.39) 

n=c 

j=1 

Let W and W q be the (long-run) average waiting times in the system and in queue, 

respectively. These are linked as follows: for any job j, the time in the system is the 

sum of W j,q = time in queue plus S j = time in service. Then 

1 

W := lim 

n→∞ n 

n∑ 

j=1 

1 

W j = lim 

n→∞ n 

n∑ 

j=1 

1 

W j,q + lim 

n→∞ n 

n∑ 

S j = W q + 1 µ 

j=1 

w.p. 1, (3.40) 

the W q arising by definition, and 1/µ arising by the SLLN for the S j . Then, letting 

L = average # in system = ∑ ∞ 

n=1 nπ n we have the links 

L = λW (Little’s Law for the system) 

( 

= λ W q + 1 ) 

µ 

= L q + λ µ 

(via Little’s Law for the queue). (3.41) 

3 For ρ < 1, we have ∑ k 

i=1 iρi = ρ ∑ k 

i=1 d dρ ρi = ρ d ∑ k 

dρ i=0 ρi = ρ d 1−ρ k+1 

dρ 1−ρ 

= ρ 1−(k+1)ρk +kρ k+1 

(1−ρ) 

; 

2 

taking limits as k → ∞, we have ∑ ∞ 

i=1 iρi = ρ 

(1−ρ) 

. 2 

61

♣ Study the following problems, which involve average waiting times and Little’s 

Law. • E04 1 remainder on waiting times. • Exercise 3. Comparison of M/M/1 

versus M/M/2 (each server has the same speed) in terms of a (long-run average) cost 

that has waiting and staffing components. 

62

3.7 Exercises 

Recall: (i) e x = ∑ ∞ x k 

k=0 

. (ii) For 0 ≤ ρ < 1, ∑ ∞ 

k! i=0 ρi = 1/(1 − ρ). 

1. (Random Balking.) Consider an M/M/1 queue with arrival rate λ and service 

rate µ. An arrival that finds i jobs in the system acts randomly, independently of 

everything else, as follows: with probability p i , it joins the system; with probability 

1 − p i , it balks, i.e., does not join. Let X t be the number of jobs in the system at 

time t. (i) Determine lim h→0 P(X t+h = i + 1|X t = i)/h carefully, showing all your 

work. (ii) Identify q ij := lim h→0 P(X t+h = j|X t = i) for all i ≠ j; no derivation 

is required. Given that (X t ) is a birth-death process, identify the birth and death 

rates. (iii) For the case p i = 1/(i + 1) for all i = 0, 1, . . ., find the stationary 

distribution of (X t ) in terms of λ and µ. 

2. (Number of servers adapts to the number of customers in the system.) Customers 

arrive at a system according to a Poisson process of rate λ. With X the number 

of customers in the system, assume that the number of servers is one whenever 

X ≤ m and two whenever X > m. Assume service times are Expon(µ). Show 

that the stationary distribution of X has the form 

π k = 

{ 

π0 (2ρ) k k = 1, 2, . . . , m 

π 0 (2ρ) m ρ k−m k = m + 1, m + 2, . . . 

(3.42) 

where ρ = λ/(2µ), identifying results you use without proof. How can π 0 be 

determined? 

3. Customers arrive at a system at rate 5 per hour, and the average time to service 

them is 1/6 hours. It costs 8x per hour to have x customers in the system (waiting 

cost) plus 5c per hour to have c servers (server cost). Find the c in {1, 2} that 

minimises the long-run average cost. State any assumptions made. Hint: Use 

(3.41) and the M/M/c stationary distribution, (3.23). 

4. For the M/M/c model with arrivals of rate λ and with Expon(µ) service times, 

verify (3.30) by combining Application 2.11 (waiting time when encountering k 

customers in queue has the Gamma(k +1, cµ) distribution, see (2.18)) and the fact 

(from Example 3.5) π c+k = π c ρ k for k = 0, 1, . . ., where ρ = λ/(cµ). Hint: In the 

resulting doubly-indexed sum, change the order of summation. 

5. (Failure of ASTA Conclusions in a deterministic queue.) Let X t be the number of 

jobs in a single-server system at time t, where X 0 = 0, arrivals occur at times 0, 

10, 20, 30, . . ., and service times are 9 for each job. Here, the function (X t : t ≥ 0) 

is deterministic. State it and determine the following limits: 

(a) The long-run fraction of arrivals that find the server busy. 

(b) The long-run fraction of time the server is busy. 

Does the ASTA Theorem apply in this case? 

6. (Solving certain balance equations via probability generating functions.) Jobs 

arrive in a system in batches of size b > 1, with batch-arrival events occurring 

according to a Poisson process of rate λ. There is one server at which service times 

63

are Expon(µ) rv’s, independent of everything else. There is infinite waiting space. 

Let X t be the number of jobs in the system at time t. 

(i) Given that (X t ) is a CTMC with values in {0, 1, 2, . . .}, briefly argue that its 

stationary distribution {π i } ∞ i=0, when it exists, satisfies the following. 

State Rate Out = Rate In 

0 π 0 λ = π 1 µ 

i (1 ≤ i < b) π i (λ + µ) = π i+1 µ 

i (b ≤ i) π i (λ + µ) = π i+1 µ + π i−b λ 

(ii) Let G(z) = ∑ ∞ 

i=0 π iz i be the probability generating function (pgf) of the 

distribution {π i } ∞ i=0. Show that G(z) = N(z) , where N(z) = µπ D(z) 0(1 − z) and 

D(z) = λz b+1 − (λ + µ)z + µ. Hint: Multiply the state-i equation above by z i , for 

each i, then sum over all i (from 0 to ∞), then make appropriate “corrections”. 

(iii) To find π 0 we require 1 = ∑ ∞ 

i=0 π i = G(1) (normalising equation). It is given 

that G(1) = lim z↑1 G(z) (z increases to 1; continuity of G(), Fact 1.18). Using 

L’Hopital’s rule, work out the limit as a function of π 0 , and thus show that 

π 0 = 1 − bλ µ . 

Note that the remaining π i ’s are then determined by (1.24). 


1. Short answer to (i). Arrivals happen with rate λ. When X t = i, an arrival joins 

with probability p i , so “customer-joins-the-system” events happen with rate λp i . 

Full answer to (i). Let A = # of arrivals in (t, t + h] and D = # of departures in 

the same interval. I indicates if a given arrival joins (1-yes, 0-no), I is independent 

of everything else, and P(I = 1|X t = i) = p i . Then, intending h → 0, 

P(X t+h = i + 1|X t = i) 

= P(A = 1, D = 0, I = 1|X t = i) + P(A + D ≥ 2 and other conditions|X t = i) 

} {{ } 

o(h) 

= P(A = 1|X t = i)P(D = 0|X t = i)P(I = 1|X t = i) + o(h) by independence 

= [λh + o(h)][1 − µh + o(h)]p i + o(h) 

= λp i h + o(h) 

where we need not specify the “other conditions”. Thus 

( 

P(X t+h = i + 1|X t = i) 

lim 

= lim λp i + o(h) ) 

h→0 h 

h→0 h 

Note: The thinning Proposition 2.14 is a similar idea. 

= λp i . 

64

(ii) The set of values that X t may take is {0, 1, 2, . . .}. In (i) we showed q ij = λp i 

for j = i + 1 and all i. Similar to the derivations (3.1) to (3.3), we see that q ij = µ 

for all i > 0 and j = i − 1; and q ij = 0 for all other i ≠ j. That is, (X t ) is a 

birth-death process with birth rates λ i = λp i for all i and death rates µ i = µ for 

all i > 0. 

(iii) From the general birth-death solution (3.20) and p i = 1/(i + 1) we have 

λ · (λ/2) · (λ/3) · · · (λ/k) ρ k 

π k = π 0 = π 

µ k 0 , k = 1, 2, . . . . 

k! 

( 

where ρ = λ/µ. Normalising gives π 0 = 1 + ∑ ) −1 

∞ ρ k 

k=1 k! = (e ρ ) −1 = e −ρ . 

2. With X t the number of customers in the system at time t, (X t ) is a birth-death 

process taking values in {0, 1, 2, . . .}, with birth rates λ n = λ for all n, and death 

rates µ n = µ for n ≤ m and µ n = 2µ for n > m. Then, using the standard 

birth-death result (3.20), we obtain the stated equations. To determine π 0 , insert 

the π i from (3.42) into the normalising equation ∑ ∞ 

i=0 π i = 1 and solve for π 0 . 

3. We assume that inter-arrival and service times are exponentially distributed and 

independent. Then, the one-server and two-server systems are the M/M/1 and 

M/M/2 model, respectively. In our standard notation, λ = 5 and µ = 6. 

By “cost” we mean long-run average cost per hour, in £. The waiting cost is 8L, 

where L, the long-run average number of customers in the system, is a known 

function of ρ = λ/(cµ), via L q , from (3.39) and (3.41). The cost is 8L + 5c. 

M/M/1 calculation. Using (3.23), that is, the stationary distribution for the 

M/M/c model, for c = 1, we have π 1 = (1 − ρ)ρ. Then, by (3.39), L q = 

π 1 ρ/(1 − ρ) 2 = ρ 2 /(1 − ρ); and (3.41) becomes L = ρ 2 /(1 − ρ) + ρ = ρ/(1 − ρ). 

Then, ρ = λ/µ = 5/6, L = 5, and the cost is 8L + 5 · 1 = 45. 

M/M/2 calculation. Again by (3.23), this time for c = 2, we have π 2 = π 0 2ρ 2 , 

where, from (3.24), 

] −1 

π 0 = 

[1 + 2ρ + 2ρ2 = 1 − ρ 

1 − ρ 1 + ρ . 

Then (3.39) becomes 

and (3.41) becomes 

ρ 

L q = π 2 

(1 − ρ) = (π 02ρ 2 ρ 

) 

2 (1 − ρ) = 2ρ3 

2 1 − ρ 2 

L = 

2ρ3 

2ρ 

+ 2ρ = 

1 − ρ2 1 − ρ . 2 

We have ρ = λ/(2µ) = 5/12, L = 120/119 . = 1.0084, and the total cost is 8L+5·2 . = 

18.067. Thus, the 2-server system has lower cost. 

4. First, the conditional probability in (3.29) is 

P(W > s|X = c + k) = 

k∑ 

−cµs (cµs)i 

e , s ≥ 0, k = 0, 1, . . . . 

i! 

i=0 

65

(Application 2.11; Gamma(k + 1, cµ) ∑tail probability, (2.18)). Recall that we have 

1 

used ASTA to claim that lim n 

n→∞ n j=1 P(X T j 

= c + k) = π c+k . Now, by (3.23) 

we have π c+k = π c ρ k for k = 0, 1, . . ., where ρ = λ/(cµ). Thus (3.29) becomes 

∞∑ k∑ 

−cµs (cµs)i 

F (s) = e · π c ρ k 

i! 

k=0 i=0 

∑ ∞ 

= π c e −cµs (cµs) i ∞∑ 

ρ k 

i! 

i=0 

i=0 

k=i 

(reversed the summation order) 

∑ ∞ 

= π c e −cµs (cµs) i ρ i 

i! 1 − ρ = π c 

1 − ρ e−(cµ−λ)s . (e x = ∑ ∞ x i 

i=0 

for x = cµsρ = λs) 

i! 

5. For t from 0 to 9, we have X t = 1, then for t from 9 to 10, we have X t = 0; 

and this pattern of “cycles”, of length 10 each, repeats forever. (We choose to not 

specify if X t is 0 or 1 at times when it jumps from one to the other, as this does 

not matter.) Thus: 

(a) All arriving jobs find the server is idle (X t = 0 just before each arrival time). 

(b) The long-run fraction of time that the server is busy (equivalently that X t = 1) 

is 9/10. 

The (a) and (b) above are, respectively, the left and right side in the ASTA Theorem’s 

conclusion, (3.25). The theorem does not apply here. 

6. (i) State changes occur as follows: when at state i, a batch-arrival event changes 

the state to i + b, while a service-completion event changes the state to i − 1; thus 

the generator entries are: q i,i+b = λ for all i; q i,i−1 = µ for all i > 0; and q ij = 0 for 

other i ≠ j. The stated equations are the usual balance equations (3.9) associated 

to this generator. 

(ii) Multiplying and summing as suggested gives 

∞∑ 

∞∑ 

∞∑ 

λπ 0 + (λ + µ) π i z i = µ π i+1 z i + λ π i−b z i 

i=1 

i=0 

i=b 

} {{ } 

=G(z)−π 0 

= µ 1 ∞∑ 

∑ ∞ 

π i+1 z i+1 +λz b π i−b z i−b . 

z 

i=0 

i=b 

} {{ } } {{ } 

=G(z)−π 0 =G(z) 

The key idea above is that each of the infinite sums in the first equation gives G(z) 

after the “correction” steps seen in the second equation. Now, re-arrange to solve 

for G: 

G(z) 

(λ + µ − µ ) ( 

z − λzb = µπ 0 1 − 1 ) 

⇒ G(z) = 

z 

µπ 0 (1 − z) 

λz b+1 − (λ + µ)z + µ . 

(iii) The derivatives of N() and D() are N ′ (z) = −µπ 0 and D ′ (z) = (b+1)λz b −λ− 

µ. Using L’Hopital’s rule, 1 = G(1) = lim z↑1 G(z) = N ′ (1) 

= −µπ 0 

⇒ π D ′ (1) bλ−µ 0 = 1 − bλ . µ 

66

Chapter 4 

Sampling from Distributions 

We study some general methods for simulating (sampling) a value from a given univariate 

distribution, with support contained in the real numbers. The source of 

randomness is a pseudo-random number generator, assumed to return independent 

samples from the Unif(0, 1) distribution, defined after (1.10). 


F will generally denote a cdf. Given a cdf F , we write “X ∼ F ” to mean “X is a 

sample of the rv whose cdf is F ”. Given a pdf f, we write “X ∼ f” to mean “X is 

a sample of the rv whose pdf is f”. The notions of “inf” (infimum) and “min” are 

taken to coincide. Likewise, “sup” (supremum) and “max” will coincide. 

4.2 Inversion 

Definition 4.1 The inverse of the cdf F is the function 

F −1 (u) = inf{x : F (x) ≥ u} = min{x : F (x) ≥ u}, 0 

To give a more explicit definition of the inverse, note first that any cdf F has a 

left limit everywhere (as it is non-decreasing), i.e., for any real a we can put 

Definition 4.2 Let F be a cdf. 

F (a−) = lim 

x→a− F (x). 

(a) If F is continuous and strictly increasing on an interval (a, b), then, for any u in 

(F (a), F (b)), define F −1 (u) as the unique x in (a, b) that solves F (x) = u. 

(b) If F has a jump at a, i.e., F (a−) < F (a), then, for any u in (F (a−), F (a)], define 

F −1 (u) = a. 

67

Remark 4.3 In case (a), the existence of x follows from the Intermediate Value 

Theorem, and the uniqueness results from the strictly-increasing assumption. In this 

case, F (F −1 (u)) = u for all u in (0,1). In case (b), where F has a jump at a, note that 

F (F −1 (u)) = F (a) > u for u in (F (a−), F (a)), so F −1 is not the standard inverse 

function. 

The inversion method returns F −1 (U), where U ∼ Unif(0, 1), and determined by 

a pseudo-random-number generator. We now show the method’s correctness. 

Proposition 4.4 Let F be a cdf and let F −1 be its inverse. If U ∼ Unif(0, 1), then 

F −1 (U) has cdf F . 

Proof. The key fact is the equivalence 

F (x) ≥ u ⇔ x ≥ F −1 (u) (4.1) 

which is a consequence of the fact that F is a nondecreasing, right-continuous function. 

We omit a detailed proof of (4.1). Now 

P(F −1 (U) ≤ x) = P(U ≤ F (x)) by (4.1) 

= F (x) since U ∼ Unif(0, 1) 

i.e., F −1 (U) has cdf F . 

✷ 

Remark 4.5 On a computer, provided only that we can evaluate F (x) at any x, 

F −1 (u) can easily be computed for any u, via a bracketing/bisection method, for 

example. This method is simple, but beyond our scope. 

4.2.1 Calculating the Inverse Explicitly: Examples 

Inversion of a discrete cdf. 

Section 1.3.1 explained that a discrete distribution 

can always be reduced to ordered support points x 1 < x 2 < x 3 < . . ., each x occurring 

with positive probability, and the corresponding cdf F satisfies F (x 1 ) < F (x 2 ) < 

F (x 3 ) < . . .. Here, the inverse of F is (Definition 4.2(b)): 

⎧ 

⎪⎨ 

F −1 (u) = 

⎪⎩ 

x 1 0 

x 2 F (x 1 ) 

. 

x k F (x k−1 ) 

. 

The inverse of some simple continuous cdf’s is calculated explicitly below. 

68

Example 4.6 Consider the Unif(5, 8) distribution. By (1.10), the cdf is 

F (x) = x − 5 , 5 ≤ x ≤ 8. 

3 

Definition 4.2(a) applies on the entire support, so solve 

x − 5 

3 

= u ⇔ x = 5 + 3u = F −1 (u). 

Example 4.7 In Example 1.6 we found that X =“equiprobable outcome on [2, 4] ∪ 

[5, 6]” has cdf 

⎧ 

⎪⎨ 

F (x) = 

⎪⎩ 

x−2 

3 2 ≤ x ≤ 4 

2 

3 4 < x ≤ 5 

2 

+ x−5, 

5 < x ≤ 6 

3 3 

This cdf is continuous, so its inverse is found by solving F (x) = u (Definition 4.2(a)). 

For u in (F (2), F (4)), x must be between 2 and 4, so solve 

x − 2 

3 

= u ⇔ x = 2 + 3u = F −1 (u). 

For u in (F (5), F (6)), x must be between 5 and 6, so solve 

2 

3 + x − 5 

3 

Finally, F −1 (2/3) = 4. In summary, 

F −1 (u) = 

= u ⇔ x = 3 + 3u = F −1 (u). 

{ 2 + 3u, u ≤ 

2 

3 

3 + 3u, u > 2 3 

Example 4.8 Consider the Expon(λ) distribution. Recall that the cdf is 

F (x) = 1 − e −λx , x ≥ 0. 

Definition 4.2(a) applies on the entire support, so solve 

1 − e −λx = u ⇔ x = − 1 λ log(1 − u) = F −1 (u). 

♣ Study the following. • Exercise 1 (part (b) = inversion of Geometric distribution 

= E11 3(a)). • E04 4(i). Triangular distribution with support parametrised by some 

α. Helps emphasise that in solving F (x) = u, only x in the support are relevant. 

Failing to apply this restriction, as F (x) is a quadratic, there are two solutions, and 

the one outside the support is irrelevant. • E03 4 (inversion part, first 5 lines). • E10 

2(c) (mixture of exponentials with disjoint support). 

69

4.3 Acceptance-Rejection 

4.3.1 Method and Theory 

The problem is to sample from a given distribution with pdf f and support S. That 

is, the output X should satisfy, for all x ∈ S, 

P(x − ɛ < X ≤ x + ɛ) 

lim 

ɛ→0+ 2ɛ 

= f(x), or, less precisely, P(X ∈ dx) = f(x)dx (4.2) 

where dx means an infinitesimal (i.e., arbitrarily small) interval containing x. 

The acceptance-rejection (A/R) method samples from another pdf, g (it is assumed 

known how to sample from g) and rejects samples in a way so that accepted samples 

have the desired pdf, f. 

constant K such that 

There is the key requirement that there exists a finite 

a(x) := 

f(x) 

Kg(x) 

≤ 1 for all x ∈ S (4.3) 

The need for this is explained in Remark 4.9 below. The sampling works as follows: 

1. Sample Y ∼ g and U ∼ Unif(0, 1), independently of any other samples. 

2. If U ≤ a(Y ), set X ← Y (accept) and exit. Otherwise (reject), return to step 1. 

Note that the output X is the Y conditioned by the acceptance event 

A = {U ≤ a(Y )} . 

g is called the instrumental (trial, candidate, proposal) pdf, and Kg the envelope. 

That X has pdf f can be seen roughly as follows: 

P(X ∈ dx) = P(Y ∈ dx|A) = 

P((Y ∈ dx) ∩ A) 

P(A) 

= P ( U ≤ a(Y ) ∣ ∣Y ∈ dx ) g(x)dx 

P(A) 

= 

P (A|Y ∈ dx) P(Y ∈ dx) 

P(A) 

= a(x)g(x)dx 

P(A) 

= f(x) 

KP(A) dx. 

For simplicity, assume the above is a pdf; then it must integrate to one, so 

∫ 

∫ 

f(x) 

1 = 

S KP(A) dx = 1 

1 

f(x)dx = 

KP(A) S KP(A) 

} {{ } 

(as the pdf f has support S). That is, the “less precisely” form in the right of (4.2) 

is true. Moreover, we see that 

and that K ≥ 1 always. 

P(A) = 1 K 

70 

=1 

(4.4)

Remark 4.9 The step P(U ≤ a(Y )|Y ∈ dx) = a(x) in the proof holds only if 

a(x) ≤ 1 for all x (since a(x) > 1 cannot be a probability). This is why (4.3) is 

required. 

A more careful proof of correctness of A/R shows the left of (4.2), as follows. 

Lemma 4.10 Assume that a() satisfies 

|Y − x| ≤ ɛ ⇒ a(x) − Mɛ ≤ a(Y ) ≤ a(x) + Mɛ 

where M < ∞ can be taken as the maximum of the absolute derivative (slope) of a(). 

Then 

P(|Y − x| ≤ ɛ|A) 

lim 

ɛ→0 + 2ɛ 

= f(x) 

KP(A) 

provided that f(x) > 0 and g(x) < ∞, hence a(x) > 0. 

Proof. Write the conditional probability in (4.5) as P(B)/P(A), where 

B = {|Y − x| ≤ ɛ, U ≤ a(Y )}. 

(4.5) 

The main idea is to bound the event B by a subset and a superset whose probabilities 

tend to the correct limit. The bounds are 

{|Y − x| ≤ ɛ, U ≤ a(x) − Mɛ} ⊂ B ⊂ {|Y − x| ≤ ɛ, U ≤ a(x) + Mɛ}. 

Taking probabilities, 

P(|Y − x| ≤ ɛ, U ≤ a(x) − Mɛ) ≤ P(B) ≤ P(|Y − x| ≤ ɛ, U ≤ a(x) + Mɛ) 

⇒ P(|Y − x| ≤ ɛ)P(U ≤ a(x) − Mɛ) ≤ P(B) ≤ P(|Y − x| ≤ ɛ)P(U ≤ a(x) + Mɛ) 

⇒ P(|Y − x| ≤ ɛ)[a(x) − Mɛ] ≤ P(B) ≤ P(|Y − x| ≤ ɛ)[a(x) + Mɛ] 

by the independence of Y and U, and by then using that U ∼ Unif(0, 1). Dividing 

by 2ɛP(A) throughout, we have lower and upper bounds for our target: 

P(|Y − x| ≤ ɛ)[a(x) − Mɛ] 

2ɛP(A) 

≤ 

P(B) 

2ɛP(A) 

P(|Y − x| ≤ ɛ)[a(x) + Mɛ] 

≤ . (4.6) 

2ɛP(A) 

Now, take limits as ɛ → 0 + and observe: 

• lim ɛ→0 + P(|Y − x| ≤ ɛ)/2ɛ = g(x), as Y has pdf g; 

• lim ɛ→0+ [a(x) − Mɛ] = lim ɛ→0+ [a(x) + Mɛ] = a(x) > 0. 

Thus, both the lower and upper bound in (4.6) converge to g(x)a(x)/P(A) = f(x)/[KP(A)], 

and thus so does the middle. ✷ 

71

4.3.2 Feasibility and Efficiency 

Each trial is accepted with probability P(A) = 1 , independently of other trials. Thus, 

K 

the number of trials until acceptance has the Geometric(P(A)) distribution, whose 

mean is 1/P(A) = K. Let us think “maximum sampling efficiency” equals “minimum 

mean number of trials until acceptance”, i.e., “maximum P(A)”, i.e., “minimum K”. 

f(x) 

The constraint (4.3) on K can be rewritten as K ≥ sup = max x∈S g(x) x∈S f(x) , so the 

g(x) 

minimum K that satisfies the constraint is 

K = sup 

x∈S 

f(x) 

g(x) = max f(x) 

x∈S g(x) . (4.7) 

The A/R method always requires setup, meaning choosing g and then determining 

K by solving this maximisation problem. A considerable limitation is that if the K 

in (4.7) is infinite, then A/R is impossible for this f and g; see Example 4.13 below. 

♣ Study the following. • E07 4. Develops an A/R sampler for the target density 

proportional to x α−1 e −x , x > 0 (the Gamma(α, 1) distribution), for the case α < 1. 

Note the density goes to ∞ as x → 0. The problem statement gives the envelope, 

and the problem is then straightforward. Choosing the envelope was a more subtle 

problem, which was not asked here, because of the requirement of the existence of a 

finite K; this was achieved by the envelope choice (up to a proportionality constant) 

x α−1 for x < 1, and e −x (the Expon(1) pdf) for x > 1. The envelope is sampled by 

inversion (that is, inverting the associated cdf). 

Example 4.11 The pdf to sample from is 

f(x) = 

{ 2 

25 

(6 − x) for 1 ≤ x ≤ 6 

0 otherwise. 

This is the Triangular(1,1,6) distribution, the three parameters being minimum, 

mode, and maximum. Let g be the pdf of the Unif(1,6) distribution), i.e., the constant 

1/5 on [1, 6]. As setup, we must find (4.7). Focus on f(x) = 2(6−x) in the support of 

g(x) 5 

f, i.e., [1, 6], and see that it attains the maximum value of 2 at x = 1. Thus, K = 2, 

a(x) = (6 − x)/5, and P(A) = 1/K = 1/2. To see the sampling in action, suppose 

the sampled (Y, U) pairs are (4.9, 0.3) and (1.7, 0.8). Then calculate: 

Y U a(Y ) Is U ≤ a(Y )? 

4.9 0.3 0.22 No 

1.7 0.8 0.86 Yes 

Here, two trials were needed until acceptance. The number 1.7 is a random sample 

from the pdf f. 

72

When f and g involve the exponential function, it is typically easier to optimise 

log(f/g) (log = “natural logarithm”, the inverse function to the exponential one) 

rather than f/g, as the former has simpler derivatives. A maximiser of f/g is also a 

maximiser of log(f/g), and vice versa. The following is a simple illustration of this. 

Example 4.12 Suppose we want to sample from the pdf f(x) = (2/π) 1/2 e −x2 /2 with 

support (0, ∞). (This is the pdf of |Z|, where | · | denotes absolute value and Z ∼ 

N(0, 1), the standard normal distribution (mean 0, variance 1).) Consider acceptancerejection 

with g the pdf of the Expon(1) distribution, i.e., g(x) = e −x on (0, ∞). To 

find the optimal K, maximise log[f(x)/g(x)] = log(2/π)/2 + x−x 2 /2 on (0, ∞). The 

maximiser is x = 1, giving K = (2e/π) 1/2 and acceptance probability 1/K ≈ 0.76. 

We write f(x) ∝ h(x) on a given support S to mean that f(x) = h(x)/c, where 

c = ∫ h(t)dt does not matter in the argument. 

S 

Example 4.13 The Gamma(α, λ) distribution (shape α > 0, scale λ > 0), seen 

earlier for α integer, has pdf f(x) ∝ x α−1 e −λx with support x > 0. Consider A/R 

with f and g being Gamma with respective shapes a, b and common scale. Then 

f(x)/g(x) ∝ x a−b . If we choose b < a, then max x>0 x a−b = ∞, since x a−b → ∞ as 

x → ∞, and A/R is impossible. If we choose b > a, then max x>0 x a−b = ∞, now 

because x a−b → ∞ as x → 0, so again A/R is impossible. 

A general case where A/R is possible is when the pdf f has support (a, b) with a 

and b both finite, and moreover f is bounded, i.e., max a

4.4 Exercises 

1. Describe in full the inversion method of simulating each of the distributions below. 

(a) Uniform on the interval (−5, 5). 

(b) Suppose we do independent trials, where each trial succeeds with probability p, 

and let X be the number of trials up to and including the first success. Then 

X is said to have the Geometric(p) distribution. Derive the cdf of X and then 

describe how X can be sampled by the inversion method. 

2. Based on the definition of the Geometric(p) distribution given above, give a method 

for simulating it based only on an unlimited supply of Unif(0, 1) pseudo-random 

numbers. Hint: Simulate the success indicators; neither inversion nor acceptancerejection 

is needed. 

3. We want to sample from the Gamma(a, 1) distribution whose pdf is 

f a (x) = 1 

Γ(a) xa−1 e −x , x > 0 

where Γ(a) = ∫ ∞ 

0 

x a−1 e −x dx. Consider acceptance-rejection methods based on the 

family of trial pdf’s 

g λ (x) = λe −λx , 

where λ may depend on a. For given a > 1, show that across envelopes of the 

form Kg λ , 0 < λ < 1, the choice λ = 1/a maximises the acceptance probability. 

Then state the resulting acceptance probability. Hint: By maximising fa(x) across 

g λ (x) 

x, determine the corresponding best A/R constant, K = K ∗ (a, λ). Then, with a 

being fixed, minimise K ∗ (a, λ) across λ. It will be easier to work with logarithms 

of functions when seeking the min or max. 


1. In each case, we first write down the cdf F explicitly, and then find the inverse 

F −1 explicitly. Having done that, the random sample from F is F −1 (U), where U 

is a Unif(0, 1) (pseudo)-random number. 

(a) Let f be the pdf. The uniform distribution on [−5, 5] has pdf that is a constant 

c > 0 on this interval and 0 elsewhere. Find c: 

1 = 

∫ ∞ 

−∞ 

c dt = 

∫ 5 

−5 

c dt = 10c ⇒ c = 1 10 . 

That is, 

Thus, the cdf is 

f(t) = 

{ 1 

10 , a ≤ x ≤ b 

0 otherwise 

F (x) = 

∫ x 

−∞ 

f(t) dt = 

∫ x 

−5 

1 

10 dt = x + 5 

10 

for −5 ≤ x ≤ 5. 

74

Find the inverse by solving for x: 

x + 5 

10 

= u ⇔ x = −5 + 10u = F −1 (u). 

(b) (i) Find the cdf. To this end, the easiest way is: 

The cdf, F , is 

P(X > k) = P(first k trials failed) = (1 − p) k , k = 0, 1, 2, . . . . 

F (k) = P(X ≤ k) = 1 − P(X > k) = 1 − (1 − p) k , k = 0, 1, 2, . . . . 

(ii) To state the inverse explicitly, we need to find the smallest integer k satisfying 

1 − (1 − p) k ≥ u ⇔ (1 − p) k ≤ 1 − u ⇔ k log(1 − p) ≤ log(1 − u) ⇔ k ≥ 

⌈ 

log(1−u) 

log(1−p) 

log(1 − u) 

log(1 − p) . 

(In the last step, the division by log(1 ⌉ − p) < 0 changed the direction of inequality.) 

Thus, F −1 (u) = , where ⌈x⌉ is the standard ceiling function 

(smallest integer that is ≥ x). Inversion returns F −1 (U), where U ∼ Unif(0, 1). 

2. Let U i ∼ Unif(0, 1), i = 1, 2, . . ., and independent (determined by a pseudorandom-number 

generator). Return the first i such that U i ≤ p. 

3. For fixed a and λ, the smallest possible K is K ∗ (a, λ) = max x:x>0 r a,λ (x), where 

r a,λ (x) := f a(x) 

g λ (x) = xa−1 e (λ−1)x 

, x > 0 

λΓ(a) 

Maximise log r a,λ (x) over the support of f a , i.e., x > 0: 

log r a,λ (x) = (a − 1) log x + (λ − 1)x − log[λΓ(a)] 

d log r a,λ 

= a − 1 + (λ − 1) = 0 ⇒ x ∗ (a, λ) = a − 1 

dx x 

1 − λ 

d 2 

dx log r 2 a,λ = −(a − 1) 1 x < 0. 2 

Thus the x ∗ (a, λ) above is the maximiser. (Note that the given constraints on a 

and λ imply x ∗ > 0, which is in the support. If it was not in the support, it would 

be irrelevant; it is the maximum over the support that matters.) 

For fixed a, maximising the acceptance probability is equivalent to minimising the 

A/R constant K ∗ (a, λ) over λ; first, form the logarithm: 

log K ∗ (a, λ) = log r a,λ (x ∗ ) 

= (a − 1) log(a − 1) − (a − 1) log(1 − λ) + (1 − a) − log λ − log Γ(a). 

Now minimise this over 0 < λ < 1: 

∂ log K ∗ 

∂λ 

∂ 2 log K ∗ 

∂ 2 λ 

= a − 1 

1 − λ − 1 λ = 0 ⇒ λ∗ (a) = 1 a 

a − 1 

= 

(1 − λ) + 1 2 λ > 0. 2 

75

Thus λ ∗ (a) above is the minimiser. Putting K ∗ (a) = K ∗ (a, λ ∗ ), i.e., the constant 

associated to the best trial pdf, g λ ∗ (a), we have 

log K ∗ (a) := log K ∗ (a, λ ∗ (a)) 

( ) a − 1 

= (a − 1) log(a − 1) − (a − 1) log + (1 − a) + log a − log Γ(a) 

a 

= a log a + (1 − a) − log Γ(a) after cancellations 

i.e., K ∗ (a) = a a e 1−a /Γ(a). The acceptance probability is, as always, 1/K ∗ (a). 

76

Chapter 5 

Guidance for Exam Preparation 

Questions 1 and 2 are compulsory. Of Questions 3 and 4, only the best answer will 

count. 

Generally, it is good to state clearly any assumptions made or missing information so 

as to show what you know, even if incomplete. 

More detail is given below for each question. Listed is the material (including past 

exam papers) that can be given higher priority. “E04 4(i)” means “Exam of June 

2004, Question 4(i)”, and so on. 

Question 1 (a): 9 points. The Exponential Distribution and Section 2.2, particularly 

the memoryless property (2.4), its derivation and significance. Exam E11 

4(a). 

Question 1 (b): Modelling, 31 points. 

the following. 

A modelling question, typically involving 

1. Identify an appropriate CTMC, its generator, and the equations that determine 

the stationary distribution, assuming one exists. A classic model is a birth-death 

process with finite or infinite state space. 

2. Given the stationary distribution of this CTMC, calculate performance by applying 

some or all of (3.13), (3.15), (3.17), and Little’s Law in the form L = λW . 

Examples 3.2 and 3.4 and Application 3.9. Chapter-3 Exercise 2. WORKING 

E03 2: Two Stations in Tandem. E08 3: Finite-state-space CTMC; the policy of 

preferring A over B together with the question “find the long-run fraction of time A 

is busy” suggest that the state should indicate the idle/busy state of each server separately. 

E04 1: M/M/1 versus an M/M/2 with half-as-fast servers, i.e., same overall 

speed. E06 1: Centralised versus decentralised M/M/ systems. E08 1: M/M/1. E11 

1: M/M/2. Chapter-3 Exercise 3: A comparison of M/M/1 versus M/M/2 involving 

waiting and server costs. 

WORKING - END 

The following problems should be studied only with regard to obtaining the balance 

equations. The explicit stationary distribution (solution to these equations) via 

probability generating functions (pgf’s) is not examinable. E04 2, M/E k /1 Model. 

Exercise 6, E06 3, E07 2, E10 3: batch arrivals. 

77

Question 2: Chapter 4, 26 points. 

1. Section 4.2, especially Definitions 4.1 and 4.2 and Proposition 4.4. Exams E10 

2(a)-(b), E11 3(b). 

2. Applying the inversion method. Exercise 1. Exams E03 4, E04 4(i), E07 4, E08 4, 

E10 2(c), E11 2(a), E11 3(a). 

3. Section 4.3. Exam E11 3(c). 

4. Applying the acceptance-rejection method. Exercise 3. Exams E03 4, E06 4(b) 

excluding the “discuss” part, E07 4, E08 4, E09 2(b), E10 4(a), E11 2(b). 

Question 3: Chapter 2, 34 points. Emphasis on definitions and on analysis 

similar or identical to the notes. Study in priority order: 

1. The Poisson-process basics: Definitions 2.4 and 2.5, calculations as in Example 2.7 

and Exercise 1. Proof of “⇒” part in Theorem 2.6, which essentially is the proof 

of (2.11). Section 2.4.2. Exams E04 3, E07 3 (first 15 points). 

2. Merging of Poisson processes, Section 2.4.4, especially Proposition 2.10 and Exercise 

2. Poisson process of general rate function, Section 2.5. Exercises 3, 4. To 

solve E03 3, recognise it is an NHPP and identify the rate function; the solution 

is then standard, obtained exactly as in Exercise 3. Exam E10 4(b). 

3. Convergence as in Section 2.7. 

Typical partial question: 

Consider a certain randomly occurring event, and let N t be the number of events 

that occur up to time t. State appropriate conditions about the N t such that 

(N t : t ≥ 0) is a Poisson process. The conditions are about: (i) independence; 

(ii) stationarity, meaning that certain distributions do not depend on time; and 

(iii) the probabilities that one and two events occur, respectively, in an interval 

of length h, as h → 0. 

(11 points) 

Question 4: Chapter 3, 34 points. Emphasis on definitions and on analysis 

similar or identical to the notes. Study in priority order: 

1. Continuous-Time Markov chains, Section 3.3. The birth-death process, Section 3.2. 

Derivation of Kolmogorov-type differential equations (3.6), or something similar. 

Exam E06 2. 

2. Long-run behaviour, Sections 3.4 and 3.5, especially Section 3.5.1. Exams E03 1, 

E07 3 (last 10 points), E11 4(b). 

3. Careful statement of Little’s Law, as in Theorem 3.8. 

Typical partial question: 

78

Put X t for the number of jobs in a system at time t, and assume that for h ≥ 0 

we have X t+h = X t +B −D, where, conditional on X t = i, the random variables 

B and D (counts of new “births” and “deaths” during (t, t + h], respectively) 

are independent with 

P(B = 1|X t = i) = λ i h + o(h), 

P(B ≥ 2|X t = i) = o(h) 

P(D = 1|X t = i) = µ i h + o(h), P(D ≥ 2|X t = i) = o(h), i > 0. 

1. Determine, showing all your work, 

P(X t+h = i + 1|X t = i) 

lim 

h→0 h 

for any i. 

2. Determine, showing all your work, 

(8 points) 

P(X t+h = j|X t = i) 

lim 

h→0 h 

for |j − i| ≥ 2, i.e., the target state j differs from the starting state i by 2 or 

more. 

(9 points) 

79

MATH3013: Simulation and Queues - University of Southampton

Create successful ePaper yourself

Delete template?

Save as template?