19.05.2014 Views

MATH3013: Simulation and Queues - University of Southampton

MATH3013: Simulation and Queues - University of Southampton

MATH3013: Simulation and Queues - University of Southampton

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>MATH3013</strong>: <strong>Simulation</strong> <strong>and</strong> <strong>Queues</strong><br />

Athanassios (Thanos) N. Avramidis<br />

School <strong>of</strong> Mathematics<br />

<strong>University</strong> <strong>of</strong> <strong>Southampton</strong><br />

Highfield, <strong>Southampton</strong>, United Kingdom<br />

April 2012


Contents<br />

1 Probability 1<br />

1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />

1.2 Probability Spaces* . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />

1.3 R<strong>and</strong>om Variables <strong>and</strong> Distributions . . . . . . . . . . . . . . . . . . 4<br />

1.3.1 Discrete distributions . . . . . . . . . . . . . . . . . . . . . . . 5<br />

1.3.2 Continuous distributions . . . . . . . . . . . . . . . . . . . . . 5<br />

1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1.6 Mean <strong>and</strong> Variance, <strong>and</strong> Sums . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.7 Probability Generating Functions . . . . . . . . . . . . . . . . . . . . 12<br />

1.8 Two Major Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . 13<br />

1.9 Questions / Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

1.10 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

2 Poisson <strong>and</strong> Related Processes 20<br />

2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20<br />

2.2 Residual Time <strong>and</strong> Memoryless Property . . . . . . . . . . . . . . . . 22<br />

2.3 Counting Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

2.4 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.4.1 Distribution <strong>of</strong> counts (the N t process) . . . . . . . . . . . . . 25<br />

2.4.2 Distribution <strong>of</strong> Times (the X n <strong>and</strong> S n ) . . . . . . . . . . . . . 28<br />

2.4.3 Simulating a Poisson Process . . . . . . . . . . . . . . . . . . 29<br />

2.4.4 Merging Poisson Processes . . . . . . . . . . . . . . . . . . . . 30<br />

2.5 Poisson Process <strong>of</strong> General Rate Function . . . . . . . . . . . . . . . 31<br />

2.5.1 Thinning a Poisson Process . . . . . . . . . . . . . . . . . . . 31<br />

2.5.2 Simulating an NHPP via Thinning . . . . . . . . . . . . . . . 32<br />

2.6 Estimating a Poisson Process: Brief View . . . . . . . . . . . . . . . . 33<br />

2.7 Large-t Nt<br />

t<br />

via Strong Law <strong>of</strong> Large Numbers . . . . . . . . . . . . . . 34<br />

2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35<br />

i


2.9 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />

3 <strong>Queues</strong> 41<br />

3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.1.1 <strong>Queues</strong>: Terminology <strong>and</strong> Global Assumptions . . . . . . . . . 41<br />

3.2 The Birth-Death Process . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

3.3 Continuous-Time Markov Chains (CTMCs) . . . . . . . . . . . . . . 44<br />

3.3.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

3.3.2 Time-dependent Distribution <strong>and</strong> Kolmogorov’s Differential System<br />

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46<br />

3.4 Long-Run Behaviour . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.4.1 Long-Run Average Cost . . . . . . . . . . . . . . . . . . . . . 49<br />

3.4.2 Explicit Stationary Distributions for Specific Models . . . . . 53<br />

3.5 Arrivals That See Time Averages . . . . . . . . . . . . . . . . . . . . 56<br />

3.5.1 A Steady-State Delay Distribution . . . . . . . . . . . . . . . 57<br />

3.6 Little’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

3.8 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 64<br />

4 Sampling from Distributions 67<br />

4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br />

4.2 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67<br />

4.2.1 Calculating the Inverse Explicitly: Examples . . . . . . . . . . 68<br />

4.3 Acceptance-Rejection . . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

4.3.1 Method <strong>and</strong> Theory . . . . . . . . . . . . . . . . . . . . . . . 70<br />

4.3.2 Feasibility <strong>and</strong> Efficiency . . . . . . . . . . . . . . . . . . . . . 72<br />

4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

4.5 Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

5 Guidance for Exam Preparation 77<br />

ii


.<br />

iii


Chapter 1<br />

Probability<br />

This chapter builds foundations in probability, the main mathematical tool behind<br />

queues <strong>and</strong> stochastic simulation. Because it is more foundational than an end in<br />

itself, the examination under-emphasizes it. Guidance for the examination is given<br />

in Chapter 5.<br />

1.1 Preliminaries<br />

The expression “x := y” defines x as being equal to y, while “x =: y” defines y as<br />

being equal to x.<br />

1.2 Probability Spaces*<br />

A r<strong>and</strong>om experiment involves an outcome that cannot be determined in advance.<br />

The set <strong>of</strong> all possible outcomes is called the certain event <strong>and</strong> denoted Ω.<br />

An event is a subset <strong>of</strong> the certain event. An event A is said to occur if <strong>and</strong> only<br />

if the observed outcome ω <strong>of</strong> the experiment is an element <strong>of</strong> the set A.<br />

Example 1.1 Consider the experiment <strong>of</strong> flipping a coin once. The two possible<br />

outcomes are “Heads” <strong>and</strong> “Tails”, <strong>and</strong> a natural certain event is the set {H, T }.<br />

Given a certain event Ω <strong>and</strong> an event A, the event that occurs if <strong>and</strong> only if A<br />

does not occur is called the complement <strong>of</strong> A <strong>and</strong> is denoted A c . That is,<br />

A c = {ω ∈ Ω : ω /∈ A}.<br />

Given two events A <strong>and</strong> B, their union is the event “A or B”, also written<br />

A ∪ B = {ω ∈ Ω : ω ∈ A or ω ∈ B}.<br />

1


The intersection <strong>of</strong> A <strong>and</strong> B is the event “A <strong>and</strong> B”, also written<br />

A ∩ B = {ω ∈ Ω : ω ∈ A <strong>and</strong> ω ∈ B}.<br />

The operations <strong>of</strong> complement, union, <strong>and</strong> intersection arise frequently <strong>and</strong> give new<br />

events, for which we observe the following identities:<br />

(A ∪ B) c = A c ∩ B c , (A ∩ B) c = A c ∪ B c . (1.1)<br />

The first says “not (A or B)” equals “(not A) <strong>and</strong> (not B)”. The second says “not (A<br />

<strong>and</strong> B)” equals “(not A) or (not B)”.<br />

The set containing no elements is called the empty event <strong>and</strong> is denoted ∅. Note<br />

that Ω c = ∅ <strong>and</strong> ∅ c = Ω.<br />

Event A is said to imply event B, written A ⊂ B, if every element <strong>of</strong> A belongs<br />

to B; in other words, the occurrence <strong>of</strong> A makes the occurrence <strong>of</strong> B certain. This is<br />

also written as B ⊃ A.<br />

The union <strong>of</strong> several (i.e., more than two) events means the occurrence <strong>of</strong> any one<br />

<strong>of</strong> the events, <strong>and</strong> the intersection <strong>of</strong> several events means the occurrence <strong>of</strong> all the<br />

events.<br />

Two events A <strong>and</strong> B are called disjoint if they cannot happen simultaneously;<br />

equivalently, they have no elements in common, i.e.,<br />

A ∩ B = ∅,<br />

A family <strong>of</strong> events is called disjoint if every pair <strong>of</strong> them are disjoint.<br />

The following is our definition <strong>of</strong> probability.<br />

Definition 1.2 Let Ω be a certain event <strong>and</strong> let P be a function that assigns a number<br />

to each event. Then P is called a probability provided that<br />

1. For any event A, 0 ≤ P(A) ≤ 1;<br />

2. P(Ω) = 1;<br />

3. for any sequence A 1 , A 2 , . . . <strong>of</strong> disjoint events,<br />

P(∪ ∞ i=1A i ) =<br />

∞∑<br />

P(A i ) (1.2)<br />

i=1<br />

The following summarises properties <strong>of</strong> a probability P.<br />

Proposition 1.3 (a) P(∅) = 0.<br />

2


(b) For any event A, P(A) + P(A c ) = 1.<br />

(c) If events A <strong>and</strong> B satisfy A ⊂ B, then P(A) ≤ P(B).<br />

(d) If events A 1 , A 2 , . . . A n are disjoint, then P(∪ n i=1A i ) = ∑ n<br />

i=1 P(A i).<br />

Pro<strong>of</strong>. Left as exercise.<br />

✷<br />

3


1.3 R<strong>and</strong>om Variables <strong>and</strong> Distributions<br />

Definition 1.4 A r<strong>and</strong>om variable (rv) X with values in the set E is a function that<br />

assigns a value X(ω) in E to each outcome ω in Ω.<br />

We will refer to the set E as the support <strong>of</strong> X. In all cases <strong>of</strong> interest to us, E is<br />

one <strong>of</strong> the following: (a) the set <strong>of</strong> integer numbers; (b) the set <strong>of</strong> real numbers; (c)<br />

a subset <strong>of</strong> these sets.<br />

Any set containing a support is also a support, so it is most useful to prescribe the<br />

smallest possible support. An example <strong>of</strong> a discrete r<strong>and</strong>om variable is the number,<br />

X, <strong>of</strong> successes during n independent trials each having success probability p, i.e.,<br />

X has the Binomial(n, p) distribution. Here, the smallest possible support is the set<br />

{0, 1, . . . , n}.<br />

Example 1.1 (continued) Define X by putting X(H) = 1, X(T ) = −1. Then X<br />

is a r<strong>and</strong>om variable with support {−1, 1}.<br />

Notationally, we generally abbreviate P({ω : X(ω) ≤ b}) as P(X ≤ b).<br />

The function F defined by<br />

F (b) = P(X ≤ b),<br />

−∞ < b < ∞<br />

is called the (Cumulative) Distribution Function <strong>of</strong> the r<strong>and</strong>om variable X. We call<br />

F in short the cdf <strong>of</strong> X.<br />

Example 1.1 (continued) Suppose the probability <strong>of</strong> “Heads” is 0.6. We have<br />

P(X = −1) = P({T }) = 1 − P({H}) = 0.4, P(X = 1) = P({H}) = 0.6,<br />

<strong>and</strong> thus the cdf <strong>of</strong> X is<br />

⎧<br />

⎨ 0 b < −1,<br />

F (b) = P(X ≤ b) = 0.4 −1 ≤ b < 1,<br />

⎩<br />

1 1 ≤ b.<br />

(1.3)<br />

A cdf has the following properties.<br />

Proposition 1.5 If F is a cdf <strong>of</strong> a finite-valued rv, then:<br />

(a) F is nondecreasing.<br />

(b) F is right-continuous .<br />

(c) F (−∞) = lim x→−∞ F (x) = 0 .<br />

4


(d) F (∞) = lim x→∞ F (x) = 1 .<br />

Pro<strong>of</strong>. We only prove (a). If a ≤ b, then<br />

{X ≤ a} ⊂ {X ≤ b} (1.4)<br />

<strong>and</strong> Proposition 1.3(c) gives<br />

P(X ≤ a) ≤ P(X ≤ b). (1.5)<br />

✷<br />

1.3.1 Discrete distributions<br />

In Example 1.1 the support had two points.<br />

discrete rv X with support the ordered points<br />

Consider now the general case <strong>of</strong> a<br />

x 1 < x 2 < x 3 < . . . . (1.6)<br />

Then, for each k = 1, 2, 3, . . ., the cdf at x k is the sum <strong>of</strong> the probabilities associated<br />

to the values less than or equal to x k :<br />

F (x k ) =<br />

∑<br />

P(X = x j ) = ∑<br />

j:x j ≤x k<br />

j:j≤k<br />

P(X = x j ).<br />

Because each term in the sum is non-negative, we have F (x 1 ) ≤ F (x 2 ) ≤ . . .. Thus,<br />

the cdf is the non-decreasing step function<br />

⎧<br />

0 x < x 1<br />

F (x 1 ) x 1 ≤ x < x 2<br />

⎪⎨ F (x 2 ) x 2 ≤ x < x 3<br />

F (x) =<br />

.<br />

F (x k ) x k ≤ x < x k+1<br />

⎪⎩<br />

.<br />

(1.7)<br />

In the opposite direction, the individual probabilities follow from the cdf as P(X =<br />

x k ) = F (x k ) − F (x k−1 ) for all k.<br />

1.3.2 Continuous distributions<br />

A major class <strong>of</strong> non-discrete distributions has a support that is not countable, meaning<br />

it cannot be enumerated, i.e., cannot be represented as in (1.6). A typical such<br />

support is any subinterval <strong>of</strong> the real numbers, i.e, [a, b], with a < b. (Enumerating<br />

5


this set is impossible because for any real number x 1 > a, there exists a real number<br />

x 2 such that a < x 2 < x 1 .)<br />

Suppose further that F is differentiable at x, i.e., the left- <strong>and</strong> right-derivatives <strong>of</strong><br />

F at x are equal, i.e., we can define<br />

F ′ (x) = lim<br />

ɛ→0+<br />

F (x + ɛ) − F (x)<br />

ɛ<br />

= lim<br />

ɛ→0+<br />

F (x) − F (x − ɛ)<br />

ɛ<br />

F (x + ɛ) − F (x − ɛ)<br />

= lim<br />

.<br />

ɛ→0+ 2ɛ<br />

(1.8)<br />

The last enumerator is P(x − ɛ < X ≤ x + ɛ), so F ′ (x) describes the “probabilistic<br />

intensity” <strong>of</strong> falling “arbitrarily close to x”. For this reason, it is called the probability<br />

density function (pdf) (<strong>of</strong> F <strong>and</strong> <strong>of</strong> the associated rv).<br />

The cdf F <strong>and</strong> its pdf F ′ are mirrors <strong>of</strong> each other: they are linked by a differentiation<br />

step when going from F to F ′ (by definition); <strong>and</strong> they are linked by an<br />

integration step when going from F ′ to F (this is said by the Fundamental Theorem<br />

<strong>of</strong> Calculus (FTC)). The integration link is<br />

F (b) = F (b) − F (−∞) =<br />

∫ b<br />

−∞<br />

F ′ (t)dt, −∞ < b < ∞. (1.9)<br />

Alternatively, we could calculate the function 1 − F (b) = P(X > b) (called the tail<br />

probability or complementary cdf ) as<br />

1 − F (b) = F (∞) − F (b) =<br />

∫ ∞<br />

b<br />

F ′ (t)dt, −∞ < b < ∞.<br />

The integral in (1.9) is the area under the graph <strong>of</strong> F ′ between −∞ <strong>and</strong> b. If F ′<br />

changes form anywhere there, then the integral (area) is typically calculated piecewise.<br />

The following is a minimal example illustrating this.<br />

Example 1.6 Equiprobable outcomes on [2, 4] ∪ [5, 6] (i.e., the union <strong>of</strong> these real<br />

intervals) are described by the pdf<br />

f(x) =<br />

{<br />

1<br />

3 , 2 ≤ x ≤ 4<br />

1<br />

, 5 ≤ x ≤ 6<br />

3<br />

F (x) is calculated as the area piecewise. The answer is<br />

⎧<br />

x−2<br />

⎪⎨ , 2 ≤ x ≤ 4<br />

3<br />

2<br />

F (x) =<br />

3<br />

⎪⎩<br />

, 4 < x ≤ 5<br />

2<br />

+ x−5,<br />

5 < x ≤ 6<br />

3 3<br />

The Uniform Distribution<br />

Let a < b. The Unif(a, b) distribution has support<br />

[a, b] <strong>and</strong> constant pdf. Thus, the cdf F satisfies F (a) = 0 <strong>and</strong> F (b) = 1. The<br />

6


constant value <strong>of</strong> the pdf, call it c, is determined from<br />

1 = F (b) =<br />

i.e., c = 1/(b − a). Thus,<br />

F (x) =<br />

∫ x<br />

a<br />

∫ b<br />

a<br />

cdt = c(b − a),<br />

1<br />

b − a dt = x − a , a ≤ x ≤ b. (1.10)<br />

b − a<br />

In particular, the Unif(0, 1) distribution has cdf F (x) = x <strong>and</strong> pdf f(x) = 1 with<br />

support [0, 1].<br />

The Exponential Distribution Let λ > 0. We write X ∼ Expon(λ) <strong>and</strong> read<br />

“X has the exponential distribution with rate λ” to mean that X has support [0, ∞)<br />

<strong>and</strong> has tail probability<br />

P(X > x) = e −λx , x > 0. (1.11)<br />

Equivalently, the cdf is<br />

P(X ≤ x) = 1 − e −λx , x ≥ 0,<br />

<strong>and</strong> the pdf is<br />

d<br />

dx (1 − e−λx ) = λe −λx , x > 0.<br />

7


1.4 Conditional Probability<br />

Let Ω be a certain event <strong>and</strong> let P be a probability on it.<br />

Definition 1.7 Let B be an event such that P(B) > 0. For any event A, the conditional<br />

probability <strong>of</strong> A given B, written P(A|B), is a number satisfying<br />

(a) 0 ≤ P(A|B) ≤ 1;<br />

(b) P(A ∩ B) = P(A|B)P(B).<br />

Remark 1.8 Fix B such that P(B) > 0. Then, the P(A|B), viewed as a function <strong>of</strong><br />

A, satisfies all the conditions in Definition 1.2. That is:<br />

1. 0 ≤ P(A|B) ≤ 1;<br />

2. P(Ω|B) = 1;<br />

3. For any sequence A 1 , A 2 , . . . <strong>of</strong> disjoint events, P(∪ ∞ i=1A i |B) = ∑ ∞<br />

i=1 P(A i|B).<br />

That is, the probability P(·|B) satisfies all the usual properties as the unconditional<br />

probability, P(·).<br />

The intuition is: knowledge that the event B has occurred forces us to revise<br />

the probability <strong>of</strong> all events. A key exception to this happens when A <strong>and</strong> B are<br />

independent, as will be seen shortly.<br />

We now give a fundamental idea for probabilistic calculations.<br />

Definition 1.9 (Partition.) The set <strong>of</strong> events {B 1 , B 2 , . . .} is called a partition if it<br />

satisfies:<br />

1. (Disjointness.) B i ∩ B j = ∅ for all i ≠ j (B i <strong>and</strong> B j cannot happen simultaneously).<br />

2. (Exhaustiveness.) ∪ ∞ i=1B i = Ω. (One <strong>of</strong> the B i must happen.)<br />

Theorem 1.10 (Law <strong>of</strong> Total Probability (LTP)). Let B 1 , B 2 , . . . be a partition, i.e.,<br />

the B i are disjoint <strong>and</strong> exhaustive. Then, for any event A,<br />

P(A) =<br />

∞∑<br />

P(A|B i )P(B i ). (1.12)<br />

i=1<br />

8


Pro<strong>of</strong>. Write<br />

A = ∪ ∞ i=1(A ∩ B i ) (1.13)<br />

<strong>and</strong> take probabilities, noting that the A ∩ B i are disjoint because the B i are disjoint,<br />

to obtain<br />

∞∑<br />

P(A) = P(A ∩ B i ).<br />

i=1<br />

Finally, replace P(A ∩ B i ) by P(A|B i )P(B i ). ✷<br />

9


1.5 Independence<br />

Definition 1.11 Events A <strong>and</strong> B are said to be independent if<br />

P(A ∩ B) = P(A)P(B). (1.14)<br />

If (1.14) holds, then Definition 1.7 gives P(A|B) = P(A) (<strong>and</strong> also P(B|A) =<br />

P(B)), i.e., the conditional probabilities equal the unconditional ones. The intuition<br />

behind independence is that the occurrence <strong>of</strong> one event does not contain information<br />

about the occurrence <strong>of</strong> the other.<br />

Definition 1.12 The r<strong>and</strong>om variables X 1 <strong>and</strong> X 2 are said to be independent if<br />

the events {X 1 ≤ b 1 } <strong>and</strong> {X 2 ≤ b 2 } are independent for all b 1 , b 2 . (1.15)<br />

It turns out that (1.15) implies that all “interesting” events about X 1 <strong>and</strong> X 2 are<br />

independent (e.g., events where “≤” is replaced by “≥” or by “=” are also independent).<br />

10


1.6 Mean <strong>and</strong> Variance, <strong>and</strong> Sums<br />

Definition 1.13 Suppose X is a discrete r<strong>and</strong>om variable with support {1, 2, 3, . . .}<br />

<strong>and</strong> its distribution is P(X = i) = p i for all i. The expected value (mean) <strong>of</strong> X is<br />

E[X] = 1p 1 + 2p 2 + 3p 3 + · · · =<br />

∞∑<br />

ip i . (1.16)<br />

i=1<br />

E[X] can be seen to be the area on the (x, y) plane, bounded below by the cdf <strong>of</strong><br />

X, bounded above by the line y = 1, <strong>and</strong> bounded to the left by the line x = 0.<br />

Definition 1.14 Suppose X is a r<strong>and</strong>om variable with mean µ. The variance <strong>of</strong> X<br />

is the mean square deviation <strong>of</strong> X from its mean µ:<br />

Var(X) = E[(X − µ) 2 ]. (1.17)<br />

Here is how summation <strong>of</strong> r<strong>and</strong>om variables affects the mean <strong>and</strong> variance.<br />

Fact 1.15 1. For any r<strong>and</strong>om variable X <strong>and</strong> constants a, b,<br />

E[a + bX] = a + bE[X]. (1.18)<br />

2. For any r<strong>and</strong>om variables X 1 <strong>and</strong> X 2 ,<br />

E[X 1 + X 2 ] = E[X 1 ] + E[X 2 ]. (1.19)<br />

Fact 1.16 If X 1 <strong>and</strong> X 2 are independent r<strong>and</strong>om variables, then<br />

Var(X 1 + X 2 ) = Var(X 1 ) + Var(X 2 ). (1.20)<br />

Properties (1.19) <strong>and</strong> (1.20) extend to any sum <strong>of</strong> finitely many r<strong>and</strong>om variables,<br />

as can be seen by mathematical induction; that is,<br />

E[X 1 + X 2 + . . . + X n ] = E[X 1 ] + E[X 2 ] + . . . + E[X n ],<br />

Var(X 1 + X 2 + . . . + X n ) = Var(X 1 ) + Var(X 2 ) + . . . + Var(X n ) for independent X’s.<br />

11


1.7 Probability Generating Functions<br />

Definition 1.17 Let X be a discrete r<strong>and</strong>om variable with support {0, 1, . . .} <strong>and</strong><br />

probabilities p n := P(X = n), where ∑ ∞<br />

n=0 p n = 1. The probability generating function<br />

(pgf) <strong>of</strong> X is the function<br />

G(z) =<br />

∞∑<br />

p n z n , 0 ≤ z ≤ 1. (1.21)<br />

n=0<br />

The pgf contains the distribution <strong>of</strong> X, as stated below.<br />

Fact 1.18 The function G <strong>and</strong> its derivatives <strong>of</strong> all orders are continuous functions<br />

on [0, 1].<br />

Write the n-order derivative <strong>of</strong> G as G (n) , <strong>and</strong> put G (0) = G. We have:<br />

G (1) (z) = d ∞<br />

dz G(z) = ∑ d<br />

dz p nz n =<br />

G (2) (z) = d dz G(1) (z) =<br />

n=0<br />

∞∑<br />

n=1<br />

∞∑<br />

p n nz n−1 , (1.22)<br />

n=1<br />

d<br />

dz p nnz n−1 =<br />

∞∑<br />

p n n(n − 1)z n−2 . (1.23)<br />

The derivatives <strong>of</strong> higher order work analogously. At z = 0, powers 0 k appear; they<br />

equal zero unless k = 0, where 0 0 = 1. That is, we get<br />

n=2<br />

G(0) = p 0 , G (1) (0) = p 1 , G (2) (0) = 2p 2 ,<br />

<strong>and</strong>, in the same way, G (n) (0) = n!p n for all n > 0. We now see all <strong>of</strong> the following:<br />

Fact 1.19 The pgf gives the distribution as<br />

p 0 = G(0), p n = G(n) (0)<br />

, n = 1, 2, . . . . (1.24)<br />

n!<br />

Fact 1.20 G(1) = ∑ ∞<br />

n=0 p n = 1, directly from (1.21).<br />

Fact 1.21 The pgf determines the mean <strong>of</strong> X as<br />

E[X] =<br />

∞∑<br />

p n n = G (1) (1) from (1.22).<br />

n=1<br />

12


1.8 Two Major Limit Theorems<br />

For independent <strong>and</strong> identically distributed (iid) r<strong>and</strong>om variables X 1 , X 2 , . . . , there<br />

are strong theorems about their average <strong>and</strong> their sum, as the number n being averaged<br />

or summed goes to infinity. The first theorem is about the average.<br />

Theorem 1.22 (Strong Law <strong>of</strong> Large Numbers (SLLN)) If X 1 , X 2 , . . . are independent<br />

<strong>and</strong> identically distributed (iid) r<strong>and</strong>om variables with finite mean µ, then their<br />

∑<br />

average, ¯Xn = 1 n<br />

n i=1 X i, converges to µ in a probabilistic sense:<br />

(<br />

)<br />

P ¯X n = µ = 1. (1.25)<br />

lim<br />

n→∞<br />

In words, the average converges to the mean with probability one, or w.p. 1.<br />

The second theorem is about the sum, S n = X 1 + X 2 + · · · + X n , as n goes to<br />

infinity. Here is an example where the theorem is useful.<br />

Example 1.23 A person, JD, arrives at a single-server system <strong>and</strong> finds n = 100<br />

customers ahead, one <strong>of</strong> them already in service. Suppose the individual customer<br />

service (processing) times are iid r<strong>and</strong>om variables X 1 , X 2 , . . ., with mean µ = 5 <strong>and</strong><br />

variance σ 2 = 4. Can we approximate JD’s waiting time in queue?<br />

In general, the in-service customer has a remaining service time, S rem , whose<br />

distribution depends on the elapsed time in service (unless service times are exponentially<br />

distributed, as we will see). But provided n is not small, the sum <strong>of</strong> n service<br />

times should not be affected too much by any one. Let us then pretend the in-service<br />

customer’s remaining service time is like a “fresh” one, so JD’s waiting time is the<br />

sum S 100 = X 1 + X 2 + . . . + X 100 , where the X i are iid with mean 5 <strong>and</strong> variance 4.<br />

One approximation could be based, roughly, on the SLLN:<br />

S 100 = 100 1<br />

100 (X 1 + . . . + X 100 ) ≈ 500. (1.26)<br />

} {{ }<br />

≈µ=5<br />

Using only the mean service time, we got a deterministic approximation <strong>of</strong> S 100 .<br />

The Central Limit Theorem, (1.27) below, says that the distribution <strong>of</strong> S n<br />

approximately Normal with mean<br />

<strong>and</strong> variance<br />

approx<br />

Thus, S 100 ∼<br />

E[S n ] = nµ by (1.19)<br />

Var(S n ) = nσ 2 by (1.20).<br />

N(500, 400), where approx<br />

∼<br />

means “is distributed approximately as”,<br />

<strong>and</strong> N(µ, σ 2 ) denotes a Normal distribution with mean µ <strong>and</strong> variance σ 2 .<br />

13<br />

is


The CLT is recorded below.<br />

Theorem 1.24 (Central Limit Theorem (CLT)) Let X 1 , X 2 , . . . be independent <strong>and</strong><br />

identically distributed (iid) r<strong>and</strong>om variables with mean µ <strong>and</strong> variance σ 2 < ∞, <strong>and</strong><br />

put S n = X 1 + X 2 + · · · + X n . Then<br />

S n − nµ<br />

σ √ n<br />

D<br />

→ N(0, 1) as n → ∞,<br />

where D → means convergence in distribution. More practically,<br />

approx<br />

S n ∼ N(nµ, nσ 2 ) = D nµ + σ √ nN(0, 1)<br />

}{{} } {{ }<br />

deterministic stochastic<br />

as n → ∞, (1.27)<br />

where D = means equality in distribution 1 .<br />

Remarks:<br />

1. No assumption is made about the distribution <strong>of</strong> the X i .<br />

2. Assuming µ ≠ 0, we can see that as n → ∞, the stochastic effect becomes (in<br />

the limit) negligible relative to the deterministic effect. To see this, divide the<br />

multiplier <strong>of</strong> the stochastic term N(0, 1) over the deterministic term:<br />

√ nσ<br />

<strong>and</strong> note this goes to zero.<br />

nµ<br />

1 Recall that X ∼ N(0, 1) is equivalent to aX + b ∼ N(b, a 2 ), for any a, b.<br />

14


.<br />

15


1.9 Questions / Exercises<br />

Indications: “P”: pro<strong>of</strong>; “S”: short answer that refers appropriately to definitions.<br />

1. Two st<strong>and</strong>ard dice are thrown, <strong>and</strong> the outcomes X 1 , X 2 (a number in {1, 2, . . . 6}<br />

for each) are recorded. Assume X 1 <strong>and</strong> X 2 are independent. Are X 1 <strong>and</strong> X 1 +X 2<br />

independent r<strong>and</strong>om variables? Explain carefully.<br />

2. For a r<strong>and</strong>om variable with pdf f(), the expected value can be defined as<br />

µ = E[X] =<br />

Its variance can then be defined as<br />

Var(X) =<br />

∫ ∞<br />

−∞<br />

∫ ∞<br />

−∞<br />

xf(x)dx. (1.28)<br />

(x − µ) 2 f(x)dx. (1.29)<br />

Compute the mean <strong>and</strong> variance <strong>of</strong> the exponential distribution seen in Section<br />

1.3.2. Hint: Note the identity Var(X) = E[(X − µ) 2 ] = E[X 2 − 2µX + µ 2 ] =<br />

E[X 2 ] − 2µE[X] + µ 2 = E[X 2 ] − µ 2 .<br />

3. Out <strong>of</strong> n servers in a system (e.g., persons answering calls at a call center), seen<br />

at a certain time point, say at noon, some are temporarily absent (e.g., they are<br />

at the bathroom). Assume individual-server absences occur independently <strong>of</strong><br />

each other with probability p. Let N be the number <strong>of</strong> absent servers. Express<br />

the mean <strong>and</strong> variance <strong>of</strong> N in terms <strong>of</strong> n <strong>and</strong> p <strong>and</strong> use the CLT to approximate<br />

the distribution <strong>of</strong> N.<br />

4. In Example 1.23, let the number <strong>of</strong> customers ahead (number <strong>of</strong> rv’s in the sum)<br />

be n. Express the probability that the approximated (i.e., normally-distributed)<br />

waiting time is at least 10% above its mean, writing it as a function <strong>of</strong> the N(0, 1)<br />

cdf <strong>and</strong> n, then compute it explicitly for n = 4, 16, 64, <strong>and</strong> 256. Determine the<br />

limit <strong>of</strong> this as n → ∞.<br />

5. (S) Suppose a customer arrives at a system where multiple servers each serve<br />

their own queue <strong>and</strong> chooses to join the shortest queue. Suggest assumptions,<br />

focusing on the customer service times seen as r<strong>and</strong>om variables, that would<br />

explain this customer behaviour. Strive for the weakest assumption(s) possible.<br />

Then, assuming shorter waiting is preferred, suggest a situation where join-theshortest<br />

queue would not necessarily make sense.<br />

6. (P)<br />

(a) Let X <strong>and</strong> Y be independent r<strong>and</strong>om variables, each with support the<br />

integer numbers. Show that<br />

E[XY ] = E[X]E[Y ]. (1.30)<br />

Hint: Put A i := {X = i} <strong>and</strong> B j := {Y = j}. Use the fact that A i <strong>and</strong><br />

B j are independent events for all i <strong>and</strong> j.<br />

(b) Suppose X 1 <strong>and</strong> X 2 are rvs, <strong>and</strong> let µ 1 , µ 2 be their respective means.<br />

Observe<br />

Var(X 1 + X 2 ) = E[(X 1 − µ 1 + X 2 − µ 2 ) 2 ].<br />

Show that when X 1 <strong>and</strong> X 2 are discrete <strong>and</strong> independent, the above equals<br />

Var(X 1 ) + Var(X 2 ). Hint: exp<strong>and</strong> the square appropriately, <strong>and</strong> use properties<br />

<strong>of</strong> E[], including (1.30).<br />

16


7. (P) Throughout, X is a real-valued r<strong>and</strong>om variable with cdf F , <strong>and</strong> b is a<br />

real (number). We give an idea why F is right-continuous at b (the result in<br />

Proposition 1.5(b)). If the events A 1 , A 2 , . . . “decrease to” A, meaning that<br />

A 1 ⊃ A 2 ⊃ A 3 ⊃ . . . <strong>and</strong> A = ∩ ∞ n=1A n , then it can be shown that<br />

P(A) = lim<br />

n→∞<br />

P(A n ). (1.31)<br />

Now, note that the events A n = {X ≤ b + 1/n}, n = 1, 2, 3, . . ., decrease to the<br />

event {X ≤ b}. Applying (1.31),<br />

i.e., F is right-continuous at b.<br />

F (b) = P(X ≤ b) = lim<br />

n→∞<br />

P(A n ) = lim<br />

n→∞<br />

F (b + 1 n )<br />

(a) If the events A 1 , A 2 , . . . “increase to” A, meaning that A 1 ⊂ A 2 ⊂ A 3 ⊂ . . .<br />

<strong>and</strong> A = ∪ ∞ n=1A n , then again (1.31) holds. Using this property, show that<br />

P(X < b) = F (b−) (1.32)<br />

where F (b−) = lim x→b− F (x), the left-limit <strong>of</strong> F at b.<br />

(b) Show that<br />

P(X = b) = F (b) − F (b−). (1.33)<br />

(c) Define X to be a continuous r<strong>and</strong>om variable if<br />

P(X = b) = 0, all b. (1.34)<br />

Restate this definition via a property <strong>of</strong> F , the cdf <strong>of</strong> X.<br />

1.10 Solutions to Exercises<br />

1. Short argument: as X 1 + X 2 contains X 1 , we do not expect X 1 <strong>and</strong> X 1 + X 2 to<br />

be independent. As a more careful argument, we show one conditional probability<br />

about X 1 + X 2 , given an event about X 1 , that is different from the unconditional<br />

one: P(X 1 + X 2 = 8|X 1 = 1) = 0, versus the (easily seen) P(X 1 + X 2 = 8) > 0.<br />

The inequality <strong>of</strong> the two shows that independence fails.<br />

2. We need the integration-by-parts rule 2<br />

∫ b<br />

a<br />

f(x)g ′ (x)dx = f(b)g(b) − f(a)g(a) −<br />

where f ′ (x) = d f(x), <strong>and</strong> likewise for g. The calculation is:<br />

dx<br />

E[X] =<br />

∫ ∞<br />

0<br />

yλe −λy dy = 1 λ<br />

∫ ∞<br />

0<br />

x<br />

∫ b<br />

}{{}<br />

f(x)<br />

a<br />

f ′ (x)g(x)dx. (1.35)<br />

e −x<br />

}{{}<br />

g ′ (x)<br />

2 This is obtained from f(x)g(x) ∣ ∣ b a = ∫ b<br />

a (fg)′ (x)dx = ∫ b<br />

a f ′ (x)g(x)dx + ∫ b<br />

a f(x)g′ (x)dx.<br />

dx<br />

17


(change <strong>of</strong> variable x = λy), where the last integral is<br />

I :=<br />

∫ ∞<br />

0<br />

xe −x dx = x(−e −x ) ∣ ∣ ∞ 0<br />

∫ ∞<br />

− 1 · (−e −x )dx = 0 + e −x∣ ∣ ∞ = 1, 0<br />

0<br />

(by (1.35) with f(x) = x, g(x) = −e −x , a = 0, b = ∞). Thus, E[X] = 1/λ.<br />

Now, calculate E[X 2 ] using similar steps:<br />

E[X 2 ] =<br />

∫ ∞<br />

0<br />

y 2 λe −λy dy = 1 λ 2 ∫ ∞<br />

0<br />

}{{} x 2<br />

f(x)<br />

e −x<br />

}{{}<br />

g ′ (x)<br />

(again x = λy), <strong>and</strong> the last integral is, using (1.35) with f(x) = x 2 <strong>and</strong> g, a, b as<br />

above,<br />

dx,<br />

∫ ∞<br />

0<br />

x 2 e −x dx = x 2 (−e −x ) ∣ ∣ ∞ 0<br />

∫ ∞<br />

− (2x)(−e −x )dx = 0 + 2I = 2.<br />

0<br />

Thus, E[X 2 ] = 2/λ 2 , <strong>and</strong> now Var(X) = 2/λ 2 − (1/λ) 2 = 1/λ 2 .<br />

3. Let I j be r<strong>and</strong>om, taking value 1 if server j is absent, <strong>and</strong> 0 otherwise. The number<br />

<strong>of</strong> absent servers is N = I 1 + I 2 + . . . + I n . By assumption, the I j are independent<br />

<strong>and</strong> identically distributed, where P(I 1 = 1) = p, so P(I 1 = 0) = 1 − p. The CLT<br />

gives N approx<br />

∼ Normal(E[N], Var(N)), where<br />

E[N] = nE[I 1 ]<br />

Var(N) = nVar(I 1 )<br />

E[I 1 ] = 1 · p + 0 · (1 − p) = p<br />

Var(I 1 ) = (1 − p) 2 · p + (0 − p) 2 (1 − p) = p(1 − p).<br />

4. By the CLT, the approximated waiting time when there are n customers ahead is<br />

X ∼ N(nµ, nσ 2 ), or equivalently X−nµ √ nσ<br />

∼ N(0, 1).<br />

The exercise inadvertently asked about the event “X is 10% above its mean”. This<br />

event has probability zero, as will be seen. Now we calculate the probability that<br />

“X ≥ 10% above its mean”, i.e., {X ≥ 1.1nµ}.<br />

P(X ≥ 1.1nµ) = P(X > 1.1nµ) = P<br />

( X − nµ<br />

√ nσ<br />

> 0.1nµ ) ( ) 0.1nµ<br />

√ = 1 − Φ √ nσ nσ<br />

where Φ() is the N(0, 1) cdf. In the first step we used P(X = 1.1nµ) = 0.<br />

3<br />

As n → ∞, the argument inside Φ goes to ∞, so the P() goes to 1 − Φ(∞) =<br />

0. For calculations, I used matlab <strong>and</strong> the code “n=[4 16 64 256]; proba= 1<br />

- normcdf(0.1*5*n./(sqrt(4*n)))”. The probabilities are, in order, 0.3085,<br />

0.1587, 0.0228, <strong>and</strong> 0.00003167.<br />

5. Assume first-come first-serve discipline <strong>and</strong> let n i be the number <strong>of</strong> people at server<br />

i upon arrival <strong>of</strong> a test customer X. Assume X joins one server immediately, <strong>and</strong><br />

does not switch to another server. If service times are identically distributed, then,<br />

3 this follows from (1.34) <strong>and</strong> the continuity <strong>of</strong> the normal distribution.<br />

18


6.(a)<br />

by (1.19), the mean waiting time <strong>of</strong> X at server i is n i b, where b is the mean service<br />

time. Minimising the mean waiting time is then equivalent to minimising n i across<br />

i, i.e., joining the shortest queue. Independence <strong>of</strong> service times is not necessary<br />

in this argument. If a server tends to be “faster” than others, then the assumption<br />

“identically distributed service times across servers” does not seem sensible. Note:<br />

The argument generalises: if the mean service time (per customer) at server i is<br />

b i , then the mean waiting time at i is n i b i , <strong>and</strong> we could minimise this.<br />

E[XY ] = ∑ i<br />

∑<br />

ijP(X = i, Y = j). (1.36)<br />

j<br />

∑<br />

j ijP(X = i)P(Y = j) = ∑ i iP(X =<br />

Using the independence, this equals ∑ i<br />

i) ∑ j<br />

jP(Y = j) = E[X]E[Y ].<br />

Note: it might be more natural to only sum the distinct outcomes <strong>of</strong> XY , unlike<br />

(1.36) above, but this again results in (1.36), as we now explain. For any k, write<br />

{XY = k} as the union (logical “or”) <strong>of</strong> disjoint events ∪ (i,j):ij=k {X = i, Y = j}<br />

(for example, {XY = 3} = {X = 1, Y = 3} ∪ {X = 3, Y = 1}). Thus<br />

P(XY = k) = ∑ (i,j):ij=k<br />

P(X = i, Y = j), <strong>and</strong> (1.36) follows from<br />

E[XY ] = ∑ k<br />

= ∑ k<br />

kP(XY = k)<br />

∑<br />

ijP(X = i, Y = j) = ∑ i<br />

(i,j):ij=k<br />

∑<br />

ijP(X = i, Y = j).<br />

j<br />

(b) (X 1 − µ 1 + X 2 − µ 2 ) 2 = (X 1 − µ 1 ) 2 + (X 2 − µ 2 ) 2 + 2(X 1 − µ 1 )(X 2 − µ 2 ). The<br />

mean <strong>of</strong> this equals<br />

E[(X 1 − µ 1 ) 2 ] + E[(X 2 − µ 2 ) 2 ] + 2E[(X 1 − µ 1 )(X 2 − µ 2 )] by (1.19) <strong>and</strong> (1.18)<br />

= Var(X 1 ) + Var(X 2 ) + 2E[X 1 − µ 1 ]E[X 2 − µ 2 ] by part (a) <strong>and</strong> Var() definition<br />

<strong>and</strong> the rightmost term is zero (E[X 1 − µ 1 ] = E[X 1 ] − µ 1 = 0).<br />

7.(a) The events A n = {X ≤ b − 1/n}, n = 1, 2, . . . increase to the event {X < b}, so<br />

(1.31) gives<br />

P(X < b) = lim<br />

n→∞<br />

P(A n ),<br />

which is lim n F ( b − 1 n)<br />

= F (b−).<br />

(b) Now, {X ≤ b} = {X < b} ∪ {X = b}, <strong>and</strong> the events {X < b} <strong>and</strong> {X = b}<br />

are disjoint, so, by Proposition 1.3(d), P(X ≤ b) = P(X < b) + P(X = b), i.e.,<br />

F (b) = F (b−) + P(X = b), which is (1.33).<br />

(c) From (1.33) <strong>and</strong> (1.34) for a fixed b, we have<br />

F (b) − F (b−) = P(X = b) = 0,<br />

i.e., F is left-continuous at b, which is equivalent to F being continuous at b because<br />

F is always right-continuous. The re-stated definition reads: a continuous<br />

r<strong>and</strong>om variable is one whose cdf is an everywhere-continuous function.<br />

19


Chapter 2<br />

Poisson <strong>and</strong> Related Processes<br />

2.1 Preliminaries<br />

Function Order<br />

If a function f satisfies<br />

then we write f(h) = o(h). Examples:<br />

f(h)<br />

lim<br />

h→0 h = 0,<br />

• h 2 : h2<br />

h<br />

= h → 0, so h2 is o(h)<br />

• √ h: √ h<br />

h = 1 √<br />

h<br />

→ ∞, so √ h is not o(h).<br />

It is easy to check that:<br />

1. f(h) = o(h) ⇒ cf(h) = o(h) for any constant c.<br />

2. f(h) = o(h), g(h) = o(h) ⇒ f(h) + g(h) = o(h).<br />

Later on, we encounter expressions such as λh + o(h), where λ > 0. These mean<br />

to say that the o(h) term becomes negligible compared to λh in the limit as h → 0<br />

(o(h)/h tends to zero, whereas λh/h = λ > 0). In particular, the sign in front <strong>of</strong> o(h)<br />

is irrelevant, as<br />

λh + o(h)<br />

lim<br />

h→0 h<br />

= lim<br />

h<br />

λh − o(h)<br />

h<br />

= lim<br />

h<br />

λh<br />

h = λ.<br />

(Notation: the limit detail is dropped after first use.)<br />

Expansion <strong>of</strong> the Exponential Function Exp<strong>and</strong>ing the exponential function<br />

around zero via Taylor’s Theorem with remainder <strong>of</strong> order 2,<br />

e x = 1 + x + o(x). (2.1)<br />

20


The Poisson Distribution The discrete r<strong>and</strong>om variable N with support {0, 1, 2, . . .}<br />

is said to have the Poisson distribution with mean λ if<br />

P(N = n) = e<br />

−λ λn<br />

, n = 0, 1, 2, . . . . (2.2)<br />

n!<br />

We write this in short N ∼ Poisson(λ). A calculation gives E[N] = λ <strong>and</strong> Var(N) = λ.<br />

21


2.2 Residual Time <strong>and</strong> Memoryless Property<br />

We are interested in the time X until a specified event happens, having in mind<br />

events such as customer arrivals <strong>and</strong> customer service completions. X is modelled as<br />

a positive-real-valued r<strong>and</strong>om variable with known distribution.<br />

Suppose an observer knows that, as <strong>of</strong> s time units ago, the event has not occurred.<br />

Then, the time until the event occurs, called the residual time, is X −s. The observer<br />

is then interested in the conditional distribution <strong>of</strong> X − s given X > s:<br />

P(X − s > t|X > s) t ≥ 0. (2.3)<br />

For s = 5, for example, this is the function<br />

P(X − 5 > t|X > 5), t ≥ 0.<br />

In general the distribution changes with s, reflecting a “memory” mechanism. A key<br />

exception is described below.<br />

Definition 2.1 A positive-valued r<strong>and</strong>om variable X is said to be memoryless if it<br />

satisfies<br />

P(X − s > t|X > s) = P(X > t), s, t ≥ 0. (2.4)<br />

This says that the conditional distribution <strong>of</strong> X − s given X > s equals the<br />

unconditional distribution <strong>of</strong> X; we write this in short<br />

(X − s|X > s) = D X. (2.5)<br />

Proposition 2.2 X satisfies (2.4) ⇔ X has an exponential distribution.<br />

Pro<strong>of</strong> <strong>of</strong> the “⇐”: Observe that<br />

P(X − s > t|X > s) = P(X > s + t|X > s) =<br />

P(X > s + t, X > s)<br />

P(X > s)<br />

now for X ∼ Expon(λ), i.e., satisfying (1.11), this equals<br />

e −λ(s+t)<br />

e −λs<br />

i.e. (2.4) holds. The pro<strong>of</strong> <strong>of</strong> “⇒” is omitted.<br />

= e −λt = P(X > t),<br />

=<br />

P(X > s + t)<br />

;<br />

P(X > s)<br />

Example 2.3 Suppose we are putting out (servicing) fires, <strong>and</strong> X is the time required<br />

to service a fire, in minutes. Suppose we try to predict when the service will finish<br />

given that service started s minutes ago, <strong>and</strong> our predictor is the mean remaining<br />

service time,<br />

g(s) = E[X − s|X > s].<br />

First, we give a simulation method for estimating g(s).<br />

22


1. Fix s. Obtain r<strong>and</strong>om samples <strong>of</strong> (X − s|X > s), as in (2.6) below, independently<br />

<strong>of</strong> others.<br />

2. Calculate ĝ(s) as the sample average. This is an estimate <strong>of</strong> g(s).<br />

How do we obtain r<strong>and</strong>om samples <strong>of</strong> (X − s|X > s)? The simplest method is:<br />

Sample (r<strong>and</strong>omly) X; if X > s, record X − s; otherwise, reject (no record is kept).<br />

(2.6)<br />

The distribution <strong>of</strong> X can affect g(s) a lot. Suppose X ∼ N(40, 100), a Normal<br />

distribution with mean 40 <strong>and</strong> st<strong>and</strong>ard deviation 10. 1 million trials <strong>of</strong> the form<br />

(2.6) gave the following estimates (which are accurate for our purpose):<br />

s ĝ(s)<br />

20 20.5<br />

30 12.9<br />

40 8.0<br />

50 5.2<br />

60 3.7<br />

In contrast, suppose X ∼ Expon(1/40), the Exponential distribution with the<br />

same mean as the Normal, 40. Then, from (2.5), g(s) = E[X] = 40 for all s. This is<br />

very different: for example, for a service that began s = 40 minutes ago, the Normal<br />

gives a mean remaining service time <strong>of</strong> 8.0, much smaller than the Exponential’s 40.<br />

23


2.3 Counting Processes<br />

How do we model events occurring “r<strong>and</strong>omly” over time? Let t denote time, <strong>and</strong> let<br />

it range from 0 to infinity.<br />

One approach focuses on the (cumulative) count <strong>of</strong> events over time:<br />

N t = number <strong>of</strong> events that occur after time 0 <strong>and</strong> up to time t, t ≥ 0.<br />

Another approach focuses on the times <strong>of</strong> events:<br />

S n = time <strong>of</strong> occurrence <strong>of</strong> n-th event, n = 1, 2, . . . , (2.7)<br />

or, equivalently, the inter-event times (times between successive events)<br />

X 1 = S 1 = time <strong>of</strong> 1st event<br />

X 2 = S 2 − S 1 = time between 1st <strong>and</strong> 2nd event<br />

. . . = . . .<br />

X n = S n − S n−1 = time between the (n − 1)-st <strong>and</strong> n-th event, n = 1, 2, 3, . . .<br />

(Put S 0 = 0 so that X n = S n − S n−1 for n = 0 as well.) We make some natural<br />

assumptions:<br />

1. N 0 = 0; that is, the counting starts at time zero.<br />

2. N t is integer-valued, <strong>and</strong> it is a nondecreasing function <strong>of</strong> t.<br />

3. X n > 0 for all n; that is, events occur one at a time–two events cannot happen<br />

at the same time.<br />

As the counts (N t : t ≥ 0) <strong>and</strong> the event times (S n , n = 0, 1, 2, . . .) are alternative<br />

descriptions <strong>of</strong> the same (r<strong>and</strong>om) experiment, they are connected by<br />

{“time <strong>of</strong> the n-th event” > t} ⇔ {“# events in (0, t]” < n}<br />

for all n <strong>and</strong> t<br />

i.e., we have the equality <strong>of</strong> events<br />

{S n > t} = {N t < n} (2.8)<br />

<strong>and</strong><br />

{N t = n} = {S n ≤ t < S n+1 } (2.9)<br />

for any n = 0, 1, 2, . . . <strong>and</strong> t ≥ 0. These event equalities are fundamental to calculations<br />

later on.<br />

24


2.4 The Poisson Process<br />

We study a special counting process, the Poisson, that is analytically simple <strong>and</strong><br />

commonly used.<br />

2.4.1 Distribution <strong>of</strong> counts (the N t process)<br />

For s, t ≥ 0, we have by definition<br />

N s+t − N s = # (number <strong>of</strong>) events that occur after s <strong>and</strong> up to s + t.<br />

This is the count or increment on (the time interval) (s, s + t].<br />

The description is via the (probabilistic) behaviour <strong>of</strong> such counts, particularly on<br />

disjoint (non-overlapping) intervals.<br />

Definition 2.4 The counting process (N t : t ≥ 0) is called a Poisson process <strong>of</strong> rate<br />

λ if:<br />

1. (Independence) The counts on disjoint intervals are independent rv’s.<br />

2. (Identical distribution, or stationarity) The distribution <strong>of</strong> N s+t − N s depends<br />

on the interval’s length, t, <strong>and</strong> not the startpoint, s.<br />

3. (Small-interval behaviour)<br />

P(N h = 1) = λh + o(h), P(N h ≥ 2) = o(h), as h → 0. (2.10)<br />

Here is another definition:<br />

Definition 2.5 The counting process (N t : t ≥ 0) is called a Poisson process <strong>of</strong> rate<br />

λ if:<br />

1. (Independence) The counts on disjoint intervals are independent rv’s.<br />

2. (Poisson distribution) The count in any interval <strong>of</strong> length t (t ≥ 0) has the<br />

Poisson distribution with mean λt. That is, for any s, t ≥ 0,<br />

−λt (λt)n<br />

P(N s+t − N s = n) = P(N t = n) = e , n = 0, 1, . . . (2.11)<br />

n!<br />

For s, t ≥ 0, the intervals (0, s] <strong>and</strong> (s, s + t] are disjoint, so N s <strong>and</strong> N s+t − N s are<br />

independent.<br />

Theorem 2.6 Definition 2.4 ⇔ Definition 2.5.<br />

25


Pro<strong>of</strong>. Pro<strong>of</strong> <strong>of</strong> “⇒”. It suffices to show that the functions<br />

are as specified on the right <strong>of</strong> (2.11).<br />

p n (t) = P(N t = n), t ≥ 0, n = 0, 1, 2, . . . ,<br />

We will show this by writing <strong>and</strong> solving<br />

differential equations linking the p n (t) <strong>and</strong> their derivatives, (d/dt)p n (t), across n. As<br />

N 0 = 0, these functions at time 0 are:<br />

p 0 (0) = 1, p n (0) = 0, n = 1, 2, . . . .<br />

We put p −1 (t) = 0, all t; this function has no physical meaning <strong>and</strong> serves to<br />

simplify the notation.<br />

Fix h > 0 <strong>and</strong> put D h = N t+h − N t . Note D h D = N h , by the identical-distribution<br />

assumption. Fix n. Then<br />

{N t+h = n} = {N t = n, D h = 0} or {N t = n − 1, D h = 1} or {D h ≥ 2, N t = n − D h }<br />

(2.12)<br />

where the “or”’s are between disjoint events (as the value <strong>of</strong> D h is different across).<br />

Now note<br />

P(D h ≥ 2, N t = n − D h ) ≤ P(D h ≥ 2) = o(h) by (2.10).<br />

Taking probabilities in (2.12) <strong>and</strong> using the above gives<br />

p n (t + h) = P(N t = n, D h = 0) + P(N t = n − 1, D h = 1) + o(h), (2.13)<br />

<strong>and</strong> the probabilities on the right can be written<br />

<strong>and</strong>, similarly,<br />

P(N t = n, D h = 0) = P(N t = n)P(D h = 0) by independence<br />

= p n (t)[1 − λh + o(h)] by (2.10),<br />

P(N t = n − 1, D h = 1) = P(N t = n − 1)P(D h = 1) by independence<br />

= p n−1 (t)[λh + o(h)] by (2.10).<br />

Thus (2.13) gives (substitute the above, re-arrange, <strong>and</strong> divide by h):<br />

(<br />

p n (t + h) − p n (t)<br />

= − λ + o(h) ) (<br />

p n (t) + λ + o(h) )<br />

p n−1 (t) + o(h)<br />

h<br />

h<br />

h<br />

h .<br />

Now take limits as h ↓ 0 (i.e., h decreases to zero); the o(h)/h terms tend to zero, so<br />

d<br />

dt p n(t) = −λp n (t) + λp n−1 (t), n = 0, 1, 2, . . . . (2.14)<br />

26


To solve this set <strong>of</strong> differential equations, multiply both sides by e λt <strong>and</strong> re-arrange<br />

to obtain:<br />

λe λt p n−1 (t) = e λt d dt p n(t) + λe λt p n (t) = d (<br />

e λt p n (t) ) . (2.15)<br />

dt<br />

The above can be solved for p n (t) in the order n = 0, 1, 2, . . ., as follows. For n = 0:<br />

p −1 (t) = 0, <strong>and</strong> analytical integration <strong>of</strong> (2.15) gives<br />

0 = d (<br />

e λt p 0 (t) ) ⇒ e λt p 0 (t) = c ⇒ p 0 (t) = ce −λt , t ≥ 0<br />

dt<br />

for some constant c, <strong>and</strong> the requirement p 0 (0) = 1 gives c = 1. Now we use induction<br />

on n. Assume<br />

−λt (λt)n−1<br />

p n−1 (t) = e<br />

(n − 1)! .<br />

Putting this into the left <strong>of</strong> (2.15),<br />

λ n t n−1<br />

(n − 1)! = d (<br />

e λt p n (t) ) . (2.16)<br />

dt<br />

Integrating analytically,<br />

e λt p n (t) = λn t n<br />

+ c<br />

n!<br />

for some constant c, <strong>and</strong> the condition p n (0) = 0 gives c = 0. This concludes the<br />

induction step <strong>and</strong> the pro<strong>of</strong>.<br />

Pro<strong>of</strong> <strong>of</strong> “⇐”: Using (2.11) <strong>and</strong> expressing the exponentials as in (2.1), we find<br />

P(N h = 0) = e −λh ,<br />

P(N h = 1) = e −λh λh = λh + o(h)<br />

P(N h ≥ 2) = 1 − P(N h = 0) − P(N h = 1) = 1 − e −λh − e −λh λh = o(h).<br />

✷<br />

Example 2.7 (Calculating joint probabilities.) Let (N t : t ≥ 0) be a Poisson process<br />

<strong>of</strong> rate λ. For times 0 = t 0 < t 1 < t 2 < . . . < t k <strong>and</strong> natural numbers n 1 ≤ n 2 ≤ . . . ≤<br />

n k , probabilities <strong>of</strong> the form<br />

P(N t1 = n 1 , N t2 = n 2 , . . . , N tk = n k )<br />

are calculable via the independence <strong>and</strong> Poisson-distribution properties:<br />

P(N t1 = n 1 , N t2 = n 2 , . . . , N tk = n k )<br />

= P(N t1 − N t0 = n 1 , N t2 − N t1 = n 2 − n 1 , . . . , N tk − N tk−1 = n k − n k−1 )<br />

= P(N t1 − N t0 = n 1 )P(N t2 − N t1 = n 2 − n 1 ) · · · P(N tk − N tk−1 = n k − n k−1 )<br />

=<br />

by independence<br />

k∏<br />

e −λ(t i−t i−1 ) [λ(t i − t i−1 )] n i−n i−1<br />

(n i − n i−1 )!<br />

i=1<br />

27<br />

by (2.11).


2.4.2 Distribution <strong>of</strong> Times (the X n <strong>and</strong> S n )<br />

Theorem 2.8 The counting process, N = (N t : t ≥ 0) is a Poisson process <strong>of</strong> rate<br />

λ ⇔ The associated inter-event times, X 1 = S 1 , X 2 = S 2 − S 1 , X 3 = S 3 − S 2 , . . . are<br />

independent exponentially distributed rv’s <strong>of</strong> rate λ.<br />

Pro<strong>of</strong>. (Partial pro<strong>of</strong>.) Apply (2.8) for n = 1:<br />

Taking probabilities,<br />

{S 1 > t} = {N t = 0}. (2.17)<br />

P(S 1 > t) = P(N t = 0) = e −λt , t ≥ 0<br />

where the last step holds by (2.11). That is, S 1 has the Expon(λ) distribution, as<br />

claimed. The remainder is a sketch <strong>of</strong> the remaining pro<strong>of</strong>. Using the independence<br />

property <strong>of</strong> the N t process, it can be shown that S 2 −S 1 is independent <strong>of</strong> S 1 ; moreover,<br />

using the stationarity property, it can be shown that S 2 −S 1 has the same distribution<br />

as S 1 . Similarly, one can show that for any n, S n − S n−1 is independent <strong>of</strong> the<br />

corresponding differences for any smaller n, <strong>and</strong> moreover it has the same distribution<br />

as S 1 .<br />

✷<br />

Now we give the distribution <strong>of</strong> S n by taking probabilities in (2.8):<br />

P(S n > t) = P(N t < n) = P(∪ n−1<br />

k=0 {N ∑n−1<br />

∑n−1<br />

−λt (λt)k<br />

t = k}) = P(N t = k) = e , t ≥ 0.<br />

k!<br />

k=0<br />

k=0<br />

(2.18)<br />

(Note in step 3 that the events N t = k are disjoint across k.) In the last step, we<br />

used the Poisson formula, (2.11). The above distribution is known as Gamma(n, λ)<br />

<strong>and</strong> also as Erlang-n.<br />

28


2.4.3 Simulating a Poisson Process<br />

To simulate a Poisson process, the result from Theorem 2.8 is applied. Specifically, we<br />

simulate the inter-event times by using the fact that for any λ > 0, a r<strong>and</strong>om sample<br />

<strong>of</strong> the Expon(λ) distribution can be obtained as (−1/λ) log(U), where log denotes<br />

natural logarithm <strong>and</strong> U ∼ Unif(0, 1) denotes a pseudo-r<strong>and</strong>om number. Note that<br />

log(U) is always negative, <strong>and</strong> the result is always positive, as it should.<br />

Method 2.9 (How to sample a Poisson process on a finite interval.) Given T > 0<br />

<strong>and</strong> λ > 0, the event times <strong>of</strong> a Poisson process <strong>of</strong> rate λ on the interval [0, T ] can be<br />

simulated based on the inter-event times, as follows. Set S 0 = 0, <strong>and</strong> for n = 1, 2, . . .,<br />

set S n = S n−1 +X n , where X n , the n-th inter-event time is sampled as (−1/λ) log(U n ),<br />

where U n ∼ Unif(0, 1) is independent <strong>of</strong> all other U’s; stop as soon as S n > T .<br />

29


2.4.4 Merging Poisson Processes<br />

Suppose we merge the events <strong>of</strong> two independent Poisson processes <strong>of</strong> rates λ 1 <strong>and</strong><br />

λ 2 (i.e., all the N t are independent across the two; or, equivalently, all the X n , or all<br />

the S n , are independent across the two). Put<br />

Nt i = number <strong>of</strong> events <strong>of</strong> process i that occur up to time t, i = 1, 2<br />

N t = Nt 1 + Nt 2 = number <strong>of</strong> events <strong>of</strong> the merged process that occur up to time t<br />

Then the merged process (N t : t ≥ 0) is Poisson <strong>of</strong> rate λ 1 + λ 2 . This property<br />

generalises to merging any finite number <strong>of</strong> processes, by induction. That is:<br />

Proposition 2.10 The merging <strong>of</strong> the events <strong>of</strong> independent Poisson processes gives<br />

a Poisson process <strong>of</strong> rate equal to the sum <strong>of</strong> the rates <strong>of</strong> the merged processes.<br />

Sketch <strong>of</strong> pro<strong>of</strong> (merge two processes for simplicity):<br />

1. The merged process has independent counts on disjoint intervals; this comes<br />

from the independence <strong>of</strong> counts across the processes being merged <strong>and</strong> across<br />

disjoint time intervals for the same process.<br />

2. Check that the merged process satisfies (2.10) with the rate claimed.<br />

Application 2.11 Customer JD arrives to find q customers waiting in queue, at<br />

a system <strong>of</strong> k non-idling servers (servers do not idle if work is available), where<br />

customer service times are independent Expon(µ) rvs. Assuming first-come firstserve<br />

discipline, how long is JD’s waiting time?<br />

1. JD joins the queue in position q + 1 (due to first-come first-serve discipline).<br />

Thus, with time 0 being the time he joins, his waiting time equals the time <strong>of</strong><br />

the (q + 1)-st (customer) departure (service completion).<br />

2. While JD waits, each server is working (due to no-server-idling). The independent<br />

Expon(µ) service times mean that the departures (i.e., the counting<br />

process <strong>of</strong> departure events) at a particular server is a Poisson processes <strong>of</strong> rate<br />

µ (Theorem 2.8), <strong>and</strong> independent <strong>of</strong> the departures at other servers.<br />

3. Departures from the system are simply the merging <strong>of</strong> departures at individual<br />

servers, so they form a Poisson process <strong>of</strong> rate k times µ (Proposition 2.10).<br />

4. The distribution <strong>of</strong> the time <strong>of</strong> the (q + 1)-st event <strong>of</strong> a Poisson process was<br />

derived earlier; see (2.18). So JD’s waiting time has the Gamma(q + 1, kµ)<br />

distribution, that is, tail probability function (2.18) with n = q + 1 <strong>and</strong> λ = kµ.<br />

30


2.5 Poisson Process <strong>of</strong> General Rate Function<br />

So far, the constant λ was the probabilistic rate <strong>of</strong> an event occurrence in any small<br />

time interval. The forthcoming model replaces this constant by a general (nonnegative)<br />

function <strong>of</strong> time, λ(u), u ≥ 0.<br />

Definition 2.12 The counting process (N t : t ≥ 0) is called a Poisson process <strong>of</strong><br />

(with) rate function λ(t) if:<br />

1. (Independence) The counts on disjoint intervals are independent rv’s.<br />

2. (Small-interval behaviour)<br />

P(N s+h − N s = 1) = λ(s)h + o(h), P(N s+h − N s ≥ 2) = o(h), as h → 0.<br />

(2.19)<br />

The term Non-Homogeneous (or time-inhomogeneous) Poisson Process (NHPP)<br />

indicates the rate function is non-constant.<br />

In this case the distribution <strong>of</strong> N t is again Poisson, but its mean is now the mean<br />

function<br />

m(t) =<br />

∫ t<br />

0<br />

λ(u)du, (2.20)<br />

that is, the area under the graph <strong>of</strong> the rate function from 0 to t. By solving differential<br />

equations like those in the pro<strong>of</strong> <strong>of</strong> Theorem 2.6, we obtain the summary result below:<br />

Proposition 2.13 In a Poisson process <strong>of</strong> rate function λ(·), the count N e − N s has<br />

the Poisson distribution with mean ∫ e<br />

λ(u)du = m(e) − m(s). That is,<br />

s<br />

−[m(e)−m(s)] [m(e) − m(s)]n<br />

P(N e − N s = n) = e , n = 0, 1, . . . (2.21)<br />

n!<br />

Pro<strong>of</strong>. Left as Exercise 3.<br />

✷<br />

Example: Exam E03 3. Recognise it is an NHPP <strong>and</strong> identify the rate function;<br />

the solution is then st<strong>and</strong>ard, exactly the main result above for s = 0.<br />

2.5.1 Thinning a Poisson Process<br />

Suppose:<br />

1. N = (N t : t ≥ 0) is a Poisson process <strong>of</strong> rate K.<br />

2. Conditional on any event <strong>of</strong> N occurring at time t, accept the event with probability<br />

p(t) <strong>and</strong> reject it otherwise, doing so independently <strong>of</strong> anything else. (The<br />

acceptance step is simulated via pseudo-r<strong>and</strong>om numbers later.)<br />

31


Let Ñt be the number <strong>of</strong> accepted events occurring up to time t inclusive. Call<br />

(Ñt : t ≥ 0) the thinned process.<br />

Proposition 2.14 (Thinning a Poisson process.) The process (Ñt : t ≥ 0) is a<br />

Poisson process <strong>of</strong> rate function Kp(t).<br />

Sketch <strong>of</strong> pro<strong>of</strong> (assume the function p(·) is continuous, for simplicity):<br />

1. The thinned process has independent counts on disjoint intervals; this comes<br />

from the independence <strong>of</strong> counts <strong>of</strong> the original process together with the independence<br />

<strong>of</strong> the acceptance/rejection r<strong>and</strong>om variables from everything else.<br />

2. Check that the thinned process satisfies (2.19) with the rate claimed.<br />

2.5.2 Simulating an NHPP via Thinning<br />

Problem 2.15 Simulate (or sample) on a given time interval [0, T ] a non-homogeneous<br />

Poisson process <strong>of</strong> given rate function λ(t), 0 ≤ t ≤ T .<br />

If the rate function is piece-wise constant, a simple solution is: simulate a constantrate<br />

process on each interval where the rate is constant, via Method 2.9.<br />

If the rate function is not piecewise constant (examples include: linear with a<br />

nonzero slope, non-constant polynomial, trigonometric, exponential), then a simple<br />

solution is as follows.<br />

Method 2.16 (The Thinning Method.) To simulate the event times <strong>of</strong> a Poisson<br />

process <strong>of</strong> rate function λ(t) on the interval [0, T ], do:<br />

1. Calculate the rate<br />

K = max λ(t) < ∞. (2.22)<br />

0≤t≤T<br />

2. Using Method 2.9, simulate the event times <strong>of</strong> a Poisson process <strong>of</strong> constant<br />

rate K on the given interval. That is, set S 0 = 0, <strong>and</strong> for n = 1, 2, . . ., set<br />

S n = S n−1 + (−1/K) log(U n ), where U n ∼ Unif(0, 1) is independent <strong>of</strong> other<br />

U’s, stopping as soon as S n > T . The set S 1 , S 2 , . . . , S n−1 is a r<strong>and</strong>om sample<br />

<strong>of</strong> the event times <strong>of</strong> a Poisson process <strong>of</strong> rate K on the given interval.<br />

3. (Thinning step.) For i = 1, 2, . . . , n − 1, accept S i with probability λ(S i )/K<br />

(reject with the remaining probability; note that λ(S i )/K is always between<br />

0 <strong>and</strong> 1, by the choice <strong>of</strong> K). The acceptance/rejection is done by r<strong>and</strong>omly<br />

sampling V i ∼ Unif(0, 1) independently <strong>of</strong> everything else <strong>and</strong> accepting if V i ≤<br />

32


λ(S i )/K. The set <strong>of</strong> accepted S’s is a r<strong>and</strong>om sample <strong>of</strong> the event times <strong>of</strong> a<br />

Poisson process <strong>of</strong> rate function λ(t) on the given interval.<br />

2.6 Estimating a Poisson Process: Brief View<br />

Suppose we want to estimate (construct) the functions λ(t) <strong>and</strong> m(t) from given event<br />

times s 1 < s 2 < . . . < s n . Note:<br />

• m() can always be found by integrating λ() as in (2.20).<br />

• Obtaining λ(t) from m() is possible, assuming m is differentiable at t:<br />

λ(t) = d dt m(t).<br />

• A constant rate λ is equivalent to a linear mean function, m(t) = λt.<br />

As a start, consider:<br />

Step-function estimate <strong>of</strong> m(t).<br />

ˆm(t) = observed count up to t = number <strong>of</strong> s’s that are ≤ t. (2.23)<br />

That is,<br />

ˆm(s k ) = k, k = 1, 2, . . . n,<br />

<strong>and</strong> it is constant between s k−1 <strong>and</strong> s k . Unattractively, it is impossible to infer λ()<br />

by differentiation 1 .<br />

One way to get a reasonable estimate <strong>of</strong> λ() is by a slight revision <strong>of</strong> the above:<br />

Piece-wise linear estimate <strong>of</strong> m(t) (linear in-between event times).<br />

the function ˜m(t) to be as above at the event times, i.e.,<br />

Define<br />

<strong>and</strong> estimate ˜λ as the slope <strong>of</strong> ˜m:<br />

˜λ(t) = ˜m(s k) − ˜m(s k−1 )<br />

s k − s k−1<br />

=<br />

˜m(s k ) = k, k = 1, 2, . . . n,<br />

1<br />

s k − s k−1<br />

for s k−1 ≤ t < s k . (2.24)<br />

If the ˜λ(t) are close between adjacent (s k−1 , s k ] intervals, we could average them,<br />

loosing little information in the averaging, to simplify the estimate.<br />

1 ˆm(t) has derivative zero at all points other than the s’s; at the s’s it is not differentiable.<br />

33


2.7 Large-t N t<br />

t<br />

via Strong Law <strong>of</strong> Large Numbers<br />

Let N t be the count <strong>of</strong> events up to time t. Assume<br />

A1. Inter-event times are independent, identically distributed, with mean µ > 0.<br />

A2. lim t→∞ N t = ∞.<br />

What is the average number <strong>of</strong> events per unit time, N t /t, for t large ?<br />

For any t,<br />

Dividing by N t ,<br />

S Nt ≤ t < S Nt+1.<br />

S Nt<br />

≤ t < S N t+1 N t + 1<br />

. (2.25)<br />

N t N t N t + 1 N t<br />

By the SLLN (Theorem 1.22), both the right <strong>and</strong> left side <strong>of</strong> (2.25) converge to µ<br />

as t → ∞, with probability one. Hence the middle must converge to the same, with<br />

probability one (the event <strong>of</strong> convergence <strong>of</strong> the left <strong>and</strong> right, call it A, implies<br />

the event <strong>of</strong> convergence <strong>of</strong> the middle, call it B; thus 1 = P(A) ≤ P(B), proving<br />

P(B) = 1). We conclude<br />

N t<br />

lim<br />

t→∞ t<br />

= 1 µ<br />

w.p. 1. (2.26)<br />

34


2.8 Exercises<br />

1. Let N = (N t : t ≥ 0) be a Poisson process <strong>of</strong> rate λ = 0.4.<br />

i Calculate: (a) P(N 5 = 3); (b) P(N 5 = 3, N 15 = 7); (c) P(N 15 = 7|N 5 = 3);<br />

(d) P(N 5 = 3|N 15 = 7).<br />

ii Let X be the time between two successive events <strong>of</strong> this process. Specify the<br />

distribution <strong>of</strong> X fully, including its mean. Write P(X ≤ 0.5) <strong>and</strong> P(X ≤ 2)<br />

explicitly.<br />

iii State, briefly, how the results in (i) change if the process is Poisson <strong>of</strong> mean<br />

function m() with m(0) = 0 as usual.<br />

2. In Proposition 2.10, two independent Poisson processes <strong>of</strong> rates λ 1 <strong>and</strong> λ 2 are<br />

merged; Nt i is the count <strong>of</strong> process i up to time t; <strong>and</strong> N t = Nt 1 +Nt<br />

2 is the count<br />

<strong>of</strong> the merged process. Show that as h → 0, we have P(N h = 2) = o(h) <strong>and</strong><br />

P(N h = 1) = (λ 1 +λ 2 )h+o(h). Note: similar, slightly more complex calculations<br />

are seen later in (3.1) to (3.3).<br />

3. (NHPP)<br />

(a) (Distribution <strong>of</strong> N t .) For the (NHPP) N t in Definition 2.12, write p n (t) =<br />

P(N t = n), n = 0, 1, 2, . . . (put p −1 (t) = 0 for later convenience), <strong>and</strong> show<br />

that these functions must satisfy the set <strong>of</strong> differential equations<br />

d<br />

dt p n(t) = −λ(t)p n (t) + λ(t)p n−1 (t), n = 0, 1, 2, . . . . (2.27)<br />

Then verify that the functions p n (t) = e −m(t) [m(t)] n /n!, n = 0, 1, 2, . . . satisfy<br />

(2.27). Note that (d/dt)m(t) = λ(t). Note p 0 (0) = 1 <strong>and</strong> p n (0) = 0 for n > 0,<br />

effectively saying that N 0 = 0.<br />

(b) (Distribution <strong>of</strong> S n .) Working as for the constant-rate process, show that the<br />

distribution <strong>of</strong> S n for the general case above is<br />

∑n−1<br />

−m(t) [m(t)]k<br />

P(S n > t) = e , t ≥ 0<br />

k!<br />

k=0<br />

so the distribution <strong>of</strong> the time <strong>of</strong> the 1st event, S 1 , is<br />

4. (Simulating Poisson processes.)<br />

P(S 1 > t) = e −m(t) , t ≥ 0. (2.28)<br />

(a) Simulate the event times <strong>of</strong> a Poisson process <strong>of</strong> rate λ = 2 during [0, T = 2]<br />

based on assumed pseudo-r<strong>and</strong>om numbers {.8187, .2466, .5488, .3679, .4066}.<br />

(b) Simulate the event times <strong>of</strong> the NHPP <strong>of</strong> rate function λ(t) = 1 + sin(πt),<br />

0 ≤ t ≤ 2, where π = . 3.14159, by thinning the process sampled previously.<br />

For the acceptance test, assume pseudo-r<strong>and</strong>om numbers 0.7, 0.4, 0.2, 0.6,<br />

adding your own if needed.<br />

(c) It can be shown that a r<strong>and</strong>om variable with cdf F (x) can be simulated<br />

by solving for x the equation F (x) = U, where U ∼ Unif(0, 1) (a pseudor<strong>and</strong>om<br />

number). Using this property together with (2.28), show how the<br />

time <strong>of</strong> the 1st event <strong>of</strong> the NHPP in (b) may be simulated. You may use<br />

that ∫ t<br />

sin(πu)du = (1/π)[1 − cos(πt)].<br />

0<br />

35


.<br />

36


2.9 Solutions to Exercises<br />

1. For s ≤ e, N e − N s has the Poisson distribution with mean λ(e − s) the rate λ<br />

times the length <strong>of</strong> the interval, e − s. Recall the Poisson distribution is given in<br />

(2.2).<br />

i (a) Recall the assumption N 0 = 0, so N 5 = N 5 − N 0 ; here e − s = 5, so<br />

N 5 ∼ Poisson(2) <strong>and</strong> thus P(N 5 = 3) = e −2 2 3 /3!.<br />

(b) In calculating this joint probability, note that N 5 <strong>and</strong> N 15 refer to overlapping<br />

time intervals, so they are not independent. The “trick” is to write N 15 =<br />

N 5 + N 15 − N 5 <strong>and</strong> use that N 15 − N 5 is independent <strong>of</strong> N 5 <strong>and</strong> Poisson(4)-<br />

distributed (e − s = 15 − 5 = 10, times rate, 0.4).Thus<br />

(c)<br />

P(N 5 = 3, N 15 = 7) = P(N 5 = 3, N 15 − N 5 = 7 − 3)<br />

= P(N 5 = 3)P(N 15 − N 5 = 4)<br />

= e<br />

−2 23<br />

3!<br />

e−4<br />

44<br />

4! .<br />

P(N 15 = 7|N 5 = 3) = P(N 15 = 7, N 5 = 3)<br />

P(N 5 = 3)<br />

= P(N 15 − N 5 = 4) by (b), the P(N 5 = 3) cancels out<br />

(d)<br />

−4 44<br />

= e<br />

4! .<br />

P(N 5 = 3|N 15 = 7) = P(N 5 = 3, N 15 = 7)<br />

P(N 15 = 7)<br />

=<br />

23 44<br />

e−2 e−4<br />

3! 4!<br />

.<br />

e −6 6 7<br />

7!<br />

ii We know X ∼ Expon(0.4), whose mean is 1/0.4 (seen elsewhere). Then<br />

P(X ≤ 1 2 ) = 1 − e−0.4· 1<br />

2 = 1 − e −0.2 .<br />

P(X ≤ 2) = 1 − e −0.4·2 = 1 − e −0.8 .<br />

iii In the Poisson process with mean function m(), N e − N s has the Poisson distribution<br />

with mean m(e) − m(s), <strong>and</strong> such counts on disjoint intervals are<br />

independent, as with the constant-rate process. Thus, we only modify the Poisson<br />

means; the mean <strong>of</strong> N 5 is m(5) − m(0) = m(5) <strong>and</strong> the mean <strong>of</strong> N 15 − N 5 is<br />

m(15) − m(5).<br />

2. The key idea is to express the events <strong>of</strong> interest as (note superscripts are not<br />

powers):<br />

{N h = 2} = {N 1 h = 2, N 2 h = 0} ∪ {N 1 h = 1, N 2 h = 1} ∪ {N 1 h = 0, N 2 h = 2}<br />

{N h = 1} = {N 1 h = 1, N 2 h = 0} ∪ {N 1 h = 0, N 2 h = 1} (2.29)<br />

37


where the events on the right are disjoint. Taking probabilities,<br />

P(N h = 2) = P(Nh 1 = 2, Nh 2 = 0) + P(Nh 1 = 1, Nh 2 = 1) + P(Nh 1 = 0, Nh 2 = 2)<br />

= P(Nh 1 = 2)P(Nh 2 = 0) + P(Nh 2 = 1)P(Nh 2 = 1) + P(Nh 1 = 0)P(Nh 2 = 2)<br />

by independence.<br />

The first term is ≤ P(N 1 h = 2) = o(h); likewise, the third term is ≤ P(N 2 h = 2) =<br />

o(h). The middle term is<br />

[λ 1 h + o(h)][λ 2 h + o(h)] = o(h)<br />

(terms λ i ho(h) <strong>and</strong> λ 1 λ 2 h 2 are o(h)), as required. Similarly,<br />

P(N h = 1) = P(Nh 1 = 1, Nh 2 = 0) + P(Nh 1 = 0, Nh 2 = 1)<br />

= P(Nh 1 = 1)P(Nh 2 = 0) + P(Nh 1 = 0)P(Nh 2 = 1) by independence<br />

= [λ 1 h + o(h)][1 − λ 2 h + o(h)] + [1 − λ 1 h + o(h)][λ 2 h + o(h)]<br />

= λ 1 h + λ 2 h + o(h)<br />

where, again, various original terms are combined into the o(h).<br />

3.(a) The count from t to t + h, denoted D t,h = N t+h − N t , is independent <strong>of</strong> N t by<br />

assumption. The essential difference relative to the homogeneous case is that<br />

the distribution <strong>of</strong> D t,h depends on both t <strong>and</strong> h, whereas that <strong>of</strong> D h there<br />

depended only on h. The derivation mimics closely the steps from (2.12) to<br />

(2.14), “correcting” for the difference above.<br />

{N t+h = n} = {N t = n, D t,h = 0} or {N t = n−1, D t,h = 1} or {D t,h ≥ 2, N t = n−D t,h }<br />

where the events on the right are disjoint, so the left probability is the sum <strong>of</strong><br />

the events’ probabilities on the right. Work these out:<br />

P(D t,h ≥ 2, N t = n − D t,h ) ≤ P(D t,h ≥ 2) = o(h) by (2.19),<br />

P(N t = n, D t,h = 0) = p n (t)[1 − λ(t)h + o(h)] by independence <strong>and</strong> (2.19),<br />

<strong>and</strong><br />

P(N t = n − 1, D t,h = 1) = p n−1 (t)[λ(t)h + o(h)] by independence <strong>and</strong> (2.19).<br />

Putting these together,<br />

p n (t + h) = p n (t)[1 − λ(t)h + o(h)] + p n−1 (t)[λ(t)h + o(h)] + o(h).<br />

Exactly as in the constant-rate case (move p n (t) from right to left, divide by h,<br />

take limits as h → 0), we get (2.27). To verify, differentiate the given functions<br />

p n (t) = e −m(t) [m(t)] n /n!:<br />

d<br />

dt<br />

(e<br />

as required.<br />

−m(t) [m(t)]n<br />

n!<br />

)<br />

=<br />

( d<br />

dt e−m(t) ) [m(t)]<br />

n<br />

n!<br />

( ) d [m(t)]<br />

= −e −m(t) n<br />

dt m(t) n!<br />

= −λ(t)p n (t) + λ(t)p n−1 (t),<br />

38<br />

+ e −m(t) d [m(t)] n<br />

dt n!<br />

−m(t) n[m(t)]n−1<br />

+ e<br />

n!<br />

d<br />

dt m(t)


(b)<br />

∑n−1<br />

∑n−1<br />

−m(t) [m(t)]k<br />

P(S n > t) = P(N t < n) = P(N t = k) = e .<br />

k!<br />

k=0<br />

k=0<br />

4.(a) The event times are simulated iteratively as S 0 = 0, S i = S i−1 + X i for i =<br />

1, 2, . . ., stopping as soon as S i > T = 2, where the inter-event times X i are<br />

simulated by the formula X i = (−1/λ) log(U i ), the U i being Unif(0, 1) r<strong>and</strong>om<br />

numbers (Method 2.16). We obtain:<br />

i U i X i = −(1/λ) log(U i ) S i = S i−1 + X i<br />

1 .8187 0.1 0.1<br />

2 .2466 0.7 0.8<br />

3 .5488 0.3 1.1<br />

4 .3679 0.5 1.6<br />

5 .4066 0.45 2.05<br />

(b) Following Method 2.16, compute the maximum: K = max 0≤t≤2 [1 + sin(πt)] = 2.<br />

The event times in (a) may be used as the times to be thinned because they were<br />

sampled with the appropriate rate, 2. Denoting by V i the given pseudo-r<strong>and</strong>om<br />

numbers, calculate:<br />

S i λ(S i )/K V i Is V i ≤ λ(S i )/K?<br />

0.1 0.65 0.7 No<br />

0.8 0.79 0.4 Yes<br />

1.1 0.34 0.2 Yes<br />

1.6 0.02 0.6 No<br />

Thus, the simulated process has events at times 0.8 <strong>and</strong> 1.1 only.<br />

(c) The mean function is<br />

m(t) =<br />

∫ t<br />

0<br />

(1 + sin(πu))du = t + (1/π)[1 − cos(πt)].<br />

Aim for the point at which the cdf <strong>of</strong> S 1 equals U:<br />

F (t) = 1 − e −m(t) = U ⇔ m(t) = − log(1 − U).<br />

The explicit point t is not pursued here.<br />

39


.<br />

40


Chapter 3<br />

<strong>Queues</strong><br />

3.1 Preliminaries<br />

Vectors are column vectors. The transpose <strong>of</strong> a vector p is denoted p T .<br />

We <strong>of</strong>ten take limits as the length h <strong>of</strong> a time interval goes to zero; we write this<br />

as “lim h→0 ”, or even as “lim” when the condition is obvious.<br />

A probability distribution with support a set E is called in short a distribution on<br />

E.<br />

3.1.1 <strong>Queues</strong>: Terminology <strong>and</strong> Global Assumptions<br />

Jobs (customers) arrive at a system at which a number <strong>of</strong> servers are stationed. A<br />

job may have to wait in a waiting area (queue) until a server becomes available. After<br />

being served in a service area, jobs leave. The system includes both these areas.<br />

1. The times between arrivals are independent identically distributed (iid) r<strong>and</strong>om<br />

variables.<br />

2. The service times are r<strong>and</strong>om variables <strong>and</strong> independent <strong>of</strong> the inter-arrival times.<br />

3. No idling. Server(s) will not idle if jobs are waiting.<br />

4. M/M/c/k means Memoryless inter-arrival times, Memoryless service times, c<br />

Servers, <strong>and</strong> a maximum <strong>of</strong> k jobs waiting in queue, where k = ∞ if not specified.<br />

“Memoryless” means that these times are exponentially distributed.<br />

5. The inter-arrival times <strong>and</strong> the service times have means that are positive <strong>and</strong><br />

finite.<br />

41


3.2 The Birth-Death Process<br />

Denote X t the number <strong>of</strong> jobs in the system at time t, also called the state. We are<br />

interested in the (stochastic) process X = (X t : t ≥ 0). If there is a maximum allowed<br />

number in the system, say k, then X t takes values in the finite set E = {0, 1, 2, . . . , k},<br />

0 being the lowest state, reflecting an empty system. If there is no such maximum,<br />

then X t takes values in the infinite set E = {0, 1, 2, . . .}. A birth-death process arises<br />

as follows:<br />

1. Whenever X t = i, events whose effect is to increase X by one (“births”) occur<br />

according to a Poisson process B = (B t ) <strong>of</strong> (birth) rate λ i (the time T B until the<br />

next birth is an Expon(λ i ) r<strong>and</strong>om variable).<br />

2. In addition, again given X t = i, events whose effect is to decrease X by one<br />

(“deaths”) occur according to a Poisson process D = (D t ) <strong>of</strong> (death) rate µ i (the<br />

time T D until the next death is an Expon(µ i ) r<strong>and</strong>om variable).<br />

3. These Poisson processes are independent.<br />

The process X moves as follows. The next state <strong>of</strong> X, as well as the timing <strong>of</strong> the<br />

state change, are the result <strong>of</strong> the competition between the birth <strong>and</strong> death processes.<br />

This means that: (a) Starting from state i, the time when the process X first changes<br />

state, T = min(T B , T D ), has an Expon(λ i + µ i ) distribution (as seen in merging); <strong>and</strong><br />

(b) The next state is i + 1 with probability λ i /(λ i + µ i ) (a birth happens before a<br />

death), or i − 1 with the remaining probability (a death happens before a birth).<br />

For each i, we now derive conditional (transition) probabilities <strong>of</strong> the future state<br />

<strong>of</strong> the process given X t = i, h time units in the future. Put:<br />

• B = number <strong>of</strong> births (occurring) in (t, t + h], <strong>and</strong> D = number <strong>of</strong> deaths.<br />

• N := B + D is the number <strong>of</strong> events (births plus deaths) in (t, t + h].<br />

Then,<br />

P(X t+h = i + 1|X t = i) = P(B − D = 1|X t = i)<br />

= P(B − D = 1, N = 1|X t = i) + P(B − D = 1, N ≥ 2|X t = i)<br />

} {{ }<br />

o(h)<br />

= P(B = 1, D = 0|X t = i) + o(h)<br />

= P(B = 1|X t = i)P(D = 0|X t = i) + o(h)<br />

= [λ i h + o(h)][1 − µ i h + o(h)] + o(h) = λ i h + o(h), (3.1)<br />

42


P(X t+h = i − 1|X t = i)<br />

= P(B − D = −1|X t = i)<br />

= P(B − D = −1, N = 1|X t = i) + P(B − D = −1, N ≥ 2|X t = i)<br />

} {{ }<br />

o(h)<br />

= P(B = 0, D = 1|X t = i) + o(h)<br />

= P(B = 0|X t = i)P(D = 1|X t = i) + o(h)<br />

= [1 − λ i h + o(h)][µ i h + o(h)] + o(h) = µ i h + o(h), (3.2)<br />

<strong>and</strong> X changes by 2 or more with probability that is negligible as h → 0:<br />

P(X t+h = j|X t = i) ≤ P(N ≥ 2|X t = i) = o(h) for |j − i| ≥ 2. (3.3)<br />

Consequently, X remains at the same state with the probability that remains, i.e.,<br />

P(X t+h = i|X t = i) = 1 − (λ i + µ i )h + o(h).<br />

The distribution <strong>of</strong> X t can be expressed via differential equations obtained similarly<br />

to those for the Poisson process, (2.14). In these equations, to be seen later for a more<br />

general process, the following limits appear:<br />

⎧<br />

P(X t+h = j|X t = i)<br />

⎨<br />

q i,j := lim<br />

=<br />

h→0 h<br />

⎩<br />

λ i j = i + 1<br />

µ i j = i − 1<br />

0 otherwise (j /∈ {i − 1, i + 1})<br />

⎫<br />

⎬<br />

⎭ , j ≠ i.<br />

(3.4)<br />

The q i,j , sometimes called transition rates or probability rates, are probabilities <strong>of</strong><br />

transitions (state changes) relative to time. Note that if a i → j transition requires<br />

at least 2 events in order to happen, then q i,j = 0. On the other h<strong>and</strong>, if i → j is<br />

caused by a single event, then q i,j is the rate <strong>of</strong> the underlying process.<br />

The transition rates are sometimes summarised in a transition diagram, as in<br />

Figure 3.1.<br />

λ 0✲ λ 1<br />

0 1<br />

✲<br />

λ n−1<br />

✛ 2 n − 1 ✲ λ<br />

n n ✲ n + 1<br />

µ2<br />

✛<br />

µ1<br />

✛<br />

µn<br />

✛<br />

µn+1<br />

<br />

Figure 3.1: Transition diagram <strong>of</strong> a birth-death process.<br />

43


3.3 Continuous-Time Markov Chains (CTMCs)<br />

Definition 3.1 The stochastic process (X t : t ≥ 0) is called a continuous-time<br />

Markov chain (CTMC) with state space (set <strong>of</strong> possible states) E if for all i, j, i 1 , . . . , i k ∈<br />

E, all t ≥ 0, <strong>and</strong> all p 1 , p 2 , . . . p k ≤ s,<br />

P(X s+t = j|X p1 = i 1 , . . . , X pk = i k , X s = i) = P(X s+t = j|X s = i). (3.5)<br />

If the right side <strong>of</strong> (3.5) does not depend on s, then the CTMC is called homogeneous.<br />

In words, given the process’ history up to time s, the conditional distribution <strong>of</strong> X<br />

at future time points is the same as the conditional distribution given only its present<br />

value, X s . Yet in other words, we can say that the X process is memoryless, because<br />

its future behaviour depends on its history only through the present; the strict past<br />

is irrelevant. We are mainly interested in homogeneous CTMCs, where<br />

is a function <strong>of</strong> i, j <strong>and</strong> t.<br />

3.3.1 Generator<br />

We assume there exist q i,j such that<br />

p i,j (t) := P(X s+t = j|X s = i)<br />

p i,j (h)<br />

q i,j := lim , i, j ∈ E, i ≠ j.<br />

h→0 h<br />

That is, q i,j is a “probability rate” <strong>of</strong> going from i to j. The birth-death process seen<br />

earlier is a special case <strong>of</strong> a homogeneous CTMC with q i,j being the λ i for j = i + 1,<br />

the µ i for j = i − 1, <strong>and</strong> 0 otherwise.<br />

We also define<br />

P(X t+h ≠ X t |X t = i) 1 − p i,i (h)<br />

q i = lim<br />

= lim<br />

h→0<br />

∑ h<br />

h→0 h<br />

j∈E,j≠i<br />

= lim<br />

p i,j(h)<br />

= ∑ ( )<br />

p i,j (h)<br />

lim = ∑<br />

h→0 h<br />

h→0 h<br />

j∈E,j≠i<br />

j∈E,j≠i<br />

where we will always assume the interchange between limit <strong>and</strong> summation is valid<br />

(it could fail, but only if E is infinite <strong>and</strong> under further unusual conditions). (We<br />

used that ∑ j∈E p i,j(t) = 1 for all i <strong>and</strong> t, a consequence <strong>of</strong> the fact that X t must take<br />

a value in E.) That is, q i is a “probability rate” <strong>of</strong> leaving i (to go anywhere else).<br />

We summarise these in the matrix<br />

q i,j<br />

Q = [q i,j ] i,j,∈E<br />

44


where we put q i,i = −q i . Q is called the (CTMC) generator <strong>and</strong> is central to later<br />

calculations. It can be calculated without taking limits, after a little experience.<br />

Example 3.2 A repair shop has two workers named fast (F) <strong>and</strong> slow (S). Jobs arrive<br />

according to a Poisson process <strong>of</strong> rate 4. Service times are exponentially distributed<br />

with mean 1/3 <strong>and</strong> 1/2 at F <strong>and</strong> S respectively. Whenever both F <strong>and</strong> S are available<br />

to process a job, preference is given to F. Arrivals that find a job waiting are lost.<br />

Focus on the system state over time, with the following possible states:<br />

Numeric Code State Description<br />

1 empty<br />

2 F working, S idle<br />

3 S working, F idle<br />

4 both servers working, 0 jobs in queue<br />

5 both servers working, 1 job in queue<br />

A detailed derivation <strong>of</strong> the generator would resemble that done for the birth-death<br />

process in (3.1) to (3.3). Here, a 5 → 4 transition is caused by a departure at F<br />

or S, i.e., an event <strong>of</strong> the process that merges these two event types, whose rate is<br />

3 + 2 = 5 (merging result). Thus, similar to (3.2), p 5,4 (h) = 5h + o(h) as h → 0,<br />

so q 5,4 = lim h→0 p 5,4 (h)/h = 5. Consider now q 5,3 . Similar to (3.3), p 5,3 (h) = o(h)<br />

because 5 → 3 requires at least two departures, so q 5,3 = lim h→0 p 5,3 (h)/h = 0. And<br />

so on for other i, j.<br />

Generally, a systematic calculation <strong>of</strong> the generator can go as follows. For each<br />

type <strong>of</strong> event (Poisson process), write the corresponding rate, <strong>and</strong> list all transitions<br />

caused by a single such event in the form “from i to j”, for specified i <strong>and</strong> j. Here<br />

this gives:<br />

Causing Event Rate List <strong>of</strong> State Changes<br />

Arrival 4 from 1 to 2, from 2 to 4, from 3 to 4, from 4 to 5<br />

Departure at F 3 from 2 to 1, from 4 to 3, from 5 to 4<br />

Departure at S 2 from 3 to 1, from 4 to 2, from 5 to 4<br />

Then, for each i <strong>and</strong> j ≠ i, locate all i → j transitions listed <strong>and</strong> find q i,j as the sum<br />

<strong>of</strong> the corresponding rates. Here this gives (empty matrix entries indicate zeros):<br />

⎡<br />

⎤<br />

−4 4<br />

3 −7 4<br />

Q =<br />

⎢ 2 −6 4<br />

⎥<br />

⎣ 2 3 −9 4 ⎦<br />

3 + 2 −5<br />

45


3.3.2 Time-dependent Distribution <strong>and</strong> Kolmogorov’s Differential<br />

System<br />

Assume the process state is known at time 0, e.g., X 0 = 5. Put p i (t) = P(X t = i).<br />

We focus on the vector p(t) = (p i (t)) i∈E , which is the distribution at time t <strong>of</strong> X t .<br />

The whole distribution is related to that at previous times via<br />

p i (t + h) = P(X t+h = i)<br />

= ∑ k∈E<br />

P(X t+h = i, X t = k)<br />

= ∑ k∈E<br />

P(X t = k)P(X t+h = i|X t = k)<br />

= ∑ k∈E<br />

p k (t)p k,i (h)<br />

= p i (t)p i,i (h) + ∑<br />

k∈E,k≠i<br />

p k (t)p k,i (h) t, h ≥ 0<br />

(Theorem 1.10, Law <strong>of</strong> Total Probability). Subtract p i (t) from both left <strong>and</strong> right,<br />

<strong>and</strong> divide by h:<br />

p i (t + h) − p i (t)<br />

h<br />

= −p i (t) 1 − p i,i(h)<br />

h<br />

+ ∑<br />

k∈E,k≠i<br />

p k (t) p k,i(h)<br />

.<br />

h<br />

Take limits as h → 0. In the rightmost term, we assume the limit can pass inside the<br />

sum (which is the case if E is finite), i.e.,<br />

lim<br />

∑<br />

h→0<br />

k∈E,k≠i<br />

p k (t) p k,i(h)<br />

h<br />

= ∑<br />

k∈E,k≠i<br />

p k (t)<br />

<strong>and</strong> arrive at Kolmogorov’s differential system:<br />

where p ′ i(t) = d dt p i(t).<br />

p ′ i(t) = −p i (t)q i +<br />

∑<br />

k∈E,k≠i<br />

(<br />

lim<br />

h→0<br />

)<br />

p k,i (h)<br />

= ∑<br />

p k (t)q k,i<br />

h<br />

k∈E,k≠i<br />

p k (t)q k,i , i ∈ E. (3.6)<br />

Supposing for example X 0 = 5, we would have the initial condition p 5 (0) = 1 <strong>and</strong><br />

p j (0) = 0 for j ≠ 5, <strong>and</strong> p(t) should satisfy (3.6) together with the initial condition.<br />

46


3.4 Long-Run Behaviour<br />

With X t being a count <strong>of</strong> jobs in the system at time t, or something similar (repairshop<br />

example), <strong>and</strong> all events causing changes to X “being” independent Poisson<br />

processes (inter-event times having exponential distributions), (X t : t ≥ 0) will be a<br />

CTMC.<br />

The theory <strong>of</strong> (Homogeneous) CTMCs is rich, especially for time going to ∞. It<br />

is built upon a corresponding theory for discrete-time Markov chains (t integer). The<br />

theory is technical <strong>and</strong> will not be seen in depth. A key summary result, Fact 3.3<br />

below, is our basic supporting theory.<br />

1 {A} is the indicator function, valued 1 on the set (event) A <strong>and</strong> 0 elsewhere. Thus,<br />

for example, 1 {Xu=i} indicates if the process X at time u is at state i. Put<br />

T i<br />

t = time from 0 to t that X is in state i =<br />

∫ t<br />

0<br />

1 {Xu=i}du. (3.7)<br />

Fact 3.3 The CTMC (X t : t ≥ 0) corresponding to “interesting” <strong>and</strong> “stable” queues<br />

converges, regardless <strong>of</strong> the initial state X 0 , as follows.<br />

(a)<br />

lim p i(t) = π i ,<br />

t→∞<br />

lim<br />

t→∞<br />

p ′ i(t) = 0, i ∈ E. (3.8)<br />

(b) Putting (3.8) into (3.6), we have the balance equations<br />

π i q i =<br />

∑<br />

k∈E:k≠i<br />

π k q k,i , i ∈ E (3.9)<br />

(π T Q = 0). If the equation set (3.9) together with the normalising equation<br />

∑<br />

π i = 1 (3.10)<br />

i∈E<br />

has a solution (π i ) i∈E , then the solution is unique <strong>and</strong> π i > 0 for all i. This is<br />

called the stationary (or steady-state or limit) distribution <strong>of</strong> the CTMC.<br />

(c) If a stationary distribution (π i ) i∈E exists, then<br />

lim<br />

t→∞<br />

<strong>and</strong> as a consequence <strong>of</strong> π i > 0 we have<br />

T i<br />

t<br />

t = π i w.p. 1 for all i ∈ E (3.11)<br />

lim T t i = ∞, i ∈ E. (3.12)<br />

t→∞<br />

47


Result (3.11) is basic to calculations below. For this reason, we want at least some<br />

idea why it holds. We argue why the corresponding mean converges to π i :<br />

[∫ t<br />

] ∫ t<br />

ETt i = E 1 {Xu=i}du = E[1 {Xu=i}]du (interchange <strong>of</strong> E[] <strong>and</strong> ∫ assumed valid)<br />

Then<br />

0<br />

=<br />

0<br />

∫ t<br />

0<br />

p i (u)du.<br />

[ ∫ ]<br />

t<br />

E 1 0 {X u=i}du<br />

lim<br />

t→∞ t<br />

∫ t<br />

= lim<br />

p 0 i(u)du<br />

t→∞ t<br />

= π i ,<br />

the last step following from lim u→∞ p i (u) = π i , which is said in (3.8).<br />

As π i is the (long-run) fraction <strong>of</strong> time spent at state i, the balance equation (3.9)<br />

for state i says<br />

(fraction <strong>of</strong> time at i) × q i =<br />

∑<br />

k∈E,k≠i<br />

(fraction <strong>of</strong> time at k) × q k,i<br />

the left side being a “rate out <strong>of</strong> i” <strong>and</strong> the right side being a “rate into i”.<br />

48


3.4.1 Long-Run Average Cost<br />

Pro<strong>of</strong>s here are non-examinable. (2.26) will be used heavily, for arrival events. 1<br />

Cost a Deterministic Function <strong>of</strong> System State<br />

Suppose that whenever X u = i, we incur a cost f(i) per unit time, where f is a given<br />

function (while thinking “cost” may help, f could represent anything, e.g., a gain).<br />

Then the cost per unit time, up to time t, is<br />

∫ t<br />

f(X 0 u)du<br />

= ∑ f(i)<br />

t<br />

i∈E<br />

Thus the long-run cost per unit time is<br />

∫<br />

∑ t<br />

lim f(i)<br />

1 0 {X u=i}du<br />

= ∑ (<br />

f(i)<br />

t→∞<br />

t<br />

i∈E<br />

i∈E<br />

lim<br />

t→∞<br />

∫ t<br />

∫ t<br />

1 0 {X u=i}du<br />

,<br />

t<br />

0 1 {X u=i}du<br />

t<br />

)<br />

= ∑ i∈E<br />

f(i)π i w.p. 1<br />

(3.13)<br />

by (3.11) in the last step, <strong>and</strong> by assuming that the limit can pass inside the sum,<br />

which is the case if E is finite. This is abbreviated as “average <strong>of</strong> f(X)”. Note this<br />

is E[f(X)] for the r<strong>and</strong>om variable X whose distribution is (π i ) i∈E .<br />

We now apply this.<br />

Example 3.2 (continued) Find the following long-run quantities: (a) the average<br />

number <strong>of</strong> jobs in the system; (b) the average number <strong>of</strong> busy servers; <strong>and</strong> (c) the<br />

fraction <strong>of</strong> time that both servers are busy.<br />

To identify the stationary distribution, solve ∑ 5<br />

i=1 π i = 1 together with (3.9), i.e.<br />

(the generator was given previously):<br />

State Rate Out = Rate In<br />

1 4π 1 = 2π 3 + 3π 2<br />

2 7π 2 = 4π 1 + 2π 4<br />

3 6π 3 = 3π 4<br />

4 9π 4 = 4π 2 + 4π 3 + 5π 5<br />

5 5π 5 = 4π 4<br />

Obtaining the solution is then straightforward. 2<br />

(a) Following the cost result (3.13), the average number <strong>of</strong> jobs in the system is<br />

1 · (π 2 + π 3 ) + 2π 4 + 3π 5 (f(1) = 0; f(2) = f(3) = 1; f(4) = 2; f(5) = 3).<br />

(b) 1 · (π 2 + π 3 ) + 2(π 4 + π 5 ) (f(1) = 0; f(2) = f(3) = 1; f(4) = f(5) = 2).<br />

(c) π 4 +π 5 (f is an indicator function, valued 1 at states 4 <strong>and</strong> 5, <strong>and</strong> 0 elsewhere).<br />

1 Assume all inter-arrival times are finite; then the arrival times S n are finite for all n; then the<br />

number <strong>of</strong> events up to t, N t = ∑ ∞<br />

n=1 1 {S n≤t}, satisfies lim t→∞ N t = ∞, verifying the A2 there.<br />

2 The solution is the vector (65, 60, 40, 80, 64)/309.<br />

49


Cost Events<br />

We consider here costs that arise as counts <strong>of</strong> certain cost events. In the first model,<br />

while the system state is i, cost events are a Poisson process with rate c i . Put<br />

C i t = number <strong>of</strong> cost events that occur while the state is i, up to time t (3.14)<br />

so C t = ∑ i∈E Ci t is the number <strong>of</strong> cost events up to time t. Write<br />

C t<br />

t = ∑ i∈E<br />

<strong>and</strong> recall T i<br />

t is the time spent at i, see (3.7). The fractions on the right converge:<br />

1. T i<br />

t /t converges (w.p. 1) to the state-i stationary probability, π i , by (3.11).<br />

2. C i t/T i<br />

t converges (w.p. 1) to the underlying rate, c i , by (2.26) (the assumption there<br />

that time goes to ∞ is checked by (3.12)).<br />

C i t<br />

T i<br />

t<br />

T i<br />

t<br />

t<br />

Thus<br />

C t<br />

lim<br />

t→∞ t = ∑ i∈E<br />

Ct<br />

i lim<br />

t→∞ Tt<br />

i<br />

lim<br />

t→∞<br />

Tt<br />

i<br />

t = ∑ c i π i (3.15)<br />

i∈E<br />

where the “w.p. 1” will be dropped for convenience.<br />

In the second model, assume a cost event is triggered for each arrival if <strong>and</strong> only<br />

if it finds the system (X t ) in one <strong>of</strong> the states in a set A (for example, A could have<br />

a single “system is full” state). Let C t be the number <strong>of</strong> cost events up to t; then the<br />

average number <strong>of</strong> cost events per unit time is<br />

where<br />

C t<br />

t = ∑ Nt<br />

i N t<br />

N t t ,<br />

i∈A<br />

N i t = number <strong>of</strong> arrivals that find the system in state i, up to time t (3.16)<br />

<strong>and</strong> N t is the number <strong>of</strong> arrivals up to time t. As t → ∞:<br />

1. N t /t converges (w.p. 1) to the arrival rate, call it λ, again by (2.26).<br />

2. We will later see a Theorem called PASTA (Poisson Arrivals see Time Averages)<br />

that ensures that N i t<br />

N t<br />

converges (w.p. 1) to the state-i stationary probability, π i .<br />

Thus<br />

C t<br />

lim<br />

t→∞ t = ∑ i∈A<br />

Nt<br />

i lim<br />

t→∞ N t<br />

N t<br />

lim<br />

t→∞ t<br />

= ∑ i∈A<br />

π i λ. (3.17)<br />

50


Example 3.4 Jobs (customers) arrive to a system according to a Poisson process <strong>of</strong><br />

rate λ = 5. There are two servers, <strong>and</strong> service times at either server are exponentially<br />

distributed with mean 1/µ = 1/3 (rate µ = 3), <strong>and</strong> independent <strong>of</strong> everything else.<br />

At most two jobs may wait for service, so a job that arrives to find 2 jobs waiting<br />

balks, meaning it does not join the queue. Moreover, each job i ab<strong>and</strong>ons, meaning it<br />

leaves without being served, as soon as its waiting time in queue is equal to Y i , where<br />

the Y i are exponentially distributed with mean 1/η = 1/2 (rate η = 2), independently<br />

<strong>of</strong> everything else.<br />

(The system state, X t , defined as the number <strong>of</strong> jobs in the system, thus takes<br />

values in E = {0, 1, 2, 3, 4} <strong>and</strong> the independent exponential distributions imply that<br />

(X t : t ≥ 0) is a CTMC.) We want to calculate:<br />

(a) The long-run average number <strong>of</strong> ab<strong>and</strong>ons per unit time.<br />

(b) The long-run average number <strong>of</strong> balks per unit time.<br />

First, calculate the generator.<br />

Single events that cause a state change are the<br />

arrivals, departures <strong>of</strong> served customers, <strong>and</strong> (what is new here) ab<strong>and</strong>ons, which act<br />

the same way as departures, decreasing the state by one. The balk events do not<br />

change the state.<br />

The rate <strong>of</strong> ab<strong>and</strong>ons depends on the state: when in a state with k jobs in queue<br />

(importantly the state determines k: k = 1 in state 3, k = 2 in state 4, k = 0 in<br />

other states), <strong>and</strong> since η is given as the rate at which an individual job ab<strong>and</strong>ons,<br />

the ab<strong>and</strong>on rate is kη (merging property). Summing the rates <strong>of</strong> the events causing<br />

the same state transition (as usual), we find the generator is<br />

⎡<br />

⎤<br />

λ<br />

µ λ<br />

Q =<br />

⎢ 2µ λ<br />

⎥<br />

⎣ 2µ + η λ ⎦<br />

2µ + 2η<br />

(the main diagonal is implicit, <strong>and</strong> other entries are zero).<br />

This is a birth-death<br />

process (i.e., we have zeros everywhere except on the diagonals above <strong>and</strong> below the<br />

main diagonal). In Section 3.4.2 below, we derive the stationary distribution <strong>of</strong> a<br />

general birth-death process (with infinite state space); this essentially also shows how<br />

the finite-state space solution can be obtained. So we do not pursue a solution here.<br />

Denote the stationary distribution (π i ) 4 i=0.<br />

(a) Following the first cost result, (3.15), the long-run average number <strong>of</strong> ab<strong>and</strong>ons<br />

per unit time is ηπ 3 + 2ηπ 4 (cost rates c 3 = η, c 4 = 2η, <strong>and</strong> 0 at other states).<br />

51


(b) Following the second cost result, (3.17), the long-run average number <strong>of</strong> balks<br />

per unit time is π 4 λ (A = {4}).<br />

♣ Study the following. E03 2, Two Stations in T<strong>and</strong>em. Involves thinning <strong>and</strong><br />

thereby a relatively tricky calculation <strong>of</strong> the generator.<br />

52


3.4.2 Explicit Stationary Distributions for Specific Models<br />

Birth-Death Process with Infinite State Space<br />

The generator is in (3.4), so the balance equations become<br />

π 0 λ 0 = π 1 µ 1<br />

π i (λ i + µ i ) = π i−1 λ i−1 + π i+1 µ i+1 , i = 1, 2, . . . .<br />

}<br />

(3.18)<br />

Solving iteratively gives<br />

π i λ i = π i+1 µ i+1 , i = 0, 1, . . . (3.19)<br />

i.e., π i+1 = π i λ i /µ i+1 , from which we obtain (proceeding down to state zero)<br />

π k = π k−1<br />

λ k−1<br />

µ k<br />

= π k−2<br />

λ k−2<br />

µ k−1<br />

λ k−1<br />

µ k<br />

= . . . = π 0<br />

λ 0 λ 1 . . . λ k−1<br />

µ 1 µ 2 . . . µ k<br />

, k = 1, 2, . . . . (3.20)<br />

Normalise, i.e., require (3.10) (substitute the π’s in terms <strong>of</strong> π 0 ):<br />

∞∑<br />

(<br />

π i = π 0 1 + λ 0<br />

+ λ ) (<br />

)<br />

0λ 1<br />

∞∑ λ 0 λ 1 . . . λ k−1<br />

+ . . . = π 0 1 +<br />

= 1. (3.21)<br />

µ 1 µ 1 µ 2 µ 1 µ 2 . . . µ k<br />

i=0<br />

We see that a unique solution (stationary distribution) exists if <strong>and</strong> only if<br />

1 +<br />

∞∑<br />

k=1<br />

k=1<br />

It is then given by (3.20), with π 0 determined from (3.21).<br />

λ 0 λ 1 . . . λ k−1<br />

µ 1 µ 2 . . . µ k<br />

< ∞. (3.22)<br />

53


Application 3.5 Our st<strong>and</strong>ard notation for the M/M/c model will be an arrival rate<br />

denoted λ (i.e., inter-arrival times exponentially distributed with mean 1/λ) <strong>and</strong> an<br />

individual-server service rate denoted µ (i.e., service times exponentially distributed<br />

with mean 1/µ). With X t the number <strong>of</strong> jobs in the system at time t, X = (X t : t ≥ 0)<br />

is a birth-death process with state space {0, 1, 2, . . .}, as there is infinite waiting space.<br />

As servers are non-idling, µ n , the death rate when X t = n, is the rate <strong>of</strong> departures<br />

aggregated over all busy servers, i.e., the individual-server departure rate, µ, times<br />

the number <strong>of</strong> busy servers; thus<br />

µ n =<br />

{ nµ n < c<br />

cµ n ≥ c<br />

The birth (arrival) rate is λ, regardless <strong>of</strong> X t ; that is, λ n = λ for all n. Putting<br />

ρ := λ<br />

cµ ,<br />

the birth-death solution (3.20) becomes<br />

{<br />

λ n<br />

π0<br />

π n =<br />

= π µ·2µ·3µ···nµ 0 cn ρ n<br />

n ≤ c<br />

n!<br />

π c ρ n−c c<br />

= π c ρ n<br />

0 n ≥ c<br />

c!<br />

(the two branches agree at n = c) <strong>and</strong> π 0 must satisfy<br />

∑c−1<br />

1 = π n +<br />

n=0<br />

∞∑<br />

n=c<br />

[ ∑c−1<br />

(cρ) n<br />

π n = π 0 + (cρ)c<br />

n! c!<br />

n=0<br />

]<br />

∞∑<br />

ρ n−c .<br />

n=c<br />

(3.23)<br />

Thus, a stationary distribution exists if <strong>and</strong> only if ∑ ∞<br />

j=0 ρj < ∞, i.e., if <strong>and</strong> only if<br />

ρ < 1. In this case, ∑ ∞<br />

j=0 ρj = 1 , <strong>and</strong><br />

1−ρ<br />

[ c−1<br />

] −1<br />

∑ (cρ) n<br />

π 0 =<br />

+ (cρ)c . (3.24)<br />

n! c!(1 − ρ)<br />

n=0<br />

For future reference, for c = 1 (M/M/1 model), we get π 0 = 1 − ρ (the sum is 1,<br />

the second term is ρ/(1 − ρ)), <strong>and</strong> we find π n = (1 − ρ)ρ n for all n.<br />

With X t being the number <strong>of</strong> jobs in the system in the finite-waiting space model<br />

M/M/c/k <strong>and</strong> in extensions that allow balks <strong>and</strong> ab<strong>and</strong>ons (with exponentially distributed<br />

patience times), a stationary distribution exists for any ρ (as there are only<br />

c + k + 1 π’s); it is not difficult to compute by using the general birth-death solution<br />

(3.20) <strong>and</strong> (3.21).<br />

♣ Study the following M/M/c-type problems. • E04 1, except for the question on<br />

waiting times (last paragraph) : M/M/1 versus an M/M/2 with half-as-fast servers,<br />

i.e., same overall speed.<br />

centralised M/M/2 that h<strong>and</strong>les all work.<br />

• E06 1: compare three separate M/M/1 systems to a<br />

54


Transitions Beyond Infinite-Case Birth-Death, Solvable via PGFs<br />

♣ Study the following.<br />

• Exercise 6. Then recognise that E06 3, E07 2, E10 3 are all similar to this. The<br />

modelling novelty is to allow batch arrivals. An infinite-state-space CTMC arises,<br />

with structure more complex than the birth-death case. Solvable via the pgf <strong>of</strong><br />

the stationary distribution, as shown in the exercise.<br />

• E04 2. This is the M/E k /1 Model, where E k refers to the Erlang-k distribution,<br />

meaning that the service time is the sum <strong>of</strong> k independent exponentially distributed<br />

service stages (<strong>of</strong> known rate each). A reduction to an infinite-state-space CTMC<br />

is done, the state variable now counting stages (in the system) rather than jobs.<br />

The balance equations are exactly as in Exercise 6 (but the state variable has different<br />

meaning across the two problems), <strong>and</strong> thus so is the pgf <strong>of</strong> the stationary<br />

distribution. In answering questions involving the number <strong>of</strong> jobs, we must translate<br />

from a job count to (a set <strong>of</strong>) stage counts: for example, if k = 2, then the<br />

state “1 job in the system” is equivalent to “1 stage in the system” or “2 stages in<br />

the system”.<br />

55


3.5 Arrivals That See Time Averages<br />

We now state carefully a key theorem that states, under conditions, the equality <strong>of</strong><br />

long-run customer averages to associated stationary probabilities.<br />

Theorem 3.6 (Arrivals See Time Averages (ASTA)) Let X = (X t : t ≥ 0) be<br />

the number <strong>of</strong> jobs in the system, assumed to be a CTMC with state space E <strong>and</strong><br />

stationary distribution (π i ) i∈E . Write T j for the j-th arrival time. Assume that for<br />

all t, the arrival-counting process (essentially the T j ’s) forward from t is independent<br />

<strong>of</strong> the history <strong>of</strong> X up to t. Then<br />

lim<br />

n→∞<br />

∑ n<br />

j=1 1 {X Tj =i}<br />

n<br />

= π i w.p. 1, i ∈ E. (3.25)<br />

Taking expected value <strong>of</strong> the left side results in the indicator r<strong>and</strong>om variables being<br />

replaced by probabilities, so the above gives<br />

∑ n<br />

j=1<br />

lim<br />

P(X T j<br />

= i)<br />

= π i . (3.26)<br />

n→∞ n<br />

If the arrivals are a Poisson process, then the theorem’s assumptions are true,<br />

<strong>and</strong> this special case is called PASTA, the added “P” st<strong>and</strong>ing for “Poisson”.<br />

Exercises <strong>and</strong> Exams from Year 2007/08 onwards, PASTA is mentioned when used.<br />

In pre-2007/08 exams, PASTA is implicitly assumed without being mentioned.<br />

With deterministic arrival <strong>and</strong> service times, the ASTA conclusions tend to fail:<br />

although there exists a long-run fraction <strong>of</strong> time in state i, it tends to differ from the<br />

long-run fraction <strong>of</strong> arrivals that find the system in this state.<br />

In<br />

56


3.5.1 A Steady-State Delay Distribution<br />

Problem 3.7 Customers arrive to an M/M/1 queue at rate λ = 1/2 per hour.<br />

Service discipline is first come first served (FCFS). Two rates <strong>of</strong> service are possible.<br />

The faster rate <strong>of</strong> service is µ = 1 per hour <strong>and</strong> costs £40 per hour; the slower service<br />

rate is µ = 0.75 per hour <strong>and</strong> costs £30 per hour. A cost <strong>of</strong> £200 is incurred for<br />

each customer that waits in queue (i.e., excluding service) more than one hour. Find<br />

which service rate gives the lowest average cost per hour.<br />

We will analyse a slightly more general problem, where we replace the constant<br />

“1” (the “1 hour”) by any s ≥ 0, <strong>and</strong> assume the M/M/c model for c general. Let<br />

W j be the delay (time spent in queue) <strong>of</strong> the j-th job (customer). Write<br />

∑N t<br />

C t = number <strong>of</strong> arrivals up to t that wait more than s =<br />

Similar to the cost model that gave (3.17), write<br />

C t<br />

t = C t N t<br />

N t t<br />

j=1<br />

1 {Wj >s}.<br />

where N t is the number <strong>of</strong> arrivals up to t. Now, as t → ∞, N t /t converges (w.p.<br />

1) to the arrival rate, as seen in (2.26), <strong>and</strong> lim t→∞ N t = ∞. We attempt to guess<br />

lim t→∞ C t /N t from the corresponding (limit <strong>of</strong>) expected value:<br />

[ ] [<br />

]<br />

Ct<br />

1 ∑N t<br />

1<br />

n∑<br />

F (s) := lim E = lim E 1 {Wj >s} = lim P(W j > s), s ≥ 0.<br />

t→∞ N t t→∞ n→∞ n<br />

N t<br />

j=1<br />

We call F () a steady-state delay distribution (it is usually a proper distribution).<br />

j=1<br />

(3.27)<br />

To find F , let T j be the arrival time <strong>of</strong> the j-th job, <strong>and</strong> condition on the number<br />

<strong>of</strong> jobs it finds on the system, denoted X Tj :<br />

∞∑<br />

P(W j > s) = P(W j > s|X Tj = k)P(X Tj = k) (3.28)<br />

k=0<br />

for any s ≥ 0 (this is the Law <strong>of</strong> Total Probability (LTP), Theorem 1.10, with partition<br />

events {X Tj = k}, k = 0, 1, 2, . . .).<br />

Supposing the system has c servers, it suffices to sum over k ≥ c, since otherwise<br />

the waiting time is zero; thus<br />

1<br />

n∑ ∞∑<br />

F (s) = lim P(W j > s|X Tj = c + k)P(X Tj = c + k) by (3.28)<br />

n→∞ n<br />

j=1 k=0<br />

(<br />

)<br />

∞∑<br />

1<br />

n∑<br />

= P(W > s|X = c + k) · lim P(X Tj = c + k) (3.29)<br />

n→∞ n<br />

k=0<br />

57<br />

j=1


(limit passed inside sum), where we abbreviate W j as W <strong>and</strong> X Tj as X, as j does not<br />

affect the probability. Now note that:<br />

• The term in parenthesis, lim n→∞<br />

1<br />

n<br />

∑ n<br />

j=1 P(X T j<br />

= c + k), equals the stationary<br />

probability that the process X is at state c + k, by PASTA, (3.26).<br />

• The conditional probability has been considered, under the First-Come, First-<br />

Serve (FCFS) discipline, in Application 2.11: given X = c + k with k ≥ 0, W has<br />

a Gamma distribution. Specifically, assuming Expon(µ) service times, plugging<br />

the Gamma result from there <strong>and</strong> the π’s from (3.23) into (3.29) <strong>and</strong> simplifying<br />

gives<br />

F (s) =<br />

π c<br />

1 − ρ e−(cµ−λ)s , s ≥ 0, (3.30)<br />

where F (0) = π c /(1 − ρ) = ∑ ∞<br />

n=c π n is the stationary probability that all servers<br />

are busy. The formula describes the steady-state delay as a mixed distribution<br />

with two components: with probability 1 − F (0), it is zero; with the remaining<br />

probability, F (0), it is distributed as Expon(cµ − λ).<br />

The answers to Problem 3.7 are now direct from (3.30) <strong>and</strong> the M/M/1 stationary<br />

distribution, given following (3.24). In the slow system, π c /(1 − ρ) = ρ = 2/3,<br />

F slow (1) = 2 3 e− 1 4 ·1 . = 0.519,<br />

<strong>and</strong> the long-run average cost per hour is 30 + 200λF slow (1) = £81.9. Exactly in the<br />

same way, in the fast system, π c /(1 − ρ) = ρ = 1/2,<br />

F fast (1) = 1 2 e− 1 2 ·1 . = 0.303<br />

<strong>and</strong> the long-run average cost per hour is 40 + 200λF fast (1) = £70.3. Thus the faster<br />

system has lower cost.<br />

♣ Study the following. Exams E03 1, E07 3 (last 10 points), E11 4(b) are all<br />

essentially the above analysis.<br />

58


3.6 Little’s Law<br />

Little’s Law states the existence <strong>of</strong> three long-run limits <strong>and</strong> a link between them.<br />

We need the following.<br />

N t<br />

X t<br />

W j<br />

number <strong>of</strong> arrivals up to time t<br />

number <strong>of</strong> jobs present in the system at time t<br />

time spent in the system by job j<br />

Very roughly speaking, the central idea is that there exist (r<strong>and</strong>om) times such<br />

that all the processes “probabilistically restart” themselves, independently <strong>of</strong> the past,<br />

<strong>and</strong> moreover, the processes go through infinitely many such “independent identically<br />

distributed” cycles. Taking these times as<br />

τ i = i-th time the system becomes empty after having been non-empty, i = 1, 2, . . .<br />

where we put τ 0 = 0, <strong>and</strong> putting<br />

A i = ∫ τ i<br />

τ i−1<br />

X u du = “area” <strong>of</strong> X process during the i-th cycle<br />

Ñ i = N τi − N τi−1 = number <strong>of</strong> arrivals during the i-th cycle<br />

the following is a set <strong>of</strong> precise supporting assumptions:<br />

A1. The cycle lengths (τ i − τ i−1 ) ∞ i=1 are iid. The areas (A i ) ∞ i=1 are iid. The arrival<br />

counts (Ñi) ∞ i=1 are iid.<br />

A2. E[τ 1 ] < ∞ (finite mean cycle length) <strong>and</strong> E[Ñ1] < ∞ (finite mean number <strong>of</strong><br />

arrivals per cycle).<br />

A3. The cycle containing t, R t := min{i : τ i ≥ t}, <strong>and</strong> the cycle during which job k<br />

arrives, C k := min{i : N τi ≥ k}, satisfy<br />

lim R t = ∞,<br />

t→∞<br />

lim C k = ∞, w.p. 1.<br />

k→∞<br />

Theorem 3.8 (Little’s Law) Assume A1 to A3. Then,<br />

(i) Long-run average number in system:<br />

lim<br />

t→∞<br />

∫ t<br />

0 X udu<br />

t<br />

= E[A 1]<br />

E[τ 1 ]<br />

=: L w.p. 1. (3.31)<br />

(ii) Long-run average arrival rate:<br />

N t<br />

lim<br />

t→∞ t<br />

= E[Ñ1]<br />

E[τ 1 ]<br />

=: λ w.p. 1. (3.32)<br />

59


(iii) Long-run average waiting time:<br />

lim<br />

k→∞<br />

∑ k<br />

i=1 W i<br />

k<br />

= E[A 1]<br />

E[Ñ1]<br />

=: W w.p. 1. (3.33)<br />

Thus, in particular, L = λW .<br />

Pro<strong>of</strong>. The pro<strong>of</strong> is non-examinable <strong>and</strong> given for completeness.<br />

<strong>and</strong><br />

(i) For any t <strong>and</strong> for n = R t being the cycle containing t, we have τ n−1 ≤ t ≤ τ n<br />

A 1 + . . . + A n−1 ≤<br />

∫ t<br />

0<br />

X u du ≤ A 1 + . . . + A n .<br />

Dividing the latter inequality by the (reversed) former inequality,<br />

∫ t<br />

A 1 + . . . + A n−1<br />

≤<br />

X 0 udu<br />

≤ A 1 + . . . + A n<br />

. (3.34)<br />

τ n<br />

t<br />

τ n−1<br />

We will take limits <strong>of</strong> the bounds as t → ∞ <strong>and</strong> thus n = R t → ∞. Consider the<br />

lower bound (left side <strong>of</strong> (3.34)) carefully. The enumerator is a sum <strong>of</strong> a large number<br />

<strong>of</strong> A’s, n − 1 <strong>of</strong> them, <strong>and</strong> the denominator τ n = ∑ n<br />

i=1 (τ i − τ i−1 ) is the sum <strong>of</strong> n<br />

cycle lengths, suggesting a “Strong Law <strong>of</strong> Large Numbers” effect for both. Indeed,<br />

as n → ∞,<br />

A 1 + . . . + A n−1<br />

τ n<br />

= A 1 + . . . + A n−1<br />

n − 1<br />

} {{ }<br />

→E[A 1 ] w.p. 1<br />

n − 1<br />

n<br />

} {{ }<br />

→1<br />

1<br />

P n<br />

i=1 (τ i−τ i−1 )<br />

n<br />

} {{ }<br />

→1/E[τ 1 −τ 0 ]=1/E[τ 1 ] w.p. 1<br />

(3.35)<br />

by using the SLLN for the areas (A’s) <strong>and</strong> for the cycle lengths (τ i − τ i−1 ’s). That<br />

is, the above converges to E[A 1 ]/E[τ 1 ] w.p. 1. Checking that the upper bound (right<br />

side <strong>of</strong> (3.34)) has the same limit, the pro<strong>of</strong> is complete.<br />

(ii) The ideas are similar to (i). For any t <strong>and</strong> for n = R t , we have<br />

N τn−1<br />

τ n<br />

≤ N t<br />

t<br />

≤ N τ n<br />

τ n−1<br />

(3.36)<br />

<strong>and</strong> the limit <strong>of</strong> the lower bound as t → ∞ (thus n = R t → ∞) is<br />

N τn−1<br />

lim<br />

n→∞ τ n<br />

Ñ 1 + . . . +<br />

= lim<br />

Ñn−1<br />

n→∞ n − 1<br />

n − 1<br />

n<br />

n 1<br />

= E[Ñ1]<br />

τ n E[τ 1 ]<br />

w.p. 1 (3.37)<br />

by using the SLLN for the Ñ’s <strong>and</strong> for the cycle lengths. Checking that the upper<br />

bound (right side <strong>of</strong> (3.36)) has the same limit, the pro<strong>of</strong> is complete.<br />

(iii). Since a job always departs by the end <strong>of</strong> the cycle during which it arrives,<br />

we have<br />

N<br />

∑ τn<br />

W i = A 1 + . . . + A n for all n.<br />

i=1<br />

60


Consider the k-th arrival <strong>and</strong> let n = C k be the cycle during which it occurs. Then,<br />

N τn−1 ≤ k ≤ N τn , <strong>and</strong><br />

A 1 + . . . + A n−1<br />

N τn<br />

=<br />

∑ Nτn−1<br />

i=1 W i<br />

N τn<br />

≤<br />

∑ k<br />

i=1 W i<br />

k<br />

≤<br />

∑ Nτn<br />

i=1 W i<br />

N τn−1<br />

= A 1 + . . . + A n<br />

N τn−1<br />

. (3.38)<br />

The limit <strong>of</strong> the lower bound as k → ∞ (thus n = C k → ∞) is<br />

A 1 + . . . + A n−1<br />

lim<br />

n→∞ N τn<br />

= lim<br />

n→∞<br />

A 1 + . . . + A n−1<br />

τ n<br />

lim<br />

n→∞<br />

τ n<br />

= E[A 1] E[τ 1 ]<br />

N τn E[τ 1 ]<br />

E[Ñ1] = E[A 1]<br />

E[Ñ1] w.p. 1<br />

(the first limit is proved in (3.35) <strong>and</strong> the second limit is essentially proved in (3.37)).<br />

Checking that the upper bound (right side <strong>of</strong> (3.38)) has the same limit, the pro<strong>of</strong> is<br />

complete.<br />

✷<br />

Application 3.9 Based on the π’s in (3.23) for the M/M/c model, we give formulas<br />

for certain (long-run) averages for the system <strong>and</strong> for the queue only (i.e., excluding<br />

service), using Little’s Law to go from average numbers to average waiting times<br />

(for which we previously had no theory). The long-run average number <strong>of</strong> jobs in the<br />

M/M/c queue, call it L q , is (apply (3.13) with cost function “# in queue”, f(n) = n−c<br />

for n ≥ c <strong>and</strong> 0 otherwise):<br />

L q = average # in queue =<br />

∞∑<br />

∑ ∞<br />

(n − c)π n = π c jρ j ρ<br />

= π c<br />

(1 − ρ) 2 .3 (3.39)<br />

n=c<br />

j=1<br />

Let W <strong>and</strong> W q be the (long-run) average waiting times in the system <strong>and</strong> in queue,<br />

respectively. These are linked as follows: for any job j, the time in the system is the<br />

sum <strong>of</strong> W j,q = time in queue plus S j = time in service. Then<br />

1<br />

W := lim<br />

n→∞ n<br />

n∑<br />

j=1<br />

1<br />

W j = lim<br />

n→∞ n<br />

n∑<br />

j=1<br />

1<br />

W j,q + lim<br />

n→∞ n<br />

n∑<br />

S j = W q + 1 µ<br />

j=1<br />

w.p. 1, (3.40)<br />

the W q arising by definition, <strong>and</strong> 1/µ arising by the SLLN for the S j . Then, letting<br />

L = average # in system = ∑ ∞<br />

n=1 nπ n we have the links<br />

L = λW (Little’s Law for the system)<br />

(<br />

= λ W q + 1 )<br />

µ<br />

= L q + λ µ<br />

(via Little’s Law for the queue). (3.41)<br />

3 For ρ < 1, we have ∑ k<br />

i=1 iρi = ρ ∑ k<br />

i=1 d dρ ρi = ρ d ∑ k<br />

dρ i=0 ρi = ρ d 1−ρ k+1<br />

dρ 1−ρ<br />

= ρ 1−(k+1)ρk +kρ k+1<br />

(1−ρ)<br />

;<br />

2<br />

taking limits as k → ∞, we have ∑ ∞<br />

i=1 iρi = ρ<br />

(1−ρ)<br />

. 2<br />

61


♣ Study the following problems, which involve average waiting times <strong>and</strong> Little’s<br />

Law. • E04 1 remainder on waiting times. • Exercise 3. Comparison <strong>of</strong> M/M/1<br />

versus M/M/2 (each server has the same speed) in terms <strong>of</strong> a (long-run average) cost<br />

that has waiting <strong>and</strong> staffing components.<br />

62


3.7 Exercises<br />

Recall: (i) e x = ∑ ∞ x k<br />

k=0<br />

. (ii) For 0 ≤ ρ < 1, ∑ ∞<br />

k! i=0 ρi = 1/(1 − ρ).<br />

1. (R<strong>and</strong>om Balking.) Consider an M/M/1 queue with arrival rate λ <strong>and</strong> service<br />

rate µ. An arrival that finds i jobs in the system acts r<strong>and</strong>omly, independently <strong>of</strong><br />

everything else, as follows: with probability p i , it joins the system; with probability<br />

1 − p i , it balks, i.e., does not join. Let X t be the number <strong>of</strong> jobs in the system at<br />

time t. (i) Determine lim h→0 P(X t+h = i + 1|X t = i)/h carefully, showing all your<br />

work. (ii) Identify q ij := lim h→0 P(X t+h = j|X t = i) for all i ≠ j; no derivation<br />

is required. Given that (X t ) is a birth-death process, identify the birth <strong>and</strong> death<br />

rates. (iii) For the case p i = 1/(i + 1) for all i = 0, 1, . . ., find the stationary<br />

distribution <strong>of</strong> (X t ) in terms <strong>of</strong> λ <strong>and</strong> µ.<br />

2. (Number <strong>of</strong> servers adapts to the number <strong>of</strong> customers in the system.) Customers<br />

arrive at a system according to a Poisson process <strong>of</strong> rate λ. With X the number<br />

<strong>of</strong> customers in the system, assume that the number <strong>of</strong> servers is one whenever<br />

X ≤ m <strong>and</strong> two whenever X > m. Assume service times are Expon(µ). Show<br />

that the stationary distribution <strong>of</strong> X has the form<br />

π k =<br />

{<br />

π0 (2ρ) k k = 1, 2, . . . , m<br />

π 0 (2ρ) m ρ k−m k = m + 1, m + 2, . . .<br />

(3.42)<br />

where ρ = λ/(2µ), identifying results you use without pro<strong>of</strong>. How can π 0 be<br />

determined?<br />

3. Customers arrive at a system at rate 5 per hour, <strong>and</strong> the average time to service<br />

them is 1/6 hours. It costs 8x per hour to have x customers in the system (waiting<br />

cost) plus 5c per hour to have c servers (server cost). Find the c in {1, 2} that<br />

minimises the long-run average cost. State any assumptions made. Hint: Use<br />

(3.41) <strong>and</strong> the M/M/c stationary distribution, (3.23).<br />

4. For the M/M/c model with arrivals <strong>of</strong> rate λ <strong>and</strong> with Expon(µ) service times,<br />

verify (3.30) by combining Application 2.11 (waiting time when encountering k<br />

customers in queue has the Gamma(k +1, cµ) distribution, see (2.18)) <strong>and</strong> the fact<br />

(from Example 3.5) π c+k = π c ρ k for k = 0, 1, . . ., where ρ = λ/(cµ). Hint: In the<br />

resulting doubly-indexed sum, change the order <strong>of</strong> summation.<br />

5. (Failure <strong>of</strong> ASTA Conclusions in a deterministic queue.) Let X t be the number <strong>of</strong><br />

jobs in a single-server system at time t, where X 0 = 0, arrivals occur at times 0,<br />

10, 20, 30, . . ., <strong>and</strong> service times are 9 for each job. Here, the function (X t : t ≥ 0)<br />

is deterministic. State it <strong>and</strong> determine the following limits:<br />

(a) The long-run fraction <strong>of</strong> arrivals that find the server busy.<br />

(b) The long-run fraction <strong>of</strong> time the server is busy.<br />

Does the ASTA Theorem apply in this case?<br />

6. (Solving certain balance equations via probability generating functions.) Jobs<br />

arrive in a system in batches <strong>of</strong> size b > 1, with batch-arrival events occurring<br />

according to a Poisson process <strong>of</strong> rate λ. There is one server at which service times<br />

63


are Expon(µ) rv’s, independent <strong>of</strong> everything else. There is infinite waiting space.<br />

Let X t be the number <strong>of</strong> jobs in the system at time t.<br />

(i) Given that (X t ) is a CTMC with values in {0, 1, 2, . . .}, briefly argue that its<br />

stationary distribution {π i } ∞ i=0, when it exists, satisfies the following.<br />

State Rate Out = Rate In<br />

0 π 0 λ = π 1 µ<br />

i (1 ≤ i < b) π i (λ + µ) = π i+1 µ<br />

i (b ≤ i) π i (λ + µ) = π i+1 µ + π i−b λ<br />

(ii) Let G(z) = ∑ ∞<br />

i=0 π iz i be the probability generating function (pgf) <strong>of</strong> the<br />

distribution {π i } ∞ i=0. Show that G(z) = N(z) , where N(z) = µπ D(z) 0(1 − z) <strong>and</strong><br />

D(z) = λz b+1 − (λ + µ)z + µ. Hint: Multiply the state-i equation above by z i , for<br />

each i, then sum over all i (from 0 to ∞), then make appropriate “corrections”.<br />

(iii) To find π 0 we require 1 = ∑ ∞<br />

i=0 π i = G(1) (normalising equation). It is given<br />

that G(1) = lim z↑1 G(z) (z increases to 1; continuity <strong>of</strong> G(), Fact 1.18). Using<br />

L’Hopital’s rule, work out the limit as a function <strong>of</strong> π 0 , <strong>and</strong> thus show that<br />

π 0 = 1 − bλ µ .<br />

Note that the remaining π i ’s are then determined by (1.24).<br />

3.8 Solutions to Exercises<br />

1. Short answer to (i). Arrivals happen with rate λ. When X t = i, an arrival joins<br />

with probability p i , so “customer-joins-the-system” events happen with rate λp i .<br />

Full answer to (i). Let A = # <strong>of</strong> arrivals in (t, t + h] <strong>and</strong> D = # <strong>of</strong> departures in<br />

the same interval. I indicates if a given arrival joins (1-yes, 0-no), I is independent<br />

<strong>of</strong> everything else, <strong>and</strong> P(I = 1|X t = i) = p i . Then, intending h → 0,<br />

P(X t+h = i + 1|X t = i)<br />

= P(A = 1, D = 0, I = 1|X t = i) + P(A + D ≥ 2 <strong>and</strong> other conditions|X t = i)<br />

} {{ }<br />

o(h)<br />

= P(A = 1|X t = i)P(D = 0|X t = i)P(I = 1|X t = i) + o(h) by independence<br />

= [λh + o(h)][1 − µh + o(h)]p i + o(h)<br />

= λp i h + o(h)<br />

where we need not specify the “other conditions”. Thus<br />

(<br />

P(X t+h = i + 1|X t = i)<br />

lim<br />

= lim λp i + o(h) )<br />

h→0 h<br />

h→0 h<br />

Note: The thinning Proposition 2.14 is a similar idea.<br />

= λp i .<br />

64


(ii) The set <strong>of</strong> values that X t may take is {0, 1, 2, . . .}. In (i) we showed q ij = λp i<br />

for j = i + 1 <strong>and</strong> all i. Similar to the derivations (3.1) to (3.3), we see that q ij = µ<br />

for all i > 0 <strong>and</strong> j = i − 1; <strong>and</strong> q ij = 0 for all other i ≠ j. That is, (X t ) is a<br />

birth-death process with birth rates λ i = λp i for all i <strong>and</strong> death rates µ i = µ for<br />

all i > 0.<br />

(iii) From the general birth-death solution (3.20) <strong>and</strong> p i = 1/(i + 1) we have<br />

λ · (λ/2) · (λ/3) · · · (λ/k) ρ k<br />

π k = π 0 = π<br />

µ k 0 , k = 1, 2, . . . .<br />

k!<br />

(<br />

where ρ = λ/µ. Normalising gives π 0 = 1 + ∑ ) −1<br />

∞ ρ k<br />

k=1 k! = (e ρ ) −1 = e −ρ .<br />

2. With X t the number <strong>of</strong> customers in the system at time t, (X t ) is a birth-death<br />

process taking values in {0, 1, 2, . . .}, with birth rates λ n = λ for all n, <strong>and</strong> death<br />

rates µ n = µ for n ≤ m <strong>and</strong> µ n = 2µ for n > m. Then, using the st<strong>and</strong>ard<br />

birth-death result (3.20), we obtain the stated equations. To determine π 0 , insert<br />

the π i from (3.42) into the normalising equation ∑ ∞<br />

i=0 π i = 1 <strong>and</strong> solve for π 0 .<br />

3. We assume that inter-arrival <strong>and</strong> service times are exponentially distributed <strong>and</strong><br />

independent. Then, the one-server <strong>and</strong> two-server systems are the M/M/1 <strong>and</strong><br />

M/M/2 model, respectively. In our st<strong>and</strong>ard notation, λ = 5 <strong>and</strong> µ = 6.<br />

By “cost” we mean long-run average cost per hour, in £. The waiting cost is 8L,<br />

where L, the long-run average number <strong>of</strong> customers in the system, is a known<br />

function <strong>of</strong> ρ = λ/(cµ), via L q , from (3.39) <strong>and</strong> (3.41). The cost is 8L + 5c.<br />

M/M/1 calculation. Using (3.23), that is, the stationary distribution for the<br />

M/M/c model, for c = 1, we have π 1 = (1 − ρ)ρ. Then, by (3.39), L q =<br />

π 1 ρ/(1 − ρ) 2 = ρ 2 /(1 − ρ); <strong>and</strong> (3.41) becomes L = ρ 2 /(1 − ρ) + ρ = ρ/(1 − ρ).<br />

Then, ρ = λ/µ = 5/6, L = 5, <strong>and</strong> the cost is 8L + 5 · 1 = 45.<br />

M/M/2 calculation. Again by (3.23), this time for c = 2, we have π 2 = π 0 2ρ 2 ,<br />

where, from (3.24),<br />

] −1<br />

π 0 =<br />

[1 + 2ρ + 2ρ2 = 1 − ρ<br />

1 − ρ 1 + ρ .<br />

Then (3.39) becomes<br />

<strong>and</strong> (3.41) becomes<br />

ρ<br />

L q = π 2<br />

(1 − ρ) = (π 02ρ 2 ρ<br />

)<br />

2 (1 − ρ) = 2ρ3<br />

2 1 − ρ 2<br />

L =<br />

2ρ3<br />

2ρ<br />

+ 2ρ =<br />

1 − ρ2 1 − ρ . 2<br />

We have ρ = λ/(2µ) = 5/12, L = 120/119 . = 1.0084, <strong>and</strong> the total cost is 8L+5·2 . =<br />

18.067. Thus, the 2-server system has lower cost.<br />

4. First, the conditional probability in (3.29) is<br />

P(W > s|X = c + k) =<br />

k∑<br />

−cµs (cµs)i<br />

e , s ≥ 0, k = 0, 1, . . . .<br />

i!<br />

i=0<br />

65


(Application 2.11; Gamma(k + 1, cµ) ∑tail probability, (2.18)). Recall that we have<br />

1<br />

used ASTA to claim that lim n<br />

n→∞ n j=1 P(X T j<br />

= c + k) = π c+k . Now, by (3.23)<br />

we have π c+k = π c ρ k for k = 0, 1, . . ., where ρ = λ/(cµ). Thus (3.29) becomes<br />

∞∑ k∑<br />

−cµs (cµs)i<br />

F (s) = e · π c ρ k<br />

i!<br />

k=0 i=0<br />

∑ ∞<br />

= π c e −cµs (cµs) i ∞∑<br />

ρ k<br />

i!<br />

i=0<br />

i=0<br />

k=i<br />

(reversed the summation order)<br />

∑ ∞<br />

= π c e −cµs (cµs) i ρ i<br />

i! 1 − ρ = π c<br />

1 − ρ e−(cµ−λ)s . (e x = ∑ ∞ x i<br />

i=0<br />

for x = cµsρ = λs)<br />

i!<br />

5. For t from 0 to 9, we have X t = 1, then for t from 9 to 10, we have X t = 0;<br />

<strong>and</strong> this pattern <strong>of</strong> “cycles”, <strong>of</strong> length 10 each, repeats forever. (We choose to not<br />

specify if X t is 0 or 1 at times when it jumps from one to the other, as this does<br />

not matter.) Thus:<br />

(a) All arriving jobs find the server is idle (X t = 0 just before each arrival time).<br />

(b) The long-run fraction <strong>of</strong> time that the server is busy (equivalently that X t = 1)<br />

is 9/10.<br />

The (a) <strong>and</strong> (b) above are, respectively, the left <strong>and</strong> right side in the ASTA Theorem’s<br />

conclusion, (3.25). The theorem does not apply here.<br />

6. (i) State changes occur as follows: when at state i, a batch-arrival event changes<br />

the state to i + b, while a service-completion event changes the state to i − 1; thus<br />

the generator entries are: q i,i+b = λ for all i; q i,i−1 = µ for all i > 0; <strong>and</strong> q ij = 0 for<br />

other i ≠ j. The stated equations are the usual balance equations (3.9) associated<br />

to this generator.<br />

(ii) Multiplying <strong>and</strong> summing as suggested gives<br />

∞∑<br />

∞∑<br />

∞∑<br />

λπ 0 + (λ + µ) π i z i = µ π i+1 z i + λ π i−b z i<br />

i=1<br />

i=0<br />

i=b<br />

} {{ }<br />

=G(z)−π 0<br />

= µ 1 ∞∑<br />

∑ ∞<br />

π i+1 z i+1 +λz b π i−b z i−b .<br />

z<br />

i=0<br />

i=b<br />

} {{ } } {{ }<br />

=G(z)−π 0 =G(z)<br />

The key idea above is that each <strong>of</strong> the infinite sums in the first equation gives G(z)<br />

after the “correction” steps seen in the second equation. Now, re-arrange to solve<br />

for G:<br />

G(z)<br />

(λ + µ − µ ) (<br />

z − λzb = µπ 0 1 − 1 )<br />

⇒ G(z) =<br />

z<br />

µπ 0 (1 − z)<br />

λz b+1 − (λ + µ)z + µ .<br />

(iii) The derivatives <strong>of</strong> N() <strong>and</strong> D() are N ′ (z) = −µπ 0 <strong>and</strong> D ′ (z) = (b+1)λz b −λ−<br />

µ. Using L’Hopital’s rule, 1 = G(1) = lim z↑1 G(z) = N ′ (1)<br />

= −µπ 0<br />

⇒ π D ′ (1) bλ−µ 0 = 1 − bλ . µ<br />

66


Chapter 4<br />

Sampling from Distributions<br />

We study some general methods for simulating (sampling) a value from a given univariate<br />

distribution, with support contained in the real numbers. The source <strong>of</strong><br />

r<strong>and</strong>omness is a pseudo-r<strong>and</strong>om number generator, assumed to return independent<br />

samples from the Unif(0, 1) distribution, defined after (1.10).<br />

4.1 Preliminaries<br />

F will generally denote a cdf. Given a cdf F , we write “X ∼ F ” to mean “X is a<br />

sample <strong>of</strong> the rv whose cdf is F ”. Given a pdf f, we write “X ∼ f” to mean “X is<br />

a sample <strong>of</strong> the rv whose pdf is f”. The notions <strong>of</strong> “inf” (infimum) <strong>and</strong> “min” are<br />

taken to coincide. Likewise, “sup” (supremum) <strong>and</strong> “max” will coincide.<br />

4.2 Inversion<br />

Definition 4.1 The inverse <strong>of</strong> the cdf F is the function<br />

F −1 (u) = inf{x : F (x) ≥ u} = min{x : F (x) ≥ u}, 0 < u < 1.<br />

To give a more explicit definition <strong>of</strong> the inverse, note first that any cdf F has a<br />

left limit everywhere (as it is non-decreasing), i.e., for any real a we can put<br />

Definition 4.2 Let F be a cdf.<br />

F (a−) = lim<br />

x→a− F (x).<br />

(a) If F is continuous <strong>and</strong> strictly increasing on an interval (a, b), then, for any u in<br />

(F (a), F (b)), define F −1 (u) as the unique x in (a, b) that solves F (x) = u.<br />

(b) If F has a jump at a, i.e., F (a−) < F (a), then, for any u in (F (a−), F (a)], define<br />

F −1 (u) = a.<br />

67


Remark 4.3 In case (a), the existence <strong>of</strong> x follows from the Intermediate Value<br />

Theorem, <strong>and</strong> the uniqueness results from the strictly-increasing assumption. In this<br />

case, F (F −1 (u)) = u for all u in (0,1). In case (b), where F has a jump at a, note that<br />

F (F −1 (u)) = F (a) > u for u in (F (a−), F (a)), so F −1 is not the st<strong>and</strong>ard inverse<br />

function.<br />

The inversion method returns F −1 (U), where U ∼ Unif(0, 1), <strong>and</strong> determined by<br />

a pseudo-r<strong>and</strong>om-number generator. We now show the method’s correctness.<br />

Proposition 4.4 Let F be a cdf <strong>and</strong> let F −1 be its inverse. If U ∼ Unif(0, 1), then<br />

F −1 (U) has cdf F .<br />

Pro<strong>of</strong>. The key fact is the equivalence<br />

F (x) ≥ u ⇔ x ≥ F −1 (u) (4.1)<br />

which is a consequence <strong>of</strong> the fact that F is a nondecreasing, right-continuous function.<br />

We omit a detailed pro<strong>of</strong> <strong>of</strong> (4.1). Now<br />

P(F −1 (U) ≤ x) = P(U ≤ F (x)) by (4.1)<br />

= F (x) since U ∼ Unif(0, 1)<br />

i.e., F −1 (U) has cdf F .<br />

✷<br />

Remark 4.5 On a computer, provided only that we can evaluate F (x) at any x,<br />

F −1 (u) can easily be computed for any u, via a bracketing/bisection method, for<br />

example. This method is simple, but beyond our scope.<br />

4.2.1 Calculating the Inverse Explicitly: Examples<br />

Inversion <strong>of</strong> a discrete cdf.<br />

Section 1.3.1 explained that a discrete distribution<br />

can always be reduced to ordered support points x 1 < x 2 < x 3 < . . ., each x occurring<br />

with positive probability, <strong>and</strong> the corresponding cdf F satisfies F (x 1 ) < F (x 2 ) <<br />

F (x 3 ) < . . .. Here, the inverse <strong>of</strong> F is (Definition 4.2(b)):<br />

⎧<br />

⎪⎨<br />

F −1 (u) =<br />

⎪⎩<br />

x 1 0 < u ≤ F (x 1 )<br />

x 2 F (x 1 ) < u ≤ F (x 2 )<br />

.<br />

x k F (x k−1 ) < u ≤ F (x k )<br />

.<br />

The inverse <strong>of</strong> some simple continuous cdf’s is calculated explicitly below.<br />

68


Example 4.6 Consider the Unif(5, 8) distribution. By (1.10), the cdf is<br />

F (x) = x − 5 , 5 ≤ x ≤ 8.<br />

3<br />

Definition 4.2(a) applies on the entire support, so solve<br />

x − 5<br />

3<br />

= u ⇔ x = 5 + 3u = F −1 (u).<br />

Example 4.7 In Example 1.6 we found that X =“equiprobable outcome on [2, 4] ∪<br />

[5, 6]” has cdf<br />

⎧<br />

⎪⎨<br />

F (x) =<br />

⎪⎩<br />

x−2<br />

3 2 ≤ x ≤ 4<br />

2<br />

3 4 < x ≤ 5<br />

2<br />

+ x−5,<br />

5 < x ≤ 6<br />

3 3<br />

This cdf is continuous, so its inverse is found by solving F (x) = u (Definition 4.2(a)).<br />

For u in (F (2), F (4)), x must be between 2 <strong>and</strong> 4, so solve<br />

x − 2<br />

3<br />

= u ⇔ x = 2 + 3u = F −1 (u).<br />

For u in (F (5), F (6)), x must be between 5 <strong>and</strong> 6, so solve<br />

2<br />

3 + x − 5<br />

3<br />

Finally, F −1 (2/3) = 4. In summary,<br />

F −1 (u) =<br />

= u ⇔ x = 3 + 3u = F −1 (u).<br />

{ 2 + 3u, u ≤<br />

2<br />

3<br />

3 + 3u, u > 2 3<br />

Example 4.8 Consider the Expon(λ) distribution. Recall that the cdf is<br />

F (x) = 1 − e −λx , x ≥ 0.<br />

Definition 4.2(a) applies on the entire support, so solve<br />

1 − e −λx = u ⇔ x = − 1 λ log(1 − u) = F −1 (u).<br />

♣ Study the following. • Exercise 1 (part (b) = inversion <strong>of</strong> Geometric distribution<br />

= E11 3(a)). • E04 4(i). Triangular distribution with support parametrised by some<br />

α. Helps emphasise that in solving F (x) = u, only x in the support are relevant.<br />

Failing to apply this restriction, as F (x) is a quadratic, there are two solutions, <strong>and</strong><br />

the one outside the support is irrelevant. • E03 4 (inversion part, first 5 lines). • E10<br />

2(c) (mixture <strong>of</strong> exponentials with disjoint support).<br />

69


4.3 Acceptance-Rejection<br />

4.3.1 Method <strong>and</strong> Theory<br />

The problem is to sample from a given distribution with pdf f <strong>and</strong> support S. That<br />

is, the output X should satisfy, for all x ∈ S,<br />

P(x − ɛ < X ≤ x + ɛ)<br />

lim<br />

ɛ→0+ 2ɛ<br />

= f(x), or, less precisely, P(X ∈ dx) = f(x)dx (4.2)<br />

where dx means an infinitesimal (i.e., arbitrarily small) interval containing x.<br />

The acceptance-rejection (A/R) method samples from another pdf, g (it is assumed<br />

known how to sample from g) <strong>and</strong> rejects samples in a way so that accepted samples<br />

have the desired pdf, f.<br />

constant K such that<br />

There is the key requirement that there exists a finite<br />

a(x) :=<br />

f(x)<br />

Kg(x)<br />

≤ 1 for all x ∈ S (4.3)<br />

The need for this is explained in Remark 4.9 below. The sampling works as follows:<br />

1. Sample Y ∼ g <strong>and</strong> U ∼ Unif(0, 1), independently <strong>of</strong> any other samples.<br />

2. If U ≤ a(Y ), set X ← Y (accept) <strong>and</strong> exit. Otherwise (reject), return to step 1.<br />

Note that the output X is the Y conditioned by the acceptance event<br />

A = {U ≤ a(Y )} .<br />

g is called the instrumental (trial, c<strong>and</strong>idate, proposal) pdf, <strong>and</strong> Kg the envelope.<br />

That X has pdf f can be seen roughly as follows:<br />

P(X ∈ dx) = P(Y ∈ dx|A) =<br />

P((Y ∈ dx) ∩ A)<br />

P(A)<br />

= P ( U ≤ a(Y ) ∣ ∣Y ∈ dx ) g(x)dx<br />

P(A)<br />

=<br />

P (A|Y ∈ dx) P(Y ∈ dx)<br />

P(A)<br />

= a(x)g(x)dx<br />

P(A)<br />

= f(x)<br />

KP(A) dx.<br />

For simplicity, assume the above is a pdf; then it must integrate to one, so<br />

∫<br />

∫<br />

f(x)<br />

1 =<br />

S KP(A) dx = 1<br />

1<br />

f(x)dx =<br />

KP(A) S KP(A)<br />

} {{ }<br />

(as the pdf f has support S). That is, the “less precisely” form in the right <strong>of</strong> (4.2)<br />

is true. Moreover, we see that<br />

<strong>and</strong> that K ≥ 1 always.<br />

P(A) = 1 K<br />

70<br />

=1<br />

(4.4)


Remark 4.9 The step P(U ≤ a(Y )|Y ∈ dx) = a(x) in the pro<strong>of</strong> holds only if<br />

a(x) ≤ 1 for all x (since a(x) > 1 cannot be a probability). This is why (4.3) is<br />

required.<br />

A more careful pro<strong>of</strong> <strong>of</strong> correctness <strong>of</strong> A/R shows the left <strong>of</strong> (4.2), as follows.<br />

Lemma 4.10 Assume that a() satisfies<br />

|Y − x| ≤ ɛ ⇒ a(x) − Mɛ ≤ a(Y ) ≤ a(x) + Mɛ<br />

where M < ∞ can be taken as the maximum <strong>of</strong> the absolute derivative (slope) <strong>of</strong> a().<br />

Then<br />

P(|Y − x| ≤ ɛ|A)<br />

lim<br />

ɛ→0 + 2ɛ<br />

= f(x)<br />

KP(A)<br />

provided that f(x) > 0 <strong>and</strong> g(x) < ∞, hence a(x) > 0.<br />

Pro<strong>of</strong>. Write the conditional probability in (4.5) as P(B)/P(A), where<br />

B = {|Y − x| ≤ ɛ, U ≤ a(Y )}.<br />

(4.5)<br />

The main idea is to bound the event B by a subset <strong>and</strong> a superset whose probabilities<br />

tend to the correct limit. The bounds are<br />

{|Y − x| ≤ ɛ, U ≤ a(x) − Mɛ} ⊂ B ⊂ {|Y − x| ≤ ɛ, U ≤ a(x) + Mɛ}.<br />

Taking probabilities,<br />

P(|Y − x| ≤ ɛ, U ≤ a(x) − Mɛ) ≤ P(B) ≤ P(|Y − x| ≤ ɛ, U ≤ a(x) + Mɛ)<br />

⇒ P(|Y − x| ≤ ɛ)P(U ≤ a(x) − Mɛ) ≤ P(B) ≤ P(|Y − x| ≤ ɛ)P(U ≤ a(x) + Mɛ)<br />

⇒ P(|Y − x| ≤ ɛ)[a(x) − Mɛ] ≤ P(B) ≤ P(|Y − x| ≤ ɛ)[a(x) + Mɛ]<br />

by the independence <strong>of</strong> Y <strong>and</strong> U, <strong>and</strong> by then using that U ∼ Unif(0, 1). Dividing<br />

by 2ɛP(A) throughout, we have lower <strong>and</strong> upper bounds for our target:<br />

P(|Y − x| ≤ ɛ)[a(x) − Mɛ]<br />

2ɛP(A)<br />

≤<br />

P(B)<br />

2ɛP(A)<br />

P(|Y − x| ≤ ɛ)[a(x) + Mɛ]<br />

≤ . (4.6)<br />

2ɛP(A)<br />

Now, take limits as ɛ → 0 + <strong>and</strong> observe:<br />

• lim ɛ→0 + P(|Y − x| ≤ ɛ)/2ɛ = g(x), as Y has pdf g;<br />

• lim ɛ→0+ [a(x) − Mɛ] = lim ɛ→0+ [a(x) + Mɛ] = a(x) > 0.<br />

Thus, both the lower <strong>and</strong> upper bound in (4.6) converge to g(x)a(x)/P(A) = f(x)/[KP(A)],<br />

<strong>and</strong> thus so does the middle. ✷<br />

71


4.3.2 Feasibility <strong>and</strong> Efficiency<br />

Each trial is accepted with probability P(A) = 1 , independently <strong>of</strong> other trials. Thus,<br />

K<br />

the number <strong>of</strong> trials until acceptance has the Geometric(P(A)) distribution, whose<br />

mean is 1/P(A) = K. Let us think “maximum sampling efficiency” equals “minimum<br />

mean number <strong>of</strong> trials until acceptance”, i.e., “maximum P(A)”, i.e., “minimum K”.<br />

f(x)<br />

The constraint (4.3) on K can be rewritten as K ≥ sup = max x∈S g(x) x∈S f(x) , so the<br />

g(x)<br />

minimum K that satisfies the constraint is<br />

K = sup<br />

x∈S<br />

f(x)<br />

g(x) = max f(x)<br />

x∈S g(x) . (4.7)<br />

The A/R method always requires setup, meaning choosing g <strong>and</strong> then determining<br />

K by solving this maximisation problem. A considerable limitation is that if the K<br />

in (4.7) is infinite, then A/R is impossible for this f <strong>and</strong> g; see Example 4.13 below.<br />

♣ Study the following. • E07 4. Develops an A/R sampler for the target density<br />

proportional to x α−1 e −x , x > 0 (the Gamma(α, 1) distribution), for the case α < 1.<br />

Note the density goes to ∞ as x → 0. The problem statement gives the envelope,<br />

<strong>and</strong> the problem is then straightforward. Choosing the envelope was a more subtle<br />

problem, which was not asked here, because <strong>of</strong> the requirement <strong>of</strong> the existence <strong>of</strong> a<br />

finite K; this was achieved by the envelope choice (up to a proportionality constant)<br />

x α−1 for x < 1, <strong>and</strong> e −x (the Expon(1) pdf) for x > 1. The envelope is sampled by<br />

inversion (that is, inverting the associated cdf).<br />

Example 4.11 The pdf to sample from is<br />

f(x) =<br />

{ 2<br />

25<br />

(6 − x) for 1 ≤ x ≤ 6<br />

0 otherwise.<br />

This is the Triangular(1,1,6) distribution, the three parameters being minimum,<br />

mode, <strong>and</strong> maximum. Let g be the pdf <strong>of</strong> the Unif(1,6) distribution), i.e., the constant<br />

1/5 on [1, 6]. As setup, we must find (4.7). Focus on f(x) = 2(6−x) in the support <strong>of</strong><br />

g(x) 5<br />

f, i.e., [1, 6], <strong>and</strong> see that it attains the maximum value <strong>of</strong> 2 at x = 1. Thus, K = 2,<br />

a(x) = (6 − x)/5, <strong>and</strong> P(A) = 1/K = 1/2. To see the sampling in action, suppose<br />

the sampled (Y, U) pairs are (4.9, 0.3) <strong>and</strong> (1.7, 0.8). Then calculate:<br />

Y U a(Y ) Is U ≤ a(Y )?<br />

4.9 0.3 0.22 No<br />

1.7 0.8 0.86 Yes<br />

Here, two trials were needed until acceptance. The number 1.7 is a r<strong>and</strong>om sample<br />

from the pdf f.<br />

72


When f <strong>and</strong> g involve the exponential function, it is typically easier to optimise<br />

log(f/g) (log = “natural logarithm”, the inverse function to the exponential one)<br />

rather than f/g, as the former has simpler derivatives. A maximiser <strong>of</strong> f/g is also a<br />

maximiser <strong>of</strong> log(f/g), <strong>and</strong> vice versa. The following is a simple illustration <strong>of</strong> this.<br />

Example 4.12 Suppose we want to sample from the pdf f(x) = (2/π) 1/2 e −x2 /2 with<br />

support (0, ∞). (This is the pdf <strong>of</strong> |Z|, where | · | denotes absolute value <strong>and</strong> Z ∼<br />

N(0, 1), the st<strong>and</strong>ard normal distribution (mean 0, variance 1).) Consider acceptancerejection<br />

with g the pdf <strong>of</strong> the Expon(1) distribution, i.e., g(x) = e −x on (0, ∞). To<br />

find the optimal K, maximise log[f(x)/g(x)] = log(2/π)/2 + x−x 2 /2 on (0, ∞). The<br />

maximiser is x = 1, giving K = (2e/π) 1/2 <strong>and</strong> acceptance probability 1/K ≈ 0.76.<br />

We write f(x) ∝ h(x) on a given support S to mean that f(x) = h(x)/c, where<br />

c = ∫ h(t)dt does not matter in the argument.<br />

S<br />

Example 4.13 The Gamma(α, λ) distribution (shape α > 0, scale λ > 0), seen<br />

earlier for α integer, has pdf f(x) ∝ x α−1 e −λx with support x > 0. Consider A/R<br />

with f <strong>and</strong> g being Gamma with respective shapes a, b <strong>and</strong> common scale. Then<br />

f(x)/g(x) ∝ x a−b . If we choose b < a, then max x>0 x a−b = ∞, since x a−b → ∞ as<br />

x → ∞, <strong>and</strong> A/R is impossible. If we choose b > a, then max x>0 x a−b = ∞, now<br />

because x a−b → ∞ as x → 0, so again A/R is impossible.<br />

A general case where A/R is possible is when the pdf f has support (a, b) with a<br />

<strong>and</strong> b both finite, <strong>and</strong> moreover f is bounded, i.e., max a


4.4 Exercises<br />

1. Describe in full the inversion method <strong>of</strong> simulating each <strong>of</strong> the distributions below.<br />

(a) Uniform on the interval (−5, 5).<br />

(b) Suppose we do independent trials, where each trial succeeds with probability p,<br />

<strong>and</strong> let X be the number <strong>of</strong> trials up to <strong>and</strong> including the first success. Then<br />

X is said to have the Geometric(p) distribution. Derive the cdf <strong>of</strong> X <strong>and</strong> then<br />

describe how X can be sampled by the inversion method.<br />

2. Based on the definition <strong>of</strong> the Geometric(p) distribution given above, give a method<br />

for simulating it based only on an unlimited supply <strong>of</strong> Unif(0, 1) pseudo-r<strong>and</strong>om<br />

numbers. Hint: Simulate the success indicators; neither inversion nor acceptancerejection<br />

is needed.<br />

3. We want to sample from the Gamma(a, 1) distribution whose pdf is<br />

f a (x) = 1<br />

Γ(a) xa−1 e −x , x > 0<br />

where Γ(a) = ∫ ∞<br />

0<br />

x a−1 e −x dx. Consider acceptance-rejection methods based on the<br />

family <strong>of</strong> trial pdf’s<br />

g λ (x) = λe −λx ,<br />

where λ may depend on a. For given a > 1, show that across envelopes <strong>of</strong> the<br />

form Kg λ , 0 < λ < 1, the choice λ = 1/a maximises the acceptance probability.<br />

Then state the resulting acceptance probability. Hint: By maximising fa(x) across<br />

g λ (x)<br />

x, determine the corresponding best A/R constant, K = K ∗ (a, λ). Then, with a<br />

being fixed, minimise K ∗ (a, λ) across λ. It will be easier to work with logarithms<br />

<strong>of</strong> functions when seeking the min or max.<br />

4.5 Solutions to Exercises<br />

1. In each case, we first write down the cdf F explicitly, <strong>and</strong> then find the inverse<br />

F −1 explicitly. Having done that, the r<strong>and</strong>om sample from F is F −1 (U), where U<br />

is a Unif(0, 1) (pseudo)-r<strong>and</strong>om number.<br />

(a) Let f be the pdf. The uniform distribution on [−5, 5] has pdf that is a constant<br />

c > 0 on this interval <strong>and</strong> 0 elsewhere. Find c:<br />

1 =<br />

∫ ∞<br />

−∞<br />

c dt =<br />

∫ 5<br />

−5<br />

c dt = 10c ⇒ c = 1 10 .<br />

That is,<br />

Thus, the cdf is<br />

f(t) =<br />

{ 1<br />

10 , a ≤ x ≤ b<br />

0 otherwise<br />

F (x) =<br />

∫ x<br />

−∞<br />

f(t) dt =<br />

∫ x<br />

−5<br />

1<br />

10 dt = x + 5<br />

10<br />

for −5 ≤ x ≤ 5.<br />

74


Find the inverse by solving for x:<br />

x + 5<br />

10<br />

= u ⇔ x = −5 + 10u = F −1 (u).<br />

(b) (i) Find the cdf. To this end, the easiest way is:<br />

The cdf, F , is<br />

P(X > k) = P(first k trials failed) = (1 − p) k , k = 0, 1, 2, . . . .<br />

F (k) = P(X ≤ k) = 1 − P(X > k) = 1 − (1 − p) k , k = 0, 1, 2, . . . .<br />

(ii) To state the inverse explicitly, we need to find the smallest integer k satisfying<br />

1 − (1 − p) k ≥ u ⇔ (1 − p) k ≤ 1 − u ⇔ k log(1 − p) ≤ log(1 − u) ⇔ k ≥<br />

⌈<br />

log(1−u)<br />

log(1−p)<br />

log(1 − u)<br />

log(1 − p) .<br />

(In the last step, the division by log(1 ⌉ − p) < 0 changed the direction <strong>of</strong> inequality.)<br />

Thus, F −1 (u) = , where ⌈x⌉ is the st<strong>and</strong>ard ceiling function<br />

(smallest integer that is ≥ x). Inversion returns F −1 (U), where U ∼ Unif(0, 1).<br />

2. Let U i ∼ Unif(0, 1), i = 1, 2, . . ., <strong>and</strong> independent (determined by a pseudor<strong>and</strong>om-number<br />

generator). Return the first i such that U i ≤ p.<br />

3. For fixed a <strong>and</strong> λ, the smallest possible K is K ∗ (a, λ) = max x:x>0 r a,λ (x), where<br />

r a,λ (x) := f a(x)<br />

g λ (x) = xa−1 e (λ−1)x<br />

, x > 0<br />

λΓ(a)<br />

Maximise log r a,λ (x) over the support <strong>of</strong> f a , i.e., x > 0:<br />

log r a,λ (x) = (a − 1) log x + (λ − 1)x − log[λΓ(a)]<br />

d log r a,λ<br />

= a − 1 + (λ − 1) = 0 ⇒ x ∗ (a, λ) = a − 1<br />

dx x<br />

1 − λ<br />

d 2<br />

dx log r 2 a,λ = −(a − 1) 1 x < 0. 2<br />

Thus the x ∗ (a, λ) above is the maximiser. (Note that the given constraints on a<br />

<strong>and</strong> λ imply x ∗ > 0, which is in the support. If it was not in the support, it would<br />

be irrelevant; it is the maximum over the support that matters.)<br />

For fixed a, maximising the acceptance probability is equivalent to minimising the<br />

A/R constant K ∗ (a, λ) over λ; first, form the logarithm:<br />

log K ∗ (a, λ) = log r a,λ (x ∗ )<br />

= (a − 1) log(a − 1) − (a − 1) log(1 − λ) + (1 − a) − log λ − log Γ(a).<br />

Now minimise this over 0 < λ < 1:<br />

∂ log K ∗<br />

∂λ<br />

∂ 2 log K ∗<br />

∂ 2 λ<br />

= a − 1<br />

1 − λ − 1 λ = 0 ⇒ λ∗ (a) = 1 a<br />

a − 1<br />

=<br />

(1 − λ) + 1 2 λ > 0. 2<br />

75


Thus λ ∗ (a) above is the minimiser. Putting K ∗ (a) = K ∗ (a, λ ∗ ), i.e., the constant<br />

associated to the best trial pdf, g λ ∗ (a), we have<br />

log K ∗ (a) := log K ∗ (a, λ ∗ (a))<br />

( ) a − 1<br />

= (a − 1) log(a − 1) − (a − 1) log + (1 − a) + log a − log Γ(a)<br />

a<br />

= a log a + (1 − a) − log Γ(a) after cancellations<br />

i.e., K ∗ (a) = a a e 1−a /Γ(a). The acceptance probability is, as always, 1/K ∗ (a).<br />

76


Chapter 5<br />

Guidance for Exam Preparation<br />

Questions 1 <strong>and</strong> 2 are compulsory. Of Questions 3 <strong>and</strong> 4, only the best answer will<br />

count.<br />

Generally, it is good to state clearly any assumptions made or missing information so<br />

as to show what you know, even if incomplete.<br />

More detail is given below for each question. Listed is the material (including past<br />

exam papers) that can be given higher priority. “E04 4(i)” means “Exam <strong>of</strong> June<br />

2004, Question 4(i)”, <strong>and</strong> so on.<br />

Question 1 (a): 9 points. The Exponential Distribution <strong>and</strong> Section 2.2, particularly<br />

the memoryless property (2.4), its derivation <strong>and</strong> significance. Exam E11<br />

4(a).<br />

Question 1 (b): Modelling, 31 points.<br />

the following.<br />

A modelling question, typically involving<br />

1. Identify an appropriate CTMC, its generator, <strong>and</strong> the equations that determine<br />

the stationary distribution, assuming one exists. A classic model is a birth-death<br />

process with finite or infinite state space.<br />

2. Given the stationary distribution <strong>of</strong> this CTMC, calculate performance by applying<br />

some or all <strong>of</strong> (3.13), (3.15), (3.17), <strong>and</strong> Little’s Law in the form L = λW .<br />

Examples 3.2 <strong>and</strong> 3.4 <strong>and</strong> Application 3.9. Chapter-3 Exercise 2. WORKING<br />

E03 2: Two Stations in T<strong>and</strong>em. E08 3: Finite-state-space CTMC; the policy <strong>of</strong><br />

preferring A over B together with the question “find the long-run fraction <strong>of</strong> time A<br />

is busy” suggest that the state should indicate the idle/busy state <strong>of</strong> each server separately.<br />

E04 1: M/M/1 versus an M/M/2 with half-as-fast servers, i.e., same overall<br />

speed. E06 1: Centralised versus decentralised M/M/ systems. E08 1: M/M/1. E11<br />

1: M/M/2. Chapter-3 Exercise 3: A comparison <strong>of</strong> M/M/1 versus M/M/2 involving<br />

waiting <strong>and</strong> server costs.<br />

WORKING - END<br />

The following problems should be studied only with regard to obtaining the balance<br />

equations. The explicit stationary distribution (solution to these equations) via<br />

probability generating functions (pgf’s) is not examinable. E04 2, M/E k /1 Model.<br />

Exercise 6, E06 3, E07 2, E10 3: batch arrivals.<br />

77


Question 2: Chapter 4, 26 points.<br />

1. Section 4.2, especially Definitions 4.1 <strong>and</strong> 4.2 <strong>and</strong> Proposition 4.4. Exams E10<br />

2(a)-(b), E11 3(b).<br />

2. Applying the inversion method. Exercise 1. Exams E03 4, E04 4(i), E07 4, E08 4,<br />

E10 2(c), E11 2(a), E11 3(a).<br />

3. Section 4.3. Exam E11 3(c).<br />

4. Applying the acceptance-rejection method. Exercise 3. Exams E03 4, E06 4(b)<br />

excluding the “discuss” part, E07 4, E08 4, E09 2(b), E10 4(a), E11 2(b).<br />

Question 3: Chapter 2, 34 points. Emphasis on definitions <strong>and</strong> on analysis<br />

similar or identical to the notes. Study in priority order:<br />

1. The Poisson-process basics: Definitions 2.4 <strong>and</strong> 2.5, calculations as in Example 2.7<br />

<strong>and</strong> Exercise 1. Pro<strong>of</strong> <strong>of</strong> “⇒” part in Theorem 2.6, which essentially is the pro<strong>of</strong><br />

<strong>of</strong> (2.11). Section 2.4.2. Exams E04 3, E07 3 (first 15 points).<br />

2. Merging <strong>of</strong> Poisson processes, Section 2.4.4, especially Proposition 2.10 <strong>and</strong> Exercise<br />

2. Poisson process <strong>of</strong> general rate function, Section 2.5. Exercises 3, 4. To<br />

solve E03 3, recognise it is an NHPP <strong>and</strong> identify the rate function; the solution<br />

is then st<strong>and</strong>ard, obtained exactly as in Exercise 3. Exam E10 4(b).<br />

3. Convergence as in Section 2.7.<br />

Typical partial question:<br />

Consider a certain r<strong>and</strong>omly occurring event, <strong>and</strong> let N t be the number <strong>of</strong> events<br />

that occur up to time t. State appropriate conditions about the N t such that<br />

(N t : t ≥ 0) is a Poisson process. The conditions are about: (i) independence;<br />

(ii) stationarity, meaning that certain distributions do not depend on time; <strong>and</strong><br />

(iii) the probabilities that one <strong>and</strong> two events occur, respectively, in an interval<br />

<strong>of</strong> length h, as h → 0.<br />

(11 points)<br />

Question 4: Chapter 3, 34 points. Emphasis on definitions <strong>and</strong> on analysis<br />

similar or identical to the notes. Study in priority order:<br />

1. Continuous-Time Markov chains, Section 3.3. The birth-death process, Section 3.2.<br />

Derivation <strong>of</strong> Kolmogorov-type differential equations (3.6), or something similar.<br />

Exam E06 2.<br />

2. Long-run behaviour, Sections 3.4 <strong>and</strong> 3.5, especially Section 3.5.1. Exams E03 1,<br />

E07 3 (last 10 points), E11 4(b).<br />

3. Careful statement <strong>of</strong> Little’s Law, as in Theorem 3.8.<br />

Typical partial question:<br />

78


Put X t for the number <strong>of</strong> jobs in a system at time t, <strong>and</strong> assume that for h ≥ 0<br />

we have X t+h = X t +B −D, where, conditional on X t = i, the r<strong>and</strong>om variables<br />

B <strong>and</strong> D (counts <strong>of</strong> new “births” <strong>and</strong> “deaths” during (t, t + h], respectively)<br />

are independent with<br />

P(B = 1|X t = i) = λ i h + o(h),<br />

P(B ≥ 2|X t = i) = o(h)<br />

P(D = 1|X t = i) = µ i h + o(h), P(D ≥ 2|X t = i) = o(h), i > 0.<br />

1. Determine, showing all your work,<br />

P(X t+h = i + 1|X t = i)<br />

lim<br />

h→0 h<br />

for any i.<br />

2. Determine, showing all your work,<br />

(8 points)<br />

P(X t+h = j|X t = i)<br />

lim<br />

h→0 h<br />

for |j − i| ≥ 2, i.e., the target state j differs from the starting state i by 2 or<br />

more.<br />

(9 points)<br />

79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!