28.04.2014 Views

Notes on the memoryless property

Notes on the memoryless property

Notes on the memoryless property

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The Memoryless Property<br />

Theorem 1 Let X be an Exp<strong>on</strong>ential random variable with parameter λ > 0. Then X has <strong>the</strong><br />

<strong>memoryless</strong> <strong>property</strong>, which means that for any two real numbers a, b > 0,<br />

P (X > a + b|X > b) = P (X > a).<br />

WARNING: This is not saying that P (X > a + b|X > b) = P (X > a + b). That would mean<br />

- literally - that <strong>the</strong> “future” values of X are independent of <strong>the</strong> past, which is not correct. It is<br />

simply saying - intuitively - that <strong>the</strong> probability that X is greater than some positive value does<br />

not “remember <strong>the</strong> past;” it may still depend <strong>on</strong> <strong>the</strong> past, however.<br />

Proof. First, we’ll derive an expressi<strong>on</strong> for P (X > t) for any t > 0.<br />

∫ [<br />

t<br />

t ]<br />

P (X > t) = 1 − P (X ≤ t) = 1 − λe −λx dx = 1 − λ − e−λx<br />

= 1 + e −λt − 1 = e −λt<br />

−∞<br />

λ ∣<br />

Now, compute P (X > a + b|X > b) using <strong>the</strong> definiti<strong>on</strong> of c<strong>on</strong>diti<strong>on</strong>al probability.<br />

P ({X > a + b} ∩ {X > b})<br />

P (X > a + b|X > b) =<br />

P (X > b)<br />

P (X > a + b)<br />

=<br />

P (X > b)<br />

= e−λ(a+b)<br />

e −λb<br />

= e−λa e −λb<br />

e −λb<br />

= e −λa<br />

But we just showed that e −λa is exactly P (X > a). Thus, we see that P (X > a + b|X > b) =<br />

P (X > a).<br />

0<br />

Some Remarks <strong>on</strong> <strong>the</strong> C<strong>on</strong>sequences and Interpretati<strong>on</strong>s of <strong>the</strong> Memoryless Property<br />

First of all, keep in mind that this <strong>property</strong> is not a universal <strong>property</strong>. It does not hold for<br />

all c<strong>on</strong>tinuous random variables. Moreover, it would obviously not apply in all physical situati<strong>on</strong>s.<br />

For instance, suppose X is <strong>the</strong> lifetime of a car engine given in terms of number of miles driven.<br />

If <strong>the</strong> engine has lasted 200,000 miles, we might not expect - based <strong>on</strong> actual, physical experience<br />

- that <strong>the</strong> probability that <strong>the</strong> engine lasts ano<strong>the</strong>r 100,000 miles is <strong>the</strong> same as <strong>the</strong> probability<br />

that <strong>the</strong> engine lasts 100,000 miles from <strong>the</strong> time it was first built. That is, we would probably not<br />

expect to have<br />

P (X > 300, 000|X > 200, 000) = P (X > 300, 000 − 200, 000) = P (X > 100, 000).<br />

But if empirical data showed that <strong>the</strong> lifetime of a car engine was, in fact, exp<strong>on</strong>entially distributed,<br />

<strong>the</strong>n this <strong>property</strong> would, indeed, hold, whe<strong>the</strong>r it matches your intuiti<strong>on</strong> or not.<br />

In fact, this <strong>property</strong> and <strong>the</strong> questi<strong>on</strong>s that were raised in class regarding it bring up an<br />

important philosophical point in probability <strong>the</strong>ory. Probabilities do not exist vacuously, and <strong>the</strong>y<br />

are not universal from situati<strong>on</strong> to ano<strong>the</strong>r. They carry with <strong>the</strong>m some particular distributi<strong>on</strong><br />

or form. When you estimate or compute probabilities in real life, everyday experiences, you are -


whe<strong>the</strong>r you realize it or not - implicitly imposing some sample space and probability mass/density<br />

functi<strong>on</strong> <strong>on</strong> <strong>the</strong> quantity you are computing. For instance, you would not estimate <strong>the</strong> probability<br />

that you get a particular hand in a game of poker in <strong>the</strong> same way you would estimate <strong>the</strong> probability<br />

that you have to wait 10 or more minutes in <strong>the</strong> checkout line at <strong>the</strong> grocery store. Each of those<br />

situati<strong>on</strong>s implicitly carries with it a distinct noti<strong>on</strong> of what probability means (or, ra<strong>the</strong>r, how it<br />

is distributed and how to compute it) in <strong>the</strong> particular c<strong>on</strong>text.<br />

C<strong>on</strong>sequently - because different probabilities behave in different ways - you cannot always fall<br />

back <strong>on</strong> your intuiti<strong>on</strong>. Just because your gut tells you something about probabilities in <strong>on</strong>e<br />

instance doesn’t mean that it holds in all instances.<br />

This is a comm<strong>on</strong> probabilistic fallacy.<br />

Take, for example, a problem like we discussed in class. Let X be <strong>the</strong> lifetime of a machine,<br />

machine comp<strong>on</strong>ent, battery - anything that has a finite lifetime. It may very well be <strong>the</strong> case that<br />

X is not <strong>memoryless</strong>. For example, if it is known that <strong>the</strong> machine has already lasted 20 hours, it<br />

might not be reas<strong>on</strong>able - in this particular physical c<strong>on</strong>text - to assume that <strong>the</strong> probability that<br />

<strong>the</strong> machine lasts at least ano<strong>the</strong>r 15 hours (or that it lasts at least 35 hours total, given that it has<br />

lasted 20 hours) is equal to <strong>the</strong> probability that it lasts at least 15 hours from <strong>the</strong> time it starts.<br />

That is, we might not have that<br />

P (X > 20 + 15|X > 20) = P (X > 15).<br />

But whe<strong>the</strong>r or not this holds would depend <strong>on</strong> <strong>the</strong> probability distributi<strong>on</strong> you are assuming applies<br />

to X. On <strong>the</strong> o<strong>the</strong>r hand, if you are told or if you know based <strong>on</strong> empirical data that <strong>the</strong> lifetime<br />

of this machine is exp<strong>on</strong>entially distributed, <strong>the</strong>n <strong>the</strong> above equati<strong>on</strong> would, indeed, hold, whe<strong>the</strong>r<br />

it seems intuitive to you or not. It would hold because you are imposing <strong>the</strong> c<strong>on</strong>diti<strong>on</strong> that X is<br />

exp<strong>on</strong>entially distributed, which means - regardless of what we think or feel should be right - that X<br />

is <strong>memoryless</strong>. If <strong>the</strong> lifetime of <strong>the</strong> machine is not <strong>memoryless</strong>, <strong>the</strong>n we simply wouldn’t describe<br />

its lifetime using an exp<strong>on</strong>ential random variable; we would use some o<strong>the</strong>r random variable.<br />

In o<strong>the</strong>r words, <strong>the</strong> <strong>memoryless</strong> <strong>property</strong> is a specific <strong>property</strong> of exp<strong>on</strong>ential random variables.<br />

If it seems counterintuitive to you in certain physical circumstances - and if your intuiti<strong>on</strong> is correct<br />

in those circumstances - <strong>the</strong>n that doesn’t mean that <strong>the</strong> <strong>memoryless</strong> <strong>property</strong> is wr<strong>on</strong>g. It just<br />

means that whatever quantity you’re measuring in that particular problem is not an exp<strong>on</strong>ential<br />

random variable.<br />

If this <strong>property</strong> seems strange and counterintuitive to you, though, perhaps it will be better to<br />

think of this in discrete terms, since <strong>the</strong>re is <strong>on</strong>e more random variable that has <strong>the</strong> <strong>memoryless</strong><br />

<strong>property</strong>. Some basic examples follow after that.<br />

Ano<strong>the</strong>r Memoryless Distributi<strong>on</strong>: Geometric Random Variables<br />

The geometric random variable also has <strong>the</strong> <strong>memoryless</strong> <strong>property</strong>. This should make intuitive<br />

sense given how we interpret exp<strong>on</strong>ential random variables. Exp<strong>on</strong>ential random variables usually<br />

measure <strong>the</strong> time until some event occurs. On <strong>the</strong> o<strong>the</strong>r hand, recall that geometric random variables<br />

describe <strong>the</strong> first occurrence of a particular event (i.e. a “success”) in a Bernoulli experiment.<br />

Theorem 2 If X is a geometric random variable with parameter p, <strong>the</strong>n X has <strong>the</strong> <strong>memoryless</strong><br />

<strong>property</strong>, which means that for any two positive integers i, j ≥ 1,<br />

P (X > m + n|X > n) = P (X > m).<br />

Intuitively, this can be interpreted as follows. If you have a Bernoulli experiment of identical,<br />

independent trials, each of which results in <strong>on</strong>e of two outcomes with respective probabilities p


and 1 − p (e.g. ei<strong>the</strong>r a 1 or a 0, or a “success” or a “failure,” etc.), <strong>the</strong>n X describes when <strong>the</strong><br />

first occurrence of <strong>the</strong> “success” outcome happens. Now, suppose, for instance, you run through<br />

10 trials of this experiment, all of which have been “failures” or 0s. Just because you’ve gotten 10<br />

failures in a row to begin <strong>the</strong> experiment doesn’t mean probabilistically that you should necessarily<br />

expect a success to be more likely (or a failure to be less likely) <strong>on</strong> <strong>the</strong> 11 th trial. If we go purely<br />

<strong>on</strong> intuiti<strong>on</strong> - which, as I’ve tried to point out above, isn’t always <strong>the</strong> best route in probability -<br />

you might expect that we should “eventually have to get a success,” so <strong>the</strong> l<strong>on</strong>ger we go without<br />

<strong>on</strong>e, <strong>the</strong> more likely we are to get <strong>on</strong>e next.<br />

But that’s not true!<br />

Because X is described by a geometric random variable, which has <strong>the</strong> <strong>memoryless</strong> <strong>property</strong>,<br />

it actually doesn’t matter how many c<strong>on</strong>secutive failures we get. The probability that <strong>the</strong> first<br />

success occurs at any particular trial is <strong>the</strong> same as it is at <strong>the</strong> beginning of <strong>the</strong> sequence! If <strong>the</strong><br />

intuiti<strong>on</strong> worked in this case, <strong>the</strong>n X wouldn’t be a geometric random variable, because it wouldn’t<br />

be <strong>memoryless</strong>. But it is geometric, so we have an example where <strong>the</strong> everyday intuiti<strong>on</strong> we try to<br />

apply to probabilities fails us.<br />

Proof. The probability mass functi<strong>on</strong> of X is p(i) = P (X = i) = p(1 − p) i−1 . As we did before,<br />

we will find an expressi<strong>on</strong> for P (X > n) to make things easier.<br />

P (X > n) = 1 − P (X ≤ n) = 1 −<br />

n∑<br />

n−1<br />

∑<br />

[ ]<br />

1 − (1 − p)<br />

p(1 − p) k−1 = 1 − p (1 − p) k n<br />

= 1 − p<br />

1 − (1 − p)<br />

k=1<br />

=⇒ P (X > n) = (1 − p) n<br />

Proceeding as in Theorem 1, we have <strong>the</strong> following.<br />

k=0<br />

P ({X > m + n} ∩ {X > n})<br />

P (X > m + n|X > n) =<br />

P (X > n)<br />

P (X > m + n)<br />

=<br />

P (X > n)<br />

(1 − p)m+n<br />

=<br />

(1 − p) n<br />

= (1 − p) m<br />

= P (X > m).<br />

Some Basic Examples<br />

Memoryless random variables like <strong>the</strong> exp<strong>on</strong>ential random variable may seem strange, but <strong>the</strong>y<br />

actually describe many different real-world phenomena. For instance, <strong>the</strong> time between arrivals<br />

of customers to a store, drive-thru, or any comparable service center is an exp<strong>on</strong>ential random<br />

variable. You might expect more customers to arrive at certain times of <strong>the</strong> day than at o<strong>the</strong>rs, but<br />

just because a customer hasn’t arrived in, say, <strong>the</strong> last 20 minutes doesn’t mean that you should<br />

expect <strong>on</strong>e in <strong>the</strong> next 5 minutes with anymore likelihood than you would have 20 minutes ago.<br />

By <strong>the</strong> same reas<strong>on</strong>ing, <strong>the</strong> time between teleph<strong>on</strong>e calls received by a particular ph<strong>on</strong>e is an<br />

exp<strong>on</strong>ential random variable. Ignoring circumstances where you’re expecting a planned ph<strong>on</strong>e call,<br />

and regardless of your average frequency of calls, just because you haven’t received a call in, say,<br />

<strong>the</strong> last hour doesn’t mean that you’re anymore likely to receive a call in <strong>the</strong> next 10 minutes than<br />

you were at <strong>the</strong> beginning of <strong>the</strong> hour.<br />

On <strong>the</strong> o<strong>the</strong>r hand - to admittedly c<strong>on</strong>tradict <strong>on</strong>e of my own examples from class - <strong>the</strong> lifetime<br />

of a cell ph<strong>on</strong>e battery would not be well-modeled by an exp<strong>on</strong>ential random variable. It works fine


as a ma<strong>the</strong>matical example, but experience tells us that - in physical terms - <strong>the</strong> l<strong>on</strong>ger a battery<br />

is used and recharged <strong>the</strong> more likely it is to ultimately fail. You could model <strong>the</strong> battery’s lifetime<br />

with an exp<strong>on</strong>ential random variable if you simply chose to, and it would even be appropriate for<br />

short times. But in <strong>the</strong> l<strong>on</strong>g run, modeling <strong>the</strong> overall lifetime exp<strong>on</strong>entially wouldn’t be a good<br />

idea.<br />

Also, you should read through Examples 5d in Secti<strong>on</strong> 5.5, and I encourage you to read <strong>the</strong><br />

porti<strong>on</strong> of Secti<strong>on</strong> 4.7 that discusses Poiss<strong>on</strong> processes and see how <strong>the</strong> exp<strong>on</strong>ential random variable<br />

is used to model <strong>the</strong> time between occurrences of an event. In particular, notice that if X is a Poiss<strong>on</strong><br />

random variable with parameter λ (which means λ is average/expected number of events that occur<br />

in a particular time interval), it follows from <strong>the</strong> Poiss<strong>on</strong> p.m.f. that<br />

P (X = 1) = λe −λ .<br />

This is just <strong>the</strong> p.d.f. of an exp<strong>on</strong>ential random variable with parameter λ evaluated at x = 1. If we<br />

let Y be this exp<strong>on</strong>ential random variable, <strong>the</strong>n <strong>the</strong> probability that Y is close to 1 is approximately<br />

P (1 − ∆x < Y < 1 + ∆x) ≈ f Y (1)∆x = P (X = 1)∆x.<br />

This shows <strong>the</strong> c<strong>on</strong>necti<strong>on</strong> between <strong>the</strong> Poiss<strong>on</strong> and Exp<strong>on</strong>ential random variables. In particular, it<br />

shows why we interpret <strong>the</strong> expected values <strong>the</strong> way we do for <strong>the</strong> Poiss<strong>on</strong> and exp<strong>on</strong>ential random<br />

variables. If we expect λ events to occur in a given time interval T (from a Poiss<strong>on</strong>(λ) random<br />

variable), <strong>the</strong>n we expect <strong>the</strong>m to occur at a rate of λ T<br />

events per unit time. This means that we<br />

would expect T λ<br />

units of time between events. If we set T = 1, we see that expected value of a<br />

Poiss<strong>on</strong> random variable is <strong>the</strong> number of events per unit time, and <strong>the</strong> reciprocal of that is <strong>the</strong><br />

rate at which <strong>the</strong> events occur (unit of time per event). This is, intuitively, why <strong>the</strong> expected values<br />

of Poiss<strong>on</strong>(λ) random variables and Exp<strong>on</strong>ential(λ) random variables are, respectively, λ and 1 λ -<br />

reciprocals of each o<strong>the</strong>r. Each variable measures an inverse aspect of <strong>the</strong> same problem from <strong>the</strong><br />

o<strong>the</strong>r variable.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!