Tricky Real-Life

Problems of

Probability

Yazan: Arman ÖZCAN

ISBN: 978-625-400-611-1

Dizgi ve Baskı:

KÖSELECİLER 1933

Magic Digital Center

Ulubatlı Hasan Bulvarı Güzeler İş Mrk. No.102/A Osmangazi / BURSA

Tel: 0224 25 25 717 Faks: 0224 250 04 67

www.koseleciler.com.tr info@koseleciler.com.tr

To my sibling, my mother and my aunt

Contents

Page

Preface 5

1. Let’s Make a Deal 9

2. The Stylish Aunt Lily’s Bracelets 15

3. Double Birthday 20

4. Either Boy or Girl 25

5. Tennis Players Waiting for Retirement 34

6. Waiting For The Yellow-throated Sparrow 38

7. Tag Day in the Kindergarten 43

8. The Unreliable Test 52

9. Flirting by Arranged Coincidences 56

10. A. irvinus 61

11. Nancy the Caretta 65

12. Marble Gamble 74

13. Picking Seats in the Theater 89

14. Buses on the Kadıköy - Bostancı Line 98

15. COVID-19 Swim Ring 106

16. Club, Bowling Alley or Theater 120

17. Smith Eats Turkey After Beating His Opponent 137

18. Is It Possible to Have Alzheimer’s and Keep It a Secret? 152

Preface

Probability is often not a well-understood and well-liked topic in school.

This is because it is different from other areas of mathematics. There is no

standard formula to use in a probability problem, unlike that for example,

in a calculus problem. As opposed to problems of algebra, probability

problems are usually misleading and have counter-intuitive results. There

is uncertainty and randomness in a probability problem, something which

would not happen in a problem of geometry. Because of these reasons,

one has to be creative and willing to invest a long time in order to answer

problems of probability. This, in turn, is a perfect reason to love problems

of probability. They incite us to rack our brains.

This book consists of eighteen probability problems. I will ask you first

to try solving each problem on your own. Then, I will show you solution to

the problem in a detailed manner. I have also added additional problems

at the end of many of the problems. These additional problems are highly

related to the problem they are connected to. You will be able to use the

methods and concepts throughout the book in solving these additional

problems.

I have taken inspiration from many different sources in writing out

these problems. These sources include the internet, several books, and

problems I have learned from mathematicians. In the book, I have included

not only famous problems that have puzzled mathematicians for

years, but also other original problems that I have come up with on my

own. I have also been scrupulous in choosing problems that incorporated

a variety of problem solving techniques in probability.

The problems in this book share many important characteristics. First

off, you will notice that all of the problems have small backstory within

them. Throughout the book, you will experience a range of emotions. You

will feel sad when Stylish Aunt Lily gets her bracelets stolen. You will be

elated when Nancy the Caretta makes it to the sea. You will be anxious

for the children getting on the swim ring, disregarding social distancing

during the COVID-19 pandemic. You will get to know many characters

from Allison the party-lover to Smith the turkey-lover, from microbiologist

Ms. Irvin to O’Neill the bird watcher. These little stories were meant to

make the book more fun, and also to relate mathematical questions to

real life. I, for one, think that relating mathematics to real life is key in

fostering a love of mathematics. Our test books are usually not successful

in this regard. We have never run across someone buying 452 bananas

as the ones in our test books, have we? The result of such unrealistic

questions is children who complain saying “How will mathematics ever be

useful in real life?” If you are one of these people, I believe this book will

change your opinion on mathematics.

Another common characteristic of the problems in the book is the fact

the they are all highly misleading. In many of the questions, the first

answer you come up with will likely be incorrect. This is especially the

case with the problems that seem the simplest. Sometimes you will be

so sure of the answer you have found that the actual solution will not

convince you. At this point, you might go so far as to say that solution

provided in the book is erroneous. In fact, I have run across this very

situation before even the book was finished. My mother is extremely sure

that the solutions to the first and the fourth problems are completely

wrong. To this day, she remains unconvinced.

But I advise you not to be like my mother and to try to approach

the problems without any preconceptions. If you find the solution to be

too easy, do not trust your answer and try to second-guess yourself before

reading up on the solution.

On the contrary, if a problem appears to difficult, do not give up. I

assure you that almost all of the problems in the book have a simple

method of solution. It might not always be easy to see the shortcuts that

lead to these simple solutions. However, it will certainly be child’s play

when you set foot on the right track.

All of the solutions are explained in a manner that any high school

student can understand. I’m positive that even a successful middle school

student can grasp many of the solutions. If you can understand the solution

of a problem easily after reading it, this shows that you can solve

that problem on your own, without any help. Always keep in mind that

you have to potential to solve any problem in this book, and never give

up thinking.

To be perfectly honest, many of the problems will appear to be difficult

on first sight. It is completely natural for you to fail to find the answers

in some of the questions. What would be unnatural is for you to be able

to get everything right on your first try. Even I, the one who penned this

book, failed on many of the questions in this book when I first came across

them.

In the end, what is important is pondering on the problems and learning

new concepts. This is why I don’t want you to skip directly to the

solutions after reading the problems. Please take a moment to try and

solve the problems on your own. Take one day, or even a whole week

if necessary. Only when you are completely satisfied with your answer,

should you move on to the solution. Never forget that the fun part of

mathematics is not reaching the answer, but the action of thinking beforehand.

This is exactly why I wrote this book in the first place: to get my

readers to think. In my humble opinion, thinking is an action which

pleases. And the foundation of Mathematics is thinking. That is why I

believe that anyone can love mathematics. I would go so far as to say that

anyone will love mathematics, provided they are introduced to it correctly.

This book aims to be a proper introduction. Regardless of your age, or

your interest in mathematics, I am sure that it will give you pleasure to

think on the problems in this book, and you will leave as a better lover of

mathematics.

Without further ado, let us move on to the problems.

Let’s Make a Deal

Matt is a contestant in the latest episode of the popular game show

Let’s Make a Deal, aired every Sunday. In this show, the contestants

return home either with a brand-new sports car, or an adorable little goat.

The charming host of the show, Monty Hall, presents the contestants with

three closed doors. Behind one of the doors lies the sports car, and the

other two each have goat behind them. However, only Monty Hall knows

behind which door the sports car is. Mr. Hall asks Matt to choose his

lucky door. After Matt makes his decision, Mr. Hall goes over to the two

remaining doors and as he does in every episode, opens the one with the

goat behind it. This gets the audience stoked, because now everybody

knows that the car must be either behind Matt’s door, or the other one.

Monty Hall turns to Matt, grinning, and asks:

“Would you like me to open the door that you initially chose, or would

you rather change your choice to the other one?”

Now, even though Matt loves animals, he isn’t one to say no to a brandnew

sports car, so he would very much like to make the right choice. What

should he do to increase his chance of getting the car? Should he insist on

his initial decision or should he choose the other door? Is cunning Monty

Hall trying to walk him into a trap by making him change his mind? Does

it even really matter which door he chooses?

Solution

I would like to touch up on the history behind this problem before

moving on to the solution. This problem was first posed in a letter sent

by Steve Selvin to the scientific journal the American Statistician in 1975.

Monty Hall was the creator and the long-running host of the American

game show Let’s Make a Deal, first aired in 1963. Today, this problem

is known as the Monty Hall problem. It was answered by Marilyn vos

Savant in Parade magazine’s Ask Marilyn column in 1990 and the problem

gained public recognition afterwards. Numerous mathematicians who

claimed vos Savant’s answer was incorrect wrote follow-up letters to the

magazine, some of which were more insulting and derisive rather than

critical. Ultimately, it turned out that vos Savant 1 was right.

Those who opposed Marilyn vos Savant claimed that changing the

choice of door had no effect on the probability of finding the car. They

suggested that since the car was equally likely to be behind one of the two

doors, both doors had an equal probability of 1/2. However, this is an

erroneous way of thinking. Surprising, right? We’re only getting started.

We’re going to run into many more surprises throughout the book.

If the contestant wants their best shot at winning the car, they should

always opt to switch for the other door, as doing so will give them a 2/3

chance of winning. Similarly, sticking with the initial choice results only

in a 1/3 chance of getting the car. Although it is true that the car must

be behind one of the two remaining doors, this does not actually imply

that the car is equally likely to be behind them.

Let’s rewind back to the start of the show to think the problem through.

The contestant is presented with three identical, closed doors.

1 Some sources list Marilyn vos Savant as having the highest recorded intelligent

quotient (IQ) in history.

1 2 3

Without loss of generality, let’s assume that they choose the first door

out of the initial three. There are three different cases, depending on

which door actually contains the car: The car is either behind the first

door, or the second door, or the third door. As it stands, all three of these

cases are equally likely to occur, thus having a probability of 1/3.

In each of these three cases, the game host will go on to open one of

the other two doors to reveal a goat.

As illustrated above, in two of the three cases, switching to the other

door leads to winning the car. In only one of them, switching results

in getting a goat. Hence, contestants who switch have a 2/3 chance of

winning the car, while contestants who stick to their initial choice have

only a 1/3 chance.

Another way of thinking about the solution is as follows: if the contestant

initially chose one of the two doors containing goats, switching to

the other door will win them the car. However, if the initial door is the

one that contains the car, switching will lead to getting a goat. Since the

probability of choosing a goat in the initial decision is 2/3, the contestant

a 2/3 chance of winning, should they switch.

Therefore, Matt should switch to the other door in order to increase

his chance of winning the car from 1/3 to 2/3. Otherwise, he might just

lose out on being the GOAT 2 of the show!

Let’s view the problem from yet another perspective. Matt had a 1/3

chance of choosing the right door on his first try. When Monty Hall opened

one of the other two doors and revealed a goat, no additional information

was actually revealed to Matt, because no matter which door Matt chose,

the host would open one of the doors containing a goat. We already

possess this information before the show even starts, since this is what

Mr. Hall does on every episode. (I have no idea why the audience gets

excited regardless.) Since no new information is revealed, the probability

for Matt to choose the right door on the first try does not change and

remains 1/3. Then, he has a 1/3 chance of winning the car if he doesn’t

switch. Hence, if he switches, his probability of winning is 2/3, from

1 − 1/3.

We could also think about it like this: Initially, there was a 2/3 chance

for the car to be behind one of the two doors that Matt didn’t choose.

When Monty Hall opened one of these doors, this 2/3 chance combined

on the other door. Now, this other door has a 2/3 of being the right one.

Finally, I will alter the question by a bit to give you a more intuitive

sense for the solution of the problem. Let’s think for a second that there

are not three, but a hundred doors to choose from. As before, there is

only one car behind one of the doors, and the other 99 contain adorable

goats. Again, the contestant has the right to choose one door and Monty

Hall then proceeds to open 98 doors with goats in them one by one, and

again the audience goes wild. Would you still insist on the same door

you chose first, or grow suspicious of that single door Monty Hall did not

open? I would bet that almost everyone would choose to switch to the

other door if the problem was originally worded like this. Why would you

insist on your initial choice anyway? There is nothing special about it, it

had a 1/100 chance of containing the car the moment you chose it and it

still has a 1/100 chance after opening the doors. However, if you switch to

the other door, you will now have a 99/100 chance to win the car. That’s

almost a certain win! As you see, when the number of doors is increased,

2 Greatest of all time.

we can intuitively understand that switching doors is a clever decision.

Regardless, the logic still stands even when there are only three doors.

Here’s the important point in this problem: The host, Monty, knows

exactly what is behind each door and always opens the door containing

the goat from the two doors that the contestant didn’t choose. This means

that this event is not random, it is not dependent on chance. 3 The host

always opens the door with a goat. Knowing whether an event is random

or not can seriously change the answer of a problem. We will encounter

this concept several times in the following pages, so you’d better get ready

for it.

By the way, I think I now understand why the audience goes wild

every time Mr. Hall opens a door. These people must be getting paid to

applaud and to overreact!

Additional Problem

Three partners-in-crime Joe, Joey and Johnny (intentionally named so

in order to cause additional confusion) have a huge score from a robbery

they carried out last night. Unfortunately, they are caught soon after in

their hideout and taken to the court of the king. The king’s verdict is

harsh: two of these men are to be executed the next day and only one

will be allowed to survive. Upon hearing this, Joe, being the most fainthearted

of the three, faints. He is taken to a cell to spend the night. Joe

soon breaks down and asks the warden overseeing his cell to tell him one of

the two people set to be executed the following day. The warden tells him

that Johnny would be executed. Joe is ecstatic to hear this, as this would

mean that his and Joey’s chances to live have now increased to 1/2. The

following morning, Joe runs into Joey and tells him of the situation. Joey

being an avid mathematician when he isn’t busy robbing stores, ridicules

Joe and tells him his calculation is all wrong. Dumbfounded, Joe looks

after Joey as he cheerfully walks away.

Why is Joe wrong? What are Joe and Joey’s chances to live, respectively?

This problem was originally published by Martin Gardner in the Mathematical

Games column in Scientific American in 1959. It is known for

its similarity to the Monty Hall problem.

3 Because the probability that Monty’s door contains a goat is 1.

The Bracelets of Stylish Aunt Lily

Aunt Lily is packing up for summer. Since she is particularly fond of

her jewelry, she takes her bracelets with her wherever she goes. Therefore,

she decides to pack one of her jewelry boxes with her. Aunt Lily has three

jewelry boxes and each one contains two bracelets:

The first one contains two silver bracelets.

The second one contains one gold bracelet and one silver bracelet.

The third one contains two gold bracelets.

Aunt Lily randomly takes one of the bracelets from one of her jewelry

boxes and puts it on. She notices that the bracelet she’s put on is made

of gold. She puts the remaining bracelet in her bag, along with the box

it’s in.

During her trip, Aunt Lily’s bag gets stolen. After recovering from

the initial shock, she begins to wonder whether the other bracelet in her

stolen bag was made of silver or gold. (You know how much gold costs

nowadays.)

What is the probability that the other bracelet inside Aunt Lily’s stolen

bag is a gold one? Should she worry?

Note that other than the material they’re made of, all of Aunt Lily’s

bracelets are identical.

Solution

This problem is the same in essence as Bertrand’s box paradox, first

posed by French mathematician in 1989. In the original statement of the

problem, the three boxes contain not two bracelets, but two coins.

The answer to this problem is not 1/2. I will proceed to show why the

answer is not 1/2, by following through with the logic of those who think

it is 1/2. Since the randomly chosen bracelet is gold, the box the bracelet

was taken out of has to be GG or GS. Let’s assume that the probability

of the box being GS is 1/2. However, this is only valid in the case where

the randomly chosen bracelet is gold. If the bracelet was, instead, silver,

the box would have to have been either SS or GS. In this case, following

the same logic, we would have to say that the probability of the box being

GS is again, 1/2. But this would mean that whichever kind the randomly

chosen bracelet is, the probability of it being taken from a GS box is 1/2.

This is obviously false, since we have 1/3 chance of choosing the GS box

in a random selection out of three identical boxes. This shows that the

answer the problem cannot be 1/2. The correct answer is in fact 2/3, that

is, there is a 2/3 chance that Aunt Lily’s stolen bracelet is gold.

Before we move on to explain why the answer is 2/3, let us first name

all of the bracelets. Let the gold bracelets be G 1 , G 2 and G 3 . Similarly,

let the silver bracelets be S 1 , S 2 and S 3 . Suppose these bracelets are

distributed to the boxes in the following way:

GG Box GS Box SS Box

G 1 G 3 S 1

G 2 S 3 S 2

If we didn’t know that the bracelet Aunt Lily chose to put on is gold,

all six of the bracelets would have an equal chance of 1/6 to be the one

she chose.

However, we are given the information that Aunt Lily’s chosen bracelets

is gold. Therefore, it must be one of G 1 , G 2 or G 3 . All three of these

bracelets are equally likely to be the one she put on, so they all have a

1/3 chance.

Hence, the bracelet Aunt Lily put on,

has a 1/3 chance to be G 1 , making G 2 the other bracelet in the box.

has a 1/3 chance to be G 2 , making G 1 the other bracelet in the box.

has a 1/3 chance to be G 3 , making S 3 the other bracelet in the box.

Thus, the other bracelet in the box has a 2/3 chance to be gold, and

a 1/3 chance to be silver. Let’s know take a cleaner look at the solution

with the following figure.

This figure shows six equally likely cases, two for each of the three

different boxes. In each case, the bracelet Aunt Lily chose is a different

one, shown with an arrow pointing to it. In three of these six cases, Aunt

Lily’s chosen bracelet is gold. In two of these three cases, the bracelet is

from the box GG and in one case it is from the box GS. Therefore if Aunt

Lily’s randomly chosen bracelet is gold, the other bracelet in the box has

a 2/3 chance to be also gold.

Now let us take a different approach to the problem. Suppose the

events in the problem occur 600 consecutive times. In this scenario, each

of the three boxes will be chosen, on average, 200 times. Each of the two

bracelets in every boxes will be chosen, on average, 100 times. Since we

are given that the chosen bracelet is gold, we are only interested in 300 of

those cases in which the chosen bracelet is gold. Out of these 300 cases,

200 are when one of the bracelets from the GG box is chosen, and the

other 100 are when the single gold bracelet from the GS box is chosen.

Therefore, in 200 of 300 cases, the other bracelet in the box will be

gold and in the other 100 , it will be silver. This means that there is a

2/3 chance for the stolen bracelet to be gold. Aunt Lily has a right to be

concerned. It is twice as likely for one of her gold bracelets to be stolen!

Finally, let’s try to get a more intuitive understanding of the solution.

Since the chosen bracelet is gold, the box it came out of has to be either

GG or GS. This is easy to grasp. But the error arises when we falsely

think that these two cases are equally likely. When the GG box is chosen,

it is certain that one of the bracelets taken out of it randomly will be gold.

However, when the GS box is chosen, there is only a 1/2 chance that a

random selection out of it will yield a gold bracelet. That is, only half

of the cases where a bracelet taken out of the GS box will yield a gold

bracelet. It is twice as likely for a bracelet coming out of the GG box to

be gold than the GS box. When we get this straight, we can understand

why the randomly chosen bracelet is more likely (to be precise, twice as

likely) to come from the GG box than the GS box. In short, the other

bracelet in the box has a 2/3 chance to be gold.

Additional Problem

The traditional camel wrestling tournament carried out each year in

Çanakkale is once again met with great anticipation from enthusiasts.

The tournament has been home to amazing spectacles and fierce camels

fought tooth and nail for the title. The champion of the tournament will

be determined with a final play-off between the three strongest camels

left in the tournament. These three camels are not equal in strength and

in head-to-head matches, the stronger camel always defeats its opponent.

Out of these three camels, every pair will fight each other once. The

audience is in place and the camels are ready for the fight. In the first

match, Manfred the Gobbler and Tsunami Jeremy face each other and

Manfred is the victor. In the second match, Tsunami Jeremy will face

Hurricane Harry. What is the probability of Hurricane Harry beating

Tsunami Jeremy?

Double Birthday

Have you ever met someone with the same birthday as you? Or, do

you have any friends that have the same birthday with you? If you are

a student, do you think you have classmates you share exactly the same

birthday with? In this problem, the probability for two individuals to

share the same birthday is examined.

In order to have a higher than 50 percent chance of having two different

individuals with the same birthday in a classroom, what should be the

smallest number of people in the classroom?

First of all, I want you to guess the result of this problem. Following

that, try to solve the problem by calculation and compare your result with

your initial guess.

Assume that there is no one born on February 29 and that there are

365 days in a year. An individual is equally likely to be born on any day

of the year. Also, two people sharing the same birthday doesn’t indicate

that they were born exactly on the same date. Having the same birthday

implies that two people were born on the same day of the same month.

Two individuals who were born on the 18 th day of June, for example.

Solution

What is your guess? Around 200? Or something like 150? Maybe as

low as 100?

To have more than 50 percent probability of two individuals sharing

the same birthday in a classroom, there must be at least 23 people. If there

are 57 people, the probability increases up to 99 percent. In other words,

it is almost a certainty to find a pair of people sharing the same birthday

in a group of 57! How can this be possible? How is the probability so high

even with this few people? Let’s explain with mathematics.

This problem can be solved via two different methods. The first one

is to directly calculate the probability for at least two people to share the

same birthday. However, this method is impractical and takes too much

time. One will realize that with this method, increasing the number of

people in the classroom will lead to an incredibly steep increase in the

number of cases that must be considered.

The second method is to calculate the probability of each person in

the class to be born on a different day and to subtract that value the

total probability, namely 1. In this we, the probability of at least two

people having the same birthday can be found. This method, which is

applied by subtracting the probability of the undesired cases from all cases,

is practical in many probability problems, because this approach takes

considerably less time in comparison with the direct calculation method.

We should first calculate the probability of each person to be born on

a day different than the rest of the group, starting from the first person

up to the n th one so that no two people in the classroom has identical

birthdays:

First person might be born on any day of the year: 365/365, that is, 1.

The probability of the second person to be born on a day different than the first one: 364/365.

The probability of the third person to be born on a day different than the first two: 363/365.

The probability of the fourth person to be born on a day different than the first three: 362/365.

.

The probability of the n th person to be born on a day different than the former ones: (366 − n)/365.

The probabilities of all the cases should be multiplied in order to calculate

the probability of having a group of n people with distinct birthdays:

P ( having n people with distinct birthdays) = 365

365 ×364

365 ×363 365

×· · ·×366

− n

365

This calculation can be expressed easily by using factorial notation. 1

P (having n people with distinct birthdays) =

365!

365 n · (365 − n)!

The probability of having at least two people with identical birthdays

can be calculated by subtracting this expression from the total probability,

namely 1 . Assume P n is the probability of having at least two people born

on the same day in a class of n people.

P n = 1 −

365!

365 n · (365 − n)!

This formula gives the probability of having people that share the

same birthday in a group of n. Now, to be able to solve the problem, the

number n at which the probability exceeds 50% for the first time should

be determined.

After several tries plugging in numbers to the calculator, we find that

n is equal to 23.

P 23 = 1 −

365!

365 23 · 342! ≈ 50.73%

Moreover, the probability exceeds 99% if there are 57 people in the

class and it reaches 99.9% with a class size of 77 people.

You may think that there must be more than half of 365 people so

that the probability reaches 50%Ṫhe reason for this misconception is a

misunderstanding of the problem. Most people seem to think this is what

is asked in the problem: “What should be the smallest number of people

in a classroom so that the probability of having someone with the same

birthday as the class president exceeds 50% ′′ If the problem were asked

like this, there would have to be more than half of 365 people in the class

1 The factorial operation is written as n!, which indicates the multiplication of all

integers from 1 up to n.

in order to obtain a probability higher than 50% . In such a class, the

probability for each person to not share the same birthday with the class

president would be 364

365

. In that case, if there were at least 253 people

in the classroom, in addition the president, the probability of sharing a

birthday with the class president would be higher than 50% .

1 −

( ) 253 364

≈ 50.05%

365

However, this is not what’s being asked. In this problem, we are concerned

with the probability of having any two people with the same birthday.

There are numerous pairs in the class one can choose from. For

example, the total number of different pairs which can be chosen from a

group of 23 people is ( )

23

2 , namely 253. It is sufficient that any of these 253

pairs share a birthday . Therefore, knowing that there are 253 pairs even

in a small class of 23 people, it is normal that the probability of having a

pair with the same birthday exceeds 50% .

In Turkey, the average class size is 29. The probability of finding a

pair with identical birthday is around 68% in a class of 29 people.

P 29 = 1 −

365!

365 29 · (336)! ≈ 68%

However, this calculation was on the assumption that a person is

equally likely to be born on any of the 365 days of the year. In real

life, the birth rate in summer is higher than it is in winter. Therefore, the

probability of finding a pair with identical birthdays will be higher than

68% in a class of 29 people. In other words, it is very likely to find a pair

with the same birthday in an average class in Turkey. If you are a student,

you may try to find your own birthday twin in your own classroom!

The solution of this problem is nothing extraordinary. It’s only a bit

against the common perception of people. Intuition can easily mislead,

and people are usually naturally not very good at predicting the probabilities.

Therefore, most of the time, initial estimates in such probability

problems turn out in the end to be seriously wrong. You may have had

the same thing happen in the first three problems, and this will likely

happen very often in the later ones. Therefore, one should think twice

about answers of the next questions. But don’t get upset if your answer

is wrong. Being wrong means one is in pursuit of the correct answer.

As a matter of fact, even successful mathematicians are sometimes

mistaken about questions of probability. To illustrate, Paul Erdős, one of

the greatest mathematicians of the 20 th century, was wrong in his solution

of the Monty Hall problem, the first problem in this book. Erdős had

thought in his solution that switching the door had no advantage. Moreover,

Erdős denied the correct answer even after having been explained

the right solution. As one can see, questions of probability may confuse

even the greatest mathematicians like Erdős.

If we don’t want to make mistakes while solving probability problems,

we must support our intuition about the result with sound mathematics.

Additional Problem

Allison, who loves partying, finds out that there is a huge party in his

neighborhood. Only one person will be exempt from the entrance fee of

the party. This one lucky person will be selected as such:

On the day of the party, people will line up in a row in front of the

entrance. The birthday of each person coming will be noted down one

by one. The first person, who has the same birthday with anyone already

inside, will enter the party for free. Anyone after this person will be subject

to the entrance fee. Where should Allison, who is as stingy as much as

he loves parties, stay in the line so that he maximizes his probability of

getting a free entrance?

Assume that people are lined up randomly and there are no cheaters.

(For instance, there is no cheating-Andy showing up with his twin.)

Either Boy or Girl

Tim, Mike, Bob and George are close friends from high school. It’s

now been years since they’ve graduated and now they are all fathers. Each

of them have two children.

1) If we know that Tim has a daughter, what is the probability that

he also has a son?

2) If Mike has a daughter who was born a Tuesday, what is the probability

that he also has a son?

3) If Bob says his favorite child is his daughter, what is the probability

that his less favorite child is male?

4) If we know that George has a daughter born during a full moon,

what is the probability that George also has a son? (A full moon occurs

roughly every 29.5 days.)

Remark: Every father with two children secretly has a favorite child.

Also, no father discriminates by gender.

Another remark: One child being born on a Tuesday and another

being born on a Tuesday are independent events. Similarly, one child

being born during a full moon and another being born during a full moon

are independent events.

Solution

It might be hard to accept the solutions to these highly confusing four

questions, even after you read through them. Since all four of them are

so similar in nature, most people, at first glance, think that they are just

asking the same question. However, there are small but very important

details that differentiate them. We will first acknowledge those details. In

probability theory, we might end up making mistakes if we don’t pay close

attention to the way a question is worded, and neglecting provided details

often lead to erroneous solutions. For instance, in the four problems above,

failing to take into consideration the extra information provided and the

way in which that information is obtained, we may come to a false solution.

You will soon see what I mean with "the way in which the information

is obtained". Before starting with the correct solution, let’s first handle

the incorrect answer that most people initially come up with. Yes, if you

thought that the answer to all four of these questions is 1/2, then like

most people, you are wrong. Actually, 1/2 is the answer to only one of

the questions. This might be completely against what you have learned

up to now, and your sense of intuition. But don’t be too quick to argue.

I assure you will be convinced if you follow through the entire solution,

without any bias.

The first of these four questions is an infamous question known as the

Boy or Girl paradox, first posed by Martin Gardner in 1959. The other

three are only variants of the first. In fact, you too, can create any new

variant to your liking.

Let’s start with the solution to the first question, since it is the key to

all others. There are four cases for Tim having two children:

MM, where both children are male,

MF , where the older child is male and the younger child is female,

F M, where the older child is female and the younger child is male,

F F , where both children are female.

Out of these four cases, all have an equal 1/4 chance to have occurred.

However, since we are given that Tim has a daughter, we can remove the

case where both children are male, MM, from the set of possible cases.

Thus, there are only 3 possible cases left:

{MF, F M, F F }

For Tim’s two children, these three cases are equally likely to have

occurred. Tim has a son in two of these three cases, so the probability

that Tim has a son is 2/3. We can see this in the figure below.

This is the short explanation for the first question. But I sense that

you are not satisfied yet. Most people object to this solution with the

following arguments: Why do we consider the cases MF and F M as

different? Don’t they both mean that one child is male and the other is

female? Does it really matter which one is older? It does matter.

The simplest way to explain this is to turn the problem into one of

heads or tails. Let’s say instead of making two children, we are flipping

a coin two times. (This will save us a lot of time.) In this experiment,

getting a tail will correspond to having a son and getting a head will

correspond to having a daughter. With this analogy, the equivalent to our

problem becomes:

A coin is flipped twice. If we know that the coin landed on heads at

least once, what is the probability that the coin landed on tails once?

You have probably run into many questions similar to this in your

math courses. In order to solve this problem, we need to consider the

following four cases, each of which have an equal 1/4 chance to occur:

{HH, HT, T H, T T }

The HT and T H cases are different from each other. In the HT case,

the first flip has resulted in a head and the second in a tail. The chance of

getting a head on the first flip is 1/2, and so is the chance of getting a tail

on the second flip, thus making the probability of the HT case occurring

1/4. In the T H case, the first flip has resulted in a tail and the second

flip has resulted in a head. Similarly, the chance of T H occurring is 1/2 ·

1/2, 1/4. Thanks to this analogy, we can see why the MF and F M cases

in our original problem are separate cases both having 1/4 probability to

occur.

Let’s go on with the coin flip analogy. If we know that at least one

head turned up, the T T case is impossible to have occurred. The possible

cases are HH, HT or T H. Since these three are equally likely events, we

conclude that the probability of getting a tail in one of the two flips is

2/3. We came to a similar conclusion in our original question, with the

result being again 2/3.

I would like to demonstrate a second way of solving the problem. Think

of 1000 families that have two children, like Tim’s. On average, 250 of

these families will have two daughters, 500 will have one daughter and one

son, and 250 will have two sons.

The number of families that have at least one daughter is 750. Out of

these 750, 500 also have a son. Therefore, the probability for a family that

has two children, one of them being a daughter, to have a son is 500/750,

that is, 2/3.

It is beneficial to note that if the information given in the question

stated that the older child was female and asked us the probability of the

younger one being male, then the answer would indeed be 1/2, as the

events in this question are independent. As I’m sure everyone knows, the

gender of one child has no influence on the gender of the next child to be

born, and the probability of each gender is 1/2 for both children. However,

in our problem, the situation is a little different. Because in our problem,

we are given that out of Tim’s two children, at least one is male. This

male child can be the younger or the older of the children. This is where

the difference lies.

By the way, you can try the coin flip experiment I told you about by

yourself. Just flip a coin twice consecutively and note down the results.

Realize this experiment at least 15 times. You will see that the number

of times you get one head and one tail is twice as frequently as you get

two heads. This means that if you get a head at least once in two coin

flips, the probability of you getting a tail in those two flips is 2/3. With

this experiment, we simulate the problem at hand and see that the correct

answer is, indeed, 2/3.

Let’s move on to the second question. Is there a difference between

knowing that Mike has a daughter, and that Mike has a daughter born

on a Tuesday? That is, does knowing on which day Mike’s daughter was

born change anything? This child had to be born on some day of the week.

She could have instead been born on a Wednesday or Sunday. So can we

suppose that knowing she was born on a Tuesday is of no importance? If

so, is the answer to this question also 2/3 like the first one?

No, the answer to this question is not 2/3. Let’s see why.

Let’s start by specifying the notation that will be used in the solution.

Let M denote male and F denote female like before. Let the numbers 1

through 7 denote which day of the week a child is born on, as the subscript

of the letters M or F . As such, Monday will be denoted by 1 and Sunday

will be denoted by 7. For instance, a girl born on a Tuesday will be

denoted by F 2 .

As in before the previous question, let’s determine the cases that meet

the given criteria. In the previous question, one child had to be female

and we went over the F M, MF, F F cases that applied. In this question,

the criterion is to have a girl born on a Tuesday. Therefore, the cases in

which one of Mike’s children is a F 2 will be suitable. Let us make a table

of all the cases that meet our criteria.

MF F M F F

M 1 F 2 F 2 M 1 F 2 F 1 F 1 F 2

M 2 F 2 F 2 M 2 F 2 F 2 F 2 F 2

M 3 F 2 F 2 M 3 F 2 F 3 F 3 F 2

M 4 F 2 F 2 M 4 F 2 F 4 F 4 F 2

M 5 F 2 F 2 M 5 F 2 F 5 F 5 F 2

M 6 F 2 F 2 M 6 F 2 F 6 F 6 F 2

M 7 F 2 F 2 M 7 F 2 F 7 F 7 F 2

If you’ve noticed, the F 2 F 2 case in this table, denoting two daughters

both born on Tuesdays, has been accounted for twice. We need to delete

one of them off the table.

MF F M F F

M 1 F 2 F 2 M 1 F 2 F 1 F 1 F 2

M 2 F 2 F 2 M 2 F 2 F 2

M 3 F 2 F 2 M 3 F 2 F 3 F 3 F 2

M 4 F 2 F 2 M 4 F 2 F 4 F 4 F 2

M 5 F 2 F 2 M 5 F 2 F 5 F 5 F 2

M 6 F 2 F 2 M 6 F 2 F 6 F 6 F 2

M 7 F 2 F 2 M 7 F 2 F 7 F 7 F 2

For each of these 27 cases, there is at least one daughter born on a

Tuesday, so any of these could be the denotation of Mike’s two children.

But only one of these is the actual denotation of Mike’s children. Then

what is the chance for each of these cases to occur? Indeed, all 27 of these

cases are equally likely to occur. Because in each case, both the gender

and the day of the week are specified exactly twice.

In fourteen of these equally likely 27 cases, Mike has one son and one

daughter. Hence, the probability of Mike having a son is 14/27. This

probability is roughly equal to 51.85%, slightly over 1/2.

Let’s now move on to the third question. The first question taught us

that we need to be careful in considering the information provided to us

in dealing with questions of probability. The second one showed us that

information appearing to be insignificant can actually change the result

of the problem. The question we will be dealing now will show us that

we need to take into consideration the way information is obtained in

calculating the probability of an event.

In the third question, we are told that Bob has two children and his

favorite child is female, and asked for the probability of the second child

being male. In saying that his favorite child is female, Bob has actually

given us the information that he has a daughter. Isn’t this question equivalent

to the first one, then? In the first question, we were told that Tim,

who has two children, had a daughter. It looks as though these questions

are asking the same thing, right? Let’s get involved and see.

Let’s think of 1000 families with two children, as we did in the first

question. On average, 250 of these families will have two sons. Since we’re

interested about those that have two daughters, like Bob’s family, we can

eliminate these 250. Out of the remaining 750, 500 have one daughter and

one son and 250 have two daughters. In the families with two daughters,

the father’s favorite child is clearly female, since there is no male child to

favor in question. Now comes the critical point in the solution. In the

500 families with one daughter and one son, the favorite child has a 1/2

chance to be female and a 1/2 chance to be male. Thus, in those families

with one daughter and one son, the daughter is the favorite child in only

half of them. That is, 250 of such families favor their daughter over their

son. Hence, there are 500 families in total, where the favorite child is

female. 250 of these families have two daughters, and the other 250 have

one daughter and one son.

This means that in families with two children where the favorite child is

a daughter, the chance for the second favorite child to be male is 250/500,

that is, 1/2. Bob’s family is one such family, so the chance for Bob’s

second favorite child to be male is also 1/2. Meaning, the probability of

Bob having a son is 1/2.

As I told you in the beginning, in questions of probability such as this

one, the way an information is obtained is of critical importance and small

details can change the solution drastically. If we were to ask Bob, "Do

you have a daughter?" and were told "yes", then the probability of Bob’s

second child being a son would be 2/3.

However, when we ask Bob "What is the gender of your favorite child?"

and are told "female", the probability of Bob’s second child being male

is 1/2. This question is in fact equivalent to asking "Can you randomly

state the gender of one of your children?". Notice that when we state

the question like this, we leave a bit more randomness in it, because if

this question were to be asked to a father with a son and a daughter, he

could have answered saying his favorite child is his son. In that case, we

wouldn’t have known for certain if he had a daughter. When a father

with one daughter and one son is asked whether he has a daughter, he will

always respond yes. But only half of such fathers will state their daughter

as their favorite child. In our question, since Bob states that his favorite

child is female, the probability of Bob having two daughters is slightly

higher than that in the first question. That is why in this question, the

probability of Bob’s second child being male, 1/2, is slightly lower than

that in the first question, 2/3.

Thus we have answered the first three questions in the problem. Now

it’s time to move on to the fourth. However, I will not be solving this question

now, rather during the solution of the seventh problem in the book,

because the solution to this question requires another concept called conditional

probability. Actually, all four of the questions in this problem are

questions of conditional probability and can be solved using the conditional

probability formula.

If you are already familiar with the concept of conditional probability,

you might go ahead and try to solve it with the conditional probability

formula. You can reach a solution by taking a very similar approach to

the one we took in the second question. I encourage you to try to solve it

now using your preferred method. If you fail, I ask you to remain patient

until the seventh problem. You will be able to see everything clearly from

the on. After we’ve learned of conditional probability, I will ask you to

come back to this point in the book and solve the four questions in this

problem using the conditional probability concept.

Tennis Players Waiting for Retirement

Like many sports, tennis takes a toll on a player’s body. As tennis

players grow older, their performances start to decline and their probability

of deciding to retire increases. In the past, most tennis players chose

to retire as early as when they were 30 years old. However, nowadays

with the developments in technology, especially in sports medicine, they

are able to keep their body in shape to play tennis for longer. Thus, the

age of retirement has gone up. A survey company has lead an extensive

research on this topic to determine just what point the age of retirement

has gone up to. Thanks to this research, we now know the probability of

a tennis player to still be playing professional tennis after a certain age.

Here are the exact results of the research: A player who is exactly 31

years old has 75% chance to still be playing tennis after 2 years and 45%

after 3 years. A player who is exactly 32 years old has a 50% chance to

still be playing tennis after 2 years. Lastly, a player who is exactly 33

years old has a 20% chance to be playing tennis after 2 years.

Cesar and Moses are two professional tennis players. Cesar is exactly

32 years old and Moses is exactly 34 years old. What is the probability

that after one year, Moses is still playing tennis and Cesar has retired?

Solution

Let us first fix a notation to be used throughout the solution: let

P (n, m) be the probability that a tennis player who is currently n years

old to be playing tennis when they are m years old. Then, the information

we possess can be represented as:

P (31, 33) = 3/4

P (31, 34) = 9/20

P (32, 34) = 1/2

P (33, 35) = 1/5

Let’s make sense of the information at hand before starting off. Consider

for instance the expression P (31, 33). The probability of a 31-yearold

player to be playing tennis after two years is 3/4. This is equivalent

to saying that a 31-year-old tennis player has a 1/4 chance to retire in

two years. Meaning, the value 1 − P (n, m), actually corresponds to the

probability that a n-year-old player retires before the age of m.

Let’s start with the solution. We have some probability rates pertaining

to tennis players between the ages of 31 and 35. We would like to

obtain two probability rates that are not given. Firstly, we would like to

find the probability that a 32-year-old player is still playing when they are

33 years old, that is, P (32, 33). Next, we would like to find the probability

that a 34-year-old player is not playing tennis when they hit 35, that is,

the value 1 − P (34, 35).

For a 31-year-old player to still be playing tennis when they are 34,

they need to first be playing tennis when they are 33, then they need to

still be playing when they are 34. This means that P (31, 34) actually

corresponds exactly to P (31, 33) and P (33, 34) occurring consecutively.

From this perspective, we can clearly see that the value of P (31, 34) will

be lower than that of P (31, 33). This already makes sense intuitively, since

a player always has a chance to retire between the ages of 33 and 34. Now,

since we know the values of P (31, 34) and P (31, 33), we can obtain the

value to P (33, 34):

P (31, 34) = P (31, 33) · P (33, 34)

9/20 = 3/4 · P (33, 34)

P (33, 34) = 3/5

Therefore, the probability of a 33-year-old player to be playing tennis

at 34 is 3/5. We have found the value for P (33, 34). Recall that we are

already given the value of P (32, 34), meaning, the chance of a 32-yearold

player to be playing tennis when they are 34. Thus, by following the

same logic as before, we can find the probability P (32, 33), that is, the

probability that a 32-year-old player is still playing at the age of 33.

P (32, 34) = P (32, 33) · P (33, 34)

1/2 = P (32, 33) · 3/5

P (32, 33) = 5/6

This means that a 32-year-old player has a 5/6 chance to be playing

tennis when they are 33 years old. If you recall, this is actually one

of the probabilities the question asked us to find. Now we’ll go after

P (34, 35), the other value we are asked to find. Just above, we have

already calculated the probability for a 33-year-old player to be playing

at age 34. Combining this with the value P (33, 35), which we are already

given, we can calculate the probability of a 34-year-old player to be playing

at their 35.

P (33, 35) = P (33, 34) · P (34, 35)

1/5 = 3/5 · P (34, 35)

P (34, 35) = 1/3

Hence we see that a 34-year-old player has a 1/3 chance to be playing

tennis a year later. We can rephrase this as follows: a 34-year-old player

will retire in the next year with a 2/3 chance.

To sum up, we have found the values to P (32, 33), P (33, 34), P (34, 35)

with the information we were given in the problem. These expressions

equate, respectively, to 5/6, 3/5 and 1/3. If we subtract these values from

1, we obtain the probability for these tennis players to not be playing

tennis in the next year, that is, the probability that they retire during the

next year. In short, the probabilities that 32-, 33- and 34-year-old players

retire in the next year are 1/6 , 2/5 and 2/3. These results go along with

what we already know, since as a tennis player gets older, their probability

of retiring within the next year increases.

Additionally, we can calculate the value to P (31, 32) with what we

have, even though it is of no value to the problem at hand. However, if

you go ahead and calculate its value, you will find that it evaluates to

9/10.

Anyway, since we have already found all of the probabilities of worth

to the problem at hand, we only need to go over what is asked in the

problem. We are asked to find the probability that 32-year-old Cesar

retires within the next year, and the probability that 34-year-old Moses

still plays after one year. Since Cesar is 32 years old, his probability of

retiring in the following year is shown by the expression 1 − P (32, 33).

Similarly, since Moses is 34 years old, his probability of still playing after

one year is expressed by P (34, 35). Thus, to find the probability that these

two independent events both occur, we are to multiply their values:

P (Cesar retires after one year, Moses still plays) = (1 − P (32, 33)) · P (34, 35)

= 1/6 · 1/3

= 1/18

Hence, the probability that Moses is still playing and that Cesar has

retired a year later is 1/18.

Waiting For The Yellow-throated Sparrow

Bird watcher O’Neill goes to Cape May, New Jersey in order to watch

rare bird species. O’Neill, who usually takes a whole day to watch a single

bird species, aims to take a picture of the elusive yellow-throated sparrow

today. He is willing to spend up to 6 hours on the lookout for this bird.

The probability for the yellow-throated sparrow to show up within 6 hours

is 27.1%. What is the probability then, that O’Neill, who has already set

up his equipment and started waiting, sees this bird within the first two

hours of waiting?

Note: For a yellow-throated sparrow to show up means that at least

one individual of that species visits the watching area at least once. Also,

the probability of this bird to be seen is constant per unit time.

Solution

It does not seem likely that bird watcher O’Neill sees the yellowthroated

sparrow within two hours. But how likely exactly is it? Let

us calculate.

The information that the probability of seeing the yellow-throated

sparrow is equal per unit time is of great importance. In other words,

the probability of this bird to be inside the area watched by O’Neill at

any time is constant. If this weren’t the case, the given information would

not be sufficient to solve the problem.

Let P (n) denote the probability of seeing the bird within n hours. We

are given that P (6) = 27.1% and we are asked the value of P (2). In other

words, given that P (6) = 27.1%, we need to find the probability of seeing

the bird within two hours.

Knowing that the probability of seeing the bird is constant per unit

time, we can divide 6 hours into equal time periods in which the probabilities

of seeing the bird are equivalent. Since P (2) is asked in the problem,

we should find the value of P (6) in terms of P (2) .

The probability of not seeing the bird in 2 hours is 1 − P (2) . The

probability of not seeing the bird in the 2 hours following these first 2

hours is, again, 1 − P (2) . Then, we can say that the probability of not

seeing the bird within 4 hours is (1 − P (2)) 2 . Similarly, probability of not

seeing the bird in the 2 hours following the first 4 hours is 1−P (2) . Then

again, the probability of not seeing the bird within 6 hours is (1 − P (2)) 3

. If we subtract this from 1, we will obtain the probability of seeing the

bird within 6 hours, that is, P (6) .

P (6) = (1 − P (2)) 3

The value of P (6) is given as 27.1% in the problem . Let us plug this

value in for P (6) :

271

= 1 − (1 − P (2))3

1000

(1 − P (2)) 3 = 729

1000

After taking the cubic root of both sides of the equation, we get

√

√

3

729

(1 − P (2))3 = 3 1000

1 − P (2) = 9

10

P (2) = 1

10

As a result, the probability of seeing the yellow-throated sparrow in

two hours is 10% . That is to say that the wildlife photographer O’Neill,

will most probably have to wait longer than 2 hours. He definitely should

wait, though. I doubt he will get another chance to spot a yellow-throated

sparrow.

There is actually another method to solve this problem. To be able to

understand this method, you must have a good command of two concepts:

probability mass function and Poisson distribution. If you are not familiar

with these concepts, you may go on and ignore this method. Or, you may

come back and read this part after making a little research on these topics.

First, we should consider the number of appearances of the yellowthroated

sparrow as a random variable 1 . This random variable obeys a

Poisson distribution. But what is the Poisson distribution? You should

have a notion of Binomial distribution to be able to understand Poisson

distribution. Because Poisson distribution is indeed a special case of Binomial

distribution. Assume that we are to carry out n experiments and

each experiment has a p probability to succeed and 1 − p probability to

fail. In that case, the total number of successful experiments obeys Binomial

distribution. If we denote total number of successful experiments as

X, the probability mass function of X is as follows:

P (X = k) =

( n

k)

p k (1 − p) n−k

Let us consider a very special case of Binomial distribution. Suppose

that the total number of experiments, n, tends to infinity whereas the

average total number of successful experiments, np, remains constant.

1 Random variables may assume different values depending on need. In general,

random variables are denoted with capital letters whereas the specific values that random

variables take on are denoted with lower-case letters. For example, the expression

X = x means that the random variable X is equal to the specific value x . The distribution,

which shows for each random variable the probability of that random variable

being equal to a specific value, is called "probability distribution".

Also, let λ substitute np. Then, the probability of each experiment being

successful, p, tends to zero. We may also write λ n

instead of p. In that case,

the Binomial probability mass function should be expressed by taking the

following limit: ( )

( ) k (

n

lim P (X = k) = lim p k (1−p) n−k n! λ

= lim

1 − λ ) n−k

n→∞ n→∞ k

n→∞ (n − k)!k! n n

Let’s factor the multiplicative constants out of the limit and take the

limits separately after dividing the expression in two.

(

λk

lim P (X = k) =

n→∞ k! · lim n!

n→∞ (n − k)!n k · lim 1 − λ ) n−k

n→∞ n

If we take the limits of these two terms respectively,

λk

lim P (X = k) =

n→∞ k! · 1 · e−λ

While calculating e −λ , we used the equality of lim

If we multiply the terms,

lim P (X = k) = e−λ λ k

n→∞ k!

n→∞

(

1 −

a

n) n

= e −a .

This probability mass function we have just obtained is known as the

probability mass function of Poisson distribution. Poisson distribution

gives us the probability distribution of the total number of successful experiments

for processes that contain numerous independent trials, each of

which is very unlikely to succeed. For example, in a hotel, the total number

of calls from clients in an hour obeys Poisson distribution. Because

there are many seconds 2 in an hour, the probability for the hotel phone to

ring in any of those seconds is very low. Furthermore, it can be assumed

that all of the calls are independent of each other. Therefore, it makes

sense to use the Poisson distribution to model total this example.

In this problem, the probability of yellow-thorated sparrow to be seen

in any second is extremely low. However, there are a lot of seconds in

six hours, and the probabilities for each second are totally independent of

each other. Hence, if we denote the total number of appearances as X,

then X will obey Poisson distribution. Let λ denote the average number

2 I used "seconds" to illustrate this point with a concrete example. Normally, the

time unit should tend infinitesimally to zero.

of appearances of the bird in six hours. Then, the probability of the bird

not to appear in six hours can be calculated as follows:

P (X = 0) = e−λ λ 0

= e −λ

0!

If we subtract the probability of the bird not to appear from 1, we find

the probability for the bird to appear at least once in six hours on the

field:

P (The probability of the bird to appear in six hours) = 1−P (X = 0) = 1−e −λ

We are already given that this probability is 27.1% . So, we can find

the value of λ in the equation.

1 − e −λ = 0.271

e −λ = 0.729

λ = − ln 0.729

On average, the total number of appearances of the yellow-throated

sparrow in six hours is − ln 0.729. Then, the average number of appearances

of the bird in two hours is

− ln 0.729

3

. If we denote the total number

of appearances in two hours as Y , Y obeys the Poisson distribution as

well. We want to find the probability for the bird to appear at least once

in two hours. Then, we should calculate the value of 1 − P (Y = 0) .

P (The probability for the sparrow to appear in two hours) = 1 − P (Y = 0)

ln 0.729

= 1 − e 3

= 1 − 3√ 0.729

= 1 − 0.9 = 0.1

As we can see, the probability for the bird to be seen in two hours

turns out again to be 1 10 .

Tag Day in a Kindergarten

A new kindergarten is in town, using the recently in-trend Montessori

method of education. Even though it has just been opened, it has already

drawn the attention of many families with small children. Only a week

ago, only 4 boys and a certain number of girls were enrolled. Yesterday, a

new kid was enrolled, and even made friends with the other kids.

Today is Tuesday, also known as tag day. Tag is the favorite game

among the kids in the kindergarten. The teacher has let the children out

into the schoolyard so that they can play tag. Then, one of the boys is

selected randomly to be "it". What is the probability, then, that the last

kid to be enrolled to the kindergarten was a boy?

Solution

If we were asked the probability that the new kid was a boy, or a girl

for that matter, without being given any additional information, then the

answer would clearly have been 1/2 for both. However, the problem is not

that simple. We are told that the randomly selected kid to be "it", is a

boy. Then, what we are actually asked can be expressed as follows: If a

random selection out of a group which contains the new kid results in a

boy being chosen, what is the probability that the new kid is a boy?

I would like you to first limit the answer to a reasonably small interval

using your intuition. Since the randomly chosen child is a boy, shouldn’t

it be more likely for the new kid to be a boy than a girl? Let’s extend

this idea as such to see it more clearly: Suppose we make not one, but

ten random selections out of the group and each time, the selection turns

out to be a boy. In such a case, doesn’t it make more sense for the newlyenrolled

kid to also be a boy? That is, since the randomly chosen child

is a boy, we could intuitively say that the chance for the new kid to be a

boy is higher than 1/2.

Let’s take a look at another example to see why the probability of

being a boy is higher than the probability of being a girl. First assume

that there is a certain number of girls and no boys in the kindergarten. If

the selection for "it" results in a boy after the arrival of the new kid, then

it is certain that the new kid is a boy. In other words, the probability of

the new kid to be a boy has increased from 1/2 to 1.

Now, having been convinced that knowing the randomly chosen kid is

a boy increases the probability of the new kid being a boy, let’s continue

on with the solution. We now expect to find a result higher than 1/2.

The event that the randomly chosen "it" is a boy can occur in two

different ways. Either the new kid is a girl and "it" is a boy, or the new

kid is a boy and "it" is also a boy. The probabilities for these two cases

are distinct and their sum gives us the probability of the new kid being a

boy:

P ("it" is a boy) = P (new kid is a boy ∩ "it" is a boy) + P (new kid is a girl ∩ "it" is a boy)

Before the new kid is enrolled, we know that there are exactly 4 boys

in the kindergarten. The number of girls is unknown. We could assume,

for a reasonable guess, that there are 4 girls as well. But instead, in order

to solve the problem in a generalized fashion, let’s suppose there are k

girls. Actually, this choice for k does not even matter, as you will see in a

second. So, the total number of children in the kindergarten will be k+5

after the arrival of the new kid, and the probability for "it" to be a boy

will be Numberofboys

k+5

. If we had no further information, the probability for

the new kid to be a boy or a girl would both be 1/2. Thus, the probability

of the new kid being a boy and "it" being a boy is expressed by:

P (new kid is a boy ∩ "it" is a boy) = 1 2 ·

5

k + 5

and the probability of the new kid being a girl and "it" being a boy is

expressed by:

P (new kid is a girl ∩ "it" is a boy) = 1 2 ·

4

k + 5

The sum of these two values gives us the probability of "it" being a

boy. Therefore,

P ("it" is a boy) = 1 2 ·

5

k + 5 + 1 2 · 4

k + 5

If we use the distributive property, this expression becomes

P ("it" is a boy) = 1 2 ·

9

k + 5

Thus we have found the probability that "it", chosen randomly among

all of the children in the kindergarten, is a boy. This probability is the

sum of the probabilities of two cases, these being the cases where the new

kid is a girl and where the new kid is a boy. We are already given the

information that "it" is a boy, and are asked for the probability of the

new kid to be a boy. Then, what we need to do is to find the proportion

of the cases where the new kid is a boy, among the cases where "it" is a

boy. In other words, we need to divide the number of cases where both

"it" and the new kid is a boy, by the number of cases where "it" is a boy.

In doing so, since we are given that "it" is a boy, we will have found the

probability that the new kid is a boy. That is great news, because this is

the answer we are after!

P (new kid is a boy | "it" is a boy) =

=

P (new kid is a boy ∩ "it" is a boy)

P ("it" is a boy)

1

2 · 5

k+5

1

2 · 9

k+5

Notice that the above expression can be simplified by cancelling out

the k + 5 terms in the numerator and the denominator cancel out. Thus,

it turns out that the value of k, that is, the initial number of girls has no

effect on the result. If we had handwavily assumed that there are 4 girls

just because there were 4 boys, we would have actually ended up with the

same result. This is the remaining expression after simplification:

P(new kid is a boy| "it" is a boy) = 5 9

In English, this probability says that knowing the randomly chosen

child out of all the children is a boy, there is a 5/9 probability that the

newly-enrolled child is a boy. Or, we could say that in 5 out of 9 cases

where the randomly selected kid is a boy, the new kid is expected to also

be a boy.

This problem is actually a problem of conditional probability and the

method we have used is nothing other than the conditional probability

formula. Let us then talk a little about conditional probability. The

conditional probability of event A given B indicates the probability of

event A occurring, given that event B has occurred. In our problem,

event B corresponds to "it" being a boy, and event A corresponds to the

new kid being a boy. In order to understand how conditional probability

is calculated, we first need to understand the following equation:

P (B)P (A|B) = P (A ∩ B)

Both A and B occurring is essentially equivalent to B occurring and

then A occurring. Therefore, the above equation holds. If we rearrange

the terms,

P (A ∩ B)

P (A|B) =

P (B)

we get the renowned conditional probability formula. The conditional

probability of A given B is the probability of both A and B occurring

divided by the probability of B occurring. It is the proportion of cases

where A occurs as well, in all cases where B occurs.

In fact, we are already very familiar with conditional probability, because

we have come across many such problems. Here’s a very simple one:

“A die is rolled. If we know that the outcome is an even number, what

is the probability that it’s a prime number?” I bet many of us have come

across such questions at least once in our lives. In solving these problems,

we use the conditional probability formula, even if we don’t realize it. It’s

only when such a problem is worded in a slightly unfamiliar way that we

fail to realize it has to do with conditional probability.

The problem we just solved is really no different than a typical conditional

probability problem. When we place the probabilities of the given

events properly into the conditional probability formula, we get exactly

the same equation as the one we just got. The only reason I chose not to

proceed directly by plugging into the formula was to give a better intuition

as to how the formula is derived in the first place.

A = new kid is a boy

B = "it" is a boy

P (new kid is a boy | "it" is a boy) =

P (new kid is a boy ∩ "it" is a boy)

P ("it" is a boy)

I would now like to present a related theorem, namely, Bayes’ theorem.

Bayes’ theorem is a theorem derived directly from the conditional probability

formula. The formula and the theorem are basically only different

perspectives for handling the same problem. In some cases, it is more

practical to use the formula, and in others using Bayes’ theorem saves us

a great deal of time. In order to state Bayes’ theorem, let us first rewrite

the conditional probability formula:

P (A|B) =

P (A ∩ B)

P (B)

Now, let us only replace the term P (A ∩ B) with P (B|A) · P (A). That

is, surprisingly, all we need.

P (A|B) =

P (B|A) · P (A)

P (B)

This is the statement of Bayes’ theorem. As you can see, Bayes’ theorem

and conditional probability are heavily intertwined. Now, let’s move

on to an example where using Bayes’ theorem comes in handy.

A research shows that only 10% of all happy people are rich. Does this

statistic show that wealth does not bring happiness?

It is completely baseless and wrong to make such an assumption based

on this information alone. Let’s demonstrate this using Bayes’ theorem.

Suppose that of all people, 10% can be named rich. Suppose also that of

all people, 40% can be named happy. In other words, a person has a 4/10

chance to be happy and a 1/10 chance to be rich.

R=Being rich

H=Being happy

P (R|H) · P (H)

P (H|R) =

P (R)

10% · 40%

=

5%

= 80%

This shows that 80% of all rich people are happy. Well, it turns out

in the end that money can in fact buy happiness. Perhaps we should stop

fooling ourselves.

An additional, important use of Bayes’ theorem is to update the probability

of an event occurring when new information about the event is

present. This method is called Bayesian inference. There are, for example,

algorithms which use Bayesian inference to detect whether an e-mail is

spam or not. Bayesian inference is the basis of Bayesian statistics. We also

use a Bayesian way of thinking in our daily lives frequently, if sometimes

inadvertently. People tend to make a mental update of the probability

of an event when new information is revealed. Bayes’ theorem is only a

mathematical formalization of this natural process. In the next problem,

you will be asked to formalize a given problem mathematically, making use

of Bayes’ theorem. Well, of course you always have the option of using the

conditional probability formula, since the two are quasi-interchangeable.

Though, do try to use Bayes’ theorem to get the hang of it.

If you recall, we had said that we were going to solve the third question

of the fourth problem using a method presented in this problem. We were

supposed to use the concept of conditional probability.

ourselves of the question:

Let us remind

If we know that George has a daughter born during a full moon, what

is the probability that George also has a son? (A full moon occurs roughly

every 29.5 days.)

Even if you were not aware when you first encountered this question,

surely now you can instantly tell that it is a classic question of conditional

probability. We are given the condition that George, who has two children,

has a daughter born during a full moon, and we are asked the probability

that George has a son. Let’s build up the conditional probability formula.

A = George has a son

B = George has a daughter born during a full moon

P (A|B) =

P (A ∩ B)

P (B)

Before calculating the values of P (A ∩ B) and P (B), let’s first fix

the notation that we will use. Let S denote a son, D denote a daughter

not born during a full moon, and D F denote a daughter born during a

full moon. Let us demonstrate George’s two children using this notation,

where the first letter denotes the older child and the second letter denotes

the younger. For example, if George has an older son, and a younger

daughter born during a full moon, we would write SD F .

Therefore, we can say that the value of P (A ∩ B) is the sum of the

values of P (SD F ) and P (D F S).

Similarly, P (B) is the sum of P (SD F ), P (D F S), P (DD F ), P (D F D)

and P (D F D F ).

Since we already know the values of P (A ∩ B) and P (B), we can plug

in the value of P (A|B) into our equation.

P (A|B) =

P (SD F ) + P (D F S)

P (SD F ) + P (D F S) + P (DD F ) + P (D F D) + P (D F D F )

A child has a 1/2 probability to be a boy or a girl. But what is the

probability that a child is born during a full moon? We are told that a

full moon occurs every 29.5 days, so the probability that a child is born

1

during a full moon is

29.5 , or 2 59

. Then, the probability of George having a

daughter born during a full moon (D F ) is 1 2 · 2

59 , that is, 1

59

. Similarly, the

probability of George having a daughter not born during a full moon (K)

is 1 2 · 57

59 , that is, 57

118

. Now that we have evaluated all of the expressions

that we need, we can calculate P (A|B).

P (A|B) =

1

2 · 1

59 + 1 59 · 1

2

1

2 · 1

59 + 1 59 · 1

2 + 57

118 · 1

59 + 1 59 · 57

118 + 1 59 · 1

59

1

2 + 1 2

=

1

2 + 1 2 + 57

118 + 57

118 + 1 59

= 1

234

118

= 59

117 ≈ 50.43%

In other words, George having a son given that he has a daughter born

during a full moon is 50.43%. That’s only slightly higher than 1/2. If you

recall, we actually found a similar result in the second question of the

same problem. In that question, we were given that Mike had a daughter

born not during a full moon, but on a Tuesday. A child had a 1/7 chance

to be born on a Tuesday, and the result of that question turned out to be

51.85%. In this question, our probability at hand is 2/59 and the result

is slightly lower than 51.85%: 50.43%. You can clearly see that these

two questions are identical in nature, the only difference being that the

probability of the given event is different: 1/7 in one, and 2/59 in the

other. As this probability gets smaller, the result approaches 1/2. We can

see this by taking the limit of the conditional probability formula as the

given probability goes to 0.

We can actually solve the other three questions in the fourth problem

using the conditional probability formula. In fact, when we were solving

them before, we already used the logic behind conditional probability.

How exactly did we do it? We considered the ratio of desired cases to all

cases. This is exactly the essence of conditional probability. But now I

ask you to revisit these three and solve them using the conditional probability

formula. This will definitely help you better understand the fourth

problem.

In fact, even the first and second problems in the book are conditional

probability problems as well. Again, you can go ahead and try to solve

those using the conditional probability formula.

Additional Problem

Prince Boris has fallen in love with the lady he saw at the ball last

week. The lady who, he recalls, had blond hair and green eyes, dropped

one of her shoes at the stairs to the palace while leaving the ball at exactly

midnight. Boris picked up the shoe, which is an exact size 43 and kept

it, hoping to see the lady again. He ordered all of his servants to go and

look for the lady. Since there are two million women around the same age

as the lady he saw in the prince’s domain, they had quite a task in their

hands.

Fortunately, they managed to find a young lady matching the description.

This young lady had blond hair and green eyes, and wore size 43

shoes. One of the servants, who happened to be a mathematician, had

calculated that the probability of a woman from this land to simultaneously

have blond hair, green eyes and to wear size 43 shoes, was 1/10000.

This lead the servants to think that it was highly likely that they had

found the right lady. Why are they mistaken?

The Unreliable Test

Kate is a young lady who works out regularly and follows a healthy diet.

Furthermore, she never skips her yearly medical check-up. Unfortunately,

something unexpected has happened this year. After receiving her checkup

results, she found out that she has a fatal disease. However, she didn’t

panic immediately. After a small research, she found out that this disease

has a prevalence of 0.1%. Moreover, after making necessary research about

the credibility of the test, Kate found out that this test has an accurate

diagnosis rate of 99%. In other words, this test has a 1% probability to

diagnose a healthy person inaccurately as ill, or to diagnose a sick person

as healthy. What is the probability then, for Kate to actually be ill?

Should she ignore the test result and move on with her life? Or should

she start her treatment immediately so as not to lose any further time?

Since Kate is a cautious person, she decided to undergo the test once

more. However, the second test gave the same result as well. Now, what

is the probability that Kate is sick for real? Do you think that she is in

trouble?

Solution

To begin, we will attack the first question. We know that the probability

of an individual to be affected by this disease is 1/1000 whereas it is

999/1000 not to be. Since we have no defining information about Kate, we

may say that the initial probability for her to be ill is 1/1000, just like any

other person. Besides, the test has a 99/100 chance to diagnose Kate’s

illness accurately whereas it has a 1/100 chance to give a false positive.

With this in hand, we can create a table of the 4 possible situations, along

with their probabilities:

Ill

1

Positive Test Result

1000 · 99

100

1

Negative Test Result

1000 · 1

100

Healthy

999

1000 · 1

100

999

1000 · 99

100

Since the test Kate has taken gave a positive result, there are two

possible cases for Kate. The first case is that Kate is ill and the test gave

a correct, positive result. The second case is that Kate is healthy but

the test gave a positive result inaccurately. What is being asked in this

question is Kate’s probability to be ill. Therefore, we need to find the

probability of the first situation.

In that case, in order to find Kate’s chance of actually being ill, we

need to divide the combined probability of the test result being positive

and Kate being ill by the total probability, which is the probability of

a positive test result. Thus we will find the proportion of the number

of cases in which Kate is ill for real, among the two scenarios where the

outcome of the test is positive. This ratio gives us the probability for Kate

to really have this disease, given that her test result is positive. Since this

is a question of conditional probability, we can make use of Bayes’ theorem

to make our lives easier.

P = The probability of a positive test result

H = The probability for Kate to be ill

H ′ = The probability for Kate not to be ill

P (H) · P (P |H)

P (H|P ) =

P (P )

P (H) · P (P |H)

=

P (H) · P (P |H) + P (H ′ ) · P (P |H ′ )

1

1000 · 99

100

=

1

1000 · 99

100 + 999

1000 · 1

100

= 99

1098

≈ 0.09

As a result, the probability for Kate to be ill is only 9%, much less

scary than it first seemed.

Let’s move on the second question now. Kate took the same test for

the second time and got yet another positive result. In this scenario, we

want to find out Kate’s probability of being ill. Before we start, it should

be clear that this time, Kate’s probability to be ill is higher than that in

the previous question. It’s safe to say that each additional positive test

result supports the hypothesis that Kate is actually ill.

The second test has an accurate diagnosis rate of 99%, just like the

first one. Let’s make another table of probabilities just as in the first

question. However, there is something that we need to be careful about

while creating the table: Kate’s probability to be ill. Before taking the

first test, Kate’s chance to be ill was 1/1000. Because we didn’t know

anything special about Kate at all and our only information was that this

disease has a prevalence of 0.1% we assumed that Kate would be sick with

a probability of 1/1000. However, we now have additional information to

consider, namely, the result of the first test. The positive result of the

first test has updated Kate’s probability to be ill to 99/1098 .

Since we already know all the probability values, let’s create the table

of the 4 situations that can occur, depending on the result of the second

test Kate has taken.

Ill

99

Positive Test Result

1098 · 99

100

99

Negative Test Result

1098 · 1

100

Healthy

999

1098 · 1

100

999

1098 · 99

100

Since we know that the second test gave a positive result, we need to

consider two cases out of all four. We can figure out, again by using Bayes’

theorem, the probability of the case where Kate is ill and the test result

is positive:

P = Positive test result H = Kate’s probability to be ill H ′ = Kate’s probability not to be ill

P (H|P ) =

P (H) · P (P |H)

P (H) · P (P |H) + P (H ′ ) · P (P |H ′ )

99

1098 · 99

100

=

99

1098 · 99

100 + 999

1098 · 1

100

= 9801

10800

= 0.9075

As a result, after the second positive test result, Kate’s probability to

be ill has increased from 9% to 91%. Indeed, as we have said, this result

makes sense, because each positive test result should in fact increase the

chance that Kate actually has this disease. As a matter of fact, each

test Kate has undergone will change her probability to be ill, regardless

of whether it has given a positive result or not. We’ve mentioned before

how Bayes’ theorem enables us to update the probability of an event with

each extra information about the occurrence of that event. Recall that we

called this functionality of Bayes’ theorem Bayesian inference.

Flirting by Arranged Coincidences

Dwight and Lucy are two young people that fell in love with each

other. They are yet to confess their feelings for each other. For now, they

are trying to get closer by pretending to run into each other by accident.

There is a park near where Dwight works. On some days, Dwight and

Lucy go to that park in hopes of running into each other. Today is one

of those days. Dwight and Lucy will be coming to the park after work,

some time between 5 PM and 6 PM and stay for 15 minutes, completely

unaware of each other’s arrivals. If they don’t happen to see each other

in this period, they will lose hope and leave. What is the probability that

they see each other in the park?

Solution

For convenience, let’s start by denoting 5 PM as minute 0, and 6 PM

as minute 60. Dwight and Lucy will be arriving at the park any time

between the 0th and the 60th minutes and will be staying for 15 minutes.

I will solve this problem with two methods. I will first use my least

favorite method:

Let us first calculate the probabilities of the cases where Dwight arrives

at the park earlier than Lucy and they see each other:

In any of the cases where Dwight arrives within the first 45 minutes,

Lucy has to arrive within the 15 minutes that follow Dwight’s arrival. The

probability of Dwight arriving in the first 45 minutes is 45/60, namely, 3/4.

Lucy’s probability of arriving in the 15 minutes following Dwight’s arrival

is 15/60, namely, 1/4. That is to say that in this case, the probability of

the two seeing each other is

3

4 · 1

4 = 3

16

But then, there is of course the possibility of Dwight arriving in the

last 15 minutes and Lucy arriving afterwards. In these cases, Lucy has to

arrive at the park less than 15 minutes after Dwight’s arrival. The time

period in which Lucy has to arrive declines linearly from up to 15 minutes

all the way to 0 minutes, depending on when exactly Dwight has arrived.

To explain, if Dwight happens to arrive exactly on the 45th minute, then

Lucy can arrive at any of the remaining 15 minutes in order for the two

to meet. However, if Dwight arrives at the 60th minute mark, there are

0 minutes left for Lucy to arrive, that is to say, the two can certainly

not see each other. As Dwight’s moment of arrival goes up from the 45th

minute to the 60th minute, the remaining time in which Lucy has to arrive

decreases linearly from 15 minutes down to 0 minutes. We can say that in

average, Lucy has 7.5 minutes to arrive at the park. In short, in the case

where Dwight arrives within the final 15 minutes, Lucy must, in average,

arrive within 7.5 minutes in order for the two to run into each other.

The probability of Dwight arriving within the final 15 minutes is 15/60,

that is, 1/4. Similarly, the probability of Lucy arriving in 7.5 minutes after

Dwight’s arrival is 7.5/60, 1/8. Therefore, in this case, the probability of

the two seeing each other is

1

4 · 1

8 = 1

32

The sum of the two values we found is

3

16 + 1

32 = 7

32

This value is valid for the cases where Dwight arrives earlier than Lucy.

However, naturally Lucy can arrive earlier than Dwight. Since these two

cases are symmetric, we can simply multiply our value by two to find the

result:

7

2 ·

32 = 7

16

The probability of Dwight and Lucy seeing each other at the park is

7/16.

Now, let’s move on to the much more charming second method.

The second method involves approaching the problem geometrically.

Let’s think of the coordinate plane. Let the random variable X represent

the moment Dwight arrives at the park, shown on the horizontal axis of the

coordinate plane; and let the random variable Y represent the time Lucy

arrives at the park, shown on the vertical axis of the coordinate plane. 1

The possible pairs of values where the two can arrive at the park will form

the area of a square on the coordinate plane, because the variables X and

Y both take values between 0 and 60. For the two to be able to see each

other, there can be at most a difference of 15 minutes between the two’s

arrival. This means that on our coordinate plane, Dwight and Lucy can

meet only at the points where the inequality |X − Y | ≤ 15 is satisfied.

Let’s build this coordinate plane and color in the area where the inequality

is satisfied.

1 The random variables X and Y follow a continuous uniform distribution between

the values 0 and 60.

The colored area inside the square shows all of the pairs of values where

the two can meet. Then, in order to calculate the possibility for the two

to see each other, we need to divide the area of the colored area by the

area of the square, which represents the number of all the cases:

P (Dwight and Lucy meeting) = 60 · 60 − ( )

45·45

2

+ 45·45

2

60 · 60

= 1575

3600

= 7

16

In the end, we have found the same answer as in the first method, in

an easier, and I might add, classier way.

Additional Problem

Arnold, Emma and Mary will each come to the same park at some

point of time between 5 PM and 6 PM and wait for 15 minutes. If at any

point in this one hour period, all three of them happen to be at the park

simultaneously, they will play cards. But only two of them being there

does not suffice. In this case, the two friends will just wave at each other

and walk away when their 15 minutes is done. What is the probability for

the three to play cards?

Additional Problem

Dr. Holt is a dentist with three children. Even though Dr. Holt

insists that sweets are unhealthy for teeth, his children still continue to

love sweets. Further, they crack open their sweets with their teeth in order

to eat them. Dr. Holt loves their children too much to turn their requests

down. Every month, he buys them a stick of candy that’s 50 centimeters

long and breaks it into three pieces before handing it out to his children.

Today is one of the days where he buys sweets. However, Dr. Holt

drops the candy and the stick splits into three pieces from two random

points. Dr. Holt wants to share these three pieces between his children but

does not want one to over-eat and get a cavity. Because of this, he decides

that in case one of the pieces is longer than the two others combined, no

one will get sweets.

What is the probability that Dr.

children?

Holt gives out the sweets to his

Note that any point on the candy stick is equally likely to be a split

point. And of course, Dr. Holt remembers to wash the candy before

handing it out.

A. irvinus

Microbiologist Ms. Irvin has spent the last few months in her lab.

She has finally succeeded in her work. She has produced a new species

of bacteria with a synthetic genome. This new life form she created, she

calls A. irvinus. These bacteria live in a petri dish, in the form of a colony.

Loving these bacteria as if they are her own children, Ms. Irvin takes the

petri dish home. However, by misfortune, Ms. Irvin’s mother thinks that

the petri dish is a dirty dish and cleans it in the washbasin. The whole

colony is now out there, somewhere in nature.

The colony consists of 50 bacteria. In nature, an A. irvinus bacterium

has a 1/3 chance to perish and a 2/3 chance to reproduce by splitting in

two at the end of each minute. What is the probability that the A. irvinus

survives in nature, that is to say, does not become extinct?

As to what happened to Ms. Irvin, nobody believed that she had

managed to produce a new species of synthetic bacteria and had to retire

early.

Solution

Notice that there is no time limit on the question of whether the bacteria

is eradicated or not. For a species to go extinct means that after a

certain length of time, no individual of that species remains alive. Then,

not going extinct requires the population of the colony to be greater than

zero at all times. So, in order to calculate the probability of the colony of

50 bacteria to never go extinct, we need to find a result that is independent

of time.

Let P (n) be the probability that a colony of n survives. Notice that

this probability is defined independent of time.

Let us start out by calculating the probability that a single bacterium

survives in nature, namely, P (1). At the end of one minute, this bacteria

will either split in two or die out. Meaning, with a 1/3 probability, it

will become extinct (its probability of surviving will be 0) and with a 2/3

probability, it will build a colony of two bacteria which has a survival

probability of P (2). We obtain the following equation:

P (1) = 1 3 · 0 + 2 3 · P (2)

We can think of the 2-bacteria colony we obtain when our initial bacterium

splits in two as two separate colonies of a single bacterium. The

probability for each of these two bacteria, by themselves, to survive is

P (1), and the probability that they go extinct is 1 − P (1). Then, the

probability that both of these two bacteria go extinct, independent of

each other, is (1 − P (1)) 2 . When we subtract this value from 1, we find

the probability for the two bacteria not to go extinct. That is, the probability

that the colony of 2 bacteria lives on is 1 − (1 − P (1)) 2 . We can

now plug the value of P (2) into our equation.

P (1) = 1 3 · 0 + 2 3 · (1 − (1 − P (1))2 )

If we simplify this equation of second degree, we get the following:

P (1) · (2P (1) − 1) = 0

We observe from this equation that the possible values of P (1) are

0 and 1/2. The probability that a single bacterium survives in nature

obviously cannot be equal to 0. 1 Therefore, the value of P (1) will be

1/2. That is, the probability that a single A. irvinus bacterium survives

in nature is 1/2.

However, the problem asks us the survival probability of a colony of

50 bacteria. This means that we need to find the value of P (50). Let us

then think of the 50-bacteria colony as 50 colonies of single bacteria. As

we have previously shown, each of these single bacteria has a 1/2 chance

to survive. Then the probability for each of them to go extinct is also 1/2.

Hence, the probability for all of these bacteria to go extinct is (1/2) 50 .

This means that the probability for them not to go extinct is 1 − (1/2) 50 .

That is,

P (50) = 1 − (1 − P (1)) 50 = 1 − (1/2) 50

Then, the probability for a colony consisting of 50 bacteria to survive

is equal to 1−(1/2) 50 . If we were to write this number out as a decimal, it

is equal to 0.99999999999999911182158029987477. That means that a 50-

bacteria colony is highly likely to survive out in nature. This is intuitively

clear, as 50 bacteria each of which have a 2/3 chance per minute to split

and reproduce is highly unlikely to die out.

Ms. Irvin might have missed the opportunity of making a groundbreaking

achievement, but her creation lives on. At least she has made

her mark on the world. Although it is doubtful whether the A. irvinus

will remember their creator.

Finally, I would like to generalize this problem. Notice that the magic

number 50 actually has nothing special about it. If the number of bacteria

in the colony were 150 instead of 50, this wouldn’t have changed anything

in the solution of the problem. The survival probability of a colony of n

bacteria can be found by the following formula:

P (n) = 1 − (1 − P (1)) 50

1 You may be asking yourself why the probability for a single bacterium to survive

cannot be 0. We will better understand why in the solutions of the following two

problems. But for now, you may answer this question indirectly by showing why the

probability for a single bacterium to go extinct cannot be 1.

Since in our problem P (1) = 1/2,

( n 1

P (n) = 1 −

2)

Nancy the Caretta

Today is a big day for Nancy who is a sea turtle belonging to the

species Caretta caretta. She succeeded in hatching after 2 months of

incubation. Nancy can’t wait to reach the sea just like any other baby

sea turtle. However, due to light pollution, Nancy has difficulty in finding

her way to the sea. Therefore, at each step she goes towards the sea with

a probability of 1/4, whereas she goes in the opposite direction with a

probability of 3/4. Knowing that Nancy started her walk one step away

from the seashore, what is her probability to eventually make it to the

sea?

Solution

Let’s denote Nancy’s position on her way to the sea with the variable x

on a half-line. Assume that each one of Nancy’s steps changes her position

by one unit. Let x = 0 designate the endpoint of the half-line, that is, the

position of the seashore, and x = 1 Nancy’s initial position.

Nancy decreases her x-value with each step she takes towards the sea

whereas she increases her position with every step in the opposite direction.

We are asked Nancy’s probability of eventually reaching the sea, or

in other words, Nancy’s probability of ever being at the position x = 0.

Notice first that the total number of steps Nancy must take to reach

seashore is an odd number. Nancy’s initial distance from the sea is 1 unit,

which is an odd number. The total number of steps that Nancy takes

must be an odd number for her displacement to also be an odd number.

This means that she can reach the sea only if the total number steps that

she takes is an odd number such as 1, 3, 5, 7, . . . .

Nancy’s probability of reaching the sea with her first step is 1/4. Her

probability of reaching the sea in three steps is 3/64, and the order of steps

that she needs to follow is 1 → 2 → 1 → 0. Similarly, she can reach the

sea in five steps with a total probability of 9/512 in the following ways:

1 → 2 → 1 → 2 → 1 → 0 or 1 → 2 → 3 → 2 → 1 → 0 . To take it

further, her probability of reaching the sea in seven steps is 135/16384. In

fact, Nancy has a certain probability to reach the seashore for every odd

number. We can find Nancy’s probability of reaching the sea if we sum all

of these probabilities. For each odd number in the form of 2n + 1 (n ∈ N)

1

, there are

n+1 · (2n )

n different orderings for Nancy’s steps. These numbers

are called the Catalan numbers. I will soon explain the reason why the

number of possible orderings is equal to these numbers. The probability

of each ordering to occur is ( 3 n (

4)

·

1 n+1

4)

because n + 1 steps out of 2n + 1

needs to be taken towards the sea whereas n steps need to be taken in

the opposite direction. Then, for each odd number in the form of 2n + 1 ,

Nancy’s probability of reaching the sea is

1

n+1 · (2n

n

)

·

( 3

4) n

·

( 1

4) n+1

. If we

sum all of the probabilities that each odd number is contributing, we can

find Nancy’s probability of reaching the sea by the following infinite sum:

(

∞∑

n=0

1

n + 1 · ( 2n

n

) ( ) n ( ) n+1 3 1

· ·

4 4)

To be able to evaluate this sum, we need to use the generating function of

1

Catalan numbers which are in the form of ·(2n )

n+1 n . Generating functions

have a the following form:

f (x) =

∞∑

a n x n

x=0

If you have not seen a generating function before, you should probably do

some research on the topic. The generating function of Catalan numbers,

c (x), has the following value:

c (x) =

∞∑

n=0

( ( ) )

1 2n

n + 1 · · x n = 1 − √ 1 − 4x

n

2x

Then, Nancy’s probability of reaching the sea can be written as follows:

P (Nancy’s probability of reaching the sea) = 1 ( ) 3

4 · c 16

For the rest, we only need to calculate the value of c ( )

3

16

√

( ) 3

c = 1 − 4 · 3

16

16 2 · 3 = 4 3

16

As a result, the probability of Nancy ever reaching the sea is 1/3.

1

Let us explain why there are

n+1 · (2n )

n different step orderings where

Nancy reaches the sea. We have already said that these numbers are

called Catalan numbers. We use Catalan numbers prominently in the

area of combinatorics. There are many ways in which we can define these

numbers. I prefer to define them as follows: In how many different ways

can n pairs of parentheses be placed correctly, assuming that each pair

consists of one left-hand and one right-hand bracket? To illustrate, (())()

is a correct order for n = 3 whereas ())()( is incorrect. To put it differently,

the total number of right-hand parentheses must never exceed the total

number of left-hand parentheses while reading from left to right. There

are

1

n+1 ·(2n

n

)

different ways, namely the Catalan number, to place these n

pairs of parentheses in correct order. I will not prove this statement here,

because it is easy to find the proof from various resources. In this problem,

Nancy’s displacement in the last step must be from x = 1 to x = 0 so that

she can reach the sea in 2n + 1 total steps. Nancy may follow any step

order unless she is at the position x = 0 in the 2n steps before the last

one. In other words, the total number of Nancy’s steps towards the sea

cannot be more than that of in the opposite direction before her last step.

In case you noticed the rule of the ordering, it is similar to the one we

applied to n pairs of parentheses. As a matter of fact, they are exactly the

same. Left-hand brackets correspond to going in the opposite direction of

the sea whereas right-hand brackets correspond to going towards the sea.

In other words, in this problem, there are

1

n+1 · (2n

n

)

different orderings

which enable Nancy to reach the sea in 2n + 1 steps .

I went over this method just to introduce you to Catalan numbers and

to give you a new way of looking at things. If it, unfortunately, confused

you, you may completely ignore this method. As you know by now, this

book is dedicated to solving each problem in the simplest and most efficient

way. We will now move on to a method which is much more elegant and

simpler to grasp.

Let us designate Nancy’s probability to reach the sea starting from the

position x = n as P n . Then, Nancy’s probability to reach the sea starting

from the position x = 1 is P 1 . Nancy has 1/4 probability to take her first

step towards the sea and reach it. With 3/4 probability she takes her

step in the opposite direction of the sea and arrives at the position x = 2 .

If she arrives at the position x = 2, her probability of reaching the sea will

now become P 2 . Then what we have is the following equation involving

P 1 :

P 1 = 1 4 + 3 4 · P 2

To be able to find P 1 , we must express P 2 in terms of P 1 . Therefore,

we need to define P 1 in a different way. According to our first definition,

P 1 is Nancy’s probability to start from the position x = 1 and arrive at the

position x = 0 . Isn’t P 1 equal, then, to her probability to start from any

position and arrive at the position one unit to the left of the initial one?

In other words, it is equal to her probability to start from the position

x = n and end up at the position x = n − 1 .

Consequently, the x = 1 and x = 0 positions in the definition of P 1 are

not any different than any pair of positions adjacent to each other. So,

the probability would remain the same even if the definition were from the

position x = 5 to the position x = 4 . In that case, if we want to define

P 1 in a different way, we can describe P 1 as Nancy’s probability to go

from her initial position to the position that is one unit closer to the sea.

Similarly, we can describe P 2 as Nancy’s probability to arrive at a position

that is two units closer to the sea than her initial position. As a result, P n

stands for Nancy’s chance to go from her initial position to the final one

that is n units closer to the sea. Hence, Nancy’s probability to go from the

position x = 2 to the position x = 1 must be equal to P 1 . Therefore, her

probability to reach the sea will be equal to her initial probability, which

is P 1 . To put it differently, P 2 is equal to the probability of arriving twice

in a row at the position which is one unit closer to the sea, namely (P 1 ) 2 .

If we substitute (P 1 ) 2 for P 2 in the equation which is used to express P 1 ,

P 1 = 1 4 + 3 4 · (P 1) 2

0 = 3 (P 1 ) 2 − 4P 1 + 1

0 = (3P 1 − 1) (P 1 − 1)

there are two possible values of P 1 .

P 1 = 1 or P 1 = 1 3

You may have intuitively noticed that 1/3 is the correct answer among

these two options. However, there is an alternative to plain intuition in

finding this result. Nancy the Caretta has a certain possibility to reach

the sea for each odd number of total steps. We find the probability that

Nancy reaches the sea after adding up each of these possibilities. If we

add up all of these probabilities one by one, we will see that the total sum

converges to 1 or 1/3. The correct answer is the one to which the sum

converges. Let’s sum Nancy’s probability to reach the sea for a couple of

odd numbers:

P 1 = P (Reaching in 1 step) + P (Reaching in 3 steps) + P (Reaching in 5 steps) + . . .

= 0.25 + P (Reaching in 3 steps) + P (Reaching in 5 steps) + . . .

= 0.296875 + P (Reaching in 5 steps) + P (Reaching in 7 steps) + . . .

= 0.314453125 + P (Reaching in 7 steps) + P (Reaching in 9 steps) + . . .

= 0.32269287109375 + P (Reaching in 9 steps) + P (Reaching in 11 steps) + . . .

As can be seen, the probability converges to 1/3, not 1. Then, we can

say that P 1 is equal to 1/3.

P 1 = 1 3

There is a more reliable method to understand which answer is correct

in such dilemmas. Let’s assume that p is Nancy’s probability to take steps

in the opposite direction of the sea whereas q is her probability to move

towards the sea. We know that (p + q = 1) since these events are mutually

exclusive. If we write down and solve the equations the same way as we

did before, we will again end up with two different answers. You may try

to solve the equations for p and q yourself. The two solutions are 1 and

q

p . Since q = 1 4

in this problem, one of the solutions is 1 while the other

one is 1 3 .

Which one of 1 and q p

is the correct answer of the problem? Notice

that q p

is greater than 1 when q > 1/2. Knowing that the probability of an

event to occur cannot be greater than 1, we should pick 1 as the correct

answer instead of q p

if q > 1/2 . If q = 1/2 , we obtain the same answer for

both probabilities. In other words, Nancy’s probability to reach the sea is

1 if q = 1/2 . Which one of these two values should be used when q < 1/2

? First off, we know that Nancy naturally has 0 probability to reach the

sea if q = 0 . Secondly, Nancy has a probability of 1 to reach the sea

when q = 1/2 . We may infer from these facts that Nancy’s probability of

reaching the sea will increase gradually from 0 to 1 as we increase the value

of q from 0 to 1/2. The same logic applies to q p

. That is to say, the value

of q p

will gradually increase from 0 to 1 while the value of q increases from

0 to 1/2 . Therefore, we must choose q p

as the correct answer if q < 1/2 .

As a result, the function which gives the probability that Nancy reaches

the sea, depending on q, will be a continuous function. If we chose 1 as

the correct answer for any q in the range 0 ≤ q < 1/2 , this probability

function would be discontinuous and hence inaccurate. 1 To sum up, we

will consider 1 as the answer if q ≥ 1/2 and q p

as the answer if q < 1/2

. We will prove in the next question why we must make this assumption.

Now, let us plot Nancy’s probability to reach the sea, P 1 , as a function

of q :

P 1

1

q/p

0

0.5 1

q

In this problem, q is equal to 1/4. Since this value is smaller than

1/2, we will use q p

in order to find the answer. If we substitute 1/4 for q

and substitute 3/4 for p , we obtain 1/3, the same result as we had found

earlier.

As a result, Nancy the Carettai, who comes out of her egg one unit

away from the seashore has a 1/3 probability to reach the sea. Knowing

that her probability to take a step towards the sea is 1/4, the value 1/3

makes sense. However, since Caretta caretta is an endangered species,

1/3 is a lower probability than desired. Therefore, we need to avoid light

pollution to help little sea turtles like Nancy to find their ways to the sea.

Decreasing light pollution increases the value of q, so that sea turtles can

reach the sea safe and sound.

Returning to mathematics, we can calculate Nancy’s probability of

reaching the sea for different starting positions.

We found out that P 1 is equal to the probability that Nancy arrives

at a position one unit closer to the sea. If Nancy hatches n units away

1 While making this deduction, we assumed that the function giving Nancy’s probability

to reach the sea must be continuous.

from the sea, she needs to get n steps closer to the sea in order to reach it.

Then, P 1 needs to occur n times consecutively . Therefore, Nancy, who

hatches n steps away from the sea, has the following probability to reach

the sea:

P n = (P 1 ) n

If Nancy hatches at the position x = n , her probability to reach the

sea can be calculated with the formula below:

( n 1

P n =

3)

Recall the previous problem in the book. That problem is highly related

to the one we have just solved. In the previous one, we had tried to

find the probability of a species of bacteria to survive in nature. We ended

up with two possible values for this probability: 0 and 1/2. Nonetheless,

we chose 1/2 as the correct answer. Why was 1/2 the correct answer

whereas 0 was incorrect? This problem has a strong resemblance to the

current problem: Suppose that Nancy takes a step towards the sea with a

1/3 probability whereas she moves in the opposite direction with a probability

of 2/3 (the values in the previous problem). In that case, what is

Nancy’s probability to never reach the sea? Indeed, this problem is identical

to the previous one. The scenario the bacteria survive corresponds to

the scenario where Nancy never reaches the sea. In this problem, q = 1/3

since Nancy has a 1/3 probability to walk towards the sea. Therefore, we

find her probability to reach the sea as 1/2 by using the expression q p .

Consequently, she has 1 − 1/2 probability to never reach the sea, that is,

1/2 . Therefore, the probability of a colony of bacteria to survive in the

previous question must also be taken as 1/2 , and not 0 .

Additional Problem

Nancy has an Anatolian shepherd friend named Yolanda, who lives in

a big house with a yard. Yolanda is a very outgoing dog and she always

wants to go to neighbors’ yards to meet other dogs. The collar around

her neck is the biggest obstacle between her and her dreams. There are a

rows of houses to the left and right of Yolanda’s house which represent a

perfect opportunity to make many new friends. One day, Yolanda is fed

up with staying at home and she rips off her collar in order to go on a

quest for new friends. She is now free and can go wherever she wants!

She is equally likely to go towards the left or the right. After each step,

she moves on to another neighbor’s house again with a fifty-fifty chance of

going either to the left or to the right. She is so freedom drunk that she

might revisit the houses which she has already been to without realizing.

At each step, she randomly chooses a direction to move to.

Yolanda’s owner soon realizes that his pup is on the loose. So, he

decides to wait at home in case she comes back, so that he can tie her

back up.

What is the probability then, that Yolanda gets to make a new friend?

Sorry, we can’t actually calculate that with the information given. The

actual question is this:

What is the probability that Yolanda stumbles back home and loses

her freedom once again?

Additional Problem

Another friend of Nancy, Max the Kitten is busy chasing a butterfly

right now. At the end of each minute, the butterfly has a 3/10 probability

to move one meter to the right, a 3/10 probability to move one meter to

the left, and a 4/10 probability to stay where she is.

Max follows this butterfly wherever she flies. Nonetheless, Max’s owner

doesn’t allow him to go far away. What is the probability that at the end

of 10 minutes, Max is at most 5 meters away from where he started?

Marble Gamble

Chuck and Harry love playing marbles with the kids in their neighborhood.

There are several games that they mainly play. Harry’s favorite is

called Archboard while Chuck’s favorite is Simple Ringer. In each of these

games, the winner takes one of their opponent’s marbles and adds it to

theirs. Chuck and Harry have won many of these games and increased

their number of marbles. But they are too ambitious and want to add

even more to their tallies. Since they both dream of being marble-rich

fast, they decided to play a game against each other. This game involves

repeated coin tosses: each time they get a tail, Chuck will take one marble

from Harry and each time they get a head, Harry will take one from

Chuck. This game will go on until one of the two runs out of marbles. At

the beginning of the game, Chuck holds 400 marbles and Harry holds 300.

So, the winner of this game will be the owner of 700 marbles and become

the ultimate marble-kingpin, and the loser will lose everything and be left

to question their life choices.

What is the probability that Chuck wins the game?

If you manage to solve the problem, you can try to generalize the

solution for a further challenge, by assuming that Chuck has n marbles

and Harry has m.

Solution

Initially, Chuck has 400 marbles and Harry has 300. If at the end of a

series of coin tosses, Chuck reaches 700 marbles and Harry has none left,

then Chuck wins. On the contrary, if Chuck ever runs out of marbles,

then he loses and Harry wins. Then at any point during this game, both

will have a number of marbles between 0 and 700.

Let P n denote the probability for a player with n marbles to win at any

point during the game. Since Chuck has 400 marbles initially, his chance

to win will be P 400 . Also, because Chuck has an equal 1/2 chance to win or

lose each coin toss, his probability of having lost t marbles before winning

t marbles is 1/2. Similarly, his probability of having won t marbles before

losing t marbles is also 1/2. You might be confused at this point, but

using an actual number instead of a variable might help you understand.

Suppose t has value 300. Chuck, who starts out having 400 marbles, has

a 1/2 probability to have gone down to 100 marbles before ending up

with 700 marbles. Likewise, he has a 1/2 chance to have gone up to 700

marbles before ending up with 100. These two probabilities are equal. In

fact, these two scenarios are completely symmetrical, in both 300 marbles

will have been won or lost. To put it simply for one last time, Chuck

has a 1/2 probability to reach 700 marbles (and win) first, and yet again

a 1/2 probability to go down to 100 marbles. By our notation, Chuck’s

probability of winning the game when he has 100 marbles is P 100 , and his

probability of winning the game when he has 700 marbles is P 700 . With

these in hand, we can show the probability for Chuck to win the game in

an equation:

P 400 = 1 2 P 100 + 1 2 P 700

P 700 is the probability that Chuck wins the game when he has 700 marbles,

which is clearly equal to 1.

P 400 = 1 2 P 100 + 1 2

Let’s show the value to P 100 in a similar fashion. Chuck has a 1/2

probability to have gone down to 0 marbles before reaching 200. Similarly,

his probability of having hit 200 marbles before going down to 0 is also

1/2. Additionally, we know that having 0 marbles means losing the game,

so Chuck’s probability of winning the game when he has 0 marble is 0.

P 100 = 1 2 P 200 + 1 2 P 0

P 100 = 1 2 P 200

We can get a similar equation for P 200 . Starting with 200 marbles,

Chuck has a 1/2 probability to go down to 0 marbles and 1/2 probability

to go up to 400.

P 200 = 1 2 P 400 + 1 2 P 0

P 200 = 1 2 P 400

Writing down the three equations we’ve obtained:

P 400 = 1 2 P 100 + 1 2

P 100 = 1 2 P 200

P 200 = 1 2 P 400

In this system of equations, we can write P 100 in terms of P 200 , and

P 200 can be written in terms of P 400 . This means that we can write the

P 100 in terms of P 400 in order to obtain a first degree equation with a

single unknown.

P 400 = 1 8 P 400 + 1 2

P 400 = 4 7

Chuck has 400 marbles at the start of the game, so the value we are

looking for is P 400 , which is equal to 4/7. Chuck has a 4/7 probability to

win the game.

We have found the result, but what would we have done differently if

the given numbers were different? Or what if the coin used in the game

was biased? Can we find a generalized solution to this problem that takes

into account the numbers of marbles and the probability of winning each

round? That is exactly what we will be doing now. We will use a simple

approach to reach a general formula. You will later see that this formula

is a very practical one that is frequently encountered. I’m very confident

that what you are about to learn will impress you, assuming you like

mathematics.

The most popular variant of this problem is known as gambler’s ruin

problem. In this problem, two gamblers who start out with an equal

amount of money in hand play consecutive games where the winner takes

one dollar from the loser. The game ends when one side runs out of

money. Gambler’s ruin, and all of its variations can be examined under

the concept of random walk. But what is a random walk? In simple

terms, a random walk is a mathematical object which is used to describe

a process consisting of a succession of random steps. In fact, the previous

two questions were also concerned with random walks. You will soon see

that the formula we will be constructing in this problem works perfectly

in the solution of the two before. Let’s get going.

Let Chuck have n and Harry have m marbles. Also, to account for

the possibility of a biased coin, let p denote the probability that Chuck

wins a coin toss, instead of being restricted to 1/2. Similarly, let q denote

the probability that Harry wins a coin toss. Since these two events are

mutually exclusive, the sum of p and q is equal to 1. We have sufficiently

generalized the question, let’s now move on to finding Chuck’s probability

to win.

Starting out with n marbles, Chuck will lose the game if his number

of marbles goes down to 0 and he will win if he reaches n + m marbles.

Let n + m = k for convenience. Then the number of marbles that Chuck

has all throughout the game will always be in the range from 0 to k. Let

P i denote the probability of Chuck winning the game if he has i marbles,

at any point during the game (0 ≤ i ≤ k). With this notation, Chuck’s

probability of winning the game at the start is P n . Notice also that P 0 is

equal to 0 and P k to 1.

Chuck, who has i marbles at some point during the game, will have a

p probability to win and a q probability to lose the next coin toss. If he

wins the coin toss, he will have i + 1 coins and his probability of winning

the game will become P i+1 . Similarly, ,f he loses the coin toss, he will

have i−1 coins and his probability of winning the game will become P i−1 .

Then, we can express Chuck’s probability of winning the game when has

i marbles, namely P i , by the following equation:

P i = pP i+1 + qP i−1

This equation is a recurrence relation. We call equations where the

terms of a sequence are expressed in terms of other terms of the same

sequence recurrence relations. It is possible to find the implicit formula

for a sequence using recurrence relations. I will go into detail on this

method at the very end of the solution, but we will be following a simpler

method for now. We know that p + q = 1 holds for this equation. Hence,

there is no problem with substituting (p + q) P i for P i .

(p + q) P i = pP i+1 + qP i−1

pP i + qP i = pP i+1 + qP i−1

If we gather all of the terms with the same coefficients on the same

side and use the distributive property,

qP i − qP i−1 = pP i+1 − pP i

q (P i − P i−1 ) = p (P i+1 − P i )

P i+1 − P i = q p (P i − P i−1 )

P i+1 = P i + q p (P i − P i−1 )

Let us write 1 for i in the equation that we have obtained.

P 2 = P 1 + q p (P 1 − P 0 )

Since P 0 is equal to 0

P 2 =

(

1 + q )

P 1

p

Like so, we have expressed the term P 2 in terms of P 1 . Now let us

write 2 for i.

P 3 = P 2 + q p (P 2 − P 1 )

Now, we should plug in the expression for P 2 in terms of P 1 in the

equation above P 2 .

(

P 3 = 1 + q )

P 1 + q ((

1 + q ) )

P 1 − P 1

p p p

(

P 3 = 1 + q )

P 1 + q ( ) q

p p p P 1

(

P 3 = P 1 1 + q ( ) ) 2 q

p + p

Thus, we have expressed P 3 also in terms of P 1 . Likewise, if we write

3 for i, then we can obtain P 4 in terms of P 1 as well.

P 4 = P 3 + q p (P 3 − P 2 )

(

P 4 = P 1 1 + q ( ) ) ( 2 ( )

q

p + + q 2 q

P 1

p p p)

(

P 4 = P 1 1 + q ( 2 ( ) 3 q q

p p) + +

p)

You can most likely guess what values we will get if we continue down

the same path, plugging in values for i.

(

P 2 = P 1 1 + q )

p

(

P 3 = P 1 1 + q ( ) ) 2 q

p + p

(

P 4 = P 1 1 + q ( 2 ( ) 3 q q

p p) + +

p)

.

(

P i = P 1 1 + q ( 2 ( ) 3 ( ) i−1 q q q

p p) + + · · · +

p p)

Then, the expression of the general term P i in terms of P 1 turns out

to be as noted above. The same result can be obtained using the method

of proof by induction.

We already knew that P k had value 1. Now we know the expression of

P k in terms of P 1 . We can use these two equalities to move further.

(

1 = P k = P 1 1 + q ( 2 ( ) 3 ( ) k−1 q q q

p p) + + + · · · +

p

p)

This equation lets us write P 1 in terms of p and q.

1

P 1 = ( ( 2 ( 3 ( ) ) k−1

1 + q p + q

p)

+

q

p)

+ · · · +

q

p

We can write P i in terms of P 1 , and P 1 in terms of p and q. Then,

logically, we can write P i in terms of p and q. This means that when Chuck

has i marbles (0 ≤ i ≤ k), we can express his probability of winning the

game in terms of p and q, which are his probabilities of winning and losing

each coin toss.

At this point we need to consider two separate cases and move accordingly.

The first case is when p and q are equal, so the fraction q p

equals 1.

The second case is when p and q are distinct.

We will first handle the p = q = 1 2

case., where both players have an

equal chance to win or lose each coin toss.

1

P 1 = ( ( 2 ( 3 ( ) ) k−1

1 + q p + q

p)

+

q

p)

+ · · · +

q

p

P 1 =

P 1 = 1 k

1

(1 + 1 + 1 + · · · + 1)

} {{ }

k times

Now that we know the value to P 1 , we can also find the value to all

of the terms we have previously written in terms of P 1 . Recall how we

expressed P i in terms of P 1 :

(

P i = P 1 1 + q ( 2 ( ) 3 ( ) i−1 q q q

p p) + + + · · · +

p

p)

P i = P 1 (1 + 1 + 1 + 1 + · · · + 1)

} {{ }

i times

P i = i k

Thus we have written P i in terms of i and k. Chuck has n marbles at

the start, so let us write n instead of i. This way, we will find Chuck’s

probability to win at the start of the game.

P n = n k

(

1 st formula ) (p = q = 1/2)

This means that in a game where the probabilities of winning and

losing at each round are equal, the ratio of the number of marbles a player

has (n) to the total number of marbles (k) gives us that player’s chances

of winning the game.

Now let us consider the more complex case where p ≠ q.

1

P 1 = ( ( 2 ( 3 ( ) ) k−1

1 + q p + q

p)

+

q

p)

+ · · · +

q

p

The denominator of this fraction is a finite geometric series consisting

of the terms p and q. A geometric series is a series where the ratio of

consecutive terms is constant. For instance, 1 + 2 + 4 + 8 + · · · is an

example of a geometric series. In the above equation, the geometric series

( 2 ( 3 (

is in the form of 1 + q p + q

p)

+

q

p)

+ · · · +

q

k−1.

p)

We do not need any further information on geometric series for now.

Let us move on by naming the geometric series in the denominator of P 1

as S:

S = 1 + q ( 2 ( ) 3 ( k−1

q q q

p p) + + + · · · +

p

p)

Multiplying both sides with − q p ,

− q p S = − q ( 2 ( ) 3 ( k

q q q

p p) − − − · · · −

p

p)

Linearly adding these two equations,

S− q p S = 1+ q ( 2 ( ) 3 ( ) k−1 q q q

p p) + + +· · ·+ − q ( ) 2 ( ) 3 ( k

q q q

p p p − − −· · ·−

p p p)

Simplifying,

) k

S − q p S = 1 − ( q

p

(

1 − q

p

S =

1 − q p

As a result, we have found the implicit formula for the geometric series

S. Now rewriting P 1 ,

P 1 =

1

1−( q p) k

1− q p

) k

P 1 =

1 − q p

(

1 − q

p

Now let’s go back to the equation expressing P i in terms of P 1 .

(

P i = P 1 1 + q ( 2 ( ) 3 ( ) i−1 q q q

p p) + + + · · · +

p

p)

We have shown the P 1 term in this equation in terms of p, q and k.

We can carry out the same process for the finite geometric series in this

equation.

(

1 + q ( 2 ( ) 3 ( ) i−1 q q q 1 − q

p p) + p

+ + · · · + =

p

p 1 − q p

Then we can express P i with the following equation:

) k

) i

) i

Simplifying,

(

P i =

1 − q 1 − q

p

p

( ) k

×

1 − q 1 − q p

p

(

1 − q

p

P i = (

1 − q

p

) i

) k

With this equation which gives us the value of P i for any i, we can

find Chuck’s probability to win the game as long as we know the values

to p and q, and the total number of marbles in the game. Chuck starts

out with n marbles, so let’s plug in n for i.

) n

(

1 − q

p (

P n = ( ) k 2 nd formula ) (p ≠ q)

1 − q

p

Thus, we have constructed a formula that shows us the probability of a

player who starts the game with n marbles to win when the probabilities p

and q are not equal. We have also found the two formulae which are used

in the solution of this problem and its variations. Let’s combine these two

formulae and write the probability P n as a piecewise function.

⎧n

if p = q = 1/2

⎪⎨

k

( ) n

P n = 1 − q

p

( ) ⎪⎩

k

if p ≠ q

1 − q

p

We now know how to find the probability of each player winning in a

gambler’s ruin-like problem. We only need to know how many items the

players initially hold, and the probability of winning and losing each step

of the game (the values p and q).

It makes sense to test our formulae to see if they work for our original

problem. Since Chuck’s probability of winning or losing each coin toss is

equal with an unbiased coin, we should use the first formula.

P n = n k

(

1 st formula ) (p = q = 1/2)

Chuck has 400 marbles to start out with and there are a total of 700

marbles in the game, so the first formula tells us that Chuck’s probability

is simply 400/700, which is of course equal to the 4/7 that we have found

earlier.

P 400 = 400

700 = 4 7

We can also use our findings to reach some real-life conclusions: In the

long run, gambling is a terrible idea. Suppose that the gambler and the

gambling house have an equal probability to win a game so that we can use

the simpler first formula, P n = n k

. Let’s assume the house has n dollars

and the player has m. You can probably assume that a gambling house

has multiple times the amount that a player has. In the long run, the

n

probability that the house wins and the player loses will be

n+m . Since

m is much larger in comparison with n, the player has a near-certain

probability to go bankrupt in the long run. Adding to this the fact that

in real life, the house has a higher probability to win a game anyway, the

gambler has virtually no chance in the long run. It might be possible to

make a short term profit, but in the long term the house always wins.

Now I will correlate this problem with the previous two problems. In

essence, all of these three can be described as problems of random walk

in discrete time. Hence we can use similar methods in their solutions.

We will first analyze a fictional game using the two formulae we have

constructed.

Suppose we are gambling against an opponent and that our budget

is n units while our opponent’s is infinite. On each round, we take one

unit from our opponent with p probability and we give one unit to our

opponent with 1 − p probability. It is not possible for us to win this game,

since our opponent will never run out of money. However, it might still

be possible for us to not lose by keeping the game going on indefinitely.

We can find the probability to stall the game with the two formulae we

have constructed. What we normally call the probability to win will be

the probability to stall the game.

If p < 1/2, then we need to use the second formula. The ratio q p in

the formula will be greater than 1 and the total money k will approach

infinity, which will cause the denominator of the fraction of the formula

to approach negative infinity. This means that our probability to stall the

game will approach 0.

lim

k→∞

) n

(

1 − q

p

( ) k

= 0, (p < 1/2)

1 − q

p

If p = 1/2, we need to use the first formula. But the fact that the total

amount of money in the game approaches infinity causes our probability

of stalling the game to approach zero yet again.

n

lim = 0, (p = 1/2)

k→∞ k

If p > 1/2, again, we need to use the second formula. This time

the ratio q p

will be less than 1 and since the total amount of money, k,

approaches infinity the denominator of the fraction in the formula will

approach 1. Then, the numerator will give us the probability to stall the

game ad infinitum. We have

lim

k→∞

) n

( (

1 − q

p

1 − q

p

( ) k

=

1 − q 1

p

) n

( ) n q

= 1 − , (p > 1/2)

p

Now we can correlate this problem with the two before. I will only go

over the previous one, but the method holds for both.

In the previous problem, the initial distance of Nancy the Caretta from

the sea was only one step, and there was an infinite distance from where

she stood to the opposite direction of the sea. At each step, Nancy had a

1/4 chance to move toward the sea and a 3/4 chance to move elsewhere.

Assume that a player has as many units of money as Nancy’s distance

to the sea. This means that the player has 1 unit at the start of the

game. Following the same analogy, the player’s opponent will have the

same amount of money as Nancy’s distance to away from the sea, that is,

infinity. On each round, the player will have a 3/4 probability to take one

unit from their opponent, and a 1/4 probability to give his opponent one

unit. We can now incorporate the suitable one from our formulae. In this

problem, the value 3/4 corresponds to p. Then we need to use the formula

for the p > 1/2 case and also plug in 1 for n, because the player’s initial

budget is one unit.

lim

k→∞

1 −

(

q

p

1 −

(

q

p

) 1

) k

= 1 − q , (p > 1/2)

p

So, the probability for this player to stall the game indefinitely will

be 1 − q p

. In other words, the probability that Nancy never makes it to

the sea is 1 − q p . We have p = 3/4 and q = 1/4 so q p is equal to 1 3 . This

confirms the result we found earlier. Now, if you will, you can use this

method to solve the 10 th problem in the book.

This method also resolves an ambiguity we have come across during the

two previous questions. We had found two results for Nancy’s probability

o reach the sea: 1 and q p

. We had to question which result corresponded

to which value of q afterwards. We have just shown that for p > 1/2, that

is, p ≤ 1/2 , we need to consider q p

as the correct result. Otherwise, for

the p ≤ 1/2 case, our result will be 0. We can estimate this from the two

results we have just found.

lim

k→∞

) n

(

1 − q

p

( ) k

= 0, (p < 1/2)

1 − q

p

n

lim = 0, (p = 1/2)

k→∞ k

Then in the case where p ≤ 1/2 , so q ≥ 1/2 holds, our probability

of not being able to stall the game forever is 1. In Nancy’s case, that

means eventually reaching the sea. In that problem, Nancy’s probability

of making it to the sea had turns out to be 1 when q ≥ 1/2, and q p when

q < 1/2. That is already what he had found, but now we have a sound

proof for it.

Lastly, I want to mention an additional method for those who are

really curious. This is clearly a more advanced method, so forgive me as I

will be a little hand-wavy with the demonstration. Also, chances are you

have not come across such a method before, so I urge you to consult other

sources if you want to learn.

Recall that we had the equation P i = pP i+1 + qP i−1 in the solution

of the problem. I had told you that such a kind of equation was called

a recurrence relation. Recurrence relations are equations where the n th

term of a sequence (a n ) is expressed in terms of the k previous terms of

the same sequence. Recurrence relations have the following general form:

a n = c 1 a n−1 +c 2 a n−2 +c 3 a n−3 +· · ·+c k a n−k , (c i is a constant coefficient.)

It is possible to find the recurrence relation of an iterative sequence to

find a formula for the iterative sequence. For this method, our iterative

sequence must be in the form of ar n . Suppose that our P n in the problem

can be written in the form ar n . Then, we can write the equation P i =

pP i+1 + qP i−1 as follows:

ar n = par n+1 + qar n−1

Dividing each side by ar n−1 ,

r = pr 2 + q

0 = pr 2 − r + q

This equation is called the characteristic equation of the recurrence

relation. We can decompose this equation in the following way to find its

roots, the suitable values of r.

(

p (r − 1) r − q )

= 0

p

r = 1 or r = q p

Then the suitable values of r are 1 and q p . We know that when p

and q are equal, both of these values are equal to 1. But let us first

suppose that p and r are distinct. Then, we will have two distinct values

for r. How do we understand which one is the correct result? Instead of

choosing between these values, we need to write P n in such a way that it

incorporates both values of r:

( n

P n = α (1) n q

+ β

p)

Since we know that P 0 = 0 and P k = 1 we can find the α and β

constants in the equation. If we substitute 0 and k for n in the formula,

we can see that

) n

(

1 − q

p (

P n = ( ) k 2 nd formula ) (p ≠ q)

1 − q

p

Now for the other case, suppose that p and q are equal. In this case,

the single possible value for r is 1. This means that our characteristic

equation has a double root. In such a case, the form of the formula of our

iterative sequence will be a little different:

P n = α ′ (1) n + β ′ n (1) n

Again, using the values of P 0 and P k , we can find the constant coeffi-

cients α ′ and β ′ :

P n = n k

(

1 st formula ) (p = q = 1/2)

This means that we can reach the same implicit formulae for the iterative

sequence using the recurrence relation embedded in the problem.

It is possible to use this method in more complex iterative sequences. In

our problem, the characteristic equation was only of the second degree.

However, the implicit formula can still be found with equations of higher

degree, as long as the roots are found.

This method is used extensively especially in discrete mathematics,

and is very practical. It can even be used to find an implicit formula to

the Fibonacci sequence. This lets us do marvellous things, such as directly

calculating the 1000 th term of the Fibonacci sequence. Now it’s your turn

to try to use this method to find a formula for the Fibonacci sequence.

Just as a starting point, write out the Fibonacci sequence as a recurrence

relation, and proceed with finding its characteristic equation.

Picking Seats in a Theater

Arthur has managed to find a ticket for the first screening of the highlyanticipated

movie called “The Mystery of Pi”. There will be 100 viewers

in this grand premiere of the movie. Unfortunately, Arthur is running the

risk of being late, as he is currently stuck in traffic.

After some time, the traffic clears up and Arthur manages to make it

to the theater on time. There is already a huge line at the entrance when

he arrives, and he joins the line as the last person, that is, the 100 th . The

viewers will be let in one by one, get seated at their seats according to the

numbers on their tickets, and the screening will then commence.

However, the first person to enter the hall freaks out and takes a random

seat, regardless of what is written on their ticket. The other attendees

that follow keep entering one by one and either take the seat that is written

on their ticket if it’s empty, or take another random seat if their own

seat is occupied. What is the probability that Arthur, the 100 th person

to enter the hall, takes his own seat? If you manage to solve this one, try

solving it for the 99 th entrant to the hall.

Solution

This problem is another one of those pesky-looking but actually surprisingly

easy-to-solve problems, the kind of which we have seen many in

the book. In fact, there is one extremely simple way to solve the problem

that does not even require any calculation. I could not come up with that

method when I first solved this problem, but maybe you can. Regardless,

I will first solve the problem using a more systematic and calculationdriven

method, and then move on to the simple method. Naturally, there

are many other ways to approach the problem, as is the case with any

problem in mathematics. Even the most unexpected of ways can lead to

the right result, provided that we appeal to no fallacies along the way.

It is important to make sense of what we are given and make the right

definitions before attacking the problem in problems such as this one. Let

us first list out the knowledge in our hands and fix our notation.

The line at the entrance starts from the 1 st at the very front and ends

with the 100 th person at the very end. Each of the entrants have a distinct

seat number written on their ticket, indicating where they must sit. Let

seat #1 be the seat that the first entrant must take, #2 be the seat the the

second entrant must take... and seat #n be the seat that the n th entrant

must take. With this notation, Arthur’s seat is #100.

We are told that the first entrant takes a random seat, and the remaining

entrants take their own seat if it is free and a random seat if it

is occupied. With this at hand, we are asked the probability that the th

entrant takes the seat #100. In order for the 100 th entrant to be able to

take his own seat, seat #100 must be free by the time he enters the hall.

This means that what we are asked is actually the probability that seat

#100 is free when the 100 th entrant enters the hall.

If someone has sit at seat #100 before the th entrant enters, then it

makes no difference who it is as the 100 th entrant will not be able to take

his own seat. What matters to us is whether it is free or occupied, not

whoever occupies it. This is critical to understand because we will now

make a change in the rules of the problem according to this revelation:

If any of the entrants following the first one find that their seat is

occupied by someone else, they get angry and make the occupier leave their

seat. They take their own seat, and the former occupier randomly takes

another free seat. What is the probability, then, that the 100 th entrant

finds his seat empty when it’s his turn to sit?

If the problem is phrased in this way, the only person to be able to skip

freely from one seat to another will be the first entrant to the hall. All

of the others will be sitting at their correct places when the 100 th entrant

enters. The first entrant will have to get up and take a random seat each

time they occupy the wrong seat. With this change made, the only one

that can sit at the wrong seat will be the first entrant.

This change simplifies the question by a great deal, but the problem

after the change is completely equivalent to the original problem. Recall

that we are only concerned with whether or not the seat #100 is occupied

in the end, and not with who potentially occupies it. This version of the

problem would still be valid if the person in question was the n th entrant

and not necessarily the 100 th one. Again, we would only be concerned

with whether or not the seat #n is occupied, and not with who potentially

occupies it.

Now we have a simpler problem that is still equivalent to the

original one. Let’s proceed with this modified version in the rest

of the solution.

In this modified version of the question, let’s think of the case where

the n th entrant (n ≠ 1) find their seat to be occupied. Since only the first

entrant can sit at a wrong spot, the occupier of the seat #n will be the

first entrant. This means that the seat #1 is currently empty. Also, all

of the entrants from the 2 nd to the (n − 1) th are sitting at their correct

spots. According to the new rules we have set, the n th entrant will force

the 1 st entrant to get up and take another random seat, and they will sit

at their correct place. This means that when the 1 st entrant is forced to

get up, all of the seats up to #n, except for #1, will be occupied. So

the 1 st entrant will have (100 − n + 1), that is, (101 − n) empty seats to

randomly choose from. These are, namely, seat #1 and all of the seats

with numbers higher than n.

When the 1 st entrant leaves seat #n, they will be equally likely to sit

at each of the (101 − n) remaining seats, so every single seat will have a

1

probability of

101−n

to be chosen.

Now that we have a better grasp of the problem, we can start the

solution. When it’s the 100 th entrant’s to take a seat, only one seat will

be empty: seat #1 or seat #100. We want to know the probability of seat

#1 to be empty. This probability is actually not that easy to solve, and it

is better to resort to a method that we have used in many of the previous

problems. Let’s find the probability that seat #100 is occupied when the

100 th entrant enters, and subtract that from the total probability, 1.

It is very important to make the right definitions in probability problems.

Let us now make a definition:

Let P (n) denote the probability that the n th entrant finds seat #n

occupied when it’s their turn to sit. We can also think of P (n) as the

probability that the 1 st entrant takes seat #n.

This definition makes sense for all n except n = 1. But what do you

think is P (1) equal to when n = 1? When the first entrant enters the

hall, they take a random seat as if their own seat is occupied. Thus, we

could say that the probability that the first entrant find his seat occupied

is virtually 1, that is, P (1) = 1.

The value we are interested in is P (100). However, let us first think of

how we can find the general term P (n) for any n.

The case where the 1 st entrant takes the seat #n can be divided into

sub-cases depending on the last seat they had taken before seat #n. The

question to be asked is: Which seat was the 1 st entrant sitting at right

before getting up and moving to seat #n?

The 1 st entrant might have been sitting at seat #(n − 1) before taking seat #n.

The 1 st entrant might have been sitting at seat #(n − 2) before taking seat #n.

The 1 st entrant might have been sitting at seat #(n − 3) before taking seat #n.

.

.

The 1 st entrant might have been sitting at seat #2 before taking seat #n.

The 1 st entrant might have sit directly at seat #n as soon as they entered the hall.

All of these cases end up with the 1 st entrant sitting at seat #n. These

are all distinct cases, each having a probability to occur. Then, in order

to find the probability of the 1 st entrant taking seat #n, P (n), we need

to sum the probabilities of each of these cases:

P (n) =

P (The 1 st entrant taking seat #n right after seat #(n − 1))

+

P (The 1 st entrant taking seat #n right after seat #(n − 2))

+

P (The 1 st entrant taking seat #n right after seat #(n − 3))

+

.

+

P (The 1 st entrant taking seat #n right after seat #2)

+

P (The 1 st entrant taking seat #n directly as their first seat

How will we calculate the probabilities of these cases, of which the sum

is equal to P (n)? All of these cases conform to the same general form: The

1 st entrant taking seat #n right after seat #m. So what we are actually

looking for is:

P (The 1 st entrant taking seat #n right after seat #m)

By our definition, the probability for the 1 st entrant to take seat #m

is P (m). Also, as we have found earlier, the probability for the 1 st entrant

1

to take seat #n right after leaving seat #n has value

101−m . Therefore,

the probability that the 1 st entrant takes seat #n right after taking seat

#m is the product of these two:

P(The 1 st entrant takes seat #n right after seat #m) = P (m)

1

101−m

Thus, we can show P (n) as follows:

P (n) =

P (n − 1) P (n − 2) P (n − 3) P (2) (1)

+ + +. . .+ +P

101 − (n − 1) 101 − (n − 2) 101 − (n − 3) 101 − 2 100

P (n) =

n−1

∑

k=1

P (k)

101 − k

Thus we have found a way to write P (n) in terms of the other probability

values preceding it. Let’s write the value of P (n) for a few n values.

P (1) = 1

P (2) = P (1)

100

P (3) = P (2)

99 + 1

100

P (4) = P (3)

98 + P (2)

99 + 1

100

P (5) = P (4)

97 + P (3)

98 + P (2)

99 + 1

100

.

Notice that we can rewrite the same values as follows:

P (1) = 1

P (2) = P (1)

100

P (3) = P (2)

99 + P (2)

P (4) = P (3)

98 + P (3)

P (5) = P (4)

97 + P (4)

.

This means that it is possible to express P (n) only in terms of P (n−1).

As so, we can express any P (n) where n > 2 in the following form:

P (n) =

P (n − 1)

+ P (n − 1)

(102 − n)

Carrying out the addition on the right-hand side,

P (n) =

(103 − n) · P (n − 1)

(102 − n)

Then, the values for P (n) will be as follows:

P (1) = 1

P (2) = 1

100

P (3) = 100

99 P (2)

P (4) = 99

98 P (3)

.

P (99) = 4 3 P (98)

P (100) = 3 2 P (99)

Let’s substitute 4 3P (98) for P (99) in the expression of P (100). Then,

let’s repeat the same process by substituting 5 4P (97) for P (98). Reiterating

this until there are no terms of the form P (n) yields the following

equation:

P (100) = 3 2 · 4

3 · 5

4 · · · 98

97 · 99

98 · 100

99 · 1

100

This is quite an exhilarating result. Carrying out the simplifications,

we get:

P (100) = 1 2

As a result, we have found that the probability for the 1 st entrant to

take seat #100 is 1/2. In other words, the probability for the 100 th to find

his seat occupied by the 1 st entrant when it’s his turn to sit will be 1/2.

Then, his probability of finding his seat empty will also be 1/2, which

is the result that we are after. The probability that Arthur, the 100 th

entrant to the hall, finds his seat empty is 1/2.

Notice that we can find any entrant’s possibility to find their seat

empty, not just that of the 100 th . If we do the exact same process as

before but now for P (99), for instance, we will get the following:

P (99) = 4 3 · 5

4 · 6

7 · · · 98

97 · 99

98 · 100

99 · 1

100 = 1 3

This means that the probability for the 99 th entrant to find their seat

occupied is 1/3. Then their probability to find their seat empty is 2/3.

By this point, you should be able to get a feel for what the other

entrants’ probabilities of taking their own seats is. Following the same

logic as in the calculation of P (100) and P (99), we see that for each

n ≠ 1, the probability that the n th entrant finds their own seat empty is

1

102−n . Subtracting this from 1, the probability that the nth entrant takes

their own seat is found to be 101−n

102−n

. That is,

The probability that the 100 th entrant takes his own seat is 1/2.

The probability that the 99 th entrant takes their own seat is 2/3.

The probability that the 98 th entrant takes their own seat is 3/4.

.

The probability that the 4 th entrant takes their own seat is 97/98.

The probability that the 3 rd entrant takes their own seat is 98/99.

The probability that the 2 nd entrant takes their own seat is 99/100.

Now we’ll approach the problem with the ingenious second method

that lets us find any entrant’s probability to take their own seat without

requiring any calculation. We can carry this method out on the original

phrasing of the problem. In the problem, when it’s the 100 th entrant’s

turn to sit, there will only be 2 available seats left: either seat #1 or

seat #100. Up until the last entrant, every entrant will be equally likely

to choose seat #1 and seat #100 as they are picking a random seat. 1

This applies also for the 100 th entrant, in whose case there are only two

available seats left: seat #1 and seat #100. This means that either of

these two seats will have a 1/2 probability to be free. Then, the 100 th

entrant has a 1/2 probability to be able to get seated in his own seat.

1 Please make sure you clearly understand this statement before moving on.

We can deliberate in the same way for the other entrants as well. There

is an important fact to consider: there are some seats that are certain to

be occupied when the n th entrant enters the hall. These are namely, all

of the seats from #2 to #(n − 1), and there are n − 2 of them in total.

Out of the other 100 − (n − 2) = 102 − n seats, one other must also be

occupied, because n − 1 seats are guaranteed to be occupied when the

n th person enters the hall. We can say that up to the n th entrant, each

time someone has to pick a random seat, they are equally likely to pick

any of the (102 − n) seats available to them. This means that when the

n th entrant gets to make their choice, the probability for seat #n to be

1

101−n

occupied will be

102−n

, so its probability to be available is

102−n

. This is

in accordance with the result we have found using the first method.

To sum, the probability for the n th entrant to be seated at their own

spot is 101−n

102−n

Thus, for Arthur, the 100 th and final entrant to the theater hall, the

probability of sitting at his own seat is 1/2.

Buses on the Kadıköy - Bostancı Line

Kadıköy and Bostancı are two neat, high-profile residential areas in

Istanbul.

Emma walks from Kadıköy to Bostancı each day on the way home from

work. She is normally able to walk this 9-kilometer route quite easily, but

today she feels exhausted after the first 3 kilometers. She decides to take a

bus for the rest of the way. However, as a lot of bus drivers are on a strike

today, there is only a single working bus on the Kadıköy - Bostancı line.

This bus drives back and forth between Kadıköy and Bostancı without

ever stopping.

Emma wants to take the bus for the remaining 6 kilometers to her

house. As she stands, she is already at a bus stop on the route of the bus,

so she starts waiting right where she is. What is the probability that the

bus is headed for Bostancı the first time Emma sees it?

The same thing happens on another day. However this time, the only

difference is that the bus drivers’ strike is over, and there are n active buses

instead of 1. These n buses all drive back and forth between Kadıköy and

Bostancı independent of each other and without any stoppages. What

is the probability, in this case, that the first bus Emma sees happens to

be headed towards Bostancı? (Even if you do not manage to solve this

one, think about whether or not the answer is different from that of the

previous question.)

Assume that stopping at a bus stop takes zero time for a bus and that

all of the buses are moving with an equal, constant velocity.

Solution

The first of these two related questions is a simple one, maybe even

the simplest in this book. It requires virtually no calculation. The second

one requires us to think a little differently, but still is simple in essence.

The first of these questions was originally posed by two physicists called

Marvin Stern and George Gamow. The two observed that, peculiarly, the

elevators which arrived first at the higher stories of a building were usually

headed upwards, and the elevators which arrived first at the lower stories

were headed downwards. Upon this realization, many tried to make sense

of why this was the case, which gave way to the first question of this

problem. Of course, the way it is asked here is phrased a little differently,

but be assured that it is definitely still the same question. The second

question is a variant of the first one. You will see at the end that there is

a further, third question which is related to the first two. It is given as an

additional problem, and you are expected to solve it by yourself.

Let’s start with the first question. To get it out of the way, the wrong

answer to this question is 1/2. Just because there are two possible cases

does not mean that they are equally likely cases. This wrong way of

thinking is reminiscent of the Monty Hall problem, where there being two

doors to choose from did not mean that each door had a 1/2 probability

to be the right one.

But how exactly will we answer this question? Let us model the way between

Kadıköy and Bostancı as a line segment. If we designate Kadıköy’s

position by x = 0 and Bostancı’s position by x = 9, the position at which

Emma is waiting for the bus becomes x = 3. Clearly, the x-value shows

the distance from Kadıköy in kilometers.

The bus is faring constantly between Kadıköy and Bostancı, that is,

between x = 0 and x = 9. When Emma starts her wait, the bus can be

anywhere between these two positions. If it happens to be to the left (on

the line segment) of Emma when she starts to wait, regardless of which

way it was headed at that point, it will be headed towards the right, to

Bostancı, when it arrives at Emma’s location. Likewise, if it happens to be

to the right of Emma the moment she starts waiting, it will definitely be

headed towards the left, to Kadıköy, when it arrives at Emma’s position.

So, the direction of the bus when Emma first sees it is solely dependent

on where she starts her wait. We are asked the probability that the bus is

headed towards Bostancı when she first sees it. In this case, we need the

bus to be to the left of Emma when she starts waiting. The probability

of this being true is 3/9, or 1/3.

The second question is exactly the same as the first one except for

one difference: there are now n independent buses in circulation. What

do you think the probability is in this case? Is it still 1/3 or something

entirely different?

When the number of buses in circulation increases, the probability

that the first bus Emma sees is headed towards Bostancı will also increase

and gradually approach 1/2. There is a brilliant way to grasp the intuition

behind this. After we get the intuition, we will go on to derive the formula

which gives us our desired result for n buses.

Let’s refer back to the line segment model we have just used in order

to start off.

As the number of buses in circulation increases arbitrarily, there will

be a bus at each point on this line segment. Also, on average, half of these

buses will be headed towards the left and the other half will be headed

towards the right. Since there is an extremely large amount of buses on

the route, there will be many buses in the neighborhood of where Emma

is standing. This means that Emma will see a bus almost instantly after

she starts waiting. Because both her left and right sides are bustling with

buses, the bus she first sees will have a half-and-half probability of being

headed towards the right and left. This means that as the number of

buses on the line increases without bound, the probability for the first bus

she sees to be headed towards Bostancı will approach 1/2. Notice also

that in this case, the position at which she is waiting has no importance,

because wherever she stands, there will be many buses in the immediate

neighborhood. We will soon see all of these inferences reflected in a neat

little formula that we will derive.

We now have an intuitive sense of why the asked probability increases

from 1/3 to 1/2 as the number of active buses increases. Now, we need to

find what the exact probability when there are exactly n working buses.

In order to find this, we need to change our model from a line segment

into a circle.

Let us suppose that all of the buses are moving clockwise on this circle.

By doing so, we accurately model the buses going back and forth between

Kadıköy and Bostancı. If a bus arrives at x = 3 before one arrives at

x ′ = 3, this means that the first bus that Emma sees is moving towards

Bostancı. Otherwise, if a bus arrives at x ′ = 3 before one arrives at x = 3,

the first bus that Emma sees is moving towards Kadıköy.

Let us now split the circle into two arcs. Let A denote the large arc

of 12 kilometers between the points x = 9 and x = 3, and B denote the

smaller arc of 6 kilometers between the same points.

You will soon see why we split the circle as such. We can now start

analyzing the question.

Notice that any bus on the arc A will always arrive at Emma’s location

earlier than any bus on the arc B . Because any bus on arc A is at

most 6 kilometers away from Emma, while any bus on arc B is at least

6 kilometers away. Hence, if there is at least one bus on arc A, we can

completely ignore all of the buses on arc B.

Then let’s suppose first that there is at least one bus on arc A. What

is the probability in this case, that the first bus Emma sees is headed

towards Bostancı? In order to find this, let’s demonstrate arc A as a line

segment again, and divide it right in the middle to obtain two smaller

intervals, each 6 kilometers in length. We will call these intervals interval

1 and interval 2.

On arc A, which we have shown here as a line segment, the buses will

be moving from the left towards the right. The buses on this line segment

have a 1/2 probability to be in interval 1, and a 1/2 probability to be

in interval 2. The end of interval 1 is x = 3, and the end of interval 2

is x ′ = 3, and both correspond to the position where Emma is standing.

Then we can say that as long as there is at least one bus on the interval

A, there is a 1/2 probability that a bus arrives at x = 3 before one arrives

at x ′ = 3. Then, supposing there is at least one bus in interval A, the

probability that the first bus Emma sees is moving towards Bostancı is

1/2. This leads to the following question: What is the probability that

there is at least one bus in the interval A. We can do the same old trick by

finding the probability of there being no buses in interval A and subtract

it from 1. The probability that there are no buses in interval A is equal to

(6/18) n . Then, the probability that there is at least one bus in interval A

is (1−(1/3)) n . As we have said, supposing that there is at least one bus in

interval A, the probability that the first bus Emma sees is headed towards

Bostancı is 1/2. Then, putting these two together, the probability that

there is at least one bus in interval A and that the first bus Emma sees is

moving towards Bostancı is:

1 − ( )

1 n

3

2

Now, let’s analyze the case where there are no buses in interval A. This

is equivalent to saying that all of the buses are in interval B. If all of the

buses are in interval B, then it is certain that a bus will arrive at x ′ = 3

before one arrives at x = 3. In other words, the first bus that Emma

sees is certain to be headed towards Kadıköy. Then, if there are no buses

in interval A, it is impossible for Emma’s first bus to be moving towards

Bostancı.

In short, for the first bus that Emma sees to be headed towards

Bostancı, there has to be at least one bus in interval A. We have just

calculated the probability that there is at least one bus in interval A and

that the first bus Emma sees is headed towards Bostancı. Then, when

there are n active buses, the probability that the first bus Emma sees is

going towards is Bostancı, P (n), is equal to the following:

P (n) = 1 − ( )

1 n

3

2

Now that we have found the solution, let’s analyze it and see if it

makes sense. Before deriving the formula, we had made some intuitive

realizations of which conditions the result should satisfy. As we have

estimated, as the total number of active buses, n, increases, the probability

that the first bus that Emma sees to be moving to Bostancı increases. Also,

as the number of buses on the line gets arbitrarily large, our probability

approaches 1/2 asymptotically. We can see this by examining the following

limit:

1 − ( )

1 n

3

lim = 1

n→∞ 2 2

By this expression, we can also see that Emma’s position of waiting

becomes irrelevant as the number of active buses approaches infinity. The

only term in the expression that is dependent on Emma’s position is the

fraction 1 3

in the denominator. This term varies between 0 and 1 depending

on Emma’s position. However, as we know the limit of r n where r is any

real number between 0 and 1, approaches 0 as n approaches infinity. Then,

as the number of buses approaches infinity, our probability will approach

1/2 regardless of where Emma is standing.

Finally, if we substitute 1 for n, we can find the result when there is

only one working bus on the line:

P (1) = 1 − ( )

1 1

3

= 1 2 3

This is exactly what we were asked in the first question, and it agrees

with the result we had previously found, 1/3.

Additional Problem

Suppose all of the same details are valid, however this time, as Emma

starts her wait, there are n active buses on the route making their last

round. This means that there are n buses on the line, each having a

random position and a random direction, which will drive on until they

reach either Kadıköy and Bostancı, and then stop altogether. What is

the probability that Emma sees a bus at all, and that bus happens to be

headed towards Bostancı?

COVID-19 Swim Ring

Olivia’s father bought her her a swim ring that is 5 meters in circumference,

as a graduation gift. Thrilled by this gift, Olivia took three of

her friends to the seaside to try her new swim ring out. According to a

warning on the instruction manual of the swim ring, if four people try to

get on the same quarter of the ring, the ring capsizes. Of course, the kids

were too excited to read the instruction manual and wanted to get on the

ring as quickly as possible. They each got on a random point on the ring.

What is the probability that the swim ring capsizes?

What is the probability that the quadrilateral whose vertices are marked

by the points occupied by the children comprises the center of the ring?

At this time, Olivia’s father was worried at home because he had realized

that he forgot to tell Olivia to practice social distancing, due to

the ongoing COVID-19 pandemic. He was sure that he had seen one of

Olivia’s friends coughing constantly in the last few days. He grew increasingly

anxious and decided to make a calculation of probability. What is

the probability that the four kids who get on a random point of the ring

respect social distancing? For the sake of this question, if the arc of the

circle between any two kids on the ring is at least 1 meter long, then social

distancing is respected.

Assume that the swim ring is a perfect circle. Also, all three questions

of the problem are independent of each other.

Solution

The first question in this problem is inspired by a problem seen on

the website Brilliant. The other two were authored by me. In my humble

opinion, these three questions are the ones with the most elegant solutions

in this book.

Like all of the other problems in this book, this one has a very simple

solution. However, naturally it isn’t always easy to devise these simple

solutions The first question of this problem is a good example of this fact.

The first time I saw this, I immediately thought that I would have to take

integrals to find the result. It is possible to find the solution using that

method, however it took me much longer to see the other, much easier

method. I will start out with the simple method, then I will also tell you

about the one involving integrals.

Four children will all take random spots on the circumference of the

ring. We are interested in their probability of being in the same quarter.

Let us designate the four children with the letters A, B, C, D. Let A be

the first one to get their place on the ring. They are free to take any point

on the ring, so it is up to the three other kids to pick the same quarter

as A. Let us mark a quarter of the circle, starting from A’s position and

going clockwise.

A

If the other three children place themselves on this quarter of the

circle, then all four will be in the same quarter and the ring will capsize.

Independent of each other, each children has a 1/4 probability to take

a spot on this quarter of the circle. Thus, the probability of the three

simultaneously being on this quarter is ( 1

4) 3.

Notice that in this illustration of the ring, we assumed that the kid at

the "start" of the arc, that is at the most anticlockwise position on the

arc, was A. However, it could have been any of the four kids at the start

of the arc. Thus, we need to multiply the probability that we have found

by 4. As a result, the probability that all four of the kids get on the same

quarter of the ring turns out to be 4 · ( 1

4

probability for the ring to capsize is 1 16 .

) 3, which is equal to

1

16

. So, the

Thus we have found the answer to the first question in the problem.

A valid question to ask is how the solution would change if we had not

four, but n children. In fact, we can use the exact same method. We

place one out of the n children on the ring, mark a quarter of the ring

starting from the first kid’s position and going clockwise, and each of the

remaining n − 1 children will have a 1/4 probability to be in this marked

quarter. Then, the probability of n children taking random spots on the

ring to all be in the same quarter, P 1/4 (n), is the following:

( n−1 1

P 1/4 (n) = n ·

4)

What if we were asked the same question with n children not for a

quarter of the ring, but for an arbitrary arc? If this arc is not greater in

length than half of the circle, then the same exact method is still valid.

For instance, the probability for n children to get on the same half of the

ring is n · ( 1 n−1.

2)

However, this line of thinking does not work for arcs

greater than half of the circle, because in that case some possible positions

of the children are counted more than once. This leads to a value higher

than the correct answer. As long as the given arc is smaller than one half

of the circle, if the given arc is 1 k th of the circumference, then we can

use the following formula to find our probability of having all n children

on the same arc: (0 ≤ k ≤ 1/2),

P k (n) = n · (k) n−1

This is the simple way of solving the first question of the problem. Now,

for those who are interested, I will go over another way which involves

integration but still is not very complex. It would be of great help if you

had some prior knowledge of probability theory, especially on the topic of

probability density functions. If you are not familiar with this concept, I

recommend you to do some light reading on it before moving on.

Let’s assume that, again, n children are to climb up on this circle with

a circumference of 5 meters, and calculate the probability that all of them

get on the same quarter. The first thing we need to do is to select two

children to mark the two extremities of the quarter that will comprise all

children. We can do this selection in n(n − 1) different ways. We will then

proceed to find the probability that the remaining n − 2 children all pick

a spot on the arc between these two children.

If we call the length of this arc X, X must be between 0 and 5/4

meters long. The random variable X shows a uniform distribution and its

probability density function is the constant function 1 5 m−1 . Each of the

n − 2 children except for the two at the extremities will have a probability

of X 5

to be on this arc of the circle with length X. Then we need to take

the integral of X from 0 to 5/4.

∫ 5/4

1

( x

) (n−2)

P 1/4 (n) = n (n − 1)

dx

0 5 5

∫

n (n − 1) 5/4

=

5 n−1 x (n−2) dx

=

0

(

n (n − 1) 1 5

5 n−1 ·

n − 1 · 4

) (n−1)

When we carry out the simplifications, we find the formula to P 1/4 (n)

the same as before:

( (n−1) 1

P 1/4 (n) = n ·

4)

Now we can move on to the second question of the problem. A 3-

dimensional variant of this question was asked in the 1992 edition of

the Putnam Mathematical Competition, where undergraduate college students

from the US and Canada compete. In that version of the problem,

four random points on the surface of a sphere were chosen and the probability

that the tetrahedral whose vertices lie on these points to comprise

the center of the sphere was asked. Contrary to what it looks like, there

exists a simple way of solving this problem as well, however, it is very

difficult to perceive. I advise you to try to solve it anyway. If you fail, you

can always look up the answer on the internet. I assure you, you will be

captivated by how elegant the solution is.

I added in the second question of this problem because its solution is

directly related to the solution of the first one. Of course, as we have said

before, there are many ways to solve a problem and you might have taken

a completely different route in solving the first one than I did. However, I

will only present here the method which is connected to the first question.

In order to make this connection, we first need to make an important

realization: If all four of the points on the circle lie on the same semicircle,

then the quadrilateral formed by these four points will not comprise the

center. In order for the center to lie inside the quadrilateral, at least one

point must lie in a different half of the circle than the others. You can see

this in the illustration below.

While solving the previous question, we had found a formula that gives

us the probability of n randomly-picked points lying on the same half of

the circle: n · ( 1 n−1.

2)

When we substitute in 4 for n in this formula,

we get 1/2. Subtracting this value from 1/2, we find the probability that

not all four points lie on the same semicircle. This is equivalent to the

probability that the quadrilateral formed by these four points comprises

the center. So, our result is 1/2.

To generalize this result, the probability that the n-gon formed by n

children randomly picking spots on the circumference of a circle comprises

the center of the circle is

( n−1 1

1 − n ·

2)

Thus we have concluded the second question, and now we move on to

the final one. As I penned this question, the COVID-19 pandemic was

still in full swing. I hope that by the time you are reading these lines, the

pandemic is well over. If it is still ongoing, please do continue to respect

social distancing. Don’t overindulge like the brats in this question and

randomly jump up on a swim ring with your friends. Surely, it cannot be

that difficult. Actually, I might have gotten this excited as well if I had a

swim ring that is 5 meters in circumference, so I might not be the one to

judge.

Also, maybe we don’t have to worry at all. Maybe the children in the

question managed to respect social distancing when they got on randomly.

We will now see how probable this is. However, let’s start out by answering

the same question but with two, and with three children before, so that

we get a better sense of how to approach the actual question with four

children.

We are told that in order for social distancing to be respected, the

arc between any two children must be at least 1 meter long. Also, the

circumference of the ring that the children will climb up on is 5 meters.

Let’s first assume that only two children get on the swim ring. The

first kid is free to climb up on any point on the ring. The position of the

second kid must be at least 1 meter in arc length away from the first one.

In other words, the second kid must not pick a spot on the 2-meter-long

arc on the circle, which has the position of the first kid as the midpoint.

As long as the second kid picks a point on the remaining 3-meter-long arc

to climb up on, social distancing is respected. This has a probability of

3/5 to occur. You can see an illustration of this case below. Without loss

of generality, the first kid to climb up on the ring is assumed to be on the

uppermost point of the circle.

3/5

Now we move on to the case with three children. Let us designate

these children with letters A, B, C. The first kid to get on the ring, A, is

free once again to pick any point. Calling this point the 0-meter point,

let’s go clockwise around the circumference of the circle and number some

points. Since the circumference of the circle is 5 meters, the position of A

will also be the 5-meter point.

A

0(5)

We can also picture this circle as a line segment which starts from

the point A and again, ends with the point A. Now, let X and Y be

the positions of B and C on the circle. These two independent random

variables take random real values in the interval [0, 5]. 1 There are some

criteria which must be met in order for the social distancing to be respected.

First off, concerning the distance between A and the other two,

the inequality 1 ≤ X, Y ≤ 4 must hold. Second, between the children B

and C, the inequality |X − Y | ≥ 1 must hold. I assume you have an idea

of how to proceed after determining these two inequalities. We will use

the same method as the one that we have used in the solution of the ninth

1 They are actually random variables which demonstrate a uniform distribution

between 0 and 5. The word “random” is used here in this sense.

problem. We will plot these X and Y variables on the horizontal and vertical

axes of the coordinate plane. Recall that they show us the positions

of B and C, respectively. Then, we will divide the area which respects the

inequalities 1 ≤ X, Y ≤ 4 and |X − Y | ≥ 1 by the total area. The total

area is 25 unit squares, because both X and Y take values between 0 and

5. The colored area in the coordinate plane below is the area where the

stated inequalities hold.

As can be seen on the coordinate plane, the area where the inequalities

hold is 4 unit squares. Then, the probability that social distancing is

respected with 3 children on the ring is 4/25.

We have found that the probability of social distancing being respected

is 3/5 with two children, and 4/25 with three children. Now we have the

correct mindset to go after what is actually asked in the question, that is,

when four children get on the ring randomly.

In the same way as before, let us designate the four children with the

letters A, B, C and D. A can pick any point on the ring to get on. We

will call A’s position the 0-meter point and number the circumference of

the circle.

A

0(5)

Let X, Y and Z designate the positions of A, B and C, respectively.

X, Y, Z will take on random real values in the interval [0, 5].

At this point, if we were to plot out the points as before, we would need

a 3-dimensional coordinate space. This is difficult impractical to visualize

mentally and also on paper, also it is impossible when there are more than

four children. Hence, instead of this geometrical approach, we will adopt

another approach which works well with any number of children. This

approach involves triple integrals. This might be a foreign concept for

some readers, but it is very powerful and handy. Also, in essence, a triple

integral is only three integrals taken in a row, from inside to out.

We will first suppose that X < Y < Z. With this swim ring with

four people, there will be some conditions for each variable so that the

social distancing standards are met. The first is the inequality 1 ≤ X ≤

2, the second is the inequality X + 1 ≤ Y ≤ 3 and the third is the

inequality Y + 1 ≤ Z ≤ 4. As long as the positions of B, C and D respect

these inequalities, social distancing standards will be met. As we have

mentioned before, each of these three random variables (X, Y, Z) have the

constant function 1 5 m−1 as their probability density function. Now the

only thing left for us to do is to build the triple integral. Let P be the

probability we are interested in when the supposition X < Y < Z holds.

P =

∫ 2 ∫ 3

1

= 1

125

∫ 4

x+1 y+1

∫ 2 ∫ 3 ∫ 4

1

x+1

( 1

5 m−1 ) 3

dz dy dx

y+1

1 dz dy dx

In order to evaluate this triple integral, we will evaluate each integral

one by one starting from inside to outwards.

P = 1

125

= 1

125

∫ 2 ∫ 3

1

∫ 2

1

x+1

( x

2

= 1

125 · 1

6 = 1

750

(3 − y) dy dx

2 − 2x + 2 )

dx

This is the result with the supposition that X < Y < Z holds. However,

in reality, this ordering can be made in 3! different ways. To account

for this, we need to multiply the value that we have found by 6. Then, the

correct result turns out to be 1/125. It turns out after all, that Olivia’s

father is right to be worried. But it is not exactly Olivia’s fault either. After

all, children only care about getting together with friends and having

fun, not about social distancing!

Also notice that you can use this integration method for any number

of children. The only thing that will be different is the number of multiple

integrals. Geometrically, the increasing number of integrals correspond to

the increasing number of dimensions.

This was the solution to the third question, at least some part of it.

The real exciting part now follows. The method using multiple integrals

was a little bit advanced. However, as is the case with each problem in

the book, there is a simple method to this third question as well. We will

derive a formula that gives us the answer to this question and all of its

variants. However, our derivation will depend mainly on intuition and not

rigorous proof.

To generalize, let’s say that the circumference of the circle is c, the

required social distance in arc length is s, and the number of children is

n.

Assume that all n children take their place on the circle randomly.

Choosing any point on the circle, let us move clockwise, numbering all of

the children as 1, 2, 3, 4, . . . , n. These children will divide the circle into

n arcs. Let x i denote the length of the arc between the i th and ( i + 1 th)

children. 2

n

n

Regardless of how the children are distributed on the circle, the sum

of all x i will equal the circumference of the circle, c:

x 1 + x 2 + x 3 + . . . + x n = c

Let us now take into account what is needed for the social distancing

standards to be met. There are to be no children in a s meter range to

either side of each child. To make sure everyone complies with this, we

should think of the children not as single points, but as arcs of the circle

that are s meters in length, with the actual position of the children being

the midpoint of these arcs. This model makes sure that there is an empty

space of s/2 meters to either side of each children. Placing these arcs

on the circle will make sure that there is at least a distance of s meters

between any two children.

Each of the n children will correspond to an s meter long arc. When

we place these n arcs on the circle randomly, the circle will be divided into

n other arcs. Similar to what we did before, let us express the lengths of

these arcs by y 1 , y 2 , y 3 , . . . , y n . However, this time the sum of all of these

lengths will be c − n · s, since we subtract all n arcs of length s from the

circumference c.

2 As the only exception, we will call the length of the arc between the n th and 1 st

children x n.

n

n

Thus, regardless of how the arcs are placed on the circle, the following

equation will hold true:

y 1 + y 2 + y 3 + . . . + y n = c − n · s

You can think of these two equations in the following way: when n

children get on the swim ring randomly, they can be distributed in any

way along the length c. However, in order for them to be able to meet

the social distancing standards, they must be distributed on the length of

c − n · s. We could say that social distancing restricts where children can

be placed on the circle into a smaller region than the whole circumference.

Then, the probability of n children adhering to social distancing can be

thought of as c−n·s

c

. Because out of all the cases that can occur on the

length c, only those that occur on the length c − n · s can meet the social

distancing standards. But this does not give us the correct answer by itself.

We also need to take the (n − 1) th power of this value to get the correct

answer. This is due to the fact that the equations x 1 +x 2 +x 3 +. . .+x n = c

and y 1 +y 2 +y 3 +. . .+y n = c−n·s represent areas in (n − 1) dimensional

space, because both equations consist of n variables. To sum up, the

probability for the children to adhere to social distancing is ( )

c−n·s n−1.

c

When it comes to the two equations representing areas in (n − 1) dimensional

space, you can think of the fact in the following way: In our

previous method of solution, when we were dealing with four children, we

had to work with a triple integral or use a three-dimensional coordinate

space. Similarly, when we had two children at hand, we were working on

the coordinate plane. From these examples, you can guess that we will

be dealing with a n − 1 dimensional space when we have n children. You

might be confused as to why we had to take the (n − 1) th power instead

of multiplying by n − 1. Just think of it this way: When we want to

find the volume of a cube with an edge length of three meters, we do

not multiply by 3, but we take the 3 rd power. The length of one edge

is one-dimensional. On the other hand, the volume of a cube is in three

dimensions. Similarly, the expression c−n·s

c

in our problem is the ratio of

two lengths, however, we actually want the (n − 1) dimensional equivalent

of this value, which requires us to take its (n − 1) th power.

Thus, if we let P (c, n, s) denote the probability that n children randomly

picking a point on a circle with circumference c are all distant to

each other by s in arc length, then

( c − n · s

P (c, n, s) =

c

) n−1

If you do not feel convinced by this formula, you have every right to

do so, as the explanation here was far from a serious proof. But I hope

that you have at least understood why it makes sense. We can now try

to reproduce the earlier results we have found in order to see whether the

formula agrees in results.

In our question, the circumference of the circle (c) was 5 meters and

the social distancing distance (s) was 1 meter. With these conditions,

The probability that a single child respects social distancing is 1: ( )

5−1·1 1−1

5 = 1

The probability that two children respect social distancing is 3/5: ( )

5−2·1 2−1

5 =

3

5

The probability that three children respect social distancing is 4/25: ( )

5−3·1 3−1

5 =

4

25

The probability that four children respect social distancing is 1/125: ( )

5−4·1 4−1

5 =

1

125

As you can see, the formula that we have derived agrees on every result.

Additional Problem

I had told you that the first and the third questions in this problem

were independent from each other. Now we will change this fact with

slight modification and get a bit of conditional probability involved. If we

are given that the four children who pick a random spot to climb up on the

ring are not in the same quarter of the circle, what is the probability that

they are properly social distancing? (Hint: You can use Bayes’ theorem.)

Club, Bowling Alley or Theater

Sam is a very social young lady with a large circle of friends. They

meet up every week on Sundays. During these meet-ups, they go to one

of three locations: the theater, the bowling alley, or the club.

If they went to the theater last Sunday, they have a 20% probability

to go to the theater again, a 30% probability to go to the bowling alley,

and a 50% probability to go to the club.

If they went to the bowling alley last Sunday, they have a 40% probability

to go to the bowling alley again, a 30% probability to go to the

theater, and a 30% probability to go to the club.

If they went to the club last Sunday, they have a 40% probability to

go to the club again, a 20% probability to go to the theater, and a 40%

probability to go to the bowling alley.

This Sunday, Sam and her friends went to the theater. What is the

probability, then, that they go to the theater again two weeks later? 5

weeks later? 50 weeks later?

Bonus Problem: In the long run, what percentage of all Sundays will

Sam and her friends have spent in the theater?

Note that in the solution of this problem, Markov chains and a type

of matrix known as a transition matrix will be used. If you have no prior

knowledge of matrices, you should at least get a brief of what a matrix

is, and how matrix multiplication is carried out. It also helps to use a

calculator for the matrix operations.

Solution

If you try to solve this problem by hand, you will quickly realize that

it takes a great deal of time. This is only one of the reasons why will be

using matrices and letting calculators carry out the matrix operations in

this solution. I will first provide a quick explanation on matrices, however

I will only go over the bare-bones basics that you need to know in order

to understand the solution to this problem. Matrices are the foundation

of linear algebra and have a huge range of usage. We will only deal with a

small portion of this range, a portion which is used in probability theory.

I have tried to abstain from math-heavy jargon while explaining matrices,

as I have done in the rest of the book. So, it is safe to regard the

following explanation as only a simple introduction.

What is a matrix? In the most basic terms, you can think of matrices

as a table of numbers. Every matrix has a certain number of rows and

columns. What you see below is the general form of a matrix:

⎡

⎤

a 11 a 12 . . . a 1n

a 21 a 22 . . . a 2n

⎢

⎥

⎣ . . . . ⎦

a m1 a m2 . . . a mn

There are m rows and n columns in this matrix. We say that this

matrix has dimensions m × n. In the intersection of each row and column,

there is an element. These elements are named a ij (i ∈ {1, 2, . . . , m},

j ∈ {1, 2, . . . , n}), depending on which row and column they belong to.

There are some operations defined on matrices. For instance, two matrices

with the same dimensions can be added. This operation, called

matrix addition, can be done by adding up the elements in the two matrices

which are in the same same row and column. Another operation is

the multiplication of a matrix by a real number. This operation, called

scalar multiplication, involves multiplying all of the elements in a matrix

by the same real number.

One other operation is the multiplication of two matrices. In order for

two matrices to be multiplied, the number of columns of the first matrix

must be equal to the number of rows of the second matrix. For example,

let A be a m × n matrix and let B be a n × k matrix. In this case, the

product AB is defined, because both the number of columns of A and the

number of rows of B are equal to n. The product of this multiplication

will be a matrix of dimensions m × k. That is,

A n×m B m×k = C n×k

Notice that if we were to switch the places of A and B, the matrix

multiplication would not have been defined. Matrix multiplication is not

a commutative operation. That is, when we change the order of the two

matrices in the multiplication, we might get a different result or no result

at all. ( AB ≠ BA )

How do we carry out matrix multiplication? I will first show you the

general definition of matrix multiplication, then go over an example. Let

A be an m × n matrix and B be an n × k matrix.

⎡

⎤

a 11 a 12 . . . a 1n

a 21 a 22 . . . a 2n

A = ⎢

⎣

.

. . ..

⎥

. ⎦

a m1 a m2 . . . a mn

⎡

⎤

b 11 b 12 . . . b 1k

b 21 b 22 . . . b 2k

B = ⎢

⎣

.

. . ..

⎥

. ⎦

b n1 b n2 . . . b nk

If we call the product of these two matrices as C (that is, AB = C )

the matrix C will be as follows:

⎡

⎤

a 11b 11 + a 12b 21 + . . . + a 1nb n1 a 11b 12 + a 12b 22 + . . . + a 1nb n2 . . . a 11b 1k + a 12b 2k + . . . + a 1nb nk

a 21b 11 + a 22b 21 + . . . + a 2nb n1 a 21b 12 + a 22b 22 + . . . + a 2nb n2 . . . a 21b 1k + a 22b 2k + . . . + a 2nb nk

C = ⎢

⎣

.

.

.

..

⎥

.

⎦

a m1b 11 + a m2b 21 + . . . + a mnb n1 a m1b 12 + a m2b 22 + . . . + a mnb n2 . . . a m1b 1k + a m2b 2k + . . . + a mnb nk

That is, if we designate the element of C which is on the i th row and

j th column of the matrix c ij , then c ij will be equal to

c ij =

n∑

a it b tj

t=1

You might be wondering why matrix multiplication is defined in this

way. Why don’t we just multiply the elements in the two matrices which

are in the same position, as is the case with matrix addition? There is a

very real and satisfying explanation as to why matrix multiplication is defined

in this way. However, you need more than an elementary knowledge

of linear algebra in order to understand his explanation. For now, just

accept it as it is and move on. Let us go on to work an example. Taking

two matrices with dimensions 3 × 2 and 2 × 3,

A 3×2 =

⎡

⎣ 1 2

4 5

3 5

⎤

⎦ B 2×3 =

A 3×2 B 2×3 = C 3×3

[ 3 2

] 5

2 4 3

I have colored two elements each from matrices A and B. The colored

elements of A are in second row, whereas those of B are in the first column.

Then, the element in the intersection of the first row and the first column

in the product matrix C will be calculated as follows: the elements in

yellow will be multiplied, then the elements in green will be multiplied,

and finally the two products will be added. That is, 1 · 3 + 2 · 2 = 7 .

This value is the element in the intersection of the first row and the first

column of C. The other elements of C will also be calculated much in the

same manner, and the result will be the following:

⎡

⎤ ⎡ ⎤

1 · 3 + 2 · 2 1 · 2 + 2 · 4 1 · 5 + 2 · 3 7 10 11

C = ⎣4 · 3 + 5 · 2 4 · 2 + 5 · 4 4 · 5 + 5 · 3⎦ = ⎣22 28 35⎦

3 · 3 + 5 · 2 3 · 2 + 5 · 4 3 · 5 + 5 · 3 19 26 30

I think you now have at least a slight idea of how matrix multiplication

works. If you are still confused, you can consult the internet or other

resources. Anyway, why did we learn about matrices in the first place?

How does matrix multiplication have anything to do with this problem?

We will soon see that we can show the probability we are looking for

very easily by means of matrix multiplication. But won’t it take too long

to do all the matrix multiplications? Fortunately, we have calculators

that carry out matrix multiplications very quickly, and they are easily

accessible online. This will make our lives much easier than by attacking

the question by hand. Let us start with the solution now.

Let T ,B and C designate the theater, the bowling alley and the club,

respectively. Let X 0 be the location that the group visited this Sunday.

We are given that the group visited the theater this Sunday. We can thus

say that X 0 = S. Let X n be the location that the group visits n Sundays

after this one. For instance, the group will visit X 1 next week and X 2

the week after, and so on. Each X n is in reality a random variable. 1 The

variable X n will take one of the values in the set {S, B, D}. We are asked

the three following probabilities:

P (X 2 = S) = ? P (X 5 = S) = ? P (X 50 = S) = ?

We are given several probability values in the problem. If we know

where the group went on the (n − 1) th Sunday, we know the probabilities

of where they will go on the n th Sunday. Notice that in order to estimate

where the group will go on the next Sunday, it is sufficient to know where

they went this Sunday, knowing where they went on previous Sunday is of

no importance. In other words, in order to guess where the group will go

on any Sunday, the only information we need is where they went on the

previous Sunday. We can express this in the following way:

P (X n = j|X n−1 = i, X n−2 = i n−2 , . . . , X 1 = i 1 , X 0 = i 0 ) = P (X n = j|X n−1 = i)

The equation we have written above is known as a Markov property in

probability theory. The sequence of random variables X 0 , X 1 , X 2 , . . . , X n

that we have just constructed is known as a Markov chain. A Markov

chain is a stochastic model describing a sequence of possible events in

which the probability of each event depends only on the state attained in

the previous event. Stochastic processes are, in simple terms, processes

that evolve randomly as time progresses. The random walks encountered

in the previous questions of the book were in fact, also stochastic processes.

Markov chains are a model frequently used in areas such as mathematics,

physics and economics to model memory-less stochastic processes. There

are mainly two types of Markov chains: continuous-time and discretetime.

The one we need in our problem is discrete-time, because the group

goes to a location once a week. Each of these three choices of location is

called a state of the Markov chain. In every Markov chain, the probability

of going from one state to another is defined. These probabilities are

called transition probability and are shown as q ij . The value q ij shows

the probability of moving from state i to state j in the next ste and is

1 A random variable is a variable which can take multiple different values and of

which each possible value has a probability distribution.

defined mathematically in the following manner:

q ij = P (X n = j|X n−1 = i)

In our problem, we can show each venue and the transition probabilities

between venues with the following model:

In Markov chains, the transition probabilities are shown in a special

kind of matrix called a transition matrix. Let us think of a Markov chain

where the set of states is {1, 2, 3, . . . , m}. The transition matrix for this

chain will be of dimensions m×m. Each row and column of the transition

matrix represents states in the Markov chain. The rows tell us what the

current state is, and the columns tell us what the next state will be. That

is, the element in the i th row and j th column of the transition matrix

will actually be the probability q ij , which is probability to transition from

state i to state j in the next step. Showing transition matrix with the

letter Q from now on, the transition matrix in our problem will be as

follows:

Q =

S B D

⎡

⎤

S 2/10 3/10 5/10

⎢

⎥

B ⎣3/10 4/10 3/10⎦

D 2/10 4/10 4/10

The first row and column of this transition matrix correspond to the

theater T the second row and column correspond to the bowling alley B

and the third row and column correspond to the club C. The elements of

this matrix are the transition probabilities between venues. The element

in the (i, j) position of the matrix shows the probability of the group

going to j after having last been to i. For example, the element in the

(1, 2) position of the matrix is the probability q 12 , or in other words, the

probability q SB . This element is the probability that the group goes to

the bowling alley, provided that they last went to the theater. As we are

already given that this probability is 3/10, that is what we have entered

in the matrix.

Notice that the sum of all the elements in each row of the transition

matrix equals to 1. We have already said that the rows show us the current

state, and the columns show us the state one step later. In all cases, we

will move on from the current state to another state. It makes sense that

summing the probabilities of all the possible cases yields 1. This is valid

for all transition matrices, the sum of all the elements in each row will

always equal 1.

We now have the transition matrix for our problem. What will we do

now? Each element in the transition matrix gave us the probabilities of

the next step. What do we have to do if we want to know the probabilities

of two steps later? That is, how do we calculate the probability of moving

to state j from state i in two steps?

P (X 2 = j|X 0 = i)

To start out from state i and to move to state j in two steps, all we

need is X 0 = i and X 2 = j to hold. The intermediate state X 1 can be

any state. Then, using the transition probabilities q ij , we can write the

following equation:

P (X 2 = j|X 0 = i) =

m∑

q ik q kj

What we do on the left-hand side equation with the summation symbol

is as follows: we multiply the probability of moving from state i to an

intermediate state k, q ik , and the probability of moving from state k to

state j, q kj . To account for all of the possible states k can be, we use the

summation symbol. That is, we account for the fact that X 1 = k and that

k can be any state in the state set of the Markov chain. This expression

with the summation symbol might remind you of something. We had just

said that each element (c ij ) of the product of two matrices was equal to

this:

n∑

c ij = a it b tj

t=1

The a it in this expression is an element of the first matrix, and the b tj

is an element of the second matrix. The upper bound of the summation

symbol, n is equal to the number of columns of the first matrix and the

number of rows of the second matrix. Therefore the expression

P (X 2 = j|X 0 = i) =

k=1

m∑

q ik q kj

can be thought of the element in the (i, j) position of the product

of two transition matrices. If we call our transition matrix Q, we can

designate the product of two transition matrices by Q 2 . Let q (2)

ij be the

element in the (i, j) position of matrix Q 2 . Then,

q (2)

ij =

k=1

m∑

q ik q kj

k=1

Let’s take a moment to understand what we have just found. The

value q (2)

ij is the probability to move from current state i to the state j

in the next two steps. That is, the probability of the desired transition

happening in two steps. Taking the square of the transition matrix, the

element in position (i, j) shows us the probability of moving from state i

to state j in two steps. We can try and see whether this is true with the

given values in the problem. We can calculate the probability of the group

going to the theater, then going to the theater again two weeks later using

the matrix-squaring method, and then without using matrices. If we find

the two results to be the same, then chances are we are on the right track.

⎡

2/10 3/10

⎤ ⎡

5/10 2/10 3/10

⎤

5/10

Q 2 = ⎣3/10 4/10 3/10⎦

⎣3/10 4/10 3/10⎦

2/10 4/10 4/10 2/10 4/10 4/10

⎡

( 2 10

⎢

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 )

⎤

= ⎣( 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ⎥

⎦

( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 ) ( 2 10 · 2

10 + 3 10 · 3

10 + 5 10 · 2

10 )

⎡

⎤

23/100 38/100 39/100

Q 2 = ⎣24/100 37/100 39/100⎦

24/100 38/100 38/100

Recall that the first row and column represented the theater. Then,

the element in the first row and first column of the matrix Q 2 will give us

the probability that the group goes to the theater, and goes to the theater

again two weeks later. As we can see, the value in this position is 23/100.

How would we have found this probability without using matrices? We

already know that they first go to the theater. There are three venues they

can go to in the next week: the theater with 2/10 probability, the bowling

alley with 3/10 probability and the club with 5/10 probability. Depending

on where they go the next week, their probability of going to the theater

in the week after that will be 2/10, 3/10 and 2/10, respectively. Then, the

probability for the group to go to the theater two weeks after their first

visit will be

2

10 ·

2

10 + 3

10 ·

3

10 + 5

10 ·

2

10 = 23

100

Observe that we have found the same value as the element in the (1, 1)

position of Q 2 . This holds with any element in the matrix Q 2 .

How would it work if we were interested in the probabilities not of two

weeks, but of n weeks later? You can probably guess that in that case,

we would have to take the n th power of the transition matrix. Indeed, the

element in the (i, j) position of the matrix Q N gives us the probability

that the group goes to j n weeks after going to i. We can name this

probability q (n)

ij and express it in the following equation:

q (n)

ij = P (X n = j|X 0 = i)

This means that as long as we know the current situation, we can know

the probabilities of what will happen in n steps by taking the n th power

of the transition matrix. I sincerely hope that this makes sense to you,

especially along with the definition of matrix multiplication. In case it

does not, you can try to prove the above equation by induction.

Let us return to the problem. We have just found that the probability

for the group to go to the theater two weeks later is 23/100. However, we

are also asked the same probability in 5 weeks and in 50 weeks. It is near

impossible to find these two values without using matrices. It simply takes

an incredible amount of time and calculation. However, it is very easy to

find by taking the 5 th and 50 th powers of the transition matrix. Of course,

we will do this by using a calculator. This is why the use of matrices is

so practical. We can express problems that are impractical to solve by

hand in matrices and use calculators to carry out matrix operations very

quickly.

Recall the transition matrix we had in our problem.

⎡

2/10 3/10

⎤

5/10

Q = ⎣3/10 4/10 3/10⎦

2/10 4/10 4/10

Let us calculate the 5 th and 50 th powers of the matrix with the help

of a calculator.

⎡

0.23762 0.37623

⎤

0.38615

Q 5 = ⎣0.23763 0.37624 0.38613⎦

0.23762 0.37624 0.38614

⎡

0.23762376237623762351 0.37623762376237623724

⎤

Q 50 = ⎣0.23762376237623762351 0.37623762376237623724 0.38613861386138613822⎦

0.23762376237623762351 0.37623762376237623724 0.38613861386138613822

In both of the matrices, we need to look at the element in the first

row and the first column. When we take a look at Q 5 , we see that the

probability of the group to go to the theater 5 weeks later is 0.23762.

Similarly, the same value is 0.23762376237623762351 for Q 50 . Thus we

have concluded our solution. However, I want to elaborate on a few more

related topics and to solve the bonus problem. We first need to make two

important realizations.

The probability for the group to go to the theater two weeks later is

0.23, five weeks later it is 0.23762, and fifty weeks later it is 0.23762376237623762351.

Notice that all three of these values are very close to each other. As the

weeks go on, it seems like the probability of the group going to the theater

approaches a certain number. You can probably guess that this number

is 0.2376. We can write this number as the fraction 24

101

. So, given that

Sam and her friends go to the theater on the first week, their probability

of going to the theater again a very long time later is 24

101

. In the same

fashion, their probabilities of going to the bowling alley and the club turn

out to be 0.3762 and 0.3861, respectively. These can be written as 38

100

and 39

101 .

The second important realization is this: the higher the power we

take of the transition matrix, the more similar each row of the resulting

matrix looks to each other. We can easily observe this phenomenon in

the matrices Q 5 and Q 50 . The rows of these two matrices already look

incredibly similarly to each other. Actually, each of the three tows of Q 50

appear to be the same. In reality, they are not the exact same, but they

are so close that the calculator shows them to be the same. As weeks go

on further, that is, as we take larger powers of the transition matrix, all

the rows of the matrix will be equal. What exactly does this mean? Recall

that each row showed which venue the group initially went to. As all of

the rows become equal, the initial choice of venue loses its importance.

Regardless of which venue the group first went to, in a very long term, the

probabilities of going to each of the three venues will approach constant

values. We can show this in the following manner, using the concept of

limits:

⎡

⎤

0.2376 0.3762 0.3861

lim

n→∞ Qn = ⎣0.2376 0.3762 0.3861⎦

0.2376 0.3762 0.3861

⎡

⎤

lim

n→∞ Qn =

24/101 38/101 39/101

⎣24/101 38/101 39/101⎦

24/101 38/101 39/101

That is, regardless of where the group first went, in the long rung the

probabilities of going to T, B and C turn out to be 24/101 , 38/101 and

39/101, respectively. Then, it becomes to repeat the same row three times

in a 3 × 3 matrix. We can show the same result in a single row, with a

1 × 3 row-matrix. 2 If we call this row-matrix d,

d = [ 24/101 38/101 39/101 ]

This vector shows the probability of going to T, B and D in the long

run, that is the probability that the random variable lim X n is equal to

n→∞

one of the three states of {T, B, C}. This vector is called the stationary

distribution of the Markov chain. We will elaborate on this concept in a

little while.

In a Markov chain, for any integer n (n ∈ {0, 1, 2, . . .}), we can define

a vector showing the probabilities for the random variable X n to be equal

to one of the states in the set of states. This vector is called the marginal

distribution vector. Letting M n be the marginal distribution vector of the

random variable X n in our problem, M n will be as follows:

M n = [ P (X n = S) P (X n = B) P (X n = D) ]

Hence, we can say that the vector d we have just found is equal to the

vector lim

n→∞ M n

lim M n = d = [ 24/101 38/101 39/101 ]

n→∞

We can now find the vector M 0 . As we are given in the problem, the

first venue the group goes to is the theater. This means P (X 0 = S) = 1

in our notation. Then we can write M 0 , the marginal distribution vector

of the random variable X 0 as follows:

M 0 = [ 1 0 0 ]

In Markov chains, if we multiply the marginal distribution vector of

the random variable X n , M n , with the transition matrix, the result is

the vector M n+1 , namely, the marginal distribution vector of the random

variable X n+1 . We can express this fact as follows:

M n+1 = M n Q

I will not prove this equation, but the proof comes straight from the

2 Matrices which have a single row or a single column are also called vectors.

definition of matrix multiplication. You can go ahead an write out the

proof if you wish. It is possible to generalize this equation further. If

we multiply M 0 with the n th power of the transition matrix, we find the

vector M n :

M n = M 0 Q n

We can use this equation to find the marginal distribution vector of

the random variable X 2 in our problem.

M 2 = M 0 Q 2

⎡

⎤

M 2 = [ 1 0 0 ] 23/100 38/100 39/100

⎣24/100 37/100 39/100⎦

24/100 38/100 38/100

M 2 = [ 23/100 38/100 39/100 ]

We had defined the element in the first row of the vector M n as

P (X n = S). Then, the element in the first row of M 2 , 23/100, will give us

the probability that the group goes to the theater two weeks later. This is

in accordance with the result we found earlier. However, we did not need

to find the marginal distribution vector earlier. It was sufficient to look

at the first row and column of Q 2 . This is because D 1 = [ 1 0 0 ] . That

is, we knew for certain that the group initially went to the theater.

Anyway, all of this information was given in order to explain the concept

of stationary distribution I had just mentioned. But what exactly

is stationary distribution? The stationary distribution of Markov chains

is, regardless of how many steps were taken, equal to the marginal distribution.

The stationary distribution of a Markov chain is expressed as a

row-vector. Each element of this vector is the probability of one of the

states in the state set of the Markov chain. This is why the elements of

the stationary distribution vector always sum up to 1. When the stationary

distribution vector is multiplied with the transition matrix, the result

is the stationary distribution vector itself. That is, the transition matrix

does not alter the stationary distribution vector. For this reason, if the

initial marginal distribution vector is equal to the stationary distribution

vector, then the marginal distribution vector will stay equal to the stationary

distribution vector for all further steps. This is where the name

“stationary” comes from.

dQ = d

The vector d which satisfies the above equation is called the stationary

distribution vector of the Markov chain. You might wonder if every

Markov chain has a stationary distribution. And if it does, is it unique?

If in a Markov chain, it is possible to move from one state to another in

a finite number of steps, when there exists a unique stationary distribution

to this Markov chain. Markov chains which satisfy this condition are

irreducible.

In the Markov chain belonging to our problem, it is possible to move

from any state to another in a finite number of steps. This means that

our Markov chain will have a unique stationary distribution vector. Let

us now find it, then:

⎡

2/10 3/10

⎤

5/10

d ⎣3/10 4/10 3/10⎦ = d

2/10 4/10 4/10

The stationary distribution vector d will be of dimension 1 × 3 since

there are only three possible cases. If we show the stationary distribution

vector with three variables, as [ x y z ] we can write two equations as

follows:

⎡

⎤

[ ]

2/10 3/10 5/10

x y z ⎣3/10 4/10 3/10⎦ = [ x y z ]

2/10 4/10 4/10

x + y + z = 1

Carrying out the matrix multiplication,

[

(2x+3y+2z)

10

(3x+4y+4z)

10

(5x+3y+4z)

10

]

= [ x y z ]

[

(2x + 3y + 2z) (3x + 4y + 4z) (5x + 3y + 4z)

]

=

[

10x 10y 10z

]

Thus we obtain 4 equations.

2x + 3y + 2z = 10x

3x + 4y + 4z = 10y

5x + 3y + 4z = 10z

x + y + z = 1

When we solve this system of equations, we find the values x =

24/101, y = 38/101, z = 39/101. Then the stationary distribution vector

of our Markov chain will be the following:

d = [ 24/101 38/101 39/101 ]

Multiplying this vector with the transition matrix, we see that we get

the vector itself.

⎡

⎤

[ ]

2/10 3/10 5/10

24/101 38/101 39/101 ⎣3/10 4/10 3/10⎦ = [ 24/101 38/101 39/101 ]

2/10 4/10 4/10

We had already found the vector d earlier without using the equation

dQ = d. We had used the fact that taking arbitrarily large powers of the

transition matrix gave us the values [ 24/101 38/101 39/101 ] in each

row of the resulting matrix. That is, in the long run, the probabilities

of going to T, B and C turned out to be 24/101 , 38/101 and 39/101,

regardless of where the group went first. We can express this as such:

Regardless of what the initial marginal distribution vector is, the longterm

marginal distribution vector will become [ 24/101 38/101 39/101 ]

That is, the long-term marginal distribution vector will be equal to the

stationary distribution vector.

Therefore, we can say that the marginal distribution of the random

variable lim X n, namely the vector lim M n, will be equal to the stationary

distribution

n→∞ n→∞

vector.

lim M n = lim M 0Q n = d = dQ

n→∞ n→∞

Hence, in our problem the probability for the group to go to the theater

after a long time will be equal to 24/101. We can also say that Sam and her

friends will go to the cinema in 24 Sundays out of 201. This is the answer

to the bonus problem. Notice that if we were not given that the group

initially went to the theater, the marginal distribution vector would have

been different. However, the probability of going to the theater in the long

run would still have been 24/201, because as we have said, whatever the

marginal distribution vector is, it will always converge to the stationary

distribution in the longest run.

Does the marginal distribution of every Markov chain converge to the

stationary distribution in the long run? In order for this to hold, the

Markov chain must have one more property in addition the irreducibility,

and this property is called aperiodicity. If all of the elements of at least

one power of the transition matrix of a Markov chain are greater than

zero, then this Markov chain is aperiodic. Since the Markov chain in

our problem satisfies both of these conditions, its marginal distribution

converges to its stationary distribution. This is why we were able to find

the stationary distribution by two different methods. The first method is

to solve the equation dQ = d, and the second method is to evaluate the

limit lim M n. In either case, the stationary distribution vector is found

n→∞

to be [ 24/101 38/101 39/101 ]

Thus we have concluded this colossus of a problem. I was not sure

whether to include this problem in the book, as this requires knowledge in

a more specific area of mathematics. Without having no prior knowledge

of linear algebra and matrices, it is practically impossible to solve this

problem. I have finally decided to include it since just knowing about

Markov chains simplifies the question by a great deal. This means that

this question is just as simple as the others in the book. The other reason is

because of the prominence and practicality of Markov chains in modelling

stochastic processes. It would be unforgivable to not mention Markov

chains in a book on probability. If you want a more in-depth explanation of

the concepts explained in this problem, you can find many great resources

online.

Additional Problem

Kevin, the video game addict has three favorite games he spends all

of his time on: Age of Empires, Dota 2 and Counter-Strike. Every day,

Kevin plays one of these three games.

If he plays Age of Empires on any day, on the following day he has a

15% probability to play Age of Empires, a 45% probability to play Dota

2, and a 40% probability to play Counter-Strike.

If he plays Dota 2 on any day, on the following day he has a 30%

probability to play Age of Empires, a 35% probability to play Dota 2, and

a 35% probability to play Counter-Strike.

If he plays Counter-Strike on any day, on the following day he has a

55% probability to play Age of Empires, a 25% probability to play Dota

2, and a 20% probability to play Counter-Strike.

Which game will Kevin play the most in the long run? What percentage

of his days will he have spent playing this game in the long run?

Smith Eats Turkey After Beating His Opponent

It is called “strike” to knock down all the pins in a game of bowling.

Additionally, to strike consecutively has other special names. For example,

it is called “double” to strike twice in a row whereas striking threefold in a

row is called “turkey”. You might be wondering where the name "turkey"

came from. In the past, people striking threefold in a row were granted a

turkey. That’s why it is called “turkey” to strike threefold in a row. But

today, since its much more common to strike threefold in a row, you will

not be granted a real turkey if you manage to score a “turkey”.

Let us move on to the problem. On a Sunday morning, Kevin and

Smith decided to have a bowling competition among themselves . Kevin

gave odds to Smith because he thinks that he is super cool . They have

talked the terms as follows: Smith and Kevin will throw their bowling

shots simultaneously on different bowling tracks. All of the pins must be

rearranged after each shot. Kevin will try to score a “turkey” whereas

Smith will try to score a “double” . They will keep playing until one

of them reaches his goal. The one who reaches his goal earlier is the

ultimate winner. The loser will buy a turkey to the winner because this

duo, especially Smith loves eating turkey. If they reach their goals at the

same time, the game will end in a tie and they will eat the turkey together.

Both of them have 50% probability per shot to strike . Shots are

independent of each other. Kevin’s shots are independent of Smith’s.

What is Smith’s probability to win the game? In other words, what is

Smith’s probability to enjoy a well-done turkey by himself this evening?

Solution

The reason this problem seems difficult is that there are two independent

processes. Kevin and Smith are throwing shots independently. We

must analyze both processes simultaneously to find their probabilities to

win. Moreover, we don’t know when the competition is going to end. In

other words, there may be either 2 or 2000 shots until the end. Therefore,

the calculated probabilities must include the all possible shots.

Fortunately, there is a common method to solve such problems. You

will be able to solve many problems with the help of this method once you

learn it.

Let’s determine the mathematical notation to be used in the solution.

Assume that Kevin scored k strikes in a row whereas Smith scored s ones.

We may display the score before any strike like (k, s) . There are 6 possible

situations to occur until the end of the competition:

{(0, 0), (1, 0), (0, 1), (1, 1), (2, 0), (2, 1)}

The situation at the beginning of the game is (0,0). After each strike

of the couple, this situation will keep change until the end of the game.

At any shot, s will increase by 1 if Smith scores a strike. If he cannot, his

strike series will reset and become s = 0 . The same situation applies to

k . Either k must be 3 or s must be 2 so that the game ends. As a result,

if we have one of the situations below after any shot,

the game ends.

{(3, 0), (3, 1), (3, 2), (2, 2), (1, 2), (0, 2)}

The most important characteristic of this game is that the past moves

have no significance. Smith’s probability to win the game depends solely

on his situation at that moment. We already told that the situation was

(0,0) at the beginning of the game. If the situation (0,0) obtained again,

Smith and Kevin will have the same probabilities to win. Put it differently,

past situations have no impact on the current situation of the game.

Assume that Smith’s probability to win the game is P (k, s) while he

is being at the situation (k, s) . Since the situation at the beginning was

(0, 0) , Smith had P (0, 0) probability to win. Once he starts the game at

(0, 0) situation, his situation will turn either into (0, 0) , or (1, 0) , or (0, 1)

, or (1, 1) with 1/4 probability per situation. If so, we can express P (0, 0)

in equation as follows:

P (0, 0) = 1 4 P (0, 0) + 1 4 P (1, 0) + 1 4 P (0, 1) + 1 P (1, 1)

4

For other attainable situations like (k, s) during the game, we can

display different probabilities in the form of P (k, s) .

P (1, 0) = 1 4 P (0, 0) + 1 4 P (2, 0) + 1 4 P (0, 1) + 1 P (2, 1)

4

P (0, 1) = 1 4 P (0, 0) + 1 4 P (1, 0) + 1 4 P (0, 2) + 1 P (1, 2)

4

P (1, 1) = 1 4 P (0, 0) + 1 4 P (2, 0) + 1 4 P (0, 2) + 1 P (2, 2)

4

P (2, 0) = 1 4 P (0, 0) + 1 4 P (3, 0) + 1 4 P (0, 1) + 1 P (3, 1)

4

P (2, 1) = 1 4 P (0, 0) + 1 4 P (3, 0) + 1 4 P (0, 2) + 1 P (3, 2)

4

We have already defined P (k, s) as Smith’s probability to win the game

while the game has the situation (k, s) . If so, P (3, 0) and P (3, 1) must

be equal to 0 because Kevin wins the game since he has already scored a

turkey. Moreover, P (3, 2) is also equal to 0 because it means that both

Kevin and Smith have reached their goals simultaneously and the game

ends in a tie. Additionally, P (0, 2) , P (1, 2) , and P (2, 2) are equal to

1 because in these situations Smith has already scored a double and he

wins the game. After arranging these equations, we obtain a system of

equations with 6 unknowns and 6 equations:

P (0, 0) = 1 4 P (0, 0) + 1 4 P (1, 0) + 1 4 P (0, 1) + 1 P (1, 1)

4

P (1, 0) = 1 4 P (0, 0) + 1 4 P (2, 0) + 1 4 P (0, 1) + 1 P (2, 1)

4

P (0, 1) = 1 4 P (0, 0) + 1 4 P (1, 0) + 1 2

P (1, 1) = 1 4 P (0, 0) + 1 4 P (2, 0) + 1 2

P (2, 0) = 1 4 P (0, 0) + 1 P (0, 1)

4

P (2, 1) = 1 4 P (0, 0) + 1 4

Although it takes time, you may solve this system without using a

calculator. As an example, I will show you my solution steps. Starting

from the third equation, let P (0, 0) substitute for P (1, 0) .

P (0, 1) = 1 4 P (0, 0) + 1 4 P (1, 0) + 1 2

16P (0, 1) = 4P (0, 0) + 4P (1, 0) + 8

After this step, let’s rewrite 4P (1, 0) .

16P (0, 1) = 4P (0, 0) + P (0, 0) + P (0, 1) + P (2, 0) + P (2, 1) + 8

15P (0, 1) = 5P (0, 0) + P (2, 0) + P (2, 1) + 8

60P (0, 1) = 20P (0, 0) + 4P (2, 0) + 4P (2, 1) + 32

Now, let’s rewrite 4P (2, 0) and 4P (2, 1) .

60P (0, 1) = 20P (0, 0) + P (0, 0) + P (0, 1) + P (0, 0) + 33

59P (0, 1) = 22P (0, 0) + 33

In this way, we could write P (0, 1) in terms of P (0, 0) by making use

of the third equation. Let’s use the first equation to write P (0, 1) in terms

of P (0, 0).

P (0, 0) = 1 4 P (0, 0) + 1 4 P (1, 0) + 1 4 P (0, 1) + 1 P (1, 1)

4

3P (0, 0) = P (1, 0) + P (0, 1) + P (1, 1)

12P (0, 0) = 4P (0, 1) + 4P (1, 0) + 4P (1, 1)

Let’s rewrite P (1, 0) and P (1, 1) in the last equation.

12P (0, 0) = 4P (0, 1) + 2P (0, 0) + 2P (2, 0) + P (0, 1) + P (2, 1) + 2

10P (0, 0) = 5P (0, 1) + 2P (2, 0) + P (2, 1) + 2

40P (0, 0) = 20P (0, 1) + 8P (2, 0) + 4P (2, 1) + 8

Finally, let’s rewrite P (2, 0) and P (2, 1)

40P (0, 0) = 20P (0, 1) + 2P (0, 0) + 2P (0, 1) + P (0, 0) + 1 + 8

If we rearrange the equation, we will be able to write P (0, 1) in terms

of P (0, 0) .

22P (0, 1) = 37P (0, 0) − 9

So, we could write P (0, 1) in terms of P (0, 0) in two different ways.

Let’s write down these two equations:

22P (0, 1) = 37P (0, 0) − 9

59P (0, 1) = 22P (0, 0) + 33

We find the value of P (0, 0) as 1257

1699

once we solve this system of equations.

As a result, Smith’s probability to win the bowling game is 1257

1699 ,

approximately 74% .

This is the answer of this problem. One may easily find the other

probabilities in the form of P (k, s) since we know the value of P (0, 0) :

P (0, 0) = 1257

1021

1419

, P (1, 0) = , P (0, 1) =

1699 1699 1699

P (1, 1) = 1331

669

739

, P (2, 0) = , P (2, 1) =

1699 1699 1699

You can use the same logic in similar problems. However, the more

situations there are, the harder to solve the system of equations. If you

remember, we have only 6 possible situations during the game in this

problem:

{(0, 0), (1, 0), (0, 1), (1, 1), (2, 0), (2, 1)}

If there were more situations, the system of equations would become

more complicated. How can we solve the problem in such a situation?

I will now show two additional solution methods to solve this problem

and even more complex versions. Both methods contain matrices. I won’t

explain everything in detail since they are mostly the topics of linear

algebra, but of course you may want to search it in detail. The first

solution method is to represent the system of equations in a matrix. The

second one is to conceptualize it as a Markov chain.

Let’s start with the first method. Representing a system of equations

in a matrix form is actually not a different solution method. The ultimate

goal is to solve the system of equations again. However, employment of

matrices makes it very easy to solve the same system because we can make

our computers do that for us. Only systems of linear equations can be

shown in a matrix form. Therefore, we consider only systems of linear

equations 1 . The system of equations in this problem is indeed linear.

Assume that there are n linear equations. Let x j denote the unknowns,

a ij the coefficients of the unknowns, and b i the constants at the left hand

side of the system. (i, j ∈ {1, 2, . . . , n})

a 11 x 1 + a 12 x 2 + . . . + a 1n x n = b 1

a 21 x 1 + a 22 x 2 + . . . + a 2n x n = b 2

.

a m1 x 1 + a m2 x 2 + . . . + a mn x n = b m

Assume there is a matrix of the size m × n containing the coefficients

in each equation.

⎡

⎤

a 11 a 12 . . . a 1n

a 21 a 22 . . . a 2n

⎢

⎣

.

.

⎥

. .. .

⎦

a m1 a m2 . . . a mn

This matrix is called matrix of the coefficients . Now, assume there is

a column vector containing all of the unknowns 2 .

⎡ ⎤

x 1

x 2

⎢ ⎥

⎣

.

⎦

x n

1 Systems of linear equations consist of equations with only first order terms.

2 A column vector is a matrix comprising only a single column.

This matrix is called the matrix of unknowns. If we multiply the matrix

of unknıwns and the matrix of coefficients, we will obtain a column vector

consisting of the terms on the left hand side of each equation in the system

of equations.

⎡

⎤ ⎡ ⎤ ⎡

⎤

a 11 a 12 . . . a 1n x 1 a 11 x 1 + a 12 x 2 + . . . + a 1n x n

a 21 a 22 . . . a 2n

x 2

⎢

⎣

.

⎥ ⎢ ⎥

. . .. .

⎦ ⎣

.

⎦ = a 21 x 1 + a 22 x 2 + . . . + a 2n x n

⎢

⎥

⎣

.

⎦

a m1 a m2 . . . a mn x n a m1 x 1 + a m2 x 2 + . . . + a mn x n

In the system of equations, the left hand side of each equation is naturally

equal to the right hand side. The column vector is a vector in

which all entities are the constants on the right hand side of the equations

and it is called the matrix of constants. So, the product of the matrix of

coefficients and the matrix of unknowns will give the matrix of constants.

⎡

⎤ ⎡ ⎤ ⎡ ⎤

a 11 a 12 . . . a 1n x 1 b 1

a 21 a 22 . . . a 2n

x 2

⎢

⎣

.

.

⎥ ⎢ ⎥

. .. .

⎦ ⎣

.

⎦ = b 2 ⎢ ⎥

⎣

.

⎦

a m1 a m2 . . . a mn x n b n

This is the general demonstration of a system of equations as a multiplication

of matrices. However, it is possible to show a system of equations

on a single matrix. This demonstration is known as the augmented matrix.

In an augmented matrix, a matrix of constants is placed next to a

matrix of coefficients. As a result, the whole system of equations can be

represented in a single matrix.

⎡

⎤

a 11 a 12 . . . a 1n b 1

a 21 a 22 . . . a 2n b 2

⎢

⎣

.

⎥

. . .. . .

⎦

a m1 a m2 . . . a mn b n

Now, we know how to show a system of equations in the form of a

matrix. Let’s do it for the system of equations in our problem. First,

show it in the form of a matrix multiplication.

⎡

⎤ ⎡ ⎤ ⎡ ⎤

3/4 −1/4 −1/4 −1/4 0 0 P (0, 0) 0

− 1/4 1 −1/4 0 −1/4 −1/4

P (1, 0)

0

−1/4 −1/4 1 0 0 0

P (0, 1)

1/2

⎢ −1/4 0 0 1 −1/4 0

⎥ ⎢P (1, 1)

=

⎥ ⎢ 1/2

⎥

⎣ −1/4 0 −1/4 0 1 0 ⎦ ⎣P (2, 0) ⎦ ⎣ 0 ⎦

−1/4 0 0 0 0 1 P (2, 1) 1/4

Let’s show it as an augmented matrix.

⎡

⎤

3/4 −1/4 −1/4 −1/4 0 0 0

− 1/4 1 −1/4 0 −1/4 −1/4 0

−1/4 −1/4 1 0 0 0 1/2

⎢ −1/4 0 0 1 −1/4 0 1/2

⎥

⎣ −1/4 0 −1/4 0 1 0 0 ⎦

−1/4 0 0 0 0 1 1/4

Augmented matrix is used commonly by demonstrating system of

equations. There are a couple of operations we can do on an augmented

matrix without changing the equality in the system. These are called

elementary row operations . There are three different elementary row

operations: 1) You may interchange the positions of two rows. 2) You

may multiply a row with a constant except 0. 3) You may add a row

to another row after multiplying it with a constant other than 0. You

may apply these elementary row operations as many times as you want

because they don’t change the equality. You may simplify the system of

equations by using these operations on the augmented matrix and find the

unknowns. Exactly the same logic applies to two other methods known as

Gaussian elimination and Gauss-Jordan elimination. More different methods

exist other than these two. Cramer’s rule, the inverse matrix method,

Jacobi method are a few examples. However, these methods are beyond

the scope of this book. You may search them later if you wish and try to

solve this problem by using one of them. For the moment, I suggest you

to solve it by using elementary row operations. You may try to simplify

the augmented matrix to the form below with the help of the elementary

row operations:

⎡

⎤

1257

1 0 0 0 0 0

1699

1021

0 1 0 0 0 0 1699

1419

0 0 1 0 0 0 1699

1331

0 0 0 1 0 0 1699

⎢

669

⎣0 0 0 0 1 0

⎥

1699⎦

739

0 0 0 0 0 1

1699

Simplifying the augmented matrix to this form above is called Gauss-

Jordan elimination method. The aim of this method is to turn the elements

on the diagonal of the augmented matrix into 1 while turning the

others into 0. Afterwards, if we consider the augmented matrix as a system

of equations, we directly find the six unknowns.

P (0, 0) = 1257

1699

P (1, 1) = 1331

1699

P (1, 0) =

1021

1699

P (2, 0) =

669

1699

P (0, 1) =

1419

1699

P (2, 1) =

739

1699

In fact, there isn’t much difference between solving a system of equations

with or without a matrix. For example, the changes we initially

made on the system of equations are very similar to the elementary row

operations. In other words, matrix is a practical way to display and solve

a system of equations.

The main advantage of a matrix is that computers can solve it instead

of us. Today, computers are employed to solve system of equations. The

algorithm used to solve linear equation systems are one of the matrix

methods I just mentioned. There are several websites to solve systems of

equations. You may enter one of these websites and have the system of

equation of this problem solved.

Now, I will explain another solution method which again makes use of

matrices. The bowling competition can be displayed as a Markov chain.

We have already told that there are 12 possible states until the game ends:

{(0, 0), (1, 0), (0, 1), (1, 1), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2), (2, 2), (1, 2), (0, 2)}

However, (0, 2) , (1, 2) , and (2, 2) are the same states. Knowing that

Smith wins the game in all of these states, they can be called as the victory

state. Similarly, Kevin wins in the states (3, 0) and (3, 1) . Let’s call them

as the defeat state . Last but not least, let’s call (3, 2) as the tie state.

Let’s denote the victory state with V , the tie state with T , and the defeat

state with D. If we conceptualize the competition as a Markov chain:

{(0, 0), (1, 0), (0, 1), (1, 1), (2, 0), (2, 1), V,T,D}

If someone is in one of the V, T, D states during the competition, this

state won’t change because all of these three states are absorbing states.

Let’s call the state X n after Kevin and Smith have thrown their n shot

simultaneously. If,

P (X n+1 = i|X n = i) = 1

then i is an absorbing state. It is impossible to transition to another

state after reaching an absorbing state. In our problem, the states V, T, D

being absorbing states makes sense, since in a competition, when a state

of victory, defeat, or tie has been reached, the competition ends and there

is no more state transition.

If there is at least one absorbing state in a Markov chain and it is

possible to start from a non-absorbing state and move to an absorbing

state in a finite number of steps, then this Markov chain is called an

absorbing Markov chain. In absorbing Markov chain, an absorbing state

will be entered in a finite number of steps and this state will never be

left again. The Markov chain of this problem is also an absorbing Markov

chain. Therefore in the long run, the Markov chain will be absorbed by

one of V, T, D states.

The general form of the transition matrix of an absorbing Markov chain

is written as the combination of the following four matrices:

Q =

[

A B

0 I

The matrix A in this general form is t × t, the matrix B is t × s, the

matrix I is the s × s identity matrix and the matrix 0 is the s × t zero

matrix. The identity matrix is a square matrix which has all 1’s as the

diagonal elements and zero otherwise. The zero matrix is a matrix which

has all elements equal to 0.

Let us now form the transition matrix of our problem using this general

form.

]

Q =

(0, 0) (1, 0) (0, 1) (1, 1) (2, 0) (2, 1) V T D

⎡

⎤

(0, 0) 1/4 1/4 1/4 1/4 0 0 0 0 0

(1, 0) 1/4 0 1/4 0 1/4 1/4 0 0 0

(0, 1) 1/4 1/4 0 0 0 0 2/4 0 0

(1, 1) 1/4 0 0 0 1/4 0 2/4 0 0

(2, 0) 1/4 0 1/4 0 0 0 0 0 2/4

(2, 1) 1/4 0 0 0 0 0 1/4 1/4 1/4

V 0 0 0 0 0 0 1 0 0

⎢

⎥

T ⎣ 0 0 0 0 0 0 0 1 0 ⎦

D 0 0 0 0 0 0 0 0 1

We are asked the probability that Smith beats Kevin, that is, the

Markov chain entering the absorbing state V at any step. 3 If the Markov

chain enters the G state at any step, it will always remain in the V state

for all of the steps that follow. Then, if we were to find the probability

that the Markov chain is in the V state after infinitely many steps, this

would be equal to the Markov chain having entered and been absorbed by

the V state at any step. This is equal to the probability of Smith beating

Kevin. Then, we can show the value we are after as follows:

lim P (X n = V ) =?

n→∞

We know that Smith and Kevin start the contest with the initial state

(0, 0). Therefore, we can say that X 0 = (0, 0). Thus, the probability we

need to find is actually the following:

lim P (X n = V |X 0 = (0, 0))

n→∞

In order to find this probability, we need to calculate an arbitrarily

large power of the transition matrix of our Markov chain. That is, we need

to find the matrix lim

n→∞ Qn . In order to find this, we can use a practical

linear algebra method called diagonalization, or just use a calculator as

before. However, we do not need either of this, because there exists a

much simpler way of calculating arbitrarily large powers of the transition

matrix in an absorbing Markov chain. Recall that we had expressed the

general form of an absorbing Markov chain in the following way:

Q =

[

A B

0 I

The arbitrarily large power of this transition matrix is called the stationary

transition matrix. The stationary transition matrix is expressed

as Q and calculated in the following way:

lim

n→∞ Qn = Q =

]

[

0 (I − A) −1 B

0 I

I will not go on to prove why it is calculated in this way because I do

not want to indulge in linear algebra too much. If you are interested, you

can always look it up. In this matrix, the matrix shown as (I −A) −1 is the

3 Each step in this problem is a simultaneous throw by Kevin and Smith.

]

matrix obtained by first subtracting A from the identity matrix, and then

taking the inverse of the result. 4 The stationary transition matrix gives

us information on how the absorbing Markov chain will behave in the long

run. Thanks to the stationary transition matrix, we have a simple way of

finding an arbitrarily large power of the transition matrix without the use

of computers.

Anyway, I have found the stationary transition matrix of our problem

for us. If you wish, you can use a calculator to evaluate lim

n→∞ Qn and find

the same matrix.

Q =

(0, 0) (1, 0) (0, 1) (1, 1) (2, 0) (2, 1) V T D

⎡

⎤

(0, 0) 0 0 0 0 0 0 1257/1699 81/1699 361/1699

(1, 0) 0 0 0 0 0 0 1021/1699 155/1699 523/1699

(0, 1) 0 0 0 0 0 0 1419/1699 59/1699 221/1699

(1, 1) 0 0 0 0 0 0 1331/1699 29/1699 339/1699

(2, 0) 0 0 0 0 0 0 669/1699 35/1699 995/1699

(2, 1) 0 0 0 0 0 0 739/1699 445/1699 515/1699

V 0 0 0 0 0 0 1 0 0

⎢

⎥

T ⎣ 0 0 0 0 0 0 0 1 0 ⎦

D 0 0 0 0 0 0 0 0 1

As you can observe from the stationary transition matrix, regardless of

the initial state, the probability of being in any other state than V, T, D is

0 in the long run. This is because the V, T, D states are absorbing states

and the Markov chain is certain to be absorbed by one of these three in

the long run.

The answer to the problem, lim P (X n = V |X 0 = (0, 0)), can be found

n→∞

using the stationary transition matrix. The probability of starting with

initial state (0, 0) and being in the state V after a long while is shown

by the element in the first row and seventh column of the stationary

transition matrix. That is, there is a 1257/1699 probability that this

Markov chain will be absorbed by the state V . Hence, the probability

that Smith triumphs over Kevin in this contest is 1257/1699. This agrees

with the result we have found earlier.

What would have changed if the marginal distribution of the random

variable X 0 were different? What would we have done differently if the

4 The inverse matrix B of a matrix A is the matrix which yields the identity matrix

as a result of the multiplication AB.

probability of starting in the (0, 0) state were not 1? Let the marginal

distribution vector of the random variable X n be M n . In the previous

problem, we had stated that in order to find the long-term marginal distribution

vector, we had to multiply the initial marginal distribution vector

with an arbitrarily large power of the transition matrix. Initial marginal

distribution is M 0 . So,

lim M n = lim M 0Q n

n→∞ n→∞

But we now know that for absorbing Markov chains, we can substitute

the stationary transition matrix for lim

n→∞ Qn :

lim M n = M 0 Q

n→∞

Not all of the rows of a stationary transition matrix will be equal.

You can observe this fact by looking at the stationary transition matrix

that we have just found. Since the rows of a stationary transition matrix

are not equal, changing the initial marginal distribution in an absorbing

Markov chain will result in a different marginal distribution in the long

run. That is, the long-term marginal distribution will be shaped by the

initial marginal distribution. You can see this fact reflected in the equation

above. This should already make sense because depending on the initial

marginal distribution, the probability that the Markov chain is absorbed

by some absorbing states is much higher than others.

Recall what have learned during the solution of the previous problem.

Taking arbitrarily large powers of irreducible and aperiodic Markov chains

resulted in matrices where each row was equal. With this kind of Markov

chains, regardless of the initial marginal distribution, the stationary distribution

was always the same in the long run. It was possible to show

the long-term stationary distribution by the row-vector d satisfying the

equation dQ = d.

However, absorbing Markov chains are neither irreducible nor aperiodic,

so it is not possible to say the same things for absorbing Markov

chains. There are infinitely many stationary distribution vectors which

satisfy the equation dQ = d. The long-term marginal distribution of absorbing

Markov chains will be one of the stationary distribution vectors

satisfying this equation. Exactly which one of these will be the one is de-

termined by the initial marginal distribution. Letting M be the stationary

distribution vector which is equal to the long-term marginal distribution

vector, M is expressed by:

lim

n→∞ M n = M = M 0 Q

Also, the stationary distribution vector M will naturally satisfy the

equation MQ = M.

In short, in order to find the stationary distribution vector that will be

obtained in the long run in absorbing Markov chains, we need to multiply

the initial marginal distribution vector with the stationary transition

matrix. Only the elements which correspond to absorbing states will be

greater one in the product of this multiplication, which is the stationary

distribution vector. The reason for this is that the Markov chain is

certainly absorbed by one of the absorbing states in the long run.

Additional Problem

Eddy and Freddy decide to play a game involving coin tosses. A coin

will be tossed consecutively until the end of the game, and the results will

be noted down from left to right on a piece of paper, H for heads and T

for tails.

Given that Eddy wins if the sequence HT comes up first, and Freddy

wins if the sequence T T comes up first, what is the probability that Eddy

wins?

Given that Eddy wins if the sequence HT H comes up first, and Freddy

wins if the sequence T T H comes up first, what is the probability that Eddy

wins?

Given that Eddy wins if the sequence HT HHHT HT comes up first,

and Freddy wins if the sequence T HT T T HT H comes up first, what is

the probability that Eddy wins?

Additional Problem

Eddy and Freddy have grown tired of the previous game and started

another one. In this game, a die will be cast and the results will be noted

down from left to right on a piece of paper.

Eddy and Freddy are very smart. Eddy has an IQ of 165, and Freddy

has an IQ of 215. Given that whoever’s IQ is written on the piece of paper

first wins the game, what is the probability that Freddy wins?

Given that if the same number comes up twice in a row Eddy wins,

and if the sum of any two numbers that come up consecutively is 7 Freddy

wins, what is the probability that Freddy wins?

Eddy is born on 25.12.2464 and Freddy is born on 23.11.2646. What

is the probability that Freddy’s birth date comes up before Eddy’s? (that

is, the sequence 23112646 before 25122464)

Is It Possible to Have Alzheimer’s and Keep It a Secret?

Do you remember Matt, who we mentioned in the first problem? He

had joined a game show named “Let’s Make a Deal” and got pretty confused

about which door to pick. Do you want to know how this contest

ended? Let me explain.

Matt thought that the host was trying to set him up and insisted on

the first door he had chosen. However, there was a goat behind this door

and Matt was devastated. Nonetheless, he decided to take advantage of

this situation and started a goat farm. He is satisfied with his life right

now and earns a great deal of money. He reads up on probability theory

in his free time so that he never makes a mistake such as the one he made

in the contest.

Matt later understood why changing his choice of door in the contest

is more advantageous. He wanted to rejoin the show, however he was not

allowed. Therefore, he got Amy, his sister, to sign up for the show.

Today is the big day for Amy. She picked one of the three doors to start

out with, as usual. Now, Monty Hall, the host, will open the door containing

a goat. However, there is a slight problem. Monty does not remember

what the doors contain, because he has undiagnosed Alzheimer’s disease.

Nevertheless, he opened one of the doors that Amy had not chosen, trying

to keep his cool in the meanwhile. Fortunately, there was indeed a goat

behind the door he opened, and Monty got a huge relief.

Now, as always, Amy has the right to stick with the first door she

has chosen or to switch it. Amy certainly does not want to lose like her

brother did. What do you think she should do? What is her probability

of winning the car if she sticks with her initial choice?

Solution

This problem is a variation of the first problem and surprisingly, is one

of the most misleading problems in this book. Even if you think that you

have completely understood the solution of the first problem, it is highly

probable that with a slight difference in the question, you end up with the

wrong answer. This is quite common when we are dealing with probability

problems. You need to consider even the most insignificant detail, as if

you are solving a puzzle.

Recall what we had said in the solution of the first question: To know

whether an event occurred by chance or not may alter the answer of the

question. We will see the true value of this sentence after solving this

question.

The scenario in this problem is very similar to that in the original

problem. In fact, in the final scene of both problems there are two closed

doors and one opened door with a goat behind it. It seems reasonable to

think that the answer to both questions might be the same. However, this

is not true.

In the original problem, the probability that there is a goat behind

the contestant’s initial choice was 1/3. However, in that problem, the

host was guaranteed to open one of the doors with a goat. In the current

problem, whether the host will open a door with a goat or not depends on

chance. Since the door that the host randomly opened had a goat behind

it, the probability that the contestant’s initial choice contains a car will

be slightly higher than 1/3. We can thus understand intuitively that the

two problems will have different answers.

Let us now approach the problem with mathematics. This problem is

indeed a simple conditional probability question. We have solved similar

problems before in this book.

If Amy’s initial choice contains the car, then the door that the host

opens is certain to contain a goat. The probability for this case is 1 3 · 1.

On the other hand, if Amy’s initial choice contains a goat, then there is a

1/2 probability for the door that the host opens to contain a goat. The

probability for this case is 2 3 · 1

2

. The case where the door that the host

opens contains a goat can only occur with these two case. However, we

are only interested in the probability of the first case. In other words, we

are looking for the probability of Amy’s door containing a car, given that

the door which the host opened randomly contained a goat. We can use

the conditional probability formula here:

A: There is a car behind Amy’s first door

B: There is a goat behind the door randomly opened by the host

P (A|B) =

=

=

= 1 2

P (A ∪ B)

P (B)

1

3 · 1

1

3 · 1 + 2 3 · 1

2

1

3

1

3 + 1 3

As a result, knowing that there is a goat behind the door randomly

opened by Monty, there is a 1 2

probability that Amy’s door contains a car.

If so, the car’s probability to be behind the other other door is also 1 2 .

Consequently, neither changing the initial door nor sticking with it makes

any difference.

Monty did not have Alzheimer’s disease in the first problem of this

book. Therefore, each time (with a probability of 1) he would open the

door which contained a goat. Unfortunately, he got older by the time you

reached the last problem of the book and now has Alzheimer’s disease.

Thus, his probability of correctly opening the door with a goat behind has

deteriorated to 1/2. This is exactly why these two problems have different

answers. As you can see, the answer of a probability problem changes

depending on whether an event occurs randomly or deterministically.

Interestingly enough, the contest in this problem appears to be no

different than the contest in the first problem. Imagine you are a viewer

in the show. When Monty opens the door with a goat behind, you would

have thought that he had done that deliberately, like he always does. So

you would think that switching doors would lead to a 2/3 probability of

getting the car. However, in reality, this is not true, because Monty has

picked the door randomly and only lucked out by actually finding the goat.

But the only way you could know that is if you were Monty himself, or

some sort of omniscient being. In probability problems, knowing by what

probability an event has occurred is just as important as knowing whether

it has occurred. Even though the same exact event has occurred in both

problems (the host opening the door containing a goat), the probabilities

of this event occurring are different (1 and 1/2). This leads to different

results.

“At a young age, Arman Özcan wrote a book that would be a valuable

resource for the readers. He brings together the most surprising and

interesting probability problems and presents their solutions with a clear

language. I congratulate him and wish his readers fun hours."

- Professor Ali Nesin, the founder of Nesin Mathematics Village

"Arman Özcan challenges his readers with beautiful problems of the

world of probability. With his humorous writing style, difficult problems

turn into enjoyable stories. I congratulate Arman Özcan, who wrote this

book at a young age and indicated that he would make remarkable

accomplishments in the future."

- Emrehan Halıcı, the president of Turkish Intelligence Foundation (TZV)