05.01.2015 Views

3. Basic probability concepts

3. Basic probability concepts

3. Basic probability concepts

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

From: R.E. Strauss, Statistical Hypothesis Testing in the Biological Sciences, in preparation.<br />

All rights reserved.<br />

<strong>3.</strong> <strong>Basic</strong> <strong>probability</strong> <strong>concepts</strong><br />

It is evident that the primes are randomly distributed but, unfortunately, we<br />

don't know what ‘random’ means. [R.C. Vaughan, 1990]<br />

When there is no explanation, they give it a name, which immediately explains<br />

everything. [H. Fabing and R. Mar, 1937]<br />

Probability is a concept used to express the likelihood of an event. Qualitatively, adjectives such<br />

as “remote”, “poor”, “weak”, “good” or “strong” are often used to indicate the relative likelihood<br />

of an event. Quantitatively, <strong>probability</strong> is expressed as a fraction or decimal value between 0 and<br />

1 or, equivalently, as a percentage between 0% and 100%. A <strong>probability</strong> of 0 implies that the<br />

event cannot occur, while a <strong>probability</strong> of 1 implies that it is certain to occur.<br />

Note that in the paragraph above I said that <strong>probability</strong> is used to express the likelihood of<br />

an event, as if use of the term likelihood served to define the term <strong>probability</strong>. Actually, the<br />

terms <strong>probability</strong> and likelihood are often used interchangeably (although likelihood also has a<br />

more restricted technical definition in terms of conditional probabilities), and it is very difficult<br />

to formulate an explicit definition for <strong>probability</strong>. This is seen in the following pairs of<br />

definitions from several recent technical dictionaries:<br />

HarperCollins Dictionary of Statistics:<br />

Probability: a number between 0 and 1 which represents how likely some event is to<br />

occur.<br />

Likelihood: the <strong>probability</strong> of a specified outcome.<br />

The Cambridge Dictionary of Statistics:<br />

Probability: a measurement that denotes the likelihood that an event occurred simply<br />

by chance.<br />

Likelihood: the degree of certainty of an event occurring. Likelihood can be stated as a<br />

<strong>probability</strong>.<br />

Dictionary of Statistics Methodology:<br />

Probability: When an event can occur in a finite number of discrete outcomes, the<br />

<strong>probability</strong> of an event is the ratio of the number of ways in which the event can occur<br />

to the total number of possibilities, assuming that each of them is equally likely.<br />

1


Likelihood: A measurement of how often and probable an event might occur. It is<br />

often used as a synonym for <strong>probability</strong> and frequency especially in a qualitative<br />

context. See Probability.<br />

The term <strong>probability</strong> actually has several different meanings, both colloquial and technical,<br />

but in statistics it is used in two very different ways (Folks, 1981): (1) as an intrinsic property of<br />

some system that does not depend on our knowledge of the system; and (2) as a measure of<br />

personal belief in a statement about the system. The first usage implies that the <strong>probability</strong><br />

would exist even if humans weren’t here to estimate it. (If a tree falls in the forest with a certain<br />

<strong>probability</strong> and no one is around to hear it fall, does the <strong>probability</strong> still exist) The second<br />

usage implies that an assessment of the <strong>probability</strong> of an event might vary from person to person.<br />

(If I believe that the <strong>probability</strong> of rain tomorrow is 20%, while you believe it to be 30%, do we<br />

disagree) We will pursue these different meanings below, but for now it is important to stress<br />

that they have led to sometimes heated controversy among statisticians, either from disagreement<br />

over the meaning of the term or from failure to distinguish between the two senses of the term.<br />

Although we generally regard <strong>concepts</strong> such as sampling and <strong>probability</strong> to be relatively<br />

recent, the conception of equally likely events, independence of events, and the use of<br />

<strong>probability</strong> in making decisions date back many centuries. Rabinovitch (1970) described an<br />

early use of inferred probabilistic reasoning in the 12 th century by Maimonides, a theologian,<br />

physician and philosopher. Arbuthnot (1710), etc.<br />

Events<br />

The term event is used rather loosely in discussing <strong>probability</strong>. Informally, we think of an event<br />

as an occurrence of some sort, such as an historical event (e.g., the 1969 Apollo moon landing)<br />

or physical phenomenon (e.g., aurora borealis). In statistics, we expand that usage to include<br />

specific outcomes of probabilistic experiments, such as the head or tail of a flipped coin, the face<br />

value of a card drawn from a deck, the number of individuals of a particular species in a sample<br />

from a community, or whether a particular treated individual does or does not succumb to a<br />

disease. By a probabilistic experiment we mean an experiment whose outcome is not predictable<br />

in advance.<br />

There are two senses in which the term event is commonly used in statistics. If we think<br />

about drawing a card from a deck, event might be used to describe the act of drawing the card,<br />

which is also called an experiment. Note that this is a trivial meaning for the term experiment,<br />

compared to how it is used in science, but is commonly used in the statistics literature. Event<br />

might also be used to describe the outcome of drawing the card, such as “three of spades”. This<br />

ambiguity can be useful if the meaning is evident from the context, but can sometimes lead to<br />

confusion. I will use event to indicate a collection (or set) of one or more outcomes of an<br />

experiment, and will often use the synonym outcome when the event consists of a single<br />

outcome. The flipping of a coin is an event, with the possible outcomes of ‘head’ or ‘tail’. If the<br />

coin is fair, then (by definition) the two outcomes are equally likely.<br />

These uses of event and outcome are consistent with the statistical concept of sampling from<br />

a population. Flipping a fair coin 10 times is conceptually equivalent to drawing 10 outcomes<br />

2


(observations or entities) from an infinite population of outcomes that are either ‘heads’ or<br />

‘tails’, in equal proportions.<br />

In estimating probabilities it is useful to classify events as simple or compound. A simple<br />

event cannot be subdivided further into component events, while a compound event comprises<br />

two or more simple events. Whether a particular event is simple or compound may depend on<br />

the purpose or context of the experiment. For a card drawn from a deck, for example, the event<br />

‘diamond’ is simple by this definition, but ‘queen of diamonds’ can be considered to be<br />

compound because the card is both a ‘queen’ (one event) and a ‘diamond’ (second event).<br />

Alternately, ‘queen of diamonds’ might be considered to be simple because it describes one<br />

particular outcome out of 52. Similarly, for an organism sampled from a population, the event<br />

‘female’ is simple, but ‘adult female’ can be considered to be either compound or simple,<br />

depending on context. We will consider this distinction in more detail below.<br />

Sample spaces<br />

The set of all possible outcomes of an experiment is known as the sample space of the<br />

experiment, denoted by S. The simplest sample spaces are those in which the outcomes are<br />

discrete. If the experiment consists of flipping a coin, then S = { H,<br />

T}<br />

, where H means that the<br />

outcome of the toss is a head, and T that it is a tail. If the experiment consists of tossing a die,<br />

then the sample space is S = { 1, 2, 3, 4, 5, 6}<br />

. If the experiment consists of flipping two coins<br />

(either sequentially or simultaneously), the sample space is S = {( H, H) ,( H, T)( , T, H)( , T,<br />

T)<br />

}.<br />

The sample space for throwing two dice is:<br />

( 1,1 ), ( 1, 2 ), ( 1, 3 ), ( 1, 4 ), ( 1, 5 ), ( 1, 6)<br />

( 2,1 ), ( 2, 2 ), ( 2,3 ), ( 2, 4 ), ( 2,5 ), ( 2,6)<br />

( 3,1 ), ( 3, 2 ), ( 3,3 ), ( 3, 4 ), ( 3,5 ), ( 3, 6)<br />

( 4,1 ), ( 4, 2 ), ( 4,3 ), ( 4, 4 ), ( 4,5 ), ( 4,6)<br />

( 5,1 ), ( 5, 2 ), ( 5,3 ), ( 5, 4 ), ( 5,5 ), ( 5,6)<br />

( 6,1 ), ( 6, 2 ), ( 6,3 ), ( 6, 4 ), ( 6,5 ), ( 6,6)<br />

⎧<br />

⎫<br />

⎪<br />

⎪<br />

⎪<br />

⎪<br />

⎪<br />

⎪<br />

S = ⎨ ⎬<br />

⎪<br />

⎪<br />

⎪<br />

⎪<br />

⎪<br />

⎪<br />

⎪⎩<br />

⎪⎭<br />

Similarly, if the experiment consists of forming a diploid condition at a Mendelian locus<br />

with two alleles by sampling and combining two gametes, the sample space of the alleles is<br />

S = A,<br />

A and the sample space of the diploid locus is<br />

{ 1 2}<br />

{( 1, 1) ,( 1, 2) ,( 2, 1) ,( 2,<br />

2)<br />

}<br />

S = A A A A A A A A .<br />

Sample spaces can also be defined for continuous scales. If the experiment consists of<br />

measuring the lifetime of an organism, then there are an infinite number of possible outcomes<br />

S = 0 < time of death


Mutually exclusive events<br />

In <strong>probability</strong> theory, a set of events { E E E }<br />

1, 2, K ,<br />

n<br />

are said to be mutually exclusive if the<br />

occurrence of any one of them automatically implies the non-occurrence of the remaining n − 1<br />

events. In other words, two mutually exclusive events cannot both occur; at most one of the<br />

events can occur. The events comprising a sample space are mutually exclusive.<br />

A related concept is that of being collectively exhaustive, which means that at least one of<br />

the events must occur.<br />

Probability <strong>concepts</strong><br />

Probabilities are always expressed as fractional or decimal values, but there are important<br />

distinctions in how those values are obtained. For simple events, probabilities may be derived<br />

classically based on the concept of fairness, may be observed empirically, or may be assigned<br />

subjectively. These distinctions have important implications for statistical analyses.<br />

Classical probabilities<br />

Some events are defined so narrowly that the probabilities of their outcomes can be deduced<br />

logically. The simplest such case involves a sample space having a finite set of discrete events,<br />

with each event having the same <strong>probability</strong>. Such probabilities can be called classical<br />

probabilities. Few scientific situations are of this sort, but the derivation of classical<br />

probabilities is worth examining because the predictability of simple processes may provide<br />

insight into the more complicated situations that are so common in science.<br />

Throwing dice is one of man’s oldest known probabilistic endeavors. Because it is a cube, a<br />

modern die has six labeled faces and so a single throw of the die results in a face value from 1 to<br />

6. The sample space is thus S = { 1, 2, 3, 4, 5, 6}<br />

. What is the <strong>probability</strong> of obtaining, say, a 5<br />

If all face values are equally likely to occur, then a 5 is one possible outcome out of six, and its<br />

<strong>probability</strong> is therefore 1/6, or approximately 0.17. Note that to assign this <strong>probability</strong> we need<br />

to assume that all face values are equally likely to occur. This is the strong assumption of<br />

fairness, and in the case of a die is unlikely to be strictly true because of inhomogeneities in the<br />

material of the die. However, it may be true enough for practical purposes, sufficiently true that<br />

we are willing to live with the assumption that the die is fair.<br />

If we define an event to be the throwing of two dice, then there are more possible outcomes.<br />

Considering only the sum of the face values, there are 11 possible outcomes: 2 through 12. A<br />

naïve dice player might assume that these 11 events are all equally likely; however, a craps<br />

player who believed this would probably loose a lot of money. The <strong>probability</strong> of obtaining a 5,<br />

for example, is not simply 1/11 (approximately 0.09) because there are a number of different<br />

ways that this outcome can be generated. The throwing of two dice is a compound event, and the<br />

possible outcomes (sums) are shown in the following table.<br />

4


Value of first die<br />

Value of second die 1 2 3 4 5 6<br />

1 2 3 4 5 6 7<br />

2 3 4 5 6 7 8<br />

3 4 5 6 7 8 9<br />

4 5 6 7 8 9 10<br />

5 6 7 8 9 10 11<br />

6 7 8 9 10 11 12<br />

The table shows that there are 36 possible outcomes, four of which result in a sum of 5 for the<br />

two dice. If the dice are fair then all 36 outcomes are equally likely to occur, and the <strong>probability</strong><br />

of a 5 on one throw of two fair die is 4/36, or approximately 0.11. The most likely outcomes<br />

from throwing two dice simultaneously are sums of either 6 or 7; both outcomes have<br />

probabilities of 6/36 = 1/6, or approximately 0.17. Since there is only one way to throw a sum of<br />

12 (=6+6), its <strong>probability</strong> is 1/36, or approximately 0.0<strong>3.</strong><br />

These examples suggest a general definition for classical <strong>probability</strong>:<br />

The classical <strong>probability</strong> of an outcome is the number of ways that outcome can occur,<br />

relative to the total number of ways that all outcomes of an event can be generated.<br />

Or, more concisely,<br />

Pr ( outcome)<br />

number of ways outcome can occur<br />

=<br />

number of ways all outcomes can occur<br />

where Pr( outcome ) is read as “the <strong>probability</strong> of the outcome”. If we call a particular outcome<br />

A, this can be written as<br />

nA ( )<br />

Pr( A)<br />

= .<br />

n<br />

The most common use of classical probabilities in statistics is to serve as null hypotheses.<br />

For example, when we flip a fair coin, the <strong>probability</strong> of observing a head is the same as the<br />

<strong>probability</strong> of a tail; both are ½. If we have reason to believe that the coin is not fair, the<br />

assumption of fairness is translated into a null hypothesis of equal <strong>probability</strong>, which can then be<br />

tested using empirical probabilities (described in the following section). In an ecological study<br />

of predator behavior in which the predator has four potential prey species, an initial and<br />

simplistic null hypothesis might be that the prey species are equally likely to be selected by the<br />

predator, each with a <strong>probability</strong> of ¼. This can be tested using empirical probabilities.<br />

Empirical probabilities<br />

Empirical probabilities are values derived by sampling, and are based on the concept of<br />

long-run or asymptotic relative frequency. The relative frequency of a particular outcome is the<br />

5


proportion of that outcome that we observe in a sample of observations. The long-run relative<br />

frequency is the proportion that we observe as the sample becomes very large. In most<br />

circumstances, as the sample size increases toward infinity, the relative frequency of the outcome<br />

in the sample will converge on its true value in the population. After estimating the long-run<br />

relative frequency, we then turn the situation around and interpret the frequency as the<br />

<strong>probability</strong> of observing the outcome of interest by sampling a single observation from the<br />

population.<br />

Say that we want to estimate the allele frequencies of individuals for the MN blood group in<br />

a particular human population. The three possible genotypes are MM, MN, and NN. We could<br />

sample a number of individuals (the more, the better for this purpose, as long as the sample is<br />

representative of the population) and determine the genotype of each. From these we could in<br />

turn estimate the frequencies of the alleles M and N. For example, the following table<br />

summarizes the classic data of Race and Sanger (1962) on MN genotypes from a British<br />

population.<br />

Phenotype M MN N<br />

Genotype MM MN NN Total<br />

Number (absolute frequency) 363 634 282 1279<br />

Relative frequency 0.284 0.496 0.220 1.000<br />

Based on these data, the relative frequencies of the M and N alleles can be estimated as the<br />

homozygote frequencies plus half of the heterozygote frequency:<br />

1 1<br />

M MM 2 MN<br />

2<br />

1 1<br />

N NN 2 MN<br />

2<br />

( )<br />

( )<br />

p = f + f = 0.284 + 0.496 = 0.532<br />

p = f + f = 0.220 + 0.496 = 0.468<br />

where p indicates a relative frequency and f an absolute frequency (count). The samples sizes<br />

are rather large, so we can assume (for now) that these estimates are reasonably precise. We can<br />

now interpret the relative frequencies as probabilities. The p M value of 0.532 indicates that our<br />

best estimate of the <strong>probability</strong> of drawing a single M allele from the gene pool is 5<strong>3.</strong>2%. In<br />

other words, for every 1000 alleles drawn from the gene pool, 532 will, on average, be M alleles.<br />

Derivation of probabilities from relative frequencies is logically circular, of course, but still<br />

valuable.<br />

The strong assumption of fairness also applies to empirical probabilities: all entities are<br />

assumed to be equally likely to occur. Thus empirical probabilities are based on logical<br />

probabilities assigned to individual observations. For example, if we use a mark-recapture<br />

method to estimate the number of fish living within a lake, the estimate will most certainly be<br />

based on the strong assumption that every fish in the lake stands an equal chance of being<br />

caught. A sample in which every individual has an equal <strong>probability</strong> of selection is a random<br />

sample. We will see that this “assumption of randomness” is basic to many statistical<br />

procedures. If the sample is not truly random in this sense, then the long-run relative frequency<br />

of the outcome in the sample will not converge on its true value in the population. The relative<br />

6


frequency (and thus estimated <strong>probability</strong>) of the outcome will then be biased, but we will be<br />

unaware of the nature and extent of the bias. Minimizing bias is one of the main objectives of<br />

sampling theory.<br />

The classical and empirical definitions of <strong>probability</strong> are basically identical, but because the<br />

total number of potential observations can be large, the terminology is modified slightly:<br />

The empirical <strong>probability</strong> of an outcome is observed frequency of that outcome, relative<br />

to the total number of observations.<br />

Pr ( outcome)<br />

observed frequency<br />

=<br />

total frequency<br />

nA ( )<br />

Pr( A)<br />

=<br />

n<br />

The variable n in the case represents the number of observations rather than the number of<br />

distinct outcomes. For obvious reasons this is known as the frequency (or frequentist) definition<br />

of <strong>probability</strong>. As a definition, it is equally valid no matter what the observed and total<br />

frequencies. However, as an estimate of some “true” population frequency, it is more precise for<br />

larger than for smaller total frequencies.<br />

For example, say that we are trying to estimate the sex ratio of a (biological) population of<br />

field mice living in a particular large field, which we define to be our statistical population. The<br />

sex ratio (expressed in this case as the ratio of males to total number of individuals) can be<br />

viewed as the <strong>probability</strong> of randomly drawing a male from the population, assuming that all<br />

individuals are equally likely to be caught. (Is this a reasonable assumption) We might<br />

estimate this <strong>probability</strong> by sampling 10 individuals from the population and discovering that 6<br />

are males, giving Pr( male ) = 6 /10 = 0.6 . Or we might sample 1000 individuals and discover<br />

that 513 are males, giving Pr( male ) = 513/1000 = 0.513 . It is reasonable to conclude that the<br />

second estimate is somehow better than the first, because it is larger. We will show later (when<br />

discussing confidence intervals) in what sense the second estimate is better than the first.<br />

One potential use of empirical probabilities is to test null hypotheses that specify explicit<br />

values for population parameters. For example, our null hypothesis might be that the population<br />

of field mice consists of half females and half males – i.e., that the probabilities of randomly<br />

drawing a female or drawing a male from the population are both ½. We might posit these<br />

probabilities simply because there are two classes of individuals, which are assumed to be<br />

equally likely (the classical frequencies). Or, if we are more informed, we might posit the<br />

probabilities because Fisher’s (1930) sex-ratio model predicts for most populations a stable<br />

equilibrium with equal numbers of females and males. In either case, the null-hypothesis values<br />

of ½ and ½ can be tested by randomly sampling individuals from the population and determining<br />

the relative frequencies, which are then used as estimates of probabilities to compare against the<br />

null-hypothesis values. This raises the issue of just how different the observed frequencies have<br />

to be from the postulated values to reject the null hypothesis. We will return to this example<br />

when discussing statistical tests of hypotheses.<br />

7


A consequence of the frequency definition of <strong>probability</strong> is that, for a frequentist, <strong>probability</strong><br />

is defined as a state of nature that is independent of a person’s cognitions or beliefs. As a result,<br />

if two frequentists disagree about the value of a <strong>probability</strong>, at least one of them must be wrong.<br />

Subjective probabilities<br />

Unlike classical and empirical probabilities, which are derived either from knowledge of<br />

potential outcomes or from data, a subjective (or degree-of-belief or subjectivist) <strong>probability</strong><br />

describes an individual’s personal judgment about how likely a particular event is to occur. It is<br />

based not on a precise computation, but rather on a reasonable assessment of the situation by a<br />

knowledgeable person. Subjective probabilities can be based on qualitative factors, previous<br />

experience in similar situations, and intuition. The subjectivist interpretation of <strong>probability</strong><br />

differs from the frequentist interpretation in two important ways. First, the subjectivist defines<br />

<strong>probability</strong> as a belief an observer has about nature, rather than as a state of nature independent<br />

of an observer. Second, subjectivist’s beliefs about the value of a <strong>probability</strong> may differ for<br />

different observers.<br />

For example, meteorologists often assign probabilities to particular weather conditions, such<br />

as “The <strong>probability</strong> of rain tomorrow is 20%”. Although meteorological models are becoming<br />

increasingly sophisticated and may be used to estimate frequentist probabilities by running<br />

thousands of trials, traditionally a meteorologist would consider winds, pressure areas and fronts,<br />

adjacent weather systems, and perhaps a decent amount of intuition to estimate the <strong>probability</strong> of<br />

rain. The estimate would apply implicitly to a specific area of interest (domain). The <strong>probability</strong><br />

of rain on the surface of the earth is likely to be close to 1, and in my back yard it may be close<br />

to 0, while still being 20% for the region in which I live.<br />

Regardless of how the <strong>probability</strong> is estimated, we typically use the information subjectively<br />

to make decisions about how to dress and whether to carry rain gear. If I’m walking, I might<br />

decide to carry a coat and umbrella if the <strong>probability</strong> of rain is greater than 50%. If I’m walking<br />

and wearing a suit, I might decide to carry a coat and umbrella if the <strong>probability</strong> of rain is greater<br />

than 60%. If I’m driving, however, I might skip the coat and grab an umbrella only if the<br />

<strong>probability</strong> is greater than 70% or 80%. My decision would be based on past experience. A<br />

person more concerned about getting wet might carry an umbrella whenever the <strong>probability</strong> of<br />

rain is greater than 30%.<br />

As an example of how the assessment of a subjective <strong>probability</strong> might differ for different<br />

individuals, consider the case of three people who have different information about the outcome<br />

of a specific toss of a fair coin. Mary, who saw the flip of the coin come up heads, would say the<br />

<strong>probability</strong> that the coin landed heads is 1. Javier, who didn’t see the outcome of the coin flip,<br />

might say that the <strong>probability</strong> that the coin landed heads is ½. And Xiao, who didn’t see the coin<br />

flip but did see the look on Mary’s face immediately following the flip, might say that the<br />

<strong>probability</strong> that the coin landed heads is ¾. Under the subjectivist definition of <strong>probability</strong> all<br />

three individuals would be equally correct as long as their beliefs were consistent with the<br />

axioms of <strong>probability</strong>.<br />

Although their quality depends on the experience of the person producing them, subjective<br />

probabilities can be perfectly acceptable estimates of “true” probabilities. However, whether<br />

8


subjective probabilities should be used in further statistical analyses is a subject of some<br />

controversy. Statisticians of the frequentist (also called objectivist or classical) school of thought<br />

reject their use, while those of the bayesian school accept them. We will discuss this contrast in<br />

more detail below.<br />

Compound events<br />

Multiplication rule for simultaneous probabilities<br />

Addition rule<br />

Statistical independence<br />

Counting rules: permutations and combinations<br />

Conditional probabilities<br />

Odds ratios<br />

Likelihood and maximum likelihood<br />

Bayesian probabilities<br />

Bayes’ theorem<br />

Random variables<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!