3. Basic probability concepts
3. Basic probability concepts
3. Basic probability concepts
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
frequency (and thus estimated <strong>probability</strong>) of the outcome will then be biased, but we will be<br />
unaware of the nature and extent of the bias. Minimizing bias is one of the main objectives of<br />
sampling theory.<br />
The classical and empirical definitions of <strong>probability</strong> are basically identical, but because the<br />
total number of potential observations can be large, the terminology is modified slightly:<br />
The empirical <strong>probability</strong> of an outcome is observed frequency of that outcome, relative<br />
to the total number of observations.<br />
Pr ( outcome)<br />
observed frequency<br />
=<br />
total frequency<br />
nA ( )<br />
Pr( A)<br />
=<br />
n<br />
The variable n in the case represents the number of observations rather than the number of<br />
distinct outcomes. For obvious reasons this is known as the frequency (or frequentist) definition<br />
of <strong>probability</strong>. As a definition, it is equally valid no matter what the observed and total<br />
frequencies. However, as an estimate of some “true” population frequency, it is more precise for<br />
larger than for smaller total frequencies.<br />
For example, say that we are trying to estimate the sex ratio of a (biological) population of<br />
field mice living in a particular large field, which we define to be our statistical population. The<br />
sex ratio (expressed in this case as the ratio of males to total number of individuals) can be<br />
viewed as the <strong>probability</strong> of randomly drawing a male from the population, assuming that all<br />
individuals are equally likely to be caught. (Is this a reasonable assumption) We might<br />
estimate this <strong>probability</strong> by sampling 10 individuals from the population and discovering that 6<br />
are males, giving Pr( male ) = 6 /10 = 0.6 . Or we might sample 1000 individuals and discover<br />
that 513 are males, giving Pr( male ) = 513/1000 = 0.513 . It is reasonable to conclude that the<br />
second estimate is somehow better than the first, because it is larger. We will show later (when<br />
discussing confidence intervals) in what sense the second estimate is better than the first.<br />
One potential use of empirical probabilities is to test null hypotheses that specify explicit<br />
values for population parameters. For example, our null hypothesis might be that the population<br />
of field mice consists of half females and half males – i.e., that the probabilities of randomly<br />
drawing a female or drawing a male from the population are both ½. We might posit these<br />
probabilities simply because there are two classes of individuals, which are assumed to be<br />
equally likely (the classical frequencies). Or, if we are more informed, we might posit the<br />
probabilities because Fisher’s (1930) sex-ratio model predicts for most populations a stable<br />
equilibrium with equal numbers of females and males. In either case, the null-hypothesis values<br />
of ½ and ½ can be tested by randomly sampling individuals from the population and determining<br />
the relative frequencies, which are then used as estimates of probabilities to compare against the<br />
null-hypothesis values. This raises the issue of just how different the observed frequencies have<br />
to be from the postulated values to reject the null hypothesis. We will return to this example<br />
when discussing statistical tests of hypotheses.<br />
7