25.11.2014 Views

Biostatistics

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

114 CHAPTER 4 PROBABILITY DISTRIBUTIONS<br />

To help us understand the nature of the distribution of a continuous random variable,<br />

let us consider the data presented in Table 1.4.1 and Figure 2.3.2. In the table we have 189<br />

values of the random variable, age. The histogram of Figure 2.3.2 was constructed by<br />

locating specified points on a line representing the measurement of interest and erecting a<br />

series of rectangles, whose widths were the distances between two specified points on the<br />

line, and whose heights represented the number of values of the variable falling between<br />

the two specified points. The intervals defined by any two consecutive specified points we<br />

called class intervals. As was noted in Chapter 2, subareas of the histogram correspond to<br />

the frequencies of occurrence of values of the variable between the horizontal scale<br />

boundaries of these subareas. This provides a way whereby the relative frequency of<br />

occurrence of values between any two specified points can be calculated: merely determine<br />

the proportion of the histogram’s total area falling between the specified points. This can be<br />

done more conveniently by consulting the relative frequency or cumulative relative<br />

frequency columns of Table 2.3.2.<br />

Imagine now the situation where the number of values of our random variable is very<br />

large and the width of our class intervals is made very small. The resulting histogram could<br />

look like that shown in Figure 4.5.1.<br />

If we were to connect the midpoints of the cells of the histogram in Figure 4.5.1 to<br />

form a frequency polygon, clearly we would have a much smoother figure than the<br />

frequency polygon of Figure 2.3.4.<br />

In general, as the number of observations, n, approaches infinity, and the width of the<br />

class intervals approaches zero, the frequency polygon approaches a smooth curve such as<br />

is shown in Figure 4.5.2. Such smooth curves are used to represent graphically the<br />

distributions of continuous random variables. This has some important consequences when<br />

we deal with probability distributions. First, the total area under the curve is equal to one, as<br />

was true with the histogram, and the relative frequency of occurrence of values between<br />

any two points on the x-axis is equal to the total area bounded by the curve, the x-axis,<br />

and perpendicular lines erected at the two points on the x-axis. See Figure 4.5.3. The<br />

f (x)<br />

FIGURE 4.5.1 A histogram resulting from a large number of values<br />

and small class intervals.<br />

x

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!