01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

40 Describ<strong>in</strong>g data<br />

Therefore the mean of the deviations xi x will always be zero. We could, however,<br />

take the mean of the deviations ignor<strong>in</strong>g their sign, i.e. count<strong>in</strong>g them all as<br />

positive. These quantities are called the absolute values of the deviations and are<br />

denoted by jxi xj. Their mean, P jxi xj=n, is called the mean deviation. This<br />

measure has the drawback of be<strong>in</strong>g difficult to handle mathematically, and we<br />

shall not consider it any further <strong>in</strong> this book.<br />

Another way of gett<strong>in</strong>g over the difficulty caused by the positive and negative<br />

signs is to square them. The mean value of the squared deviations is called the<br />

variance and is a most important measure <strong>in</strong> statistics. Its formula is<br />

P<br />

…xi x†<br />

Variance ˆ<br />

2<br />

: …2:1†<br />

n<br />

The numerator is often called the sum of squares about the mean. The variance is<br />

measured <strong>in</strong> the square of the units <strong>in</strong> which x is measured. For example, if x is<br />

height <strong>in</strong> cm, the variance will be measured <strong>in</strong> cm2 . It is convenient to have a<br />

measure of variation expressed <strong>in</strong> the orig<strong>in</strong>al units of x, and this can be easily<br />

done by tak<strong>in</strong>g the square root of the variance. This quantity is known as the<br />

standard deviation, and its formula is<br />

P<br />

…xi x†<br />

Standard deviation ˆ<br />

2<br />

s"<br />

#<br />

: …2:1a†<br />

n<br />

In practice, <strong>in</strong> calculat<strong>in</strong>g variances and standard deviations, the n <strong>in</strong> the<br />

denom<strong>in</strong>ator is almost always replaced by n 1. The reason for this is that <strong>in</strong><br />

apply<strong>in</strong>g the methods of statistical <strong>in</strong>ference, developed later <strong>in</strong> this book, it is<br />

useful to regard the collection of observations as be<strong>in</strong>g a sample drawn from a<br />

much larger group of possible read<strong>in</strong>gs. The large group is often called a<br />

population. When we calculate a variance or a standard deviation we may wish<br />

not merely to describe the variation <strong>in</strong> the sample with which we are deal<strong>in</strong>g, but<br />

also to estimate as best we can the variation <strong>in</strong> the population from which the<br />

sample is supposed to have been drawn. In a certa<strong>in</strong> respect (see §5.1) a better<br />

estimate of the population variance is obta<strong>in</strong>ed by us<strong>in</strong>g a divisor n 1 <strong>in</strong>stead of<br />

n. Thus, we shall almost always use the formula for the estimated variance or<br />

sample variance:<br />

Estimated variance, s 2 P<br />

…xi x†<br />

ˆ<br />

2<br />

, …2:2†<br />

n 1<br />

and, similarly,<br />

P<br />

…xi x†<br />

Estimated standard deviation, s ˆ<br />

2<br />

s"<br />

#<br />

: …2:2a†<br />

n 1<br />

Hav<strong>in</strong>g established the convention we shall very often omit the word `estimated'<br />

and refer to s 2 and s as `variance' and `standard deviation', respectively.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!