01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.3 Summariz<strong>in</strong>g numerical data 23<br />

frequencies are also tabulated. These give the percentages of the total who are<br />

younger than the lower limit of the follow<strong>in</strong>g <strong>in</strong>terval, that is, 9 8% of the<br />

subjects are <strong>in</strong> the age groups 25±34 and 35±44 and so are younger than 45.<br />

The advantages <strong>in</strong> present<strong>in</strong>g numerical data <strong>in</strong> the form of a frequency<br />

distribution rather than a long list of <strong>in</strong>dividual observations are too obvious to<br />

need stress<strong>in</strong>g. On the other hand, if there are only a few observations, a frequency<br />

distribution will be of little value s<strong>in</strong>ce the number of read<strong>in</strong>gs fall<strong>in</strong>g <strong>in</strong>to each<br />

group will be too small to permit any mean<strong>in</strong>gful pattern to emerge.<br />

We now consider <strong>in</strong> more detail the practical task of form<strong>in</strong>g a frequency<br />

distribution. If the variable is to be grouped, a decision will have to be taken<br />

about the end-po<strong>in</strong>ts of the groups. For convenience these should be chosen, as<br />

far as possible, to be `round' numbers. For distributions of age, for example, it is<br />

customary to use multiples of 5 or 10 as the boundaries of the groups. Care<br />

should be taken <strong>in</strong> decid<strong>in</strong>g <strong>in</strong> which group to place an observation fall<strong>in</strong>g on one<br />

of the group boundaries, and the decision must be made clear to the reader.<br />

Usually such an observation is placed <strong>in</strong> the group of which the observation is<br />

the lower limit. For example, <strong>in</strong> Table 2.3 a count of 20 lesions would be placed<br />

<strong>in</strong> the group 20±, which <strong>in</strong>cludes all counts between 20 and 29, and this convention<br />

is <strong>in</strong>dicated by the notation used for the groups.<br />

How many groups should there be? No clear-cut rule can be given. To<br />

provide a useful, concise <strong>in</strong>dication of the nature of the distribution, fewer<br />

than five groups will usually be too few and more than 20 will usually be too<br />

many. Aga<strong>in</strong>, if too large a number of groups is chosen, the <strong>in</strong>vestigator may f<strong>in</strong>d<br />

that many of the groups conta<strong>in</strong> frequencies which are too small to provide any<br />

regularity <strong>in</strong> the shape of the distribution. For a given size of group<strong>in</strong>g <strong>in</strong>terval<br />

this difficulty will become more acute as the total number of observations is<br />

reduced, and the choice of group<strong>in</strong>g <strong>in</strong>terval may, therefore, depend on this<br />

number. If <strong>in</strong> doubt, the group<strong>in</strong>g <strong>in</strong>terval may be chosen smaller than that to<br />

be f<strong>in</strong>ally used, and groups may be amalgamated <strong>in</strong> the most appropriate way<br />

after the distribution has been formed.<br />

If the orig<strong>in</strong>al data are conta<strong>in</strong>ed <strong>in</strong> a computer file, a frequency distribution<br />

can readily be formed by use of a statistical package. If the measurements are<br />

available only as a list on paper, the counts should be made by go<strong>in</strong>g systematically<br />

through the list, `tally<strong>in</strong>g' each measurement <strong>in</strong>to its appropriate group.<br />

The whole process should be repeated as a check. The alternative method of<br />

tak<strong>in</strong>g each group <strong>in</strong> turn and count<strong>in</strong>g the observations fall<strong>in</strong>g <strong>in</strong>to that group is<br />

not to be recommended, as it requires the scann<strong>in</strong>g of the list of observations<br />

once for each group (or more than once if a check is required) and thus<br />

encourages mistakes.<br />

If the number of observations is not too great (say, fewer than about 50), a<br />

frequency distribution can be depicted graphically by a diagram such as Fig. 2.5.<br />

Here each <strong>in</strong>dividual observation is represented by a dot or some other mark

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!