01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

46 Describ<strong>in</strong>g data<br />

aga<strong>in</strong>st their effect may be sought. We shall discuss <strong>in</strong> §4.2 the idea of estimat<strong>in</strong>g<br />

the mean of a large population from a sample of observations drawn from it. In<br />

most situations we should be content to do this by calculat<strong>in</strong>g the mean of the<br />

sample values, but we might sometimes seek an estimator that is less <strong>in</strong>fluenced<br />

than the sample mean by occasional outliers. This approach is called robust<br />

estimation, and a wide range of such estimators has been suggested. In §2.4 we<br />

commended the sample median as a measure of location on the grounds that it is<br />

less <strong>in</strong>fluenced by outliers than is the mean. For positive-valued observations, the<br />

logarithmic transformation described <strong>in</strong> §2.5, and the use of the geometric mean,<br />

would have a similar effect <strong>in</strong> damp<strong>in</strong>g down the effect of outly<strong>in</strong>g high values,<br />

but unfortunately it would have the opposite effect of exaggerat<strong>in</strong>g the effect of<br />

outly<strong>in</strong>g low values. One of the most widely used robust measures of location is<br />

the trimmed mean, obta<strong>in</strong>ed by omitt<strong>in</strong>g some of the most extreme observations<br />

(for example, a fixed proportion <strong>in</strong> each tail) and tak<strong>in</strong>g the mean of the rest.<br />

These estimators are remarkably efficient for samples from normal distributions<br />

(§3.8), and better than the sample mean for distributions `contam<strong>in</strong>ated' with a<br />

moderate proportion of outliers. The choice of method (e.g. the proportion to be<br />

trimmed from the tails) is not entirely straightforward, and the precision of the<br />

result<strong>in</strong>g estimator may be difficult to determ<strong>in</strong>e.<br />

Similar methods are available for more complex problems, although the<br />

choice of method is, as <strong>in</strong> the simpler case of the mean, often arbitrary, and<br />

the details of the analysis may be complicated. See Draper and Smith (1998,<br />

Chapter 25) for further discussion <strong>in</strong> the case of multiple regression (§11.6).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!