01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The bootstrapÐbasic ideas and standard errors<br />

Most statistical analyses attempt to make <strong>in</strong>ferences about a population on the<br />

basis of a sample drawn from that population. The distribution function F of a<br />

variable x <strong>in</strong> the population is almost always unknown. Analyses usually proceed<br />

by focus<strong>in</strong>g attention on specific summaries of F, such as the mean mF, which are, of course, also unknown (the subscript F is used to emphasize the<br />

dependence of the mean on the underly<strong>in</strong>g distribution). Most analyses estimate<br />

parameters such as mF from random samples drawn from the population, us<strong>in</strong>g<br />

estimators that have usually been constructed <strong>in</strong> a sensible way and which have<br />

been found to have good properties (e.g. m<strong>in</strong>imum variance, unbiased, etc.).<br />

Naturally the values of the estimates will vary from sample to sample, i.e. the<br />

estimates are affected by sampl<strong>in</strong>g variation (see §4.1). This source of variability<br />

is usually quantified through the relevant standard error, which for the mean is<br />

simply sF= n<br />

p , or more precisely through the estimate of the standard error,<br />

s= n<br />

p .<br />

The bootstrap approaches this problem from a rather different perspective.<br />

For the problem of estimat<strong>in</strong>g a mean and its associated standard error, it<br />

essentially reproduces the usual estimators. However, the new perspective permits<br />

estimation <strong>in</strong> a very wide variety of situations, <strong>in</strong>clud<strong>in</strong>g many that have<br />

proved <strong>in</strong>tractable for conventional methods. Rather than try<strong>in</strong>g to make <strong>in</strong>ferences<br />

about F by estimat<strong>in</strong>g summaries such as the mean and variance, the<br />

bootstrap approach estimates F directly and then estimates any relevant summaries<br />

through the estimated F. Various estimates, ^F,ofFare possible and some<br />

of these will be discussed briefly below. However, the most general form and the<br />

only one that we consider <strong>in</strong> detail is the empirical distribution function (EDF),<br />

which is a discrete distribution that gives probability 1=n to each sample po<strong>in</strong>t<br />

xi…i ˆ 1, ..., n†. Us<strong>in</strong>g the formula for the expectation of a probability distribution<br />

(3.6) the mean of the EDF is m^F , which is<br />

m^F<br />

ˆ 1<br />

n<br />

x1 ‡ 1<br />

n<br />

10.7 The bootstrap and the jackknife 299<br />

x2 ‡ ...‡ 1<br />

n<br />

i.e. m^F ˆ x, the sample mean, which is the usual estimator of mF. The same type<br />

of argument gives the variance of ^F as s2 ^F ˆ n 1 P …xi x† 2 , which aga<strong>in</strong> is the<br />

usual estimator, apart from the denom<strong>in</strong>ator be<strong>in</strong>g n rather than n 1. This<br />

discrepancy arises because xisthe mean of ^F, rather than merely an estimator of<br />

it. The bootstrap estimate of the standard error of the mean is then s^F = n<br />

p ,<br />

which is s= n<br />

p p<br />

but for the factor ‰n=…n 1†Š.<br />

The bootstrap has provided estimators of the mean and its standard error<br />

that are virtually the same as the conventional ones but has done so by first<br />

estimat<strong>in</strong>g the distribution function. Because ^F is a known function, properties<br />

such as its mean and variance can be computed and these used as estimators.<br />

xn

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!