01.03.2013 Views

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

Applied Statistics Using SPSS, STATISTICA, MATLAB and R

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.6 Bootstrap Estimation 101<br />

about the bootstrap technique is that it also often works for other statistics for<br />

which no theory on sampling distribution is available. As a matter of fact, the<br />

bootstrap distribution usually – for a not too small original sample size, say n > 50<br />

− has the same shape <strong>and</strong> spread as the original sampling distribution, but is<br />

centred at the original statistic value rather than the true parameter value.<br />

12<br />

n<br />

10<br />

8<br />

6<br />

4<br />

2<br />

300<br />

n<br />

x<br />

x *<br />

0<br />

0<br />

a 100 200 300 400 500 600 700b<br />

300 320 340 360 380 400 420<br />

Figure 3.7. a) Histogram of the PRT data; b) Histogram of the bootstrap means.<br />

Suppose that the bootstrap distribution of a statistic, w, is approximately normal<br />

<strong>and</strong> that the bootstrap estimate of bias is small. We then compute a two-sided<br />

bootstrap confidence interval at α risk, for the parameter that corresponds to the<br />

statistic, by the following formula:<br />

w ± tn<br />

−1,<br />

1−α<br />

/ 2<br />

SE<br />

boot<br />

We may use the percentiles of the normal distribution, instead of the Student’s t<br />

distribution, whenever m is very large.<br />

The question naturally arises on how large must the number of bootstrap<br />

samples be in order to obtain a reliable bootstrap distribution with reliable values<br />

of SEboot? A good rule of thumb for m, based on theoretical <strong>and</strong> practical evidence,<br />

is to choose m ≥ 200.<br />

The following examples illustrate the computation of confidence intervals using<br />

the bootstrap technique.<br />

Example 3.9<br />

Q: Consider the percentage of lime, CaO, in the composition of clays, a sample of<br />

which constitutes the Clays’ dataset. Compute the confidence interval at 95%<br />

level of the two-tail 5% trimmed mean <strong>and</strong> discuss the results. (The two-tail 5%<br />

trimmed mean disregards 10% of the cases, 5% at each of the tails.)<br />

250<br />

200<br />

150<br />

100<br />

50

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!