01.04.2014 Views

Chapter 14 - Bootstrap Methods and Permutation Tests - WH Freeman

Chapter 14 - Bootstrap Methods and Permutation Tests - WH Freeman

Chapter 14 - Bootstrap Methods and Permutation Tests - WH Freeman

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>14</strong>-<strong>14</strong> CHAPTER <strong>14</strong> <strong>Bootstrap</strong> <strong>Methods</strong> <strong>and</strong> <strong>Permutation</strong> <strong>Tests</strong><br />

bias<br />

bootstrap<br />

estimate of bias<br />

• Center: A statistic is biased as an estimate of the parameter if its sampling<br />

distribution is not centered at the true value of the parameter. We<br />

can check bias by seeing whether the bootstrap distribution of the statistic<br />

is centered at the value of the statistic for the original sample.<br />

More precisely, the bias of a statistic is the difference between the mean<br />

of its sampling distribution <strong>and</strong> the true value of the parameter. The bootstrap<br />

estimate of bias is the difference between the mean of the bootstrap<br />

distribution <strong>and</strong> the value of the statistic in the original sample.<br />

• Spread: The bootstrap st<strong>and</strong>ard error of a statistic is the st<strong>and</strong>ard deviation<br />

of its bootstrap distribution. The bootstrap st<strong>and</strong>ard error estimates<br />

the st<strong>and</strong>ard deviation of the sampling distribution of the statistic.<br />

<strong>Bootstrap</strong> t confidence intervals<br />

If the bootstrap distribution of a statistic shows a normal shape <strong>and</strong> small<br />

bias, we can get a confidence interval for the parameter by using the bootstrap<br />

st<strong>and</strong>ard error <strong>and</strong> the familiar t distribution. An example will show how<br />

this works.<br />

We are interested in the selling prices of residential real estate in<br />

EXAMPLE <strong>14</strong>.4<br />

Seattle, Washington. Table <strong>14</strong>.1 displays the selling prices of a r<strong>and</strong>om<br />

sample of 50 pieces of real estate sold in Seattle during 2002, as recorded by the<br />

county assessor. 6 Unfortunately, the data do not distinguish residential property from<br />

commercial property. Most sales are residential, but a few large commercial sales in<br />

a sample can greatly increase the sample mean selling price.<br />

Figure <strong>14</strong>.6 shows the distribution of the sample prices. The distribution is far<br />

from normal, with a few high outliers that may be commercial sales. The sample is<br />

small, <strong>and</strong> the distribution is highly skewed <strong>and</strong> “contaminated” by an unknown number<br />

of commercial sales. How can we estimate the center of the distribution despite<br />

these difficulties?<br />

The first step is to ab<strong>and</strong>on the mean as a measure of center in favor of a<br />

statistic that is more resistant to outliers. We might choose the median, but<br />

in this case we will use a new statistic, the 25% trimmed mean.<br />

TABLE <strong>14</strong>.1<br />

Sellingprices for Seattle real estate, 2002 ($1000s)<br />

<strong>14</strong>2 175 197.5 <strong>14</strong>9.4 705 232 50 <strong>14</strong>6.5 155 1850<br />

132.5 215 116.7 244.9 290 200 260 449.9 66.407 164.95<br />

362 307 266 166 375 244.95 210.95 265 296 335<br />

335 1370 256 <strong>14</strong>8.5 987.5 324.5 215.5 684.5 270 330<br />

222 179.8 257 252.95 <strong>14</strong>9.95 225 217 570 507 190

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!