01.04.2014 Views

Chapter 14 - Bootstrap Methods and Permutation Tests - WH Freeman

Chapter 14 - Bootstrap Methods and Permutation Tests - WH Freeman

Chapter 14 - Bootstrap Methods and Permutation Tests - WH Freeman

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>14</strong>-22 CHAPTER <strong>14</strong> <strong>Bootstrap</strong> <strong>Methods</strong> <strong>and</strong> <strong>Permutation</strong> <strong>Tests</strong><br />

800<br />

Smooth<br />

Regression line<br />

600<br />

Payoff<br />

400<br />

200<br />

0 200 400 600 800 1000<br />

Number<br />

FIGURE <strong>14</strong>.10 The first 254 winning numbers in the New Jersey Pick-<br />

It Lottery <strong>and</strong> the payoffs for each. To see patterns we use least-squares<br />

regression (line) <strong>and</strong> a scatterplot smoother (curve).<br />

Although all numbers are equally likely to win, numbers chosen by fewer people<br />

have bigger payoffs if they win because the prize is shared among fewer tickets. Figure<br />

<strong>14</strong>.10 is a scatterplot of the first 254 winning numbers <strong>and</strong> their payoffs. What patterns<br />

can we see?<br />

The straight line in Figure <strong>14</strong>.10 is the least-squares regression line. The<br />

line shows a general trend of higher payoffs for larger winning numbers. The<br />

curve in the figure was fitted to the plot by a scatterplot smoother that follows<br />

local patterns in the data rather than being constrained to a straight line. The<br />

curve suggests that there were larger payoffs for numbers in the intervals 000<br />

to 100, 400 to 500, 600 to 700, <strong>and</strong> 800 to 999. When people pick “r<strong>and</strong>om”<br />

numbers, they tend to choose numbers starting with 2, 3, 5, or 7, so these numbers<br />

have lower payoffs. This pattern disappeared after 1976; it appears that<br />

players noticed the pattern <strong>and</strong> changed their number choices.<br />

Are the patterns displayed by the scatterplot smoother just chance? We<br />

can use the bootstrap distribution of the smoother’s curve to get an idea of<br />

how much r<strong>and</strong>om variability there is in the curve. Each resample “statistic”<br />

is now a curve rather than a single number. Figure <strong>14</strong>.11 shows the curves that<br />

result from applying the smoother to 20 resamples from the 254 data points in<br />

Figure <strong>14</strong>.10. The original curve is the thick line. The spread of the resample<br />

curves about the original curve shows the sampling variability of the output<br />

of the scatterplot smoother.<br />

Nearly all the bootstrap curves mimic the general pattern of the original<br />

smoother curve, showing, for example, the same low average payoffs for numbers<br />

in the 200s <strong>and</strong> 300s. This suggests that these patterns are real, not just<br />

chance.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!