20.03.2013 Views

From Algorithms to Z-Scores - matloff - University of California, Davis

From Algorithms to Z-Scores - matloff - University of California, Davis

From Algorithms to Z-Scores - matloff - University of California, Davis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

280 CHAPTER 14. INTRODUCTION TO MODEL BUILDING<br />

is the same for all distributions having a density. This fact (whose pro<strong>of</strong> is related <strong>to</strong> the<br />

general method for simulating random variables having a given density, in Section 5.7) tells us<br />

that, without knowing anything about the distribution <strong>of</strong> X, we can be sure that M has the same<br />

distribution. And it turns out that<br />

Define “upper” and “lower” functions<br />

So, what (14.23) and (14.24) tell us is<br />

FM(1.358n −1/2 ) = 0.95 (14.24)<br />

U(t) = FX(t) + 1.358n −1/2 , L(t) = FX(t) − 1.358n −1/2<br />

(14.25)<br />

0.95 = P (the curve FX is entirely between U and L) (14.26)<br />

So, the pair <strong>of</strong> curves, (L(t), U(t)) is called a a 95% confidence band for FX.<br />

The usefulness is similar <strong>to</strong> that <strong>of</strong> confidence intervals. If the band is very wide, we know we really<br />

don’t have enough data <strong>to</strong> decide much about the distribution <strong>of</strong> X. But if the band is narrow but<br />

some member <strong>of</strong> the family comes reasonably close <strong>to</strong> the band, we would probably decide that the<br />

model is a good one, even if no member <strong>of</strong> the family falls within the band. Once again, we should<br />

NOT pounce on tiny deviations from the model.<br />

Warning: The Kolmogorov-Smirnov procedure available in the R language performs only a hypothesis<br />

test, rather than forming a confidence band. In other words, it simply checks <strong>to</strong> see whether a<br />

member <strong>of</strong> the family falls within the band. This is not what we want, because we may be perfectly<br />

happy if a member is only near the band.<br />

Of course, another way, this one less formal, <strong>of</strong> assessing data for suitability for some model is <strong>to</strong><br />

plot the data in a his<strong>to</strong>gram or something <strong>of</strong> that naure.<br />

14.3 Bias Vs. Variance—Again<br />

In our unit on estimation, Section 12.4, we saw a classic trade<strong>of</strong>f in his<strong>to</strong>gram- and kernel-based<br />

density estima<strong>to</strong>rs. With his<strong>to</strong>grams, for instance, the wider bin width produces a graph which is<br />

smoother, but possibly <strong>to</strong>o smooth, i.e. with less oscillation than the true population curve has.<br />

The same problem occurs with larger values <strong>of</strong> h in the kernel case.<br />

This is actually yet another example <strong>of</strong> the bias/variance trade<strong>of</strong>f, discussed in above and, as<br />

mentioned, ONE OF THE MOST RECURRING NOTIONS IN STATISTICS. A large

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!