12.07.2015 Views

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

values <strong>of</strong> (2.1) realized <strong>in</strong> <strong>the</strong> n experiments <strong>and</strong> is <strong>the</strong>refore itselfr<strong>and</strong>om. In addition, <strong>the</strong>re exists a non-r<strong>and</strong>om (<strong>the</strong>oretical)distribution functionF(x) = P[ξ < x] = P[x i < x] (2.3)<strong>of</strong> each result <strong>of</strong> <strong>the</strong> experiment.Kolmogorov proved that at n → ∞ <strong>the</strong> magnitudeλ = sup n | F( x) − F ( x) |(2.4)nhas some st<strong>and</strong>ard distribution (<strong>the</strong> Kolmogorov distribution); <strong>the</strong>supremum is taken over <strong>the</strong> values <strong>of</strong> x. This result is valid under as<strong>in</strong>gle assumption that F(x) is cont<strong>in</strong>uous. Now not only <strong>the</strong>asymptotic distribution <strong>of</strong> (2.4) is known, but also its distributions at n= 2, 3, ...The practical sense <strong>of</strong> <strong>the</strong> empirical distribution function F n (x)consists, first <strong>of</strong> all, <strong>in</strong> that its graph vividly represents <strong>the</strong> samplevalues (2.1). In a certa<strong>in</strong> sense this function at sufficiently large values<strong>of</strong> n resembles <strong>the</strong> <strong>the</strong>oretical distribution function F(x). [...]There also exists ano<strong>the</strong>r method <strong>of</strong> representation <strong>of</strong> a samplecalled histogram [...] Given a large number <strong>of</strong> observations, itresembles <strong>the</strong> density <strong>of</strong> distribution <strong>of</strong> r<strong>and</strong>om variable ξ. However, itis only expressive (<strong>and</strong> almost <strong>in</strong>dependent from <strong>the</strong> choice <strong>of</strong> <strong>the</strong><strong>in</strong>tervals <strong>of</strong> group<strong>in</strong>g) for <strong>the</strong> number <strong>of</strong> observations <strong>of</strong> <strong>the</strong> order <strong>of</strong> atleast a few tens. The histogram is more commonly used, but <strong>in</strong> allcases I decidedly prefer to apply <strong>the</strong> empirical distribution function.The Kolmogorov criterion based on statistics λ, see (2.4), can beapplied for test<strong>in</strong>g <strong>the</strong> fit <strong>of</strong> <strong>the</strong> supposed <strong>the</strong>oretical law F(x) to <strong>the</strong>observational data (2.1) represented by function (2.2). However, that<strong>the</strong>oretical law ought to be precisely known. A common (but graduallybe<strong>in</strong>g ab<strong>and</strong>oned) mistake was <strong>the</strong> application <strong>of</strong> <strong>the</strong> Kolmogorovcriterion for test<strong>in</strong>g <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> <strong>the</strong> k<strong>in</strong>d The <strong>the</strong>oreticaldistribution function is normal. Indeed, <strong>the</strong> normal law is onlydeterm<strong>in</strong>ed to <strong>the</strong> choice <strong>of</strong> its parameters a (<strong>the</strong> mean) <strong>and</strong> σ (meansquare scatter). In <strong>the</strong> hypo<strong>the</strong>sis formulated just above <strong>the</strong>separameters are not mentioned; it is assumed that <strong>the</strong>y are determ<strong>in</strong>edby sample data, naturally through <strong>the</strong> estimators1x s x xn2 2; = ∑ (i− ) .n −1i=1Thus, <strong>in</strong>stead <strong>of</strong> statistic (2.4), <strong>the</strong> statisticx − xsup n | F0[ ] − Fn( x) |(2.5)sis meant. Here, F 0 is <strong>the</strong> st<strong>and</strong>ard normal law N(0, 1).Statistic (2.5) differs from (2.4) <strong>in</strong> that <strong>in</strong>stead <strong>of</strong> F(x) it <strong>in</strong>cludes F 0which depends on (2.1), x <strong>and</strong> s <strong>and</strong> is <strong>the</strong>refore r<strong>and</strong>om. Typical101

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!