12.07.2015 Views

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Studies</strong> <strong>in</strong> <strong>the</strong> <strong>History</strong> <strong>of</strong> <strong>Statistics</strong> <strong>and</strong> <strong>Probability</strong>.Collected Translationsvol. 2V. N. Tutubal<strong>in</strong>, Yu. I. AlimovOn Applied Ma<strong>the</strong>matical <strong>Statistics</strong>Compiled <strong>and</strong> translated by <strong>Oscar</strong> Sheyn<strong>in</strong>Internet: www.sheyn<strong>in</strong>.de©<strong>Oscar</strong> Sheyn<strong>in</strong>, 2011ISBN 978-3-942944-04-5Berl<strong>in</strong>, 20111


ContentsIntroduction by CompilerI. V. N. Tutubal<strong>in</strong>, Theory <strong>of</strong> probability <strong>in</strong> natural science, 1972II. V. N. Tutubal<strong>in</strong>, Treatment <strong>of</strong> observational series, 1973III. V. N. Tutubal<strong>in</strong>, The boundaries <strong>of</strong> applicability(Stochastic methods <strong>and</strong> <strong>the</strong>ir possibilities), 1977IV. Yu. I. Alimov, An alternative to <strong>the</strong> method <strong>of</strong> ma<strong>the</strong>maticalstatistics, 1980V. V. N. Tutubal<strong>in</strong>, Answer<strong>in</strong>g Alimov’s critical comments onapply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability, 1978VI. O. Sheyn<strong>in</strong>, On <strong>the</strong> Bernoulli law <strong>of</strong> large numbers2


Introduction by CompilerI am present<strong>in</strong>g translations <strong>of</strong> some contributions special <strong>in</strong> that<strong>the</strong>y were devoted to <strong>the</strong> practical aspect <strong>of</strong> applied statistics. In anycase, an acqua<strong>in</strong>tance with <strong>the</strong>m compels <strong>the</strong> reader to th<strong>in</strong>k aboutunexpected circumstances. I never met Yuri Ivanovich Alimov, butsome decades ago I had attended a short course <strong>of</strong> lectures at MoscowUniversity delivered by Valery Nikolaevich Tutubal<strong>in</strong>. I regret that hehad no desire to have a look at his previous work. He allowed me to<strong>in</strong>clude here (see below <strong>in</strong> translation) his letter to me expla<strong>in</strong><strong>in</strong>g hisreluctance.Tutubal<strong>in</strong> himself [v, beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong>] <strong>in</strong>dicated what prompted him tocompile his booklets [i – iii] <strong>and</strong>, as he reasonably supposed, alsoserved as a catalyst for Alimov [iv]: <strong>the</strong> amount <strong>of</strong> falsehoods arrivedat by apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability is too great to be tolerated. Hecited Grekova (1976) who had quoted scientific lore which stated thatpure ma<strong>the</strong>matics achieves <strong>the</strong> probable by proper methods <strong>and</strong>applied ma<strong>the</strong>matics achieves <strong>the</strong> necessary by possible means. Theproblem <strong>the</strong>refore reduces to verify<strong>in</strong>g those possible means, toascerta<strong>in</strong><strong>in</strong>g <strong>the</strong> conditions for those means to rema<strong>in</strong> possible.Tutubal<strong>in</strong> <strong>in</strong>tended his booklets for a ra<strong>the</strong>r broad circle <strong>of</strong> readerseven though he was discuss<strong>in</strong>g most serious subjects [ii]. But <strong>the</strong>n, <strong>in</strong><strong>the</strong> first place <strong>in</strong> [iii], his text <strong>in</strong>cluded hardly comprehensiblestatements <strong>and</strong> an unusual pronouncement on Bernoulli’s law <strong>of</strong> largenumbers which should be read toge<strong>the</strong>r with Alimov’s works.Two <strong>of</strong> Tutubal<strong>in</strong>’s statements <strong>in</strong> <strong>the</strong> same booklet (see my Notes17 <strong>and</strong> 18) were no doubt watered down to pass censorship; nowadays,<strong>the</strong>y should have been drastically altered.Two po<strong>in</strong>ts ought to be <strong>in</strong>dicated. First, concern<strong>in</strong>g <strong>the</strong> application<strong>of</strong> probability to adm<strong>in</strong>istration <strong>of</strong> justice see my Note 4 to booklet [i].Second, Tutubal<strong>in</strong> [i] overestimated Laplace’s <strong>in</strong>fluence with respectboth to <strong>the</strong>ory <strong>and</strong> general th<strong>in</strong>k<strong>in</strong>g. I th<strong>in</strong>k that Fourier (1829, pp. 375– 376) correctly described Laplace as a <strong>the</strong>oretician:We cannot affirm that it was his dest<strong>in</strong>y to create a science entirelynew [...]; to give to ma<strong>the</strong>matical doctr<strong>in</strong>es pr<strong>in</strong>ciples orig<strong>in</strong>al <strong>and</strong> <strong>of</strong>immense extent [...]; or, like Newton, [...] to extend to all <strong>the</strong> universe<strong>the</strong> terrestrial dynamics <strong>of</strong> Galileo; but Laplace was born to perfecteveryth<strong>in</strong>g, to exhaust everyth<strong>in</strong>g <strong>and</strong> to drive back every limit <strong>in</strong> orderto solve what might have appeared <strong>in</strong>capable <strong>of</strong> solution.Nei<strong>the</strong>r Boltzmann (who cited many scholars <strong>and</strong> philosophers), norPo<strong>in</strong>caré (who regrettably knew only Bertr<strong>and</strong>) referred to Laplaceeven once, <strong>and</strong> Maxwell only mentioned him twice <strong>in</strong> a very generalway.As to general th<strong>in</strong>k<strong>in</strong>g, Quetelet regrettably overshadowedLaplace’s Essai by his spectacular but poorly justified announcements<strong>and</strong> proposals later rejected by German statisticians along with <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability.Alimov’s booklet [iv] is written <strong>in</strong> bad general style. Witness hisorig<strong>in</strong>al first sentence (altered <strong>in</strong> translation): ... ma<strong>the</strong>maticians <strong>and</strong>3


those who applies it ... The booklet is <strong>in</strong>tended for a much betterqualified readership. He <strong>in</strong>dicates <strong>the</strong> weak po<strong>in</strong>ts <strong>of</strong> <strong>the</strong> attempts toapply probability <strong>the</strong>ory, but his positive recommendations are notsufficiently isolated from <strong>the</strong> context <strong>and</strong> <strong>the</strong> exposition is not at allconducive for easy read<strong>in</strong>g. I only translated parts <strong>of</strong> his booklet <strong>and</strong>described much <strong>in</strong> my own words.Alimov’s criticism <strong>of</strong> <strong>the</strong> usual practical aspect <strong>of</strong> appliedma<strong>the</strong>matical statistics is much more radical than Tutubal<strong>in</strong>’s, sufficeit to mention <strong>the</strong> title <strong>of</strong> his contribution [iv], <strong>and</strong> he also overenthusiasticallyrejected many chapters <strong>of</strong> that discipl<strong>in</strong>e.A special comment is warranted by <strong>the</strong> authors’ separation <strong>of</strong> twounderst<strong>and</strong><strong>in</strong>g <strong>of</strong> r<strong>and</strong>omness, its narrow ma<strong>the</strong>matical mean<strong>in</strong>g <strong>and</strong>its more general scientific underst<strong>and</strong><strong>in</strong>g. This latter is still important;its beg<strong>in</strong>n<strong>in</strong>g can be traced to Po<strong>in</strong>caré (1896/1912, p. 4) who<strong>in</strong>dicated that a very small cause can have a considerable effect whichwas his ma<strong>in</strong> explanation <strong>of</strong> r<strong>and</strong>omness. His idea (effectivelypronounced earlier by several scholars <strong>in</strong>clud<strong>in</strong>g Maxwell <strong>and</strong> even byAristotle) was greatly generalized <strong>in</strong> <strong>the</strong> studies <strong>of</strong> chaotic phenomenawhich began several decades ago. I provide an example illustrat<strong>in</strong>g amistake made by imag<strong>in</strong><strong>in</strong>g ma<strong>the</strong>matical r<strong>and</strong>omness <strong>in</strong>stead <strong>of</strong>r<strong>and</strong>omness <strong>in</strong> <strong>the</strong> general sense (or even simply <strong>in</strong>def<strong>in</strong>iteness).William Herschel (1817/1912, p. 579) formulated a statement about<strong>the</strong> size <strong>of</strong> <strong>the</strong> stars. Not know<strong>in</strong>g anyth<strong>in</strong>g about it or about <strong>the</strong>existence <strong>of</strong> different spectral classes, he presumed that a starr<strong>and</strong>omly chosen from more than 14 thous<strong>and</strong> stars <strong>of</strong> <strong>the</strong> first sevenmagnitudes, is not likely to differ much from a certa<strong>in</strong> mean size <strong>of</strong><strong>the</strong>m all. Actually, <strong>the</strong> size <strong>of</strong> <strong>the</strong> stars differ enormously <strong>and</strong> a meansize is only a purely abstract notion.Here now is Tutubal<strong>in</strong>’s explanation <strong>of</strong> February 2011.Philosophers <strong>of</strong> science had successfully proved that nei<strong>the</strong>r <strong>the</strong>orynor experiment were <strong>of</strong> any consequence <strong>in</strong> science <strong>and</strong> were notsuited for anyth<strong>in</strong>g. The only possible explanation is that scientificcognition, just like religious cognition, is a miracle <strong>and</strong> revelation. Iprovided a h<strong>in</strong>t <strong>of</strong> <strong>the</strong>ology <strong>of</strong> science <strong>in</strong> my paper <strong>in</strong> UspekhiFizicheskikh Nauk vol. 163, No. 7, 1993, pp. 93 – 109.If you will not colour <strong>the</strong>ologically your <strong>in</strong>vestigations, <strong>the</strong>y will notgive rise to such <strong>in</strong>terest as <strong>the</strong>y really deserve.Perhaps most extraord<strong>in</strong>ary events do happen (with an extremelylow probability). But suppose that a ma<strong>the</strong>matician had somehowdiv<strong>in</strong>ed <strong>the</strong> yet unknown Pythagorean proposition. Even <strong>the</strong>n he stillhas to justify it. At first, he can draw a right triangle, measure its sidesetc, <strong>the</strong>n rigorously consider his task.After read<strong>in</strong>g Tutubal<strong>in</strong>’s paper mentioned above, I am still unableto say anyth<strong>in</strong>g else on this subject, but I saw a significant statementon p. 98: for two hundred years no progress was made about <strong>the</strong>fundamental problem: when does statistical stability emerge?I have now found a highly relevant statement by Kolmogorov <strong>in</strong> <strong>the</strong>Russian translation <strong>of</strong> 1986 <strong>of</strong> his Logical foundations <strong>of</strong> probability4


(Lect. Notes Math., No. 1021, 1983, pp. 1 – 5): R<strong>and</strong>omness <strong>in</strong> <strong>the</strong>wide sense <strong>in</strong>dicates phenomena which do not exhibit regularities, donot necessarily obey any stochastic laws. It should be dist<strong>in</strong>guishedfrom stochastic r<strong>and</strong>omness, a subject <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability.BibliographyGrekova I. (1976 Russian), Peculiar methodological features <strong>of</strong> appliedma<strong>the</strong>matics on <strong>the</strong> current stage <strong>of</strong> its development. Voprosy Filos<strong>of</strong>ii, No. 6, pp.104 – 114.Fourier J. B. J. (1829), Historical Eloge <strong>of</strong> <strong>the</strong> Marquis De Laplace. London,End<strong>in</strong>b. <strong>and</strong> Dubl<strong>in</strong> Phil. Mag., ser. 2, vol. 6, pp. 370 – 381. The orig<strong>in</strong>al French textwas only published <strong>in</strong> 1831.Herschel W. (1817), Astronomical observations <strong>and</strong> experiments tend<strong>in</strong>g to<strong>in</strong>vestigate <strong>the</strong> local arrangement <strong>of</strong> celestial bodies <strong>in</strong> space. Scient. Papers, vol. 2.London, 1912, pp. 575 – 591. Repr<strong>in</strong>t <strong>of</strong> book: London, 2003.Po<strong>in</strong>caré H. (1896), Calcul des probabilités. Paris, 1912; repr<strong>in</strong>ted 1923.5


IV. N. Tutubal<strong>in</strong>Theory <strong>of</strong> <strong>Probability</strong> <strong>in</strong> Natural ScienceTeoria Veroiatnostei v Estestvoznanii. Moscow, 1972IntroductionEven from <strong>the</strong> time <strong>of</strong> Laplace, Gauss <strong>and</strong> Poisson <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability is us<strong>in</strong>g a complicated ma<strong>the</strong>matical arsenal. At present, itis apply<strong>in</strong>g practically <strong>the</strong> entire ma<strong>the</strong>matical analysis <strong>in</strong>clud<strong>in</strong>g <strong>the</strong><strong>the</strong>ory <strong>of</strong> partial differential equations <strong>and</strong> <strong>in</strong> addition, beg<strong>in</strong>n<strong>in</strong>g withKolmogorov’s classic (1933), measure <strong>the</strong>ory <strong>and</strong> functional analysis.Never<strong>the</strong>less, books on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability for a wide circle <strong>of</strong>readers usually beg<strong>in</strong> by stat<strong>in</strong>g that <strong>the</strong> fundamental problems <strong>of</strong>apply<strong>in</strong>g it are quite simple for a layman to underst<strong>and</strong>. That wasCournot’s (1843) op<strong>in</strong>ion, <strong>and</strong> we wish to repeat his statement righ<strong>the</strong>re.However, it could have been also stated that those problems aredifficult even for specialists s<strong>in</strong>ce scientifically <strong>the</strong>y are still not quiteclear. More precisely, when discuss<strong>in</strong>g fundamental stochasticproblems, a specialist fully master<strong>in</strong>g its ma<strong>the</strong>matical tools has noadvantage over a layman s<strong>in</strong>ce <strong>the</strong>y do not help here. In this case,important is an experience <strong>of</strong> concrete applications which for ama<strong>the</strong>matician is not easier (if not more difficult) to acquire than foran eng<strong>in</strong>eer or researcher engaged <strong>in</strong> direct applications.At present, ideas about <strong>the</strong> scope <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability tookshape a bit more perfectly than <strong>in</strong> <strong>the</strong> time <strong>of</strong> Laplace <strong>and</strong> Cournot. Webeg<strong>in</strong> by describ<strong>in</strong>g <strong>the</strong>m.1. Does Each Event Have <strong>Probability</strong>?1.1. The concept <strong>of</strong> statistical stability (<strong>of</strong> a statistical ensemble).Textbooks on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability, especially old ones, usuallystate that each r<strong>and</strong>om event has probability whereas a r<strong>and</strong>om event issuch that can ei<strong>the</strong>r occur or not. Several examples are <strong>of</strong>fered, such as<strong>the</strong> occurrence <strong>of</strong> heads <strong>in</strong> a co<strong>in</strong> toss or <strong>of</strong> ra<strong>in</strong> this even<strong>in</strong>g or asuccessful pass<strong>in</strong>g <strong>of</strong> an exam<strong>in</strong>ation by a student etc. As a result, <strong>the</strong>reader gets an impression that, if we do not know whe<strong>the</strong>r a givenevent happens or not, we may discuss its probability, <strong>and</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability thus becomes a science <strong>of</strong> sciences, or at least anabsolutely special science <strong>in</strong> which some substantial <strong>in</strong>ferences may bereached out <strong>of</strong> complete ignorance.Modern science naturally vigorously rejects that underst<strong>and</strong><strong>in</strong>g <strong>of</strong><strong>the</strong> concept <strong>of</strong> probability. In general, science prefers experimentswhose results are stable, i. e. such that <strong>the</strong> studied event <strong>in</strong>variablyoccurs or not. However, such complete stability <strong>of</strong> results is notalways achievable. Thus, accord<strong>in</strong>g to <strong>the</strong> views nowadays accepted <strong>in</strong>physics, it is impossible for experiments perta<strong>in</strong><strong>in</strong>g to quantummechanics. On <strong>the</strong> contrary, it can be considered established6


sufficiently securely that a careful <strong>and</strong> honest experimentalist can <strong>in</strong>many cases achieve statistical, if not complete stability <strong>of</strong> his results.As it is now thought, events, connected with such experiments, are<strong>in</strong>deed compris<strong>in</strong>g <strong>the</strong> scope <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. And so, <strong>the</strong>possibility <strong>of</strong> apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability is not, generallyspeak<strong>in</strong>g, presented for free, it is a prize for extensive <strong>and</strong> pa<strong>in</strong>stak<strong>in</strong>gtechnical <strong>and</strong> <strong>the</strong>oretic work on stabiliz<strong>in</strong>g <strong>the</strong> conditions, <strong>and</strong><strong>the</strong>refore <strong>the</strong> results, <strong>of</strong> an experiment. But what exactly is meant bystatistical stability for which, as just stated, we ought to strive? How todeterm<strong>in</strong>e whe<strong>the</strong>r we have already achieved that desired situation, orshould we still perfect someth<strong>in</strong>g?It should be recognized that nowadays we do not have an exhaustiveanswer. Mises (1928/1930) had formulated some pert<strong>in</strong>ent dem<strong>and</strong>s.Let µ A be <strong>the</strong> number <strong>of</strong> occurrences <strong>of</strong> event A <strong>in</strong> n experiments, <strong>the</strong>nµ A /n is called <strong>the</strong> frequency <strong>of</strong> A. The first dem<strong>and</strong> consisted <strong>in</strong> that <strong>the</strong>frequency ought to become near to some number P(A) which is called<strong>the</strong> probability <strong>of</strong> <strong>the</strong> event A <strong>and</strong> Mises wrote it down aslim µ A /n = P (A), n → ∞.In such a form that dem<strong>and</strong> can not be experimentally checked s<strong>in</strong>ce itis practically impossible to compel n to tend to <strong>in</strong>f<strong>in</strong>ity.The second dem<strong>and</strong> consisted <strong>in</strong> that, if we had agreed beforeh<strong>and</strong>that not all, but only a part <strong>of</strong> <strong>the</strong> trials will be considered (forexample, trials <strong>of</strong> even numbers), <strong>the</strong> frequency <strong>of</strong> A, calculatedaccord<strong>in</strong>gly, should be close to <strong>the</strong> same number P (A); it is certa<strong>in</strong>lypresumed that <strong>the</strong> number <strong>of</strong> trials is sufficiently large.Let us beg<strong>in</strong> with <strong>the</strong> merit <strong>of</strong> <strong>the</strong> Mises formulation. Properlyspeak<strong>in</strong>g, it consists <strong>in</strong> that some cases <strong>in</strong> which <strong>the</strong> application <strong>of</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability would have been mistaken, are excluded, <strong>and</strong>here <strong>the</strong> second dem<strong>and</strong> is especially typical; <strong>the</strong> first one is apparentlywell realized by all those apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability <strong>and</strong> nomistakes are occurr<strong>in</strong>g here.Consider, for example, is it possible to discuss <strong>the</strong> probability <strong>of</strong> anarticle manufactured by a certa<strong>in</strong> shop be<strong>in</strong>g defective 1 . One <strong>of</strong> <strong>the</strong>causes <strong>of</strong> defects can be <strong>the</strong> not quite satisfactory condition <strong>of</strong> a part <strong>of</strong>workers, especially after a festive occasion. Accord<strong>in</strong>g to <strong>the</strong> secondMises dem<strong>and</strong>, we ought to compare <strong>the</strong> frequency <strong>of</strong> defectivearticles manufactured dur<strong>in</strong>g Mondays <strong>and</strong> <strong>the</strong> o<strong>the</strong>r days <strong>of</strong> <strong>the</strong> week,<strong>and</strong> <strong>the</strong> same applies to <strong>the</strong> end <strong>of</strong> a quarter, or year due to <strong>the</strong> rushwork. If <strong>the</strong>se frequencies are noticeably different, it is useless todiscuss <strong>the</strong> probability <strong>of</strong> defective articles. F<strong>in</strong>ally, defective articlescan appear because <strong>of</strong> possible low quality <strong>of</strong> raw materials, deviationfrom accepted technology, etc.Thus, know<strong>in</strong>g next to noth<strong>in</strong>g about <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability, <strong>and</strong>only mak<strong>in</strong>g use <strong>of</strong> <strong>the</strong> Mises rules, we see that for apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>oryfor analyz<strong>in</strong>g <strong>the</strong> quality <strong>of</strong> manufactured articles it is necessary tocreate beforeh<strong>and</strong> sufficiently adjusted conditions. The <strong>the</strong>ory <strong>of</strong>probability is someth<strong>in</strong>g like butter for <strong>the</strong> porridge: first, you ought toprepare <strong>the</strong> porridge. However, it should be noted at once that <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability is <strong>of</strong>ten most advantageous not when it can be7


applied, but when, after attempt<strong>in</strong>g to make use <strong>of</strong> it, a lack <strong>of</strong>statistical homogeneity (which is <strong>the</strong> same as stability) is revealed.If <strong>the</strong> articles manufactured by a certa<strong>in</strong> shop may be considered asa statistically homogeneous totality, <strong>the</strong> serious question still is,whe<strong>the</strong>r <strong>the</strong> quality <strong>of</strong> those articles can be improved withoutfundamentally perfect<strong>in</strong>g technology. If, however, <strong>the</strong> quality isfluctuat<strong>in</strong>g (which should be stochastically established), <strong>the</strong>n <strong>the</strong>pert<strong>in</strong>ent cause can undoubtedly be revealed <strong>and</strong> <strong>the</strong> quality improved.The ma<strong>in</strong> shortcom<strong>in</strong>g <strong>of</strong> <strong>the</strong> Mises formulation is its <strong>in</strong>def<strong>in</strong>iteness.It is not stated how large should <strong>the</strong> number <strong>of</strong> experiments n be forensur<strong>in</strong>g <strong>the</strong> given beforeh<strong>and</strong> closeness <strong>of</strong> µ A /n to P (A). A quitesatisfactory answer can only be given (see below) after additionallypresum<strong>in</strong>g an <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> results <strong>of</strong> <strong>in</strong>dividual trials. Anexperimental check <strong>of</strong> <strong>in</strong>dependence is partially possible, but difficult<strong>and</strong> always, without exception!, <strong>in</strong>complete.But <strong>the</strong> situation with <strong>the</strong> Mises second dem<strong>and</strong> is much worse. Asformulated above, it is simply contradictory s<strong>in</strong>ce, <strong>in</strong>dicat<strong>in</strong>gbeforeh<strong>and</strong> some part <strong>of</strong> <strong>the</strong> n trials, we could have accidentallychosen those <strong>in</strong> which <strong>the</strong> event A had occurred (or not) <strong>and</strong> itsfrequency will be very different from <strong>the</strong> frequency calculated for all<strong>the</strong> trials. Mises certa<strong>in</strong>ly thought not about select<strong>in</strong>g any part <strong>of</strong> <strong>the</strong>trials, but ra<strong>the</strong>r <strong>of</strong> formulat<strong>in</strong>g a reasonable rule for achiev<strong>in</strong>g that.Such a rule should depend on our ideas about <strong>the</strong> possible ways <strong>of</strong>corrupt<strong>in</strong>g statistical homogeneity. Thus, fear<strong>in</strong>g <strong>the</strong> consequences <strong>of</strong> aSunday dr<strong>in</strong>k<strong>in</strong>g bout, we ought to isolate <strong>the</strong> part <strong>of</strong> <strong>the</strong> productionmanufactured on Mondays; wish<strong>in</strong>g to check <strong>the</strong> <strong>in</strong>dependence <strong>of</strong>event A from ano<strong>the</strong>r event B, we form two parts <strong>of</strong> <strong>the</strong> trials, one <strong>in</strong>which B occurred, <strong>the</strong> o<strong>the</strong>r one, when it failed. These reasonableconsiderations are difficult to apply <strong>in</strong> <strong>the</strong> general case, i. e., <strong>the</strong>y canhardly be formulated <strong>in</strong> <strong>the</strong> boundaries <strong>of</strong> a ma<strong>the</strong>matical <strong>the</strong>ory.We see that <strong>the</strong>re does not exist any ma<strong>the</strong>matically rigorousgeneral method for decid<strong>in</strong>g whe<strong>the</strong>r a given event has probability ornot. This certa<strong>in</strong>ly does not mean that <strong>in</strong> a particular case we can notbe completely sure that stochastic methods may be applied. Forexample, <strong>the</strong>re can not be even a slightest doubt <strong>in</strong> that <strong>the</strong> Brownianmotion can be stochastically described. Brownian motion is adisorderly motion <strong>of</strong> small particles suspended <strong>in</strong> a liquid <strong>and</strong> iscaused by <strong>the</strong> shocks <strong>of</strong> its mov<strong>in</strong>g molecules. Here, our certa<strong>in</strong>ty isjustified ra<strong>the</strong>r by general ideas about <strong>the</strong> k<strong>in</strong>etic molecular <strong>the</strong>orythan by experimental checks <strong>of</strong> statistical stability.In o<strong>the</strong>r cases, such as co<strong>in</strong> toss<strong>in</strong>g, we base our knowledge on <strong>the</strong>experience <strong>of</strong> a countless number <strong>of</strong> gamblers play<strong>in</strong>g heads or tails.Note, however, that many em<strong>in</strong>ent scientists did not th<strong>in</strong>k that <strong>the</strong>equal probability <strong>of</strong> ei<strong>the</strong>r outcome was evident. Mises, for example,declared that before experiment<strong>in</strong>g we did not know about it at all;anyway, <strong>the</strong>re is no unique method for decid<strong>in</strong>g about <strong>the</strong> existence <strong>of</strong>statistical stability, or, as <strong>the</strong> physicists say, <strong>of</strong> a statistical ensemble.The stochastic approach is <strong>the</strong>refore never ma<strong>the</strong>matically rigorous(provided that a statistical ensemble does exist) but, anyway, it is notless rigorous than <strong>the</strong> application <strong>of</strong> any o<strong>the</strong>r ma<strong>the</strong>matical method <strong>in</strong>natural science. For be<strong>in</strong>g conv<strong>in</strong>ced, it is sufficient to read § 1 (What8


is energy?) from chapter 4 <strong>of</strong> Feynman (1963). In an excellent stylebut, regrettably <strong>in</strong> a passage too long for be<strong>in</strong>g quoted, it is stated <strong>the</strong>rethat <strong>the</strong> law <strong>of</strong> conservation <strong>of</strong> energy can be corroborated <strong>in</strong> eachconcrete case by f<strong>in</strong>d<strong>in</strong>g out where did energy go, but that modernphysics has no general concept <strong>of</strong> energy. This does not prevent usfrom be<strong>in</strong>g so sure <strong>in</strong> that law that we make a laugh<strong>in</strong>g-stock <strong>of</strong>anyone tell<strong>in</strong>g us that <strong>in</strong> a certa<strong>in</strong> case <strong>the</strong> efficiency was greater than100%. Many conclusions derived by apply<strong>in</strong>g stochastic methods tosome statistical ensembles are not less certa<strong>in</strong> than <strong>the</strong> law <strong>of</strong>conservation <strong>of</strong> energy.The circumstances are quite different for apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability when <strong>the</strong>re certa<strong>in</strong>ly exists no statistical ensemble or itsexistence is doubtful. In such cases modern science generally denies<strong>the</strong> possibility <strong>of</strong> those applications, but temptation is <strong>of</strong>ten strong...Let us first consider <strong>the</strong> reason why.1.2. The restrictiveness <strong>of</strong> <strong>the</strong> concept <strong>of</strong> statistical ensemble(statistical homogeneity). The reason is that that concept is ra<strong>the</strong>rrestrictive. Consider <strong>the</strong> examples cited above: co<strong>in</strong> toss<strong>in</strong>g, pass<strong>in</strong>g anexam<strong>in</strong>ation, ra<strong>in</strong>fall. The existence <strong>of</strong> an ensemble is only doubtless<strong>in</strong> <strong>the</strong> first <strong>of</strong> those. The bus<strong>in</strong>ess is much worse <strong>in</strong> <strong>the</strong> o<strong>the</strong>r twoexamples. We may discuss <strong>the</strong> probability <strong>of</strong> a successful pass<strong>in</strong>g <strong>of</strong>an exam<strong>in</strong>ation by a r<strong>and</strong>omly chosen student (better, by that student<strong>in</strong> a r<strong>and</strong>omly chosen <strong>in</strong>stitute <strong>and</strong> discipl<strong>in</strong>e <strong>and</strong> exam<strong>in</strong>ed by ar<strong>and</strong>omly chosen <strong>in</strong>structor). R<strong>and</strong>omly chosen means chosen <strong>in</strong> anexperiment from a statistical ensemble <strong>of</strong> experiments. Here, however,that ensemble consists <strong>of</strong> exactly one non-reproducible experiment <strong>and</strong>we can not consider that probability.It is possible to discuss <strong>the</strong> probability <strong>of</strong> ra<strong>in</strong>fall dur<strong>in</strong>g a givenday, 11 May, say, <strong>of</strong> a r<strong>and</strong>omly chosen year, but not <strong>of</strong> its happen<strong>in</strong>g<strong>in</strong> <strong>the</strong> even<strong>in</strong>g today. In such a case, when consider<strong>in</strong>g that probability<strong>in</strong> <strong>the</strong> same morn<strong>in</strong>g, we ought to allow for all <strong>the</strong> wea<strong>the</strong>rcircumstances, <strong>and</strong> we certa<strong>in</strong>ly will not f<strong>in</strong>d any o<strong>the</strong>r day with <strong>the</strong>mbe<strong>in</strong>g exactly <strong>the</strong> same, for example, with <strong>the</strong> same synoptic chart, atleast dur<strong>in</strong>g <strong>the</strong> period when meteorological observations have beenmade.Many contributions on apply<strong>in</strong>g <strong>the</strong> concept <strong>of</strong> stochastic processhave appeared recently. It should describe ensembles <strong>of</strong> suchexperiments whose outcome is not an event, or even a measurement(that is, not a s<strong>in</strong>gle number), but a function, for example a path <strong>of</strong> aBrownian motion. We will not discuss <strong>the</strong> scope <strong>of</strong> that concept evenif <strong>the</strong> existence <strong>of</strong> a statistical ensemble is certa<strong>in</strong> but consider <strong>the</strong>opposite case. Or, we will cite two concrete problems.The first one concerns manufactur<strong>in</strong>g. We observe <strong>the</strong> value <strong>of</strong>some economic <strong>in</strong>dicator, labour productivity, say, dur<strong>in</strong>g a number <strong>of</strong>years (months, days) <strong>and</strong> wish to forecast its values. It is tempt<strong>in</strong>g toapply <strong>the</strong> <strong>the</strong>ory <strong>of</strong> forecast<strong>in</strong>g stochastic processes. However, ourexperiment only provides <strong>the</strong> observed values <strong>and</strong> is not <strong>in</strong> pr<strong>in</strong>ciplereproducible, <strong>and</strong> <strong>the</strong>re is no statistical ensemble.The o<strong>the</strong>r problem is geological. We measured <strong>the</strong> content <strong>of</strong> auseful component <strong>in</strong> some test po<strong>in</strong>ts <strong>of</strong> a deposit <strong>and</strong> wish todeterm<strong>in</strong>e its mean content, <strong>and</strong> thus <strong>the</strong> reserves if <strong>the</strong> configuration9


<strong>of</strong> <strong>the</strong> deposit is known. It is tempt<strong>in</strong>g to apply here <strong>the</strong> <strong>the</strong>ory <strong>of</strong>estimat<strong>in</strong>g <strong>the</strong> mean <strong>of</strong> a stochastic process, but here also it is unclearwhat should constitute <strong>the</strong> ensemble <strong>of</strong> realizations. If a newrealization is understood as similar values at po<strong>in</strong>ts chosen alongano<strong>the</strong>r l<strong>in</strong>e, it is unclear whe<strong>the</strong>r <strong>the</strong>y will possess <strong>the</strong> same statisticalproperties, <strong>and</strong> still less clear if data perta<strong>in</strong><strong>in</strong>g to o<strong>the</strong>r deposits arechosen.These examples are sufficiently important for underst<strong>and</strong><strong>in</strong>g <strong>the</strong>wish to create such stochastic methods which will not need ensembles.However, modern probability <strong>the</strong>ory has no such methods but onlyparticular means for sav<strong>in</strong>g <strong>the</strong> concept <strong>of</strong> statistical homogeneity <strong>and</strong>even <strong>the</strong>y are not at all universally applicable. So how should weregard <strong>the</strong> application <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability <strong>in</strong> such cases?1.3. Relations between medic<strong>in</strong>e <strong>and</strong> magic. The problem statedabove resembles that <strong>of</strong> <strong>the</strong> relations between medic<strong>in</strong>e <strong>and</strong> magicwhose idea I have borrowed from Feynman (1963) but am consider<strong>in</strong>git <strong>in</strong> more detail. Suppose we discuss <strong>the</strong> treatment <strong>of</strong> malaria, <strong>and</strong> <strong>the</strong>shaman knows that <strong>the</strong> Peruvian bark will help whereas shak<strong>in</strong>g asnake above <strong>the</strong> patient’s face is <strong>of</strong> no use. So he prescribes <strong>in</strong> essence<strong>the</strong> same treatment as a physician will. True, <strong>the</strong> doctor will givequ<strong>in</strong><strong>in</strong>e <strong>in</strong>stead <strong>of</strong> <strong>the</strong> bark, but this is not very important, <strong>and</strong>, whichis <strong>the</strong> ma<strong>in</strong> po<strong>in</strong>t, he knows <strong>the</strong> life cycle <strong>of</strong> <strong>the</strong> plasmodium <strong>and</strong> willcorrectly prescribe <strong>the</strong> duration <strong>of</strong> <strong>the</strong> treatment.The physician has <strong>the</strong>refore more chances <strong>of</strong> success, but <strong>the</strong> ma<strong>in</strong>difference between medic<strong>in</strong>e <strong>and</strong> magic consists <strong>in</strong> <strong>the</strong> attitudes <strong>of</strong> <strong>the</strong>doctor <strong>and</strong> <strong>the</strong> shaman <strong>in</strong> case <strong>of</strong> failure. The shaman will expla<strong>in</strong> it by<strong>the</strong> devil’s meddl<strong>in</strong>g <strong>and</strong> do noth<strong>in</strong>g more; <strong>the</strong> doctor, however, willlook for <strong>the</strong> real cause <strong>of</strong> failure <strong>and</strong> hope that such knowledge will atleast help o<strong>the</strong>r patients if not <strong>the</strong> first one who could have died. Thehistory <strong>of</strong> science is a history <strong>of</strong> ever more precise cognition <strong>of</strong> realitywhich is <strong>in</strong>deed restrict<strong>in</strong>g <strong>the</strong> arbitrary <strong>in</strong>tervention <strong>of</strong> <strong>the</strong> devil <strong>in</strong>whose face <strong>the</strong> shaman feels himself hopeless.However, we do not succeed <strong>in</strong> really banish<strong>in</strong>g <strong>the</strong> devil. Even <strong>in</strong>ma<strong>the</strong>matics he is able to <strong>in</strong>terfere which is manifested for example <strong>in</strong>contradictions; most troublesome are those perta<strong>in</strong><strong>in</strong>g to <strong>the</strong> set <strong>the</strong>ory.A gr<strong>and</strong> attempt to expel <strong>the</strong> devil from ma<strong>the</strong>matics connected with<strong>the</strong> names <strong>of</strong> Bertr<strong>and</strong> Russell, Hilbert, Gödel, <strong>and</strong> o<strong>the</strong>r first-ratema<strong>the</strong>maticians had been attempted <strong>in</strong> <strong>the</strong> first half <strong>of</strong> <strong>the</strong> 20 th century,<strong>and</strong> what did emerge?It occurred that along with <strong>the</strong> devil it would have been necessary tobanish some notions which we do not at all wish to be deprived <strong>of</strong>, forexample <strong>the</strong> idea <strong>of</strong> a number cont<strong>in</strong>uum. It is impossible, say (without<strong>of</strong>fer<strong>in</strong>g <strong>the</strong> devil a f<strong>in</strong>ger <strong>in</strong>stead <strong>of</strong> which he will snap <strong>of</strong>f your h<strong>and</strong>),to state that a function cont<strong>in</strong>uous on an <strong>in</strong>terval reaches its maximumvalue. Such excessively radical exorcism (constructive ma<strong>the</strong>maticalanalysis) was naturally not recognized; we have to tolerate <strong>the</strong> devil.True, for <strong>the</strong> ma<strong>the</strong>matical <strong>the</strong>ory <strong>of</strong> probability that devil isactually only an imp who <strong>in</strong>flicts no special harm. However, I recallthat once, desir<strong>in</strong>g to apply transf<strong>in</strong>ite <strong>in</strong>duction (a ma<strong>the</strong>matical trick<strong>in</strong>volv<strong>in</strong>g someth<strong>in</strong>g devilish) for prov<strong>in</strong>g a <strong>the</strong>orem, I discoveredmuch to my relief that <strong>the</strong> process <strong>of</strong> <strong>in</strong>duction did not actually10


dem<strong>and</strong> to apply transf<strong>in</strong>ite numbers but was ra<strong>the</strong>r reduced to usualma<strong>the</strong>matical <strong>in</strong>duction.In <strong>the</strong> applied <strong>the</strong>ory <strong>of</strong> probability <strong>the</strong> harmless imp turns out asharp horned devil who favours to corrupt meanly statisticalhomogeneity. So far as we keep to <strong>the</strong> concept <strong>of</strong> ensemble <strong>and</strong> checkthat homogeneity by available methods, we are able at least to reveal<strong>in</strong> time <strong>the</strong> devilish dirty trick whereas, ab<strong>and</strong>on<strong>in</strong>g it, we whollysurrender ourselves to <strong>the</strong> devil’s rule <strong>and</strong> ought to be prepared forsurprises. Thus, from <strong>the</strong> po<strong>in</strong>t <strong>of</strong> view <strong>of</strong> modern probability <strong>the</strong>ory,<strong>the</strong> boundary between science <strong>and</strong> magic is def<strong>in</strong>ed by <strong>the</strong> notion <strong>of</strong>statistical ensemble. It follows that <strong>in</strong>ferences, derived by apply<strong>in</strong>gthat <strong>the</strong>ory when a statistical ensemble <strong>of</strong> experiments is lack<strong>in</strong>g, hasno scientific certa<strong>in</strong>ty.Unlike <strong>the</strong> arsenal <strong>of</strong> magic, <strong>the</strong> tools <strong>of</strong> science must be entirelyjustified. However, when conclud<strong>in</strong>g that, for example, <strong>the</strong> error <strong>of</strong> aresult obta<strong>in</strong>ed from a s<strong>in</strong>gle realization <strong>of</strong> a stochastic process issituated <strong>in</strong> <strong>the</strong> given <strong>in</strong>terval with probability 0.95, we do not know towhat does that probability correspond, − to an ensemble <strong>of</strong> realizationswhich we ought to conjuncture by issu<strong>in</strong>g from <strong>the</strong> s<strong>in</strong>gle observedrealization so as to apply <strong>the</strong> notion <strong>of</strong> stochastic process?But all those o<strong>the</strong>r realizations are irrelevant <strong>and</strong> it is very easy toprovide examples <strong>of</strong> faulty <strong>in</strong>ferences made when apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory<strong>of</strong> probability <strong>in</strong> manufactur<strong>in</strong>g, geology, etc where it is senseless todiscuss statistical ensembles. Historically, science emerged frommagic but treats it disda<strong>in</strong>fully <strong>and</strong> would wish to ignore it. However,we should not wholly yield to that temptation ei<strong>the</strong>r.A representative <strong>of</strong> <strong>the</strong> constructive direction <strong>in</strong> ma<strong>the</strong>maticsconsiders <strong>the</strong> usual ma<strong>the</strong>matical analysis a magic. We should ra<strong>the</strong>rdist<strong>in</strong>guish between white <strong>and</strong> black magic <strong>the</strong> latter connected withbe<strong>in</strong>g subjectively unconscionable. At present, we can not ignorehonest attempts to apply probability <strong>the</strong>ory when statistical ensemblesare lack<strong>in</strong>g. I venture to forecast that someth<strong>in</strong>g be<strong>in</strong>g magic todaywill become science tomorrow. It would have been unreasonable tokeep too strongly to <strong>the</strong> established concept <strong>of</strong> statistical homogeneity.However, here I will entirely hold on to that concept s<strong>in</strong>ce nowadaysany o<strong>the</strong>r method <strong>of</strong> obta<strong>in</strong><strong>in</strong>g really plausible results is lack<strong>in</strong>g.1.4. Summary. Thus, while perfection <strong>of</strong> experiment<strong>in</strong>g is go<strong>in</strong>g on<strong>in</strong> one or ano<strong>the</strong>r branch <strong>of</strong> science or technology, a special situation<strong>of</strong>ten arises when statistical stability is present but complete stability<strong>of</strong> <strong>the</strong> results is impossible to achieve. The former is characterized bystability <strong>of</strong> <strong>the</strong> frequencies <strong>of</strong> <strong>the</strong> occurrences <strong>of</strong> <strong>the</strong> various eventsconnected with <strong>the</strong> experiment’s outcome.An exhaust<strong>in</strong>g check <strong>of</strong> such stability (statistical homogeneity,statistical ensemble) is impossible, but <strong>in</strong> many cases <strong>the</strong> presence <strong>of</strong> astatistical ensemble is sufficiently certa<strong>in</strong>. Accord<strong>in</strong>g to modern ideas,<strong>the</strong>se cases <strong>in</strong>deed comprise <strong>the</strong> field <strong>of</strong> scientific applications <strong>of</strong> <strong>the</strong>probability <strong>the</strong>ory.And still <strong>the</strong>re exists a readily understood wish to apply it also <strong>in</strong>o<strong>the</strong>r cases <strong>in</strong> which <strong>the</strong> results <strong>of</strong> <strong>the</strong> experiments are not def<strong>in</strong>ite, but<strong>the</strong> existence <strong>of</strong> a statistically homogeneous ensemble is impossible.For <strong>the</strong> time be<strong>in</strong>g, such applications belong to magic ra<strong>the</strong>r than11


science, but, provided subjective honesty, <strong>the</strong>y can not be ignored. Infuture it will perhaps be possible to make <strong>the</strong>m scientific. As testifiedby <strong>the</strong> entire history <strong>of</strong> science, its orig<strong>in</strong> had occurred by issu<strong>in</strong>g fromfactual material collected while practis<strong>in</strong>g magic.2. The Foundations <strong>of</strong> <strong>the</strong> Ma<strong>the</strong>matical Arsenal<strong>of</strong> <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>Modern probability is sharply divided <strong>in</strong>to ma<strong>the</strong>matical <strong>and</strong> appliedparts. Ma<strong>the</strong>matical statistics adjo<strong>in</strong>s <strong>the</strong> former whereas <strong>the</strong> latter isclosely connected with <strong>the</strong> so-called applied statistics. An attempt todef<strong>in</strong>e those sciences would have led us <strong>in</strong>to such scholastic jungle,that, terror-stricken, we ab<strong>and</strong>on this thought. Here, we wish to adoptsome <strong>in</strong>termediate st<strong>and</strong>, <strong>and</strong> we beg<strong>in</strong> with <strong>the</strong> ma<strong>the</strong>matical <strong>the</strong>ory<strong>of</strong> probability.It busies itself with study<strong>in</strong>g <strong>the</strong> conclusions <strong>of</strong> <strong>the</strong> Kolmogorovaxiomatics (1933) <strong>and</strong> has essentially advanced <strong>in</strong> develop<strong>in</strong>g purelyma<strong>the</strong>matical methods. However, it wholly leaves aside <strong>the</strong> question<strong>of</strong> which phenomena <strong>of</strong> <strong>the</strong> real world does <strong>the</strong> axiomatic modelcorrespond to well enough, or somewhat worse, or not at all,respectively. It is possible to adduce really far-fetched examples <strong>of</strong>mistakes made by ma<strong>the</strong>maticians lack<strong>in</strong>g sufficient experience <strong>and</strong>practical <strong>in</strong>tuition when attempt<strong>in</strong>g to work <strong>in</strong> applications.However, <strong>the</strong> axiomatic model is suitable for develop<strong>in</strong>g <strong>the</strong>ma<strong>the</strong>matical arsenal. There, <strong>the</strong> generally known stochastic concepts<strong>and</strong> <strong>the</strong>orems simply become particular cases <strong>of</strong> <strong>the</strong> correspond<strong>in</strong>gconcepts <strong>and</strong> <strong>the</strong>orems <strong>of</strong> ma<strong>the</strong>matical analysis. In this chapter, wewill <strong>in</strong>deed describe <strong>the</strong> pert<strong>in</strong>ent subject. The follow<strong>in</strong>g chapters aredevoted to <strong>the</strong> substantial stochastic <strong>the</strong>orems.The reader ought to bear <strong>in</strong> m<strong>in</strong>d that this booklet is not a textbook,<strong>and</strong> that here <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability is <strong>the</strong>refore dealt with briefly<strong>and</strong> sometimes summarily. Its knowledge is not formally required, <strong>and</strong>all <strong>the</strong> concepts necessary for underst<strong>and</strong><strong>in</strong>g <strong>the</strong> follow<strong>in</strong>g chapters aredef<strong>in</strong>ed, but examples are not sufficiently numerous. Without <strong>the</strong>m, itis impossible to learn how to apply <strong>the</strong> axiomatic model, <strong>and</strong> it wouldbe better if <strong>the</strong> reader is, or <strong>in</strong>tends to be acqua<strong>in</strong>ted with <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability by means <strong>of</strong> any textbook even if it does not keep to <strong>the</strong>axiomatic approach. From modern textbooks, we especially advise pt 1[vol. 1] <strong>of</strong> Feller (1950).2.1. Discrete space <strong>of</strong> elementary events. In <strong>the</strong> simplest case quitesufficient for solv<strong>in</strong>g many problems <strong>the</strong> entire <strong>the</strong>ory <strong>of</strong> probabilityconsists <strong>of</strong> one notion, one axiom <strong>and</strong> one def<strong>in</strong>ition. Here <strong>the</strong>y are.The concept <strong>of</strong> stochastic space. A stochastic space Ω is any f<strong>in</strong>iteor countable set correspond<strong>in</strong>g to whose elements ω 1 , ω 2 , ..., ω n , ...non-negative numbers P(ω i ) ≥ 0 called <strong>the</strong>ir probabilities are attached.Set means here <strong>the</strong> same as totality, that is, someth<strong>in</strong>g consist<strong>in</strong>g <strong>of</strong>separate elements. A set is called countable if its elements can benumbered 1, 2, ..., n, ...We will <strong>in</strong>troduce <strong>the</strong> notationΩ = {ω 1 , ω 2 , ..., ω n , ...}, or Ω = {ω i :i = 1, 2, ..., n, ...}12


for stat<strong>in</strong>g that Ω consists <strong>of</strong> elements ω 1 , ω 2 , ..., ω n , ... Elements ω i arealso called elementary events or outcomes.Axiom. The sum <strong>of</strong> <strong>the</strong> probabilities <strong>of</strong> all <strong>the</strong> elementary events is1:P(ω 1 ) + ... + P(ω n ) + ... =∞∑ ∑P(ω ) = P(ω ) = 1.ii= 1 ω ⊆ ΩiiDef<strong>in</strong>ition. An event is any subset (part <strong>of</strong> set) <strong>of</strong> <strong>the</strong> set <strong>of</strong>elementary events; <strong>the</strong> probability <strong>of</strong> an event is <strong>the</strong> sum <strong>of</strong>probabilities <strong>of</strong> its elementary events. That set A is a subset <strong>of</strong> set Ω (i.e., that A consists <strong>of</strong> some elements <strong>in</strong>cluded <strong>in</strong> Ω) is written asA ⊆ Ω . The probability <strong>of</strong> event A is denoted by P(A) <strong>and</strong> <strong>the</strong>def<strong>in</strong>ition is written down asP(A) =∑ω ⊆ AiP(ω ).iThe explanation below <strong>the</strong> symbol <strong>of</strong> summ<strong>in</strong>g means that those <strong>and</strong>only those P(ω i ) are summed which are <strong>in</strong>cluded <strong>in</strong> A.The described ma<strong>the</strong>matical model can be applied for very manystochastic problems. However, all <strong>of</strong> <strong>the</strong>m are <strong>in</strong>itially formulated not<strong>in</strong> <strong>the</strong> term<strong>in</strong>ology <strong>of</strong> <strong>the</strong> space <strong>of</strong> elementary events, i. e., not <strong>in</strong> <strong>the</strong>axiomatic language but <strong>in</strong> ord<strong>in</strong>ary terms. This [?] is unavoidablebecause only by consider<strong>in</strong>g problems any student <strong>of</strong> probabilitybecomes acqua<strong>in</strong>ted with those concrete situations <strong>in</strong> which it isapplicable. It is impossible to describe such situations <strong>in</strong> <strong>the</strong> axiomaticlanguage <strong>and</strong> it is <strong>the</strong>refore necessary to learn how to translate <strong>the</strong>conditions <strong>of</strong> problems <strong>in</strong>to <strong>the</strong> language <strong>of</strong> elementary events.The situation here is quite similar to that which school studentsencounter when solv<strong>in</strong>g problems <strong>in</strong> compil<strong>in</strong>g systems <strong>of</strong> equations:<strong>the</strong>re, a translation from one language <strong>in</strong>to ano<strong>the</strong>r one is also needed.Such translations can be ei<strong>the</strong>r very easy or difficult or ambiguouswith differ<strong>in</strong>g systems <strong>of</strong> equations appear<strong>in</strong>g <strong>in</strong> <strong>the</strong> same problem. Inthis last-mentioned case, one such system can be difficult to compilebut easy to solve with <strong>the</strong> alternative system be<strong>in</strong>g opposite <strong>in</strong> thatsense (easy <strong>and</strong> difficult respectively).We stress <strong>the</strong>refore that, <strong>in</strong>troduc<strong>in</strong>g a space <strong>of</strong> elementary eventscorrespond<strong>in</strong>g to a given problem, is not a purely ma<strong>the</strong>maticaloperation as a pro<strong>of</strong> <strong>of</strong> a <strong>the</strong>orem, but <strong>in</strong>deed a translation from onelanguage <strong>in</strong>to ano<strong>the</strong>r one, <strong>and</strong> it is senseless to strive for such a rigouras adopted <strong>in</strong> ma<strong>the</strong>matics. Clear-cut ma<strong>the</strong>matical formulations arenow concluded here <strong>and</strong> we are turn<strong>in</strong>g to <strong>the</strong> rules <strong>of</strong> translation.Stochastic problems usually have to do with some experiments, with<strong>the</strong> set Ω consist<strong>in</strong>g <strong>of</strong> all its possible outcomes. Thus, <strong>in</strong> co<strong>in</strong> toss<strong>in</strong>gΩ consists <strong>of</strong> two elementary outcomesΩ = {heads; tails}<strong>and</strong> <strong>in</strong> throw<strong>in</strong>g a die <strong>the</strong>re are six such outcomes13


Ω = {1, 2, 3, 4, 5, 6}.For <strong>the</strong> case <strong>of</strong> two dice Ω consists <strong>of</strong> all pairs (m, n), show<strong>in</strong>g <strong>the</strong>numbers <strong>of</strong> po<strong>in</strong>ts on <strong>the</strong>m:Ω = [(m, n): m = 1, ..., 6; n = 1, ..., 6].An event A can here be, say, an even sum <strong>of</strong> <strong>the</strong> po<strong>in</strong>ts:A = [(m, n): m = 1, ..., 6; n = 1, ..., 6; m + n is even}.In general, descriptions <strong>of</strong> <strong>the</strong> set <strong>of</strong> elementary outcomes areusually easily made, but <strong>the</strong> situation is quite different whendeterm<strong>in</strong><strong>in</strong>g <strong>the</strong> probabilities P(ω i ) <strong>of</strong> separate elementary events given<strong>the</strong> conditions <strong>of</strong> a problem. Accord<strong>in</strong>g to <strong>the</strong> frequentist concept <strong>of</strong>probability it will be necessary to make a large number <strong>of</strong> experiments<strong>and</strong> assume <strong>the</strong> frequencies <strong>of</strong> <strong>the</strong> occurrence <strong>of</strong> <strong>the</strong> elementaryoutcomes ω i as <strong>the</strong> approximate values <strong>of</strong> P (ω i ). This, however, is notalways possible; actually, such determ<strong>in</strong>ations <strong>of</strong> probabilities arecomplicated so that a large part is played by cases <strong>in</strong> whichprobabilities can be determ<strong>in</strong>ed by some speculations withoutexperiment<strong>in</strong>g.For example, <strong>the</strong> set Ω ra<strong>the</strong>r <strong>of</strong>ten consists <strong>of</strong> a f<strong>in</strong>ite number N <strong>of</strong>elements whose probabilities appear undoubtedly equal to one ano<strong>the</strong>r.Accord<strong>in</strong>g to <strong>the</strong> axiom, <strong>the</strong> probability <strong>of</strong> each elementary event will<strong>the</strong>n be 1/N, <strong>and</strong> if A consists <strong>of</strong> M elementary events,MP( A) = ∑ P(ω i) = .(2.1)Nω ⊆ AiIn words: <strong>the</strong> probability <strong>of</strong> an event is equal to <strong>the</strong> ratio <strong>of</strong> <strong>the</strong>number <strong>of</strong> favourable outcomes (outcomes <strong>in</strong>cluded <strong>in</strong> <strong>the</strong> event) to<strong>the</strong> number <strong>of</strong> all possible outcomes. When formula (2.1) is applicable,we are discuss<strong>in</strong>g a problem <strong>in</strong> classical probability.Accord<strong>in</strong>g to modern <strong>in</strong>terpretation, formula (2.1) is not a def<strong>in</strong>ition<strong>of</strong> probability, it is only applicable when all <strong>the</strong> elementary events areequally probable. And when does this happen? is a ra<strong>the</strong>r subtlequestion. For example, long experiments with dice <strong>in</strong>dicate that <strong>the</strong>irvarious faces are not generally equally probable; it is difficult tomanufacture a perfectly symmetric dice. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, specialmeasures undertaken when draw<strong>in</strong>g lottery tickets by chance ensureequal probability <strong>of</strong> w<strong>in</strong>n<strong>in</strong>g for each.To illustrate <strong>the</strong> possibilities <strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical model we willconsider <strong>the</strong> cast<strong>in</strong>g <strong>of</strong> lots assum<strong>in</strong>g that such measures weresufficient for ensur<strong>in</strong>g <strong>the</strong> application <strong>of</strong> <strong>the</strong> concept <strong>of</strong> classicalprobability. When distribut<strong>in</strong>g apartments <strong>in</strong> a house be<strong>in</strong>g built by acooperative, <strong>the</strong> cast<strong>in</strong>g <strong>of</strong> lots is sometimes achieved <strong>in</strong> two stages. Atfirst, lot only decides <strong>the</strong> order <strong>of</strong> draw<strong>in</strong>g lots by <strong>the</strong> members at <strong>the</strong>second stage, when <strong>the</strong> actual distribution by chance follows. Is suchprocedure consist<strong>in</strong>g <strong>of</strong> two stages necessary? Or, who has more14


chances to draw a more suitable apartment, <strong>the</strong> first or <strong>the</strong> last <strong>in</strong> <strong>the</strong>order <strong>of</strong> <strong>the</strong> f<strong>in</strong>al draw<strong>in</strong>gs?Suppose <strong>the</strong>re are N apartments, numbers 1, 2, .., n <strong>of</strong> <strong>the</strong>m worse,<strong>and</strong> <strong>the</strong> rest numbers, n + 1, ..., N, better. Determ<strong>in</strong>e <strong>the</strong> probabilitythat <strong>the</strong> member <strong>of</strong> <strong>the</strong> cooperative k-th <strong>in</strong> <strong>the</strong> order <strong>of</strong> draw<strong>in</strong>g willdraw a worse apartment. The experiment, or draw<strong>in</strong>g <strong>the</strong> N tickets hasoutcomesi 1 , i 2 , ..., i N , all i k are different.Here, i 1 is <strong>the</strong> number <strong>of</strong> <strong>the</strong> apartment drawn by <strong>the</strong> first person, i 2 ,same by <strong>the</strong> second person, etc. The total number <strong>of</strong> all <strong>the</strong>possibilities isN( N −1)...2 ⋅ 1 = N !.If <strong>the</strong> tickets are thoroughly shuffled, all <strong>the</strong> elementary events shouldbe equally probable <strong>and</strong> we will have a problem <strong>in</strong> classicalprobability. Let A k be <strong>the</strong> event <strong>of</strong> <strong>the</strong> k-th member <strong>of</strong> <strong>the</strong> cooperativeto draw a worse apartment. In o<strong>the</strong>r words, A k consists <strong>of</strong> suchelementary events i 1 , i 2 , ..., i N , that i k takes one <strong>of</strong> <strong>the</strong> values 1, 2, ..., nwith i 1 , ..., i k−1 , i k+1 , ..., i N be<strong>in</strong>g arbitrary. Let us count <strong>the</strong> number <strong>of</strong>those elementary events.For i k <strong>the</strong>re are n possibilities;for i 1 , i 2 , ..., i k−1 , <strong>the</strong>re are N – 1, N – 2, ..., N – k + 1 possibilities;for i k+1 , ..., i N , <strong>the</strong>re are (N – k), ..., N − (N – 1) = 1 possibilities.Multiply<strong>in</strong>g all <strong>the</strong> possibilities we see that event A k consists <strong>of</strong>(N – 1)!n = (N!/N)n elementary events so that( N !/ N)n nP( Ak) = =N ! N<strong>and</strong> does not depend on k.In o<strong>the</strong>r words, <strong>the</strong> probability <strong>of</strong> choos<strong>in</strong>g a worse apartment cannot depend on <strong>the</strong> order <strong>of</strong> draw<strong>in</strong>g, so that <strong>the</strong> first draw<strong>in</strong>gs aresuperfluous. However, we assumed that <strong>the</strong> tickets were thoroughlyshuffled; o<strong>the</strong>rwise <strong>the</strong> chances <strong>of</strong> <strong>the</strong> members <strong>of</strong> <strong>the</strong> cooperative arenot identical <strong>and</strong> <strong>the</strong> first draw<strong>in</strong>gs will essentially equalize <strong>the</strong>chances. It is regrettably unknown how exactly should <strong>the</strong> tickets beshuffled for ensur<strong>in</strong>g equal chances whereas <strong>the</strong> method <strong>of</strong> shuffl<strong>in</strong>gadopted for draw<strong>in</strong>g lottery tickets is too tiresome. It follows thatdraw<strong>in</strong>gs <strong>of</strong> lots <strong>in</strong> two stages can not be held absolutely superfluous 2 .2.2. Conditional probability. The reader acqua<strong>in</strong>ted with urnstochastic models had undoubtedly noted that <strong>the</strong> model <strong>of</strong> <strong>the</strong> space<strong>of</strong> elementary events is quite isomorphic to <strong>the</strong> model <strong>of</strong> extract<strong>in</strong>gballs from an urn <strong>and</strong> only differs <strong>in</strong> that different elementary eventscan now have differ<strong>in</strong>g probabilities <strong>and</strong> <strong>the</strong> number <strong>of</strong> <strong>the</strong>se eventscan be <strong>in</strong>f<strong>in</strong>ite. Indeed, <strong>the</strong> real part played by <strong>the</strong> Kolmogorovaxiomatics only becomes clear when consider<strong>in</strong>g uncountable spaces<strong>of</strong> elementary events, but even <strong>in</strong> <strong>the</strong> simplest case (f<strong>in</strong>ite or countable15


number <strong>of</strong> events) <strong>the</strong> advantage <strong>of</strong> <strong>the</strong> axiomatic approach is that itdist<strong>in</strong>ctly separates <strong>the</strong> solution <strong>of</strong> stochastic problems <strong>in</strong>to two parts:1. Choice <strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical model <strong>of</strong> <strong>the</strong> phenomenon orexperiment.2. Calculation with<strong>in</strong> its limits.We are thus follow<strong>in</strong>g Descartes’ advice: separate each problem<strong>in</strong>to so many parts that <strong>the</strong>y become solvable. The first part, that is, <strong>the</strong>choice <strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical model, is undoubtedly more difficult, <strong>and</strong><strong>the</strong> difficulty, as stated above, lies <strong>in</strong> determ<strong>in</strong><strong>in</strong>g <strong>the</strong> probabilities <strong>of</strong><strong>the</strong> elementary events. A formulation <strong>of</strong> more or less general rules forovercom<strong>in</strong>g this difficulty dem<strong>and</strong>s an <strong>in</strong>troduction <strong>of</strong> some newconcepts. We have considered <strong>the</strong> concept <strong>of</strong> classical probability;ano<strong>the</strong>r useful concept is that <strong>of</strong> conditional probability, but it isexpedient to beg<strong>in</strong> by consider<strong>in</strong>g usual operations on events. In <strong>the</strong>set-<strong>the</strong>oretic context now adopted <strong>the</strong>se operations co<strong>in</strong>cide with those<strong>in</strong> <strong>the</strong> set <strong>the</strong>ory.A sum (unification) <strong>of</strong> events. A sum A∪ B <strong>of</strong> events A <strong>and</strong> B is anevent consist<strong>in</strong>g <strong>of</strong> those elementary events that enter <strong>in</strong>to A or B (orboth).A product (<strong>in</strong>tersection) <strong>of</strong> events. A product AB <strong>of</strong> events A <strong>and</strong> Bis an event consist<strong>in</strong>g <strong>of</strong> those elementary events that enter both A <strong>and</strong>B.A complementary (contrary) event A <strong>of</strong> event A is an eventcomposed <strong>of</strong> those elements that do not enter event A.If an experiment concludes by one <strong>of</strong> those elementary outcomeswhich enter some event C, we say that event C had occurred. Thus, <strong>the</strong>sum <strong>of</strong> events A <strong>and</strong> B occurs if at least one <strong>of</strong> those events hasoccurred. The product AB occurs if both events A <strong>and</strong> B has occurred.Complement A <strong>of</strong> event A occurs if A has not occurred.Ma<strong>the</strong>matically, conditional probability P(A/B) that A occurs if Bhas occurred is determ<strong>in</strong>ed by <strong>the</strong> equalityP( AB)P( A / B) = , P( B) ≠ 0.P( B)It follows that P(AB) = P(B) P(A/B).The part played by <strong>the</strong> concept <strong>of</strong> conditional probability is revealedby its frequentist <strong>in</strong>terpretation. Consider n experiments with events A<strong>and</strong> B occurr<strong>in</strong>g or not <strong>in</strong> each <strong>and</strong> let µ A , µ B , <strong>and</strong> µ AB be <strong>the</strong> number <strong>of</strong>occurrences <strong>of</strong> events A, B, <strong>and</strong> AB. It is evident that µ AB is also <strong>the</strong>number <strong>of</strong> occurrences <strong>of</strong> event A <strong>in</strong> those experiments, <strong>and</strong> <strong>the</strong> ratioµ AB /µ B −<strong>the</strong> conditional frequency <strong>of</strong> event A if event B has occurred.ThenµABµ / ( )( / )µ = ABn P ABP A Bµ / n≈ P( B)=BB<strong>and</strong> <strong>the</strong> conditional probability is <strong>in</strong>terpreted as <strong>the</strong> conditionalfrequency.16


Let <strong>the</strong> space <strong>of</strong> elementary events Ω be separated <strong>in</strong>to parts B 1 , B 2 ,..., B n , so that Ω = B1 ∪ B2 ∪... ∪ Bn<strong>and</strong> no two sets B i <strong>and</strong> B j havecommon elements. Then, for any A ⊆ Ω we will haveA = AB1 ∪ AB2 ∪... ∪ ABnwhich means that <strong>the</strong> elementary events <strong>in</strong>cluded <strong>in</strong> A are separated<strong>in</strong>to those enter<strong>in</strong>g B 1 , B 2 , ..., B n <strong>and</strong> obviouslyP(A) = P(AB 1 ) + P(AB 2 ) + ... + P(AB n ).This follows from <strong>the</strong> def<strong>in</strong>ition <strong>of</strong> P(A), see § 2.1.By def<strong>in</strong>ition <strong>of</strong> conditional probabilitynP( A) = P( AB ) = P( B ) P( AB )n∑ ∑ (2.2)i i ii= 1 i=1which is <strong>the</strong> formula <strong>of</strong> complete probability.There is ano<strong>the</strong>r, <strong>the</strong> so-called Bayes formulaP( ABi ) P( Bi ) P( A / Bi)P( Bi/ A) = =.nP( A)P( B ) P( A / B )∑i=1ii(2.3)We have derived formulas (2.2) <strong>and</strong> (2.3) by issu<strong>in</strong>g from <strong>the</strong>def<strong>in</strong>ition <strong>of</strong> conditional probability <strong>and</strong> apply<strong>in</strong>g really trivialtransformations. They can not <strong>the</strong>refore be called substantialma<strong>the</strong>matical <strong>the</strong>orems, but <strong>the</strong>y never<strong>the</strong>less play an important part.Let us first consider <strong>the</strong> application <strong>of</strong> formula (2.2). Suppose, for<strong>the</strong> sake <strong>of</strong> def<strong>in</strong>iteness, that event A means that some article isdefective <strong>and</strong> assume also that that event is not by itself statisticallystable; more def<strong>in</strong>itely, that <strong>the</strong>re are mutually exclusive conditions <strong>of</strong>manufactur<strong>in</strong>g B 1 , B 2 , ..., B n such that given B i , it is possible toconsider P(A/B i ) so that statistical stability is present.Suppose now that all <strong>the</strong> products manufactured under thoseconditions are stored without be<strong>in</strong>g sorted out but that <strong>the</strong>ir sharecorrespond<strong>in</strong>g to condition B i is given <strong>and</strong> equal to P(B i ). Considernow an experiment <strong>in</strong> which one article is chosen at r<strong>and</strong>om <strong>and</strong>checked. Two outcomes are possible: A (defective) <strong>and</strong> A (qualitysufficient). Its r<strong>and</strong>om extraction means that such an experiment isstatistically stable, P(A) is expressed by formula (2.2) <strong>and</strong>P( A) = 1 − P( A).Unjustified hope had been previously connected with <strong>the</strong> Bayesformula (2.3) s<strong>in</strong>ce subjective <strong>in</strong>terpretation <strong>of</strong> probability was notruled out. For example, when hav<strong>in</strong>g hypo<strong>the</strong>ses B 1 , B 2 , ..., B n trustedwith probabilities P(B 1 ), P(B 2 ), ..., P(B n ), it was thought that anexperiment was desirable for <strong>in</strong>dicat<strong>in</strong>g <strong>the</strong> proper hypo<strong>the</strong>sis.17


Suppose that event A will occur <strong>in</strong> that experiment with probabilityP(A/B i ) if hypo<strong>the</strong>sis B i is <strong>in</strong>deed correct. After calculat<strong>in</strong>g P(B i /A)accord<strong>in</strong>g to that formula, we will obta<strong>in</strong> new estimates <strong>of</strong> <strong>the</strong>likelihood <strong>of</strong> <strong>the</strong> various hypo<strong>the</strong>ses.Modern probability <strong>the</strong>ory considers subjective probability as aconcept <strong>of</strong> magic 3 <strong>and</strong> only <strong>the</strong> term<strong>in</strong>ology is preserved accord<strong>in</strong>g towhich probabilities P(B 1 ), P(B 2 ), ..., P(B n ) are called prior, <strong>and</strong>P(B 1 /A), P(B 2 /A), ..., P(B n /A), posterior. But magic should be treatedcarefully: <strong>the</strong>re exists an important scientific doma<strong>in</strong> where <strong>the</strong>mentioned magical consideration is revived <strong>in</strong> an undoubtedlyscientific manner, <strong>the</strong> doma<strong>in</strong> <strong>of</strong> mach<strong>in</strong>e diagnostics.Suppose that a certa<strong>in</strong> hospital admits patients suffer<strong>in</strong>g fromdiseases B 1 , B 2 , ..., B n . The prior probabilities P(B 1 ), P(B 2 ), ..., P(B n )are <strong>in</strong>terpreted as frequencies <strong>of</strong> <strong>the</strong> correspond<strong>in</strong>g diseases. Event Ashould be understood here as <strong>the</strong> totality <strong>of</strong> <strong>the</strong> results <strong>of</strong> a diagnosticexam<strong>in</strong>ation <strong>of</strong> a patient. Posterior probabilities P(B 1 /A), P(B 2 /A), ...,P(B n /A) <strong>of</strong>fer some objective method <strong>of</strong> summ<strong>in</strong>g <strong>the</strong> <strong>in</strong>formationconta<strong>in</strong>ed <strong>in</strong> those exam<strong>in</strong>ations; objective does not necessarily meangood enough, but, anyway, not to be neglected beforeh<strong>and</strong>.The problem is only to f<strong>in</strong>d <strong>the</strong> probabilities P(A/B i ) needed forcalculat<strong>in</strong>g those posterior probabilities. It seems that for statisticallyderiv<strong>in</strong>g it, it suffices to look at its frequency as given <strong>in</strong> <strong>the</strong> casehistories <strong>of</strong> those suffer<strong>in</strong>g from B i , but here we encounter a veryunpleasant surprise: A is <strong>the</strong> result <strong>of</strong> a large number <strong>of</strong> exam<strong>in</strong>ations,a totality, so to say, <strong>of</strong> all <strong>the</strong> <strong>in</strong>dications revealable <strong>in</strong> a given patient<strong>and</strong> essential for diagnos<strong>in</strong>g him/her. Even <strong>the</strong> simplest exam<strong>in</strong>ation<strong>in</strong>cludes nowadays a number <strong>of</strong> analyses <strong>and</strong> <strong>in</strong>vestigations <strong>and</strong> partial<strong>in</strong>vestigations by many physicians <strong>of</strong> various specialities. It will not bean exaggeration at all to say that <strong>the</strong> amount <strong>of</strong> <strong>in</strong>formation is such that50 b<strong>in</strong>ary digits will be needed to write it down; actually, that numberwill perhaps only suffice after thoroughly select<strong>in</strong>g <strong>the</strong> <strong>in</strong>dicationsessential for <strong>the</strong> diagnosis.When adopt<strong>in</strong>g <strong>the</strong>se 50, we will have 2 50 ≈ 10 15 various possiblevalues <strong>of</strong> A. Suppose that previous statistics collected data on 10 4patients, <strong>the</strong>n, <strong>in</strong> <strong>the</strong> mean, 10 −11 observations will be available foreach possible value <strong>of</strong> A. Practically this means that an overwhelm<strong>in</strong>gmajority <strong>of</strong> <strong>the</strong>se values are not covered by any observations, almosteach new patient will provide a previously unknown result <strong>of</strong>exam<strong>in</strong>ation <strong>and</strong> it will be absolutely impossible to determ<strong>in</strong>e directly<strong>the</strong> probability P(A/B i ).Generally speak<strong>in</strong>g, <strong>in</strong> practical statistical <strong>in</strong>vestigations, whendesir<strong>in</strong>g to consider at once many factors <strong>and</strong> connections between<strong>the</strong>m, we usually f<strong>in</strong>d ourselves <strong>in</strong> a bl<strong>in</strong>d alley. Classify<strong>in</strong>g statisticalmaterial accord<strong>in</strong>g to several <strong>in</strong>dices very soon provides groups <strong>of</strong> oneobservation, <strong>and</strong> it is not known what to do with <strong>the</strong>m. Then, <strong>the</strong>Bayes <strong>the</strong>orem be<strong>in</strong>g ma<strong>the</strong>matically trivial naturally can not by itselfprovide any practical result. Never<strong>the</strong>less, consideration <strong>of</strong> manyfactors <strong>in</strong> medic<strong>in</strong>e is possible. There are contributions whose resultsare difficult to doubt, but it is premature to describe <strong>the</strong>m for <strong>the</strong>general reader. One <strong>of</strong> <strong>the</strong> possibilities here is connected with apply<strong>in</strong>g<strong>the</strong> concept <strong>of</strong> <strong>in</strong>dependence whose formulation we will now provide.18


2.3. Independence. When desir<strong>in</strong>g to consider <strong>the</strong> completestochastic characteristic <strong>of</strong> events A 1 , A 2 , ..., A n , we will need to know<strong>the</strong> probabilities <strong>of</strong> every possible setP(C 1 , C 2 , ..., C n )where each C i can take two values, A i <strong>and</strong> Ai.It is not difficult tocalculate that 2 n probabilities are needed. This number <strong>in</strong>creases veryrapidly with n <strong>and</strong> <strong>the</strong> pert<strong>in</strong>ent possibilities <strong>of</strong> any experiment become<strong>in</strong>sufficient. We expect such stochastic models to be applicable only ifthat difficulty is somehow overcome <strong>and</strong> <strong>the</strong> ma<strong>in</strong> part is played hereby <strong>the</strong> concept <strong>of</strong> <strong>in</strong>dependence.Def<strong>in</strong>ition. Two events, A <strong>and</strong> B, are <strong>in</strong>dependent if <strong>the</strong> conditionalP(A/B) <strong>and</strong> unconditional probabilities co<strong>in</strong>cide:P( AB)P( A / B) = P( A) or P( AB) P( A) P( B).P( B )= =For n events A 1 , A 2 , ..., A n <strong>in</strong>dependence is def<strong>in</strong>ed by equalityP(C 1 C 2 , ..., C n ) = P(C 1 ) P(C 2 ) ... P(C n ) (2.4)where each C i can take values A i <strong>and</strong> Ai.S<strong>in</strong>ce P( Ai) = 1 – P(A i ), <strong>the</strong>probabilities for <strong>in</strong>dependent events can be given by only n valuesP(A 1 ), P(A 2 ), ..., P(A n ).Independent events do exist; <strong>the</strong>y are realized <strong>in</strong> experiments carriedout <strong>in</strong>dependently one from ano<strong>the</strong>r (<strong>in</strong> <strong>the</strong> usual physical mean<strong>in</strong>g). Atextbook on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability should show <strong>the</strong> reader how <strong>the</strong>correspond<strong>in</strong>g space <strong>of</strong> elementary events is constructed here, but thisbooklet is not a textbook. I have provided a sufficiently detailedexposition <strong>of</strong> <strong>the</strong> most essential notions <strong>of</strong> that <strong>the</strong>ory so as to showhow it is done, briefly <strong>and</strong> conveniently (one concept, one axiom, onedef<strong>in</strong>ition) <strong>in</strong> <strong>the</strong> set-<strong>the</strong>oretic language. The fur<strong>the</strong>r development isalso <strong>of</strong>fered briefly <strong>and</strong> conveniently, but from <strong>the</strong> textbook style I amturn<strong>in</strong>g to <strong>the</strong> style <strong>of</strong> a summary.2.4. R<strong>and</strong>om variables. Def<strong>in</strong>ition. A r<strong>and</strong>om variable is a functiondef<strong>in</strong>ed on a set <strong>of</strong> elementary events. They are usually denoted byGreek letters ξ, η, ζ etc. When desir<strong>in</strong>g to <strong>in</strong>clude <strong>the</strong> argumentω ⊂ Ω , we write ξ(ω), η(ω), ζ(ω) etc.A set <strong>of</strong> possible values a 1 , a 2 , ..., a n , ... <strong>of</strong> events, all <strong>of</strong> <strong>the</strong>mdifferent,{ω: ω ⊂ Ω , ξ(ω) = a i } = {ξ= a i }is connected with each r<strong>and</strong>om variable ξ = ξ(ω), as well asprobabilities∑P{ξ = a } = P(ω) = p , ω:ξ(ω) = a .i i i19


The tablea 1 a 2 ... a n ...p 1 p 2 ... p n ...is called <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> variable ξ.It should be clearly imag<strong>in</strong>ed that, practically speak<strong>in</strong>g, almostalways we have to deal not with r<strong>and</strong>om variables <strong>the</strong>mselves but onlywith <strong>the</strong>ir distributions. In a word, <strong>the</strong> reason is that <strong>the</strong> r<strong>and</strong>omvariables, be<strong>in</strong>g functions <strong>of</strong> elementary events, are usuallyunobservable. As a result <strong>of</strong> an experiment whose outcome is one <strong>of</strong><strong>the</strong> elementary events ω, we usually determ<strong>in</strong>e a value <strong>of</strong> a r<strong>and</strong>omvariable ξ(ω), but we will not f<strong>in</strong>d out ω.Let us consider a throw <strong>of</strong> a die although <strong>in</strong>troduc<strong>in</strong>g <strong>the</strong> set <strong>of</strong>elementary events <strong>in</strong> a complicated way underst<strong>and</strong><strong>in</strong>g ω as <strong>the</strong> set <strong>of</strong>values <strong>of</strong> <strong>the</strong> coord<strong>in</strong>ates <strong>and</strong> velocities <strong>of</strong> <strong>the</strong> die at <strong>the</strong> moment whenwe let it go. More precisely, ω will be <strong>the</strong> set <strong>of</strong> those numbers writtendown precisely enough for uniquely determ<strong>in</strong><strong>in</strong>g <strong>the</strong> outcome ξ(ω).Such a determ<strong>in</strong>ation is not now possible for <strong>the</strong> microcosm but <strong>in</strong> ourcase we do not doubt it although no one ever checked that possibility.In any case, it is extremely difficult to observe ω so precisely, <strong>and</strong>practically although not <strong>in</strong> pr<strong>in</strong>ciple even impossible but <strong>the</strong>observation <strong>of</strong> ξ(ω) is easy, <strong>and</strong> that is what <strong>the</strong> gamblers are onlydo<strong>in</strong>g. The space <strong>of</strong> elementary events Ω is extremely convenient as aconcept, as we have seen <strong>and</strong> will see <strong>in</strong> <strong>the</strong> sequel, but as a rule it isnot actually observable. It is easier to observe events <strong>of</strong> <strong>the</strong> k<strong>in</strong>d {ξ =a i }.And still, such events are too numerous <strong>and</strong> it is preferable tocharacterize <strong>the</strong> distribution <strong>of</strong> a r<strong>and</strong>om variable by severalparameters, i. e. by functions <strong>of</strong> <strong>the</strong> values a i <strong>and</strong> probabilities p i .Considered are not arbitrary distributions, but such as are uniquelydeterm<strong>in</strong>ed by a small number <strong>of</strong> parameters. F<strong>in</strong>e, if one or twoparameters is (are) needed, endurable if three or four. However,determ<strong>in</strong>e experimentally more than four parameters, <strong>and</strong> your resultswill be questioned. The po<strong>in</strong>t is, that, as empirically noted, whenselect<strong>in</strong>g too many parameters any experimental results can be fitted toany law <strong>of</strong> distribution.Expectation is <strong>the</strong> most important parameter <strong>of</strong> distribution. We willdef<strong>in</strong>e it not <strong>in</strong> its usual form; <strong>the</strong> generally accepted def<strong>in</strong>ition willappear as a very simple <strong>the</strong>orem.Def<strong>in</strong>ition. An expectation <strong>of</strong> a r<strong>and</strong>om variable ξ = ξ(ω) is numberEξ determ<strong>in</strong>ed by <strong>the</strong> formula∑Eξ = ξ(ω) P(ω),ω ∈Ω.It is assumed here that <strong>the</strong> series absolutely converges; o<strong>the</strong>rwise, <strong>the</strong>r<strong>and</strong>om variable is said to have no expectation.It is not difficult to conv<strong>in</strong>ce ourselves that our def<strong>in</strong>ition actuallyco<strong>in</strong>cides with <strong>the</strong> accepted formula [...]20


Eξ = ∑ aipi.Our form <strong>of</strong> def<strong>in</strong>ition is however more convenient for prov<strong>in</strong>g <strong>the</strong><strong>the</strong>orems on <strong>the</strong> properties <strong>of</strong> <strong>the</strong> expectation. Let us prove, forexample, [<strong>the</strong> <strong>the</strong>orem about <strong>the</strong> expectation <strong>of</strong> a sum <strong>of</strong> variables].[...] In many textbooks that statement is proved defectively. [...]The second most important parameter <strong>of</strong> <strong>the</strong> distribution <strong>of</strong> ar<strong>and</strong>om variable is variance.Def<strong>in</strong>ition. [...]For r<strong>and</strong>om variables as also for events, <strong>the</strong> concept <strong>of</strong><strong>in</strong>dependence is most important. We def<strong>in</strong>e <strong>in</strong>dependence (<strong>in</strong> totality)for three r<strong>and</strong>om variables, <strong>and</strong> <strong>the</strong> def<strong>in</strong>ition is similar for any number<strong>of</strong> <strong>the</strong>m.Def<strong>in</strong>ition. [...]We will prove that for <strong>in</strong>dependent r<strong>and</strong>om variablesE(ξ ⋅ η⋅ ς) = Eξ ⋅ Eη⋅Eς.Pro<strong>of</strong>. [...]It easily follows that <strong>the</strong> variance <strong>of</strong> a sum <strong>of</strong> <strong>in</strong>dependent r<strong>and</strong>omvariables is equal to <strong>the</strong> sum <strong>of</strong> <strong>the</strong> variances <strong>of</strong> its terms.We have concluded <strong>the</strong> exposition <strong>of</strong> <strong>the</strong> ma<strong>in</strong> stochastic conceptsfor <strong>the</strong> discrete case, when <strong>the</strong> experiment has [only] a f<strong>in</strong>ite or acountable number <strong>of</strong> elementary outcomes. Now, we have to considerwhat happens when it is more natural to describe <strong>the</strong> experiment by amore complicated space.2.5. Transition to <strong>the</strong> general space <strong>of</strong> elementary events. If anexperiment results <strong>in</strong> some measurement, it is possible to state that,s<strong>in</strong>ce <strong>the</strong> precision <strong>of</strong> all measurements is only f<strong>in</strong>ite, <strong>the</strong> set <strong>of</strong>elementary outcomes will at most be countable. However, <strong>the</strong> history<strong>of</strong> <strong>the</strong> development <strong>of</strong> science <strong>in</strong>dicates that physical <strong>the</strong>ories are muchsimplified by consider<strong>in</strong>g cont<strong>in</strong>uous models for which experimentalresults can be any number. Differential equations can only be applied<strong>in</strong> such models.Readers, familiar with difference equations will easily imag<strong>in</strong>e howmore elegant <strong>and</strong> simple are <strong>the</strong> differential equations. Thus, althoughmodern physics has some vague ideas about <strong>the</strong> possible discreteness<strong>of</strong> space, it certa<strong>in</strong>ly is not at all easy to ab<strong>and</strong>on <strong>the</strong> notion <strong>of</strong>cont<strong>in</strong>uum. And, allow<strong>in</strong>g that notion, what k<strong>in</strong>d <strong>of</strong> probability <strong>the</strong>oryshould we have? The answer to this question is given by <strong>the</strong>celebrated Kolmogorov axiomatics (Kolmogorov 1933; Feller 1950<strong>and</strong> 1966). Its foundation is <strong>the</strong> notion <strong>of</strong> <strong>the</strong> space <strong>of</strong> elementaryoutcomes Ω which can now be arbitrary. Some (but not all!) <strong>of</strong> itssubspaces are held to be so to say observable as an experimental result<strong>and</strong> called events. If A is an event, we are able to say whe<strong>the</strong>r itoccurred <strong>in</strong> an experiment or not <strong>and</strong> <strong>in</strong> this sense it is observable. Wemay thus discuss <strong>the</strong> frequency <strong>of</strong> its occurrence <strong>and</strong> consequently <strong>the</strong>probability P(A).The ma<strong>in</strong> dem<strong>and</strong> <strong>of</strong> <strong>the</strong> Kolmogorov axiomatics conta<strong>in</strong><strong>in</strong>g asthough <strong>in</strong> embryo <strong>the</strong> merits <strong>and</strong> shortcom<strong>in</strong>gs <strong>of</strong> <strong>the</strong> entire <strong>the</strong>ory isthat, given a countable set <strong>of</strong> events A 1 , A 2 , ..., A n , ..., <strong>the</strong>ir sum <strong>and</strong>21


<strong>in</strong>tersection are also events; <strong>in</strong> addition, it is also assumed that Ω is anevent with P(Ω) = 1 <strong>and</strong> that <strong>the</strong> complement <strong>of</strong> any event is also anevent.Concern<strong>in</strong>g probabilities, <strong>the</strong> follow<strong>in</strong>g fundamental property isassumed. If <strong>the</strong> events A 1 , A 2 , ..., A n , ... do not <strong>in</strong>tersect <strong>in</strong> pairs (haveno common events)∞ ∞P[ U Ai] = ∑ P( Ai),(2.5)i=1i=1where <strong>the</strong> symbol U means a sum. For <strong>the</strong> discrete case, this statementcan be declared a <strong>the</strong>orem derived by issu<strong>in</strong>g from <strong>the</strong> mentioneddef<strong>in</strong>ition <strong>of</strong> § 3.1. In <strong>the</strong> general case, it is an axiom whereas thatdef<strong>in</strong>ition is useless.We will consider what does <strong>the</strong> application <strong>of</strong> <strong>the</strong> Kolmogorovaxiomatics dem<strong>and</strong> by discuss<strong>in</strong>g a concrete example, experimentalr<strong>and</strong>om throws <strong>of</strong> a po<strong>in</strong>t on <strong>in</strong>terval [0, 1]. Here, <strong>the</strong> space <strong>of</strong>elementary outcomes Ω should apparently consist <strong>of</strong> all po<strong>in</strong>ts <strong>of</strong> that<strong>in</strong>terval. If 0 ≤ a < b ≤ 1, it would have been extremely annoy<strong>in</strong>g to beforbidden to discuss <strong>the</strong> probability <strong>of</strong> a r<strong>and</strong>om po<strong>in</strong>t ω occurr<strong>in</strong>gwith<strong>in</strong> <strong>in</strong>terval [a, b]. And so, we desire to call events sets <strong>of</strong> <strong>the</strong> k<strong>in</strong>d{ω:a ≤ ω ≤ b}<strong>and</strong> we will assume thatP{ω:a ≤ ω ≤ b} = b – a,or, that <strong>the</strong> probability <strong>of</strong> a r<strong>and</strong>om po<strong>in</strong>t to fall on an <strong>in</strong>terval is equalto <strong>the</strong> <strong>in</strong>terval’s length. So far, everyth<strong>in</strong>g is natural.Now, however, we must assume that events are not only <strong>in</strong>tervals,but anyth<strong>in</strong>g obta<strong>in</strong>able from <strong>the</strong>m by summ<strong>in</strong>g <strong>and</strong> <strong>in</strong>tersect<strong>in</strong>g <strong>the</strong>ircountable number as also by <strong>in</strong>clud<strong>in</strong>g complements. Select<strong>in</strong>g po<strong>in</strong>t c,0 ≤ c ≤ 1, <strong>and</strong> a sequence <strong>of</strong> <strong>in</strong>tervals [c – 1/n, c + 1/n], we see that <strong>the</strong><strong>in</strong>tersection <strong>of</strong> <strong>the</strong>ir countable number consists <strong>of</strong> a s<strong>in</strong>gle po<strong>in</strong>t c, sothat any po<strong>in</strong>t is an event. The set <strong>of</strong> rational po<strong>in</strong>ts is obta<strong>in</strong>ed bysumm<strong>in</strong>g a countable number <strong>of</strong> po<strong>in</strong>ts <strong>and</strong> is <strong>the</strong>refore an event. Theset <strong>of</strong> irrational po<strong>in</strong>ts is its complement <strong>and</strong> <strong>the</strong>refore also an event.We thus consider observable whe<strong>the</strong>r a po<strong>in</strong>t thrown on an <strong>in</strong>tervalis rational or irrational although physically this is impossible, <strong>and</strong> wesee that it is necessary to apply carefully <strong>the</strong> Kolmogorov model,o<strong>the</strong>rwise it can lead to physically absurd corollaries.Particularly complicated versions <strong>of</strong> such models are applied <strong>in</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> stochastic processes. There, <strong>the</strong> researcher ought to beespecially careful, ought to possess a certa<strong>in</strong> taste for natural science.O<strong>the</strong>rwise it is easy to derive such results by issu<strong>in</strong>g from <strong>the</strong> acceptedma<strong>the</strong>matical model which at best can not be physically <strong>in</strong>terpreted,<strong>and</strong> at worst <strong>of</strong>fer an occasion for a wrong <strong>in</strong>terpretation. As anexample, I cite a ma<strong>the</strong>matical <strong>the</strong>orem accord<strong>in</strong>g to which <strong>the</strong>coefficient <strong>of</strong> diffusion <strong>of</strong> <strong>the</strong> Brownian motion can be determ<strong>in</strong>ed22


absolutely precisely if <strong>the</strong> pert<strong>in</strong>ent path dur<strong>in</strong>g any however short<strong>in</strong>terval <strong>of</strong> time is known.You can encounter a viewpo<strong>in</strong>t stat<strong>in</strong>g that a practical estimate <strong>of</strong><strong>the</strong> coefficient <strong>of</strong> diffusion does not <strong>the</strong>refore present any difficulties.This op<strong>in</strong>ion has been established to some extent <strong>in</strong> <strong>the</strong> literature on<strong>the</strong> statistics <strong>of</strong> stationary processes, but it is completely wrong. Twocircumstances prevent its application to real Brownian motion. First,<strong>the</strong> ma<strong>the</strong>matical Brownian motion, i. e., <strong>the</strong> Wiener process, does notdescribe <strong>the</strong> real process over short <strong>in</strong>tervals <strong>of</strong> time whereas exactly<strong>the</strong> change <strong>of</strong> <strong>the</strong> position <strong>of</strong> <strong>the</strong> particle dur<strong>in</strong>g <strong>in</strong>f<strong>in</strong>itely short<strong>in</strong>tervals enters <strong>the</strong> estimation <strong>of</strong> <strong>the</strong> coefficient <strong>of</strong> diffusion. Second,<strong>the</strong> idea <strong>of</strong> know<strong>in</strong>g exactly <strong>the</strong> path <strong>of</strong> some stochastic process dur<strong>in</strong>gsome <strong>in</strong>terval <strong>of</strong> time is absolutely unrealistic; we do not at all knowhow to def<strong>in</strong>e precisely a non-regularly chang<strong>in</strong>g function which is notdescribable by an analytic expression. I am unable to dwell here <strong>in</strong>more detail on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes <strong>and</strong> am return<strong>in</strong>g toprobability P. For <strong>in</strong>tervals, it co<strong>in</strong>cides with <strong>the</strong>ir length.However, it is possible to construct very complicated sets <strong>of</strong><strong>in</strong>tervals <strong>and</strong> ma<strong>the</strong>matical correctness dem<strong>and</strong>s that it be possible todef<strong>in</strong>e additionally that probability for all such sets while reta<strong>in</strong><strong>in</strong>g <strong>the</strong>ma<strong>in</strong> property <strong>of</strong> countable additivity (2.5). The French ma<strong>the</strong>maticianLebesgue provided a construction (<strong>the</strong> Lebesgue measure) allow<strong>in</strong>g toascerta<strong>in</strong> <strong>the</strong> possibility <strong>of</strong> such an additional def<strong>in</strong>ition. It iscomplicated <strong>and</strong> we will not discuss it here. However, it can be appliedfor spaces Ω <strong>of</strong> a very general k<strong>in</strong>d, consist<strong>in</strong>g for example <strong>of</strong>functions which is important for <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes.Until now, we have discussed <strong>the</strong> complications necessarilydem<strong>and</strong>ed by <strong>the</strong> Kolmogorov axiomatics; on <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, it ishowever connected with most important simplifications. The<strong>in</strong>troduction <strong>of</strong> a measure hav<strong>in</strong>g <strong>the</strong> property <strong>of</strong> countable additivityallows to apply <strong>the</strong> concept <strong>of</strong> Lebesgue <strong>in</strong>tegral; as a concept, it is<strong>in</strong>comparably simpler <strong>and</strong> more general than <strong>the</strong> Riemann <strong>in</strong>tegral. In<strong>the</strong> general case, all <strong>the</strong> ma<strong>in</strong> notions <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> r<strong>and</strong>om variablesoccur not more complicated than those described above for <strong>the</strong> discretecase. Thus, a remarkable simplicity, generality <strong>and</strong> order is orig<strong>in</strong>ated<strong>in</strong> <strong>the</strong> ma<strong>in</strong> notions <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. However, <strong>the</strong>Lebesgue <strong>in</strong>tegral is not more than a concept. No one calculates<strong>in</strong>tegrals by apply<strong>in</strong>g <strong>the</strong> Lebesgue extension <strong>of</strong> measure, <strong>the</strong> Riemann<strong>in</strong>tegral is preferred.It is necessary to mention here a certa<strong>in</strong> difficulty that takes placewhen teach<strong>in</strong>g ma<strong>the</strong>matical analysis, both at home <strong>and</strong> abroad. Ingeneral, noth<strong>in</strong>g negative can be said about its part deal<strong>in</strong>g withfunctions <strong>of</strong> one variable, although it is somewhat tedious; <strong>the</strong> horrorbeg<strong>in</strong>s with <strong>the</strong> transition to functions <strong>of</strong> several variables. Thetreatment <strong>of</strong> <strong>the</strong> differential, <strong>and</strong> especially <strong>in</strong>tegral calculus is herenowadays absolutely unsatisfactory. Take for example <strong>the</strong> set <strong>of</strong> <strong>the</strong>Green, Stokes <strong>and</strong> Ostrogradsky formulas <strong>in</strong>troduced without anyconnection between <strong>the</strong>m. Indeed, <strong>the</strong>re exists now a united viewpo<strong>in</strong>tabout all <strong>of</strong> <strong>the</strong>m <strong>and</strong> it even <strong>in</strong>cludes <strong>the</strong> Newton – Leibniz formula. Itis not treated <strong>in</strong> textbooks, but can be read <strong>in</strong> Arnold’s lectures (1968)on <strong>the</strong>oretical mechanics.23


The exposition <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability also suffers from thatcircumstance although less than <strong>the</strong>oretical mechanics. We are<strong>the</strong>refore unable to apply ei<strong>the</strong>r <strong>the</strong> notion <strong>of</strong> <strong>the</strong> Lebesgue <strong>in</strong>tegral ora number <strong>of</strong> useful properties <strong>of</strong> <strong>the</strong> ord<strong>in</strong>ary multiple <strong>in</strong>tegral <strong>and</strong> arerestrict<strong>in</strong>g <strong>the</strong> description to a necessary m<strong>in</strong>imum. Just as <strong>in</strong> <strong>the</strong>discrete case, we pass on from r<strong>and</strong>om variables <strong>the</strong>mselves to <strong>the</strong>irdistributions, but our deliberations ought to be suitable for severalvariables at once ra<strong>the</strong>r than for one only. In o<strong>the</strong>r words, we willconsider vector ξ = ξ(ξ 1 , ξ 2 , ..., ξ n ). Our ma<strong>in</strong> pr<strong>in</strong>ciple is to <strong>in</strong>troducesuch characteristics that admit an easy transition from one coord<strong>in</strong>atesystem to ano<strong>the</strong>r one although a so-called jo<strong>in</strong>t distribution functionFξ 1,ξ 2 ,...,ξ n(x 1 , x 2 , ..., x n ) = P(ξ 1 < x 1 , ξ 2 < x 2 , ..., ξ n < x n )has been applied <strong>in</strong>stead. The transition from coord<strong>in</strong>ates x 1 , x 2 , ..., x nto o<strong>the</strong>r coord<strong>in</strong>ates y 1 , y 2 , ..., y n becomes not only difficult, it is evenimpossible to describe that procedure by a formula without actually<strong>in</strong>troduc<strong>in</strong>g a stochastic measureµ ( A) = µ ( A) = P{ξ = (ξ ,ξ ,...,ξ ) ∈ A}.ξ ξ 1,ξ 2 ,...,ξ n1 2nHere, <strong>the</strong> vector ξ = ξ(ξ 1 , ξ 2 , ..., ξ n ) is an event, an element <strong>of</strong> <strong>the</strong> setA. The jo<strong>in</strong>t distribution function is thus practically useless. Actually,we have to apply densityp ( x) = p ( x ,..., x ).ξ ξ 1,ξ 2 ,...,ξ n 1nIt is def<strong>in</strong>ed by dem<strong>and</strong><strong>in</strong>g that for any (not too complicated) set A <strong>in</strong> amany-dimensional spaceP{ξ A} ... p ( x , x ,..., x ) dx dx ... dx .∈ = ∫ ∫ξ 1,ξ 2 ,...,ξn 1 2 n 1 2 nThe <strong>in</strong>tegration is over set A. Density plays here <strong>the</strong> same part asdistribution <strong>of</strong> a r<strong>and</strong>om variable <strong>in</strong> <strong>the</strong> discrete case. In particular,∞ ∞E f (ξ ,...,ξ ) ... f ( x ,..., x ) p ( x ,..., x ) dx ... dx .= ∫ ∫1 n 1 n ξ 1,...,ξ n 1 n 1 n−∞ −∞Most important is <strong>the</strong> formula connect<strong>in</strong>g <strong>the</strong> densities <strong>of</strong> a r<strong>and</strong>omvector <strong>in</strong> various systems <strong>of</strong> coord<strong>in</strong>ates, a particular case <strong>of</strong> <strong>the</strong>formula for <strong>the</strong> change <strong>of</strong> <strong>the</strong> variables <strong>in</strong> multiple <strong>in</strong>tegrals, <strong>and</strong> I donot <strong>in</strong>troduce it. Note that usual courses <strong>in</strong> ma<strong>the</strong>matical analysis evenlack <strong>the</strong> necessary notation.The densities <strong>of</strong> distribution <strong>of</strong> <strong>the</strong> sum, <strong>the</strong> product, ratio <strong>and</strong> o<strong>the</strong>roperations on r<strong>and</strong>om variables can be immediately derived by issu<strong>in</strong>gfrom it. On <strong>the</strong> contrary, for one-dimensional variables <strong>the</strong> notion <strong>of</strong>distribution function is very useful. Here is its def<strong>in</strong>ition:F ξ (x) = P(ξ < x)24


where x is any real number. If density <strong>of</strong> distribution p ξ (x) exists, <strong>the</strong>nxFξ( x) = ∫ pξ( x) dx.−∞I also <strong>in</strong>troduce <strong>the</strong> formulas for expectation <strong>and</strong> variance <strong>in</strong> thiscase:∞∞∫ ξ ∫−∞−∞Eξ = xp ( x) dx, E f (ξ) = f ( x) p ( x) dx,ξ∞2 2var ξ = E(ξ − Eξ) = ∫ ( x − Eξ) pξ( x) dx.−∞3. Bernoulli Trials. The Poisson Jurors3.1. Bernoulli trials. And so, it is <strong>in</strong>comparably simpler to<strong>in</strong>troduce probabilities <strong>of</strong> <strong>in</strong>dependent, ra<strong>the</strong>r than dependent events.Therefore, stochastic models with <strong>in</strong>dependent events have much morechances to be practically applied. The most simple <strong>and</strong> thus <strong>the</strong> mostwidely applicable is <strong>the</strong> model <strong>in</strong> which we imag<strong>in</strong>e a certa<strong>in</strong> numbern <strong>of</strong> <strong>in</strong>dependent trials, each <strong>of</strong> <strong>the</strong>m result<strong>in</strong>g <strong>in</strong> one <strong>of</strong> <strong>the</strong> twopossible outcomes called success <strong>and</strong> failure. The probability <strong>of</strong>success is supposed to be <strong>the</strong> same throughout <strong>and</strong> is denoted by p sothat failure will be q = 1 – p. Denote also success <strong>and</strong> failure by 1 <strong>and</strong>0, <strong>the</strong>n <strong>the</strong> result <strong>of</strong> n trials will be a sequence <strong>of</strong> <strong>the</strong>se numbers hav<strong>in</strong>glength n.The set <strong>of</strong> elementary events ω, Ω = {ω}, thus consists <strong>of</strong> all suchsequences <strong>of</strong> length n <strong>and</strong> <strong>the</strong>refore has 2 n elements. Tak<strong>in</strong>g<strong>in</strong>dependence <strong>of</strong> <strong>in</strong>dividual trials <strong>in</strong>to account, we ought to provide adef<strong>in</strong>ition accord<strong>in</strong>g to which <strong>the</strong> probability p(ω) <strong>of</strong> each elementaryevent ω will be calculated by chang<strong>in</strong>g each 1 by number p, <strong>and</strong> eachfailure by chang<strong>in</strong>g each 0 by number q <strong>and</strong> multiply <strong>the</strong> obta<strong>in</strong>ednumbers. We will <strong>the</strong>n haveP(ω) = p µ(ω) q n−µ(ω)where µ(ω) is <strong>the</strong> number <strong>of</strong> unities <strong>in</strong> <strong>the</strong> sequence <strong>of</strong> <strong>the</strong> ω’s.Experiments described by this stochastic models are calledBernoulli trials, <strong>and</strong> <strong>the</strong> r<strong>and</strong>om variable µ = µ(ω) is <strong>the</strong> number <strong>of</strong>successes <strong>in</strong> n such trials. Let us determ<strong>in</strong>e <strong>the</strong> distribution <strong>of</strong> thatr<strong>and</strong>om variable. Its possible values are evidently numbers 0, 1, ..., nso that∑ ∑µ(ω) n µ(ω)= = = =P{µ m} P(ω)p q −∑m n−m m n−mp q = p q ⋅ (number <strong>of</strong> such ω that µ(ω) = m).25


The summations are over ω: µ(ω)= m. However, <strong>the</strong> number <strong>of</strong> suchsequences <strong>of</strong> ω’s that µ(ω) = m is clearly equal to <strong>the</strong> number <strong>of</strong>mpossible selections <strong>of</strong> m symbols out <strong>of</strong> n, C . And so,P{ µ = m} = C m np m q n-m (3.1)which is <strong>the</strong> ma<strong>in</strong> formula <strong>of</strong> <strong>the</strong> Bernoulli trials.Its <strong>the</strong>ory is seen to be almost trivial but not trivial is to learn how toapply it, that is, how to f<strong>in</strong>d those phenomena that are sufficiently welldescribed by that pattern. A classical example <strong>of</strong> <strong>the</strong> trials is a toss <strong>of</strong> aco<strong>in</strong>, but when attempt<strong>in</strong>g to discover someth<strong>in</strong>g more <strong>in</strong>terest<strong>in</strong>g, weenter <strong>the</strong> doma<strong>in</strong> <strong>of</strong> doubtfulness. Thus, is it possible to consider abirth <strong>of</strong> an <strong>in</strong>fant <strong>of</strong> one or ano<strong>the</strong>r sex as a Bernoulli trial (<strong>and</strong> regarda male birth, say, as a success)?Accord<strong>in</strong>g to genetic ideas, this is quite natural. However, thoseideas lead just as naturally to <strong>the</strong> frequency <strong>of</strong> male births p = 1/2whereas it somewhat exceeds 1/2 as established by exam<strong>in</strong><strong>in</strong>g such animmense material that it becomes impossible to question it. Then,however, it is perhaps permissible to admit <strong>the</strong> opposite hypo<strong>the</strong>sis <strong>of</strong>p ≠ 1/2? Once more, no, s<strong>in</strong>ce <strong>the</strong> Bernoulli trials presume a constantprobability <strong>of</strong> success whereas <strong>the</strong> statistical data certa<strong>in</strong>ly <strong>in</strong>dicatethat <strong>the</strong> frequency <strong>of</strong> male births <strong>in</strong>creases after long wars. Thedependence <strong>of</strong> <strong>the</strong> probability <strong>of</strong> male births on <strong>the</strong> social conditions<strong>of</strong> <strong>the</strong> family [<strong>and</strong> on o<strong>the</strong>r circumstances] is also be<strong>in</strong>g discussed sothat <strong>the</strong> model <strong>of</strong> Bernoulli trials does not <strong>in</strong> this case completelycorrespond to reality.Then, statistically <strong>in</strong>vestigat<strong>in</strong>g that frequency we f<strong>in</strong>d out that,strictly speak<strong>in</strong>g, <strong>the</strong> model <strong>of</strong> those trials is unacceptable; however,s<strong>in</strong>ce <strong>the</strong> probability <strong>of</strong> male birth is never<strong>the</strong>less very near to 1/2, it isonly possible to reject <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> its applicability throughstatistical research based on pr<strong>of</strong>ound corollaries <strong>of</strong> formula (3.1). Wewill see now how it is carried out <strong>in</strong> Chapter 4.An application <strong>of</strong> stochastic methods results <strong>in</strong> a conclusion that,strictly speak<strong>in</strong>g, we ought not to discuss <strong>the</strong> probability <strong>of</strong> male births(or statistical stability). However, <strong>in</strong> <strong>the</strong> f<strong>in</strong>al analysis we will f<strong>in</strong>d outmuch more than had <strong>the</strong>re been an ideal conformity with <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability: we discover for sure that <strong>the</strong>re exists a still unidentifiedagent regulat<strong>in</strong>g <strong>the</strong> numbers <strong>of</strong> men <strong>and</strong> women.The model <strong>of</strong> Bernoulli trials is <strong>of</strong>ten applied for estimat<strong>in</strong>g someplans <strong>of</strong> acceptance <strong>in</strong>spection <strong>in</strong> which <strong>the</strong> manufactur<strong>in</strong>g <strong>of</strong> faulty(failures) or suitable (successes) articles must be described by thatpattern. However, after recall<strong>in</strong>g <strong>the</strong> discussion <strong>in</strong> Chapter 1 <strong>of</strong> <strong>the</strong>possibility <strong>of</strong> a stochastic description <strong>of</strong> manufactur<strong>in</strong>g faultyproducts, it becomes evident that that model can only be made use <strong>of</strong>when <strong>the</strong> <strong>in</strong>dustrial process is arranged well enough.We will discuss at length <strong>the</strong> attempt to apply <strong>the</strong> same model to <strong>the</strong>problem <strong>of</strong> legal verdicts. Pert<strong>in</strong>ent <strong>in</strong>vestigations are connected with<strong>the</strong> names <strong>of</strong> such first-rate scholars as Laplace <strong>and</strong> Poisson, <strong>and</strong> <strong>the</strong>irstudy is very <strong>in</strong>structive. It shows by an example taken from historythat a perfect comm<strong>and</strong> <strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical methods <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>n26


probability can be coupled with an absolutely wrong approach toreality 4 .3.2. Poisson’s jurors. Laplace, <strong>and</strong> <strong>the</strong>n Poisson <strong>in</strong>vestigated <strong>the</strong>issue <strong>of</strong> <strong>the</strong> probabilities <strong>of</strong> mistaken legal verdicts. A certa<strong>in</strong> juror cannaturally make a mistake. Laplace assigned jurors a very modestability <strong>of</strong> correct judgement: he thought that for each separatelyconsidered juror <strong>the</strong> probability <strong>of</strong> a mistake was a r<strong>and</strong>om variableuniformly distributed on segment [0, 1/2]. Poisson did not agree; hera<strong>the</strong>r believed that <strong>the</strong> probability <strong>of</strong> a correct judgement should beestimated by issu<strong>in</strong>g from statistical data. The impossibility <strong>of</strong>precisely establish<strong>in</strong>g whe<strong>the</strong>r rightly or not a given accused personwas found guilty presents here <strong>the</strong> greatest difficulty <strong>of</strong> a directstatistical estimate.Poisson’s ideas widely applied now also consisted <strong>in</strong> that <strong>in</strong> such asituation it was necessary to construct a statistical model with <strong>the</strong>unknown probability enter<strong>in</strong>g it as a parameter <strong>and</strong> to attempt todeterm<strong>in</strong>e it by pert<strong>in</strong>ent data.Let us consider <strong>the</strong> adm<strong>in</strong>istration <strong>of</strong> justice <strong>in</strong> more detail. The trialis based on <strong>the</strong> <strong>in</strong>quest. Denote <strong>the</strong> event consist<strong>in</strong>g <strong>in</strong> that <strong>the</strong>evidence collected at <strong>the</strong> <strong>in</strong>quest was sufficient for <strong>the</strong> trial to declare<strong>the</strong> defendant guilty by A, <strong>and</strong> <strong>the</strong> contrary event by A . Given A, all<strong>the</strong> jurors, provided <strong>the</strong>ir judgement is faultless, ought to unanimouslyvote for <strong>the</strong> prosecution; o<strong>the</strong>rwise (event A ) for <strong>the</strong> defence.Actually, ra<strong>the</strong>r <strong>of</strong>ten <strong>the</strong> votes are divided ow<strong>in</strong>g to mistakes madeby <strong>the</strong> jurors. Poisson’s ma<strong>in</strong> proposition was that such divisionconformed to <strong>the</strong> Bernoulli pattern. If n is <strong>the</strong> number <strong>of</strong> jurors, p, <strong>the</strong>probability <strong>of</strong> a correct judgement <strong>of</strong> each juror, <strong>the</strong> number <strong>of</strong> votesfor <strong>the</strong> prosecution, µ, it is described <strong>in</strong> <strong>the</strong> follow<strong>in</strong>g way.1) Given A, µ is <strong>the</strong> number <strong>of</strong> successes for <strong>the</strong> n pert<strong>in</strong>entBernoulli trials with probability <strong>of</strong> success p.2) Given A , µ is <strong>the</strong> number <strong>of</strong> failures for <strong>the</strong> same pattern.Accord<strong>in</strong>g to <strong>the</strong> French legislation, n = 12 <strong>and</strong> <strong>the</strong> defendant wasdeclared guilty if µ ≥ 7. The probability <strong>of</strong> that outcome isP g = P(A)P{µ ≥ 7/A} + P( A )P{µ ≥ 7/ A } =12 12m m 12 − m m 12−m m12− + −12−m= 7 m=7∑ ∑ (3.2)P( A) C p (1 p) [1 P( A)] C p (1 p) .Crim<strong>in</strong>al statistics provides <strong>the</strong> frequency <strong>of</strong> such verdicts which isapproximately equal to P g <strong>and</strong> Poisson thoroughly checked its stabilityover <strong>the</strong> years. However, expression (3.2) <strong>in</strong>cludes two unknownparameters, P(A) <strong>and</strong> p. Know<strong>in</strong>g only P g , it is impossible to determ<strong>in</strong>e<strong>the</strong>m <strong>and</strong> it is <strong>the</strong>refore necessary to turn to statistics which will<strong>in</strong>dicate not only whe<strong>the</strong>r defendants were found guilty or exonerated,but [<strong>in</strong> one case, see below] by how many votes as well. Thus, be<strong>in</strong>gaccused exactly by seven votes has probabilityP g {µ = 7} = P(A)P{µ = 7/A} + P( A )P{µ = 7/ A } =7 7 5 7 5 7P( A) C p (1 − p) + [1 − P( A)] C p (1 − p) .(3.3)12 1227


Know<strong>in</strong>g <strong>the</strong> left parts <strong>of</strong> relations (3.2) <strong>and</strong> (3.3) approximatelyequal to <strong>the</strong> frequencies provided by <strong>the</strong> crim<strong>in</strong>al statistics it ispossible <strong>in</strong> pr<strong>in</strong>ciple to determ<strong>in</strong>e both P(A) <strong>and</strong> p. Equations (3.2) <strong>and</strong>(3.3) are <strong>of</strong> a high degree <strong>and</strong> <strong>the</strong>ir solution is not easy. Poisson,however, developed a general method <strong>of</strong> <strong>the</strong>ir solution <strong>and</strong> f<strong>in</strong>allysuccessfully solved <strong>the</strong>m. In that, <strong>the</strong> 19 th century, follow<strong>in</strong>g Laplace<strong>and</strong> Poisson, problems on probabilities <strong>of</strong> verdicts entered alltextbooks on probability <strong>the</strong>ory, but <strong>in</strong> <strong>the</strong> next century suchapplications <strong>of</strong> <strong>the</strong> <strong>the</strong>ory were declared absolutely nonsensical. Weought to f<strong>in</strong>d out <strong>the</strong> reason why.Poisson’s ma<strong>in</strong> presumption was <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> jurors’<strong>in</strong>dividual judgements. Fully underst<strong>and</strong><strong>in</strong>g <strong>the</strong> need to check <strong>the</strong>stability <strong>of</strong> frequencies, he (1837) did not say a word about anexperimental check <strong>of</strong> <strong>in</strong>dependence. How was such a procedurepossible? When solv<strong>in</strong>g equations (3.2) <strong>and</strong> (3.3), Poisson found outthat <strong>the</strong> probability <strong>of</strong> a correct judgement <strong>of</strong> an <strong>in</strong>dividual jurorapproximately equalled 2/3 so that a correct unanimous accusation hadprobability (2/3) 12 < 0.01 <strong>and</strong> was almost impossible. However, <strong>in</strong>neighbour<strong>in</strong>g Engl<strong>and</strong>, as Poisson himself noted, <strong>the</strong> law dem<strong>and</strong>ed aunanimous decision <strong>of</strong> all <strong>the</strong> 12 jurors, <strong>and</strong> English courtspronounced much more condemn<strong>in</strong>g sentences, death sentences<strong>in</strong>cluded, than <strong>the</strong> courts <strong>in</strong> France. To rem<strong>in</strong>d, <strong>the</strong> expositionconcerned <strong>the</strong> 19 th century.Poisson considered that circumstance as a cause for national pride,Engl<strong>and</strong> was seen as a much less civilized nation although it shouldhave been seen as an argument for doubt<strong>in</strong>g his own stochastic model.True, it should be said <strong>in</strong> all fairness that anyway he was unable tocheck it given <strong>the</strong> French crim<strong>in</strong>al statistics. Indeed, protect<strong>in</strong>g <strong>the</strong>secret <strong>of</strong> <strong>the</strong> jurors’ vot<strong>in</strong>g, <strong>the</strong> French judicial code did not dem<strong>and</strong> to<strong>in</strong>dicate <strong>the</strong> number <strong>of</strong> condemn<strong>in</strong>g votes <strong>the</strong> only exception hav<strong>in</strong>gbeen <strong>the</strong> case <strong>of</strong> <strong>the</strong> m<strong>in</strong>imal necessary votes.Thus, from <strong>the</strong> modern viewpo<strong>in</strong>t, Poisson’s error, formallyspeak<strong>in</strong>g, consisted <strong>in</strong> recommend<strong>in</strong>g a stochastic model withoutcheck<strong>in</strong>g it. He determ<strong>in</strong>ed two unknown parameters by two observedmagnitudes with no possibility <strong>of</strong> such check<strong>in</strong>g. It is <strong>in</strong>terest<strong>in</strong>g todescribe <strong>the</strong> pert<strong>in</strong>ent op<strong>in</strong>ion <strong>of</strong> Cournot (1843). Poisson’scontemporary, he apparently was not as ma<strong>the</strong>matically powerful asPoisson, much less as Laplace. However, we ought to recognize tha<strong>the</strong> possessed more common sense <strong>of</strong> a natural scientist, than thosefirst-rate scholars.In particular, he clearly understood that <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> jurors’judgement was only a premise that should have been experimentallychecked. He even proposed such a change <strong>of</strong> <strong>the</strong> judicial code which,without violat<strong>in</strong>g <strong>the</strong> secret <strong>of</strong> <strong>the</strong> jurors’ vot<strong>in</strong>g, would have allowedto obta<strong>in</strong> <strong>the</strong> necessary statistical data. As to <strong>the</strong> <strong>in</strong>dependence itself,Cournot believed that, if it did not exist <strong>in</strong> all <strong>the</strong> totality <strong>of</strong> legalproceed<strong>in</strong>gs <strong>in</strong> general, <strong>the</strong>n <strong>in</strong> any case legislation can be separated<strong>in</strong>to groups <strong>of</strong> <strong>in</strong>dependent cases. He even found out that two suchgroups concern<strong>in</strong>g crimes aga<strong>in</strong>st <strong>the</strong> person <strong>and</strong> aga<strong>in</strong>st property willhave very near to each o<strong>the</strong>r values <strong>of</strong> <strong>the</strong> parameters P(A) <strong>and</strong> p asdeterm<strong>in</strong>ed accord<strong>in</strong>g to <strong>the</strong> Poisson method.28


Nowadays we are sure that no <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> judgement <strong>of</strong><strong>in</strong>dividual jurors does exist, so that <strong>the</strong> groups isolated by Cournotwould have most likely consisted <strong>of</strong> one case only. True, thisstatement is not really proven so that accord<strong>in</strong>g to modern scienceCournot’s po<strong>in</strong>t <strong>of</strong> view is formally <strong>in</strong>vulnerable which once aga<strong>in</strong>confirms that he had essentially outstripped his time.For our days, an important conclusion from <strong>the</strong> above is that it is byno means permissible to use all <strong>the</strong> available statistical <strong>in</strong>formation fordeterm<strong>in</strong><strong>in</strong>g <strong>the</strong> parameters <strong>of</strong> a statistical model; it is absolutelynecessary to leave some part <strong>of</strong> it for check<strong>in</strong>g <strong>the</strong> model itself,o<strong>the</strong>rwise, great scientific efforts can result <strong>in</strong> complete rubbish.4. Substantial Theorems <strong>of</strong> <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>4.1. The Poisson <strong>the</strong>orem. When compil<strong>in</strong>g his treatise, Poisson(1837) discovered one <strong>of</strong> <strong>the</strong> ma<strong>in</strong> statistical laws. Calculat<strong>in</strong>g <strong>the</strong>probabilities P(µ = m) that m successes will be achieved <strong>in</strong> n Bernoullitrials, he found out an approximate formula for large values <strong>of</strong> n <strong>and</strong>small values <strong>of</strong> p:λP{µ m}em!m−λ= ≈ (4.1)where λ = np; for more details see Gnedenko (1950). The exactexpression for P{µ = m} depends on three parameters, n, m <strong>and</strong> p; <strong>in</strong><strong>the</strong> approximate expression, n <strong>and</strong> p are comb<strong>in</strong>ed <strong>in</strong>to one.At first sight this simplification seems trivial <strong>and</strong> Poisson himselfdid not th<strong>in</strong>k that his formula was really important. Indeed, his treatise<strong>in</strong>cluded a large number <strong>of</strong> more precise <strong>and</strong> almost as suitableformulas. However, <strong>the</strong> comb<strong>in</strong><strong>in</strong>g mentioned allows to compile acomparatively short table for calculat<strong>in</strong>g (4.1) with two entries, m <strong>and</strong>λ, whereas <strong>the</strong> precise expression for P{µ = m} would have dem<strong>and</strong>eda table with three entries which is not done yet <strong>in</strong> a sufficientlyconvenient form.Never<strong>the</strong>less, <strong>the</strong> ma<strong>in</strong> role <strong>of</strong> <strong>the</strong> formula (4.1) consists not <strong>in</strong>convenient calculation. Strictly speak<strong>in</strong>g, we express it as ama<strong>the</strong>matical <strong>the</strong>orem (Gnedenko 1950) concern<strong>in</strong>g Bernoulli trials, i.e. <strong>in</strong>dependent trials with two outcomes <strong>and</strong> a constant probability <strong>of</strong>success. The most important circumstance is that those conditions maybe violated without deny<strong>in</strong>g its conclusion, that is, <strong>the</strong> equality (4.1).For example, we may admit that different trials have differ<strong>in</strong>gprobabilities <strong>of</strong> successp 1 , p 2 , ..., p n , ...(if only all <strong>of</strong> <strong>the</strong>m are low). Then <strong>the</strong> exact expression for P{µ = m}from Chapter 3 as well as good enough approximate expressionsbecome useless (because <strong>the</strong>y are too exact). The comparatively roughexpression (4.1) rema<strong>in</strong>s valid if only p 1 + p 2 + ...+ p n , or, if desired,np be substituted <strong>in</strong>stead <strong>of</strong> λ. It follows that for calculat<strong>in</strong>g λ it is notnecessary to know <strong>the</strong> values <strong>of</strong> p i , suffice it to know one s<strong>in</strong>gleparameter, <strong>the</strong>ir mean, <strong>the</strong> new value <strong>of</strong> <strong>the</strong> probability <strong>of</strong> success.29


Note also that we <strong>of</strong>ten are unable to repeat an experiment with agiven probability <strong>of</strong> success for a sufficiently large number <strong>of</strong> times sothat it is perhaps impossible to f<strong>in</strong>d out p i , which does not necessarilyprevent us from calculat<strong>in</strong>g p . The approximate value (4.1) <strong>the</strong>reforehas much more chances to f<strong>in</strong>d practical application than <strong>the</strong> exactformula for P{µ = m}.Then, <strong>the</strong> dem<strong>and</strong> <strong>of</strong> <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> <strong>in</strong>dividual trials can beweakened: it may be assumed that <strong>the</strong>y have more than two outcomesbut such possibilities exceed <strong>the</strong> limits <strong>of</strong> this booklet.Those possibilities <strong>of</strong> weaken<strong>in</strong>g <strong>the</strong> conditions <strong>of</strong> <strong>the</strong> Poisson<strong>the</strong>orem without chang<strong>in</strong>g its conclusion lead to <strong>the</strong> Poissondistribution attach<strong>in</strong>g probabilities (4.1) to values m = 0, 1, 2, ...becom<strong>in</strong>g one <strong>of</strong> <strong>the</strong> most universal laws. Consider for example <strong>the</strong>problem <strong>of</strong> <strong>the</strong> number <strong>of</strong> refusals dur<strong>in</strong>g time T <strong>in</strong> cases <strong>of</strong>complicated systems. Suppose a system consists <strong>of</strong> n elements <strong>and</strong> p i =p i (T) is <strong>the</strong> probability <strong>of</strong> a refusal <strong>of</strong> <strong>the</strong> i-th element (<strong>and</strong> that afterrefusal <strong>the</strong> damaged element is <strong>in</strong>stantly replaced). The number <strong>of</strong>refusals is <strong>the</strong> number <strong>of</strong> successes <strong>in</strong> n trials with <strong>the</strong> i-th trial be<strong>in</strong>gconnected with <strong>the</strong> i-th element <strong>and</strong> its success means a refusal <strong>of</strong> thatelement. A given element can experience more than one refusal <strong>and</strong> itsrefusal can somewhat <strong>in</strong>fluence <strong>the</strong> refusals <strong>of</strong> <strong>the</strong> o<strong>the</strong>r elements, − all<strong>the</strong> same, if p i are sufficiently low, we expect <strong>the</strong> Poisson distributionwith <strong>the</strong> parameter λ = n p to describe <strong>the</strong> number <strong>of</strong> refusals.Deviations are only possible if <strong>the</strong> connections between <strong>the</strong> refusals<strong>of</strong> different elements are strong. Given low values <strong>of</strong> p i it is natural toexpect a l<strong>in</strong>ear dependence <strong>of</strong> p i = p i (T) on T:p ( T ) = p′T.iTheniλ = λ(T) = λ′T (4.3)<strong>and</strong> <strong>the</strong> probability <strong>of</strong> m refusals will beP(λ′T )m!m−λ′T{µ = m} ≈ e .For <strong>the</strong> probability <strong>of</strong> failure-free work dur<strong>in</strong>g time T, that is, for µ =0, we have−λ′P{µ = 0} ≈ e Twhich is <strong>the</strong> generally known exponent law for <strong>the</strong> time <strong>of</strong> failure-freework.The Poisson <strong>and</strong> <strong>the</strong> exponent laws <strong>the</strong>refore correspond to eacho<strong>the</strong>r. There occurs some harmonious correspondence that we mayhope to apply beneficially for solv<strong>in</strong>g practical problems.Our model does not allow for ag<strong>in</strong>g; to achieve that we ought toreplace <strong>the</strong> l<strong>in</strong>ear dependence (4.3) by a more complicated dependence30


λ = λ(T)with λ(T) be<strong>in</strong>g actually approximated by a function <strong>of</strong> a most simplek<strong>in</strong>d, for example by a polynomial. Its coefficients can be obta<strong>in</strong>ed byone or ano<strong>the</strong>r method, for <strong>in</strong>stance by <strong>the</strong> method <strong>of</strong> least squares.However, <strong>the</strong> number <strong>of</strong> parameters necessary to be <strong>in</strong>cluded will <strong>in</strong>this case be larger than when ag<strong>in</strong>g is not allowed for <strong>and</strong>, accord<strong>in</strong>gly,<strong>the</strong> model will enjoy less faith.In general, with or without allow<strong>in</strong>g for ag<strong>in</strong>g, it is natural to apply<strong>the</strong> Poisson law for describ<strong>in</strong>g <strong>the</strong> number <strong>of</strong> failures, only itsparameter is determ<strong>in</strong>ed <strong>in</strong> differ<strong>in</strong>g ways. The fit <strong>of</strong> <strong>the</strong> Poisson law,its agreement with <strong>the</strong> actual data should be checked by statisticaltests. If a good agreement is lack<strong>in</strong>g, it will be likely more natural tosuspect <strong>the</strong> statistical homogeneity <strong>of</strong> <strong>the</strong> data ra<strong>the</strong>r <strong>the</strong>n <strong>the</strong>applicability <strong>of</strong> <strong>the</strong> Poisson law. Only after check<strong>in</strong>g that out may weth<strong>in</strong>k about choos<strong>in</strong>g ano<strong>the</strong>r distribution for describ<strong>in</strong>g <strong>the</strong> number <strong>of</strong>failures.None <strong>of</strong> this certa<strong>in</strong>ly applies to <strong>the</strong> case <strong>of</strong> strong ties between <strong>the</strong>failure <strong>of</strong> different elements. For example, if <strong>the</strong> failure <strong>of</strong> one part <strong>of</strong><strong>the</strong> system automatically leads to a failure <strong>of</strong> its o<strong>the</strong>r part, <strong>the</strong> totalnumber <strong>of</strong> failures will be doubled. In such cases, even if <strong>the</strong> number<strong>of</strong> <strong>in</strong>itial failures follows <strong>the</strong> Poisson law, <strong>the</strong> doubled figure will not,<strong>and</strong> it is more natural to apply here <strong>the</strong> normal law. And <strong>the</strong> Poissonlaw is certa<strong>in</strong>ly only applicable when a failure really is a rare event.An excellent set <strong>of</strong> o<strong>the</strong>r examples <strong>of</strong> <strong>the</strong> application <strong>of</strong> <strong>the</strong> Poissonlaw is to be found <strong>in</strong> Feller (1950) only <strong>the</strong> <strong>the</strong>ory <strong>of</strong> rare excursions <strong>of</strong>stochastic processes can possibly be added to it. Just as any rare event,<strong>the</strong> number <strong>of</strong> such excursions beyond a sufficiently high level obeys<strong>the</strong> Poisson law.4.2. The Central Limit Theorem. The Poisson law is determ<strong>in</strong>edby a s<strong>in</strong>gle parameter λ. It is not difficult to show that λ is <strong>the</strong>expectation <strong>of</strong> a r<strong>and</strong>om variable distributed accord<strong>in</strong>g to that law.Here, we will consider an even more universal stochastic law, <strong>the</strong> socallednormal law determ<strong>in</strong>ed by two parameters, expectation <strong>and</strong>variance. It was discovered at about <strong>the</strong> same time by Gauss <strong>and</strong>Laplace who issued from absolutely different considerations.Gauss discovered that exactly <strong>in</strong> <strong>the</strong> case <strong>of</strong> a normal law <strong>of</strong>distribution <strong>of</strong> <strong>the</strong> observational errors it is most natural to choose <strong>the</strong>arithmetic mean <strong>of</strong> <strong>the</strong> <strong>in</strong>dividual measurements as <strong>the</strong> estimator <strong>of</strong> <strong>the</strong>real value <strong>of</strong> <strong>the</strong> measured magnitude. Laplace’s start<strong>in</strong>g po<strong>in</strong>t was hisdiscovery <strong>of</strong> an extremely powerful method <strong>of</strong> calculat<strong>in</strong>g <strong>the</strong>distribution <strong>of</strong> a sum <strong>of</strong> r<strong>and</strong>om variables. Gauss’ ideas are importantfor treat<strong>in</strong>g <strong>the</strong> results <strong>of</strong> measurement <strong>and</strong> were fur<strong>the</strong>r developed <strong>in</strong>ma<strong>the</strong>matical statistics as <strong>the</strong> so-called method <strong>of</strong> maximumlikelihood. Laplace’s ideas concerned <strong>the</strong> properties <strong>of</strong> arithmeticoperations on a large number <strong>of</strong> r<strong>and</strong>om variables <strong>and</strong> actuallyconstitute <strong>the</strong> foundation <strong>of</strong> modern probability <strong>the</strong>ory. In this booklet,devoted to <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability ra<strong>the</strong>r <strong>the</strong>n ma<strong>the</strong>matical statistics,we adopt <strong>the</strong> Laplacean approach 5 .Consider arbitrary <strong>in</strong>dependent r<strong>and</strong>om variables31


ξ 1 , ξ 2 , ..., ξ n , ... (4.4)tak<strong>in</strong>g <strong>the</strong> same values0, ± 1, ± 2, ..., ± mwith <strong>the</strong> same probabilitiesP{ξ i = m} = p m .Such variables are called identically distributed. Probabilities p m arearbitrary, <strong>the</strong>y only obey <strong>the</strong> condition <strong>of</strong> add<strong>in</strong>g up to 1 <strong>and</strong>sufficiently rapidly decrease as m → ± ∞. More precisely, it isnecessary that <strong>the</strong> variables ξ i have a f<strong>in</strong>ite expectation <strong>and</strong> a f<strong>in</strong>itevariance∞∞2 2Eξi= ∑ mpm = a, varξi= ∑ ( m − a) pm= σ .m=−∞m=−∞O<strong>the</strong>rwise, <strong>the</strong> set <strong>of</strong> probabilities {p m } is absolutely arbitrary. It can<strong>the</strong>refore be impossible to describe that set by any f<strong>in</strong>ite number <strong>of</strong>parameters.Laplace discovered that for a large number <strong>of</strong> terms <strong>of</strong> <strong>the</strong> set (4.4)<strong>the</strong> distribution <strong>of</strong> <strong>the</strong>ir sum becomes <strong>in</strong>comparably simpler than that<strong>of</strong> <strong>the</strong>ir separate terms so that, allow<strong>in</strong>g for some additional conditions(Gnedenko 1950, Chapter 8) 6 ,21 −xn( n)σ nP{ξ 1+ ... + ξn= m} ≈ exp[ − ],2π 2m − naxn( m) = . (4.5a,b)σ nIt is beneficial to bear <strong>in</strong> m<strong>in</strong>d <strong>the</strong> follow<strong>in</strong>g simple considerationswhich help to underst<strong>and</strong> <strong>the</strong> geometric mean<strong>in</strong>g <strong>of</strong> equality (4.5).Suppose that we desire to show graphically <strong>the</strong> distribution <strong>of</strong> eachr<strong>and</strong>om variable (4.4) <strong>and</strong> <strong>the</strong>ir sum. We choose an abscissa axis,<strong>in</strong>dicate po<strong>in</strong>ts0, ± 1, ± 2, ..., ± m ...<strong>and</strong> show probability as a rectangle with base 1, its midpo<strong>in</strong>t be<strong>in</strong>g atm, <strong>and</strong> area (that is, its height) p m . We will have some, generallyspeak<strong>in</strong>g, irregular set <strong>of</strong> rectangles. An attempt to show <strong>the</strong>distribution <strong>of</strong> <strong>the</strong> probabilities <strong>of</strong> <strong>the</strong> sum <strong>of</strong> those variables for alarge n will be unsuccessful because <strong>the</strong> possible values <strong>of</strong> that sumcan be very large <strong>and</strong> <strong>the</strong> probabilities <strong>of</strong> <strong>the</strong> separate values, small, ascan be proven. A change <strong>of</strong> <strong>the</strong> scale will be <strong>the</strong>refore needed so thatshow<strong>in</strong>g <strong>the</strong> values <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variable32


(1/B n )(m − A n )<strong>in</strong>stead <strong>of</strong> <strong>the</strong> value <strong>of</strong> <strong>the</strong> sum m = (ξ 1 + ...+ ξ n ) will be necessary.And now <strong>the</strong> essence <strong>of</strong> Laplace’s discovery can be expressed by as<strong>in</strong>gle phrase: <strong>the</strong> figure should be shifted byA n = E(ξ 1 + ... + ξ n ) = na<strong>and</strong> <strong>the</strong> coefficient <strong>of</strong> <strong>the</strong> change <strong>of</strong> <strong>the</strong> scale should be equal toBn= var(ξ + ... + ξ ) = σ n.1nThe r<strong>and</strong>om variable* 1sn= [(ξ1+ ... + ξn) − E(ξ1+ ... + ξn)]var(ξ + ... + ξ )1n(4.6)is called <strong>the</strong> normed sum. Obviously,* *Esn= 0, var sn= 1<strong>and</strong> <strong>the</strong> numbers (4.5b) are <strong>the</strong> possible values <strong>of</strong> that normed sum. Letus attempt to show its probabilities as rectangles with bases1xn( m + 1) − xn( m) = ,σ n<strong>the</strong>ir midpo<strong>in</strong>ts co<strong>in</strong>cid<strong>in</strong>g with po<strong>in</strong>ts x n (m) <strong>and</strong> areas equal toprobabilitiesP s x m P m*{n=n( )} = {ξ1+ ... + ξn= }.The heights <strong>of</strong> <strong>the</strong>se rectangles should be*σ nP{ sn = xn ( m)} = σ nP{ξ 1+ ... + ξn= m}.Thus, because <strong>of</strong> (4.5a) <strong>the</strong> upper bases <strong>of</strong> <strong>the</strong>se rectangles will bealmost exactly situated along a curve described by equation21 xy = y( x) = exp[ − ](4.7)2π 2<strong>in</strong>dependent <strong>of</strong> anyth<strong>in</strong>g <strong>and</strong> calculated once <strong>and</strong> for all.A result absolutely not foreseen <strong>and</strong> almost miraculous! Disorder <strong>in</strong>probabilities p m somehow gives birth to a unique curve (4.7) whichsimply occurs by summ<strong>in</strong>g r<strong>and</strong>om variables <strong>and</strong> transform<strong>in</strong>g <strong>the</strong>scale <strong>of</strong> <strong>the</strong> figure. That is Laplace’s remarkable discovery without33


which <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability would have almost lacked orig<strong>in</strong>al (notreducible to known concepts <strong>of</strong> ma<strong>the</strong>matical analysis) contents.True, <strong>the</strong> statement (4.7) has some exceptions. For example, if all<strong>the</strong> r<strong>and</strong>om variables (4.4) only take even values (so that p m ≠ 0 onlyfor even values <strong>of</strong> m) <strong>the</strong> sum (ξ 1 + ...+ ξ n ) is also always even,whereas at odd values <strong>of</strong> m <strong>the</strong> left part <strong>of</strong> (4.5a) vanishes <strong>and</strong> thatequality is violated. This exception is actually <strong>the</strong> only one (Gnedenko1950, Chapter 8) <strong>and</strong>, for avoid<strong>in</strong>g it <strong>and</strong> because <strong>of</strong> a number <strong>of</strong> o<strong>the</strong>rconsiderations, it is preferred to formulate <strong>the</strong> central limit <strong>the</strong>orem <strong>in</strong>terms <strong>of</strong> distribution functions. Here <strong>the</strong> appropriate formulation iseffectively known to Laplace.Theorem. Let ξ 1 , ..., ξ n , ... be a sequence <strong>of</strong> <strong>in</strong>dependent identicallydistributed r<strong>and</strong>om variables hav<strong>in</strong>g a f<strong>in</strong>ite expectation a <strong>and</strong> f<strong>in</strong>itevariance σ 2 *, <strong>and</strong> suppose that sn, see (4.6), is <strong>the</strong> normed sum <strong>of</strong> thosevariables. Then, as n → ∞,x* 1n ∫−∞2xP{ s < x} → exp[ − ] dx2π 2<strong>and</strong> for − ∞ < x < ∞ <strong>the</strong> convergence is uniform for every x.This formula is still less sensitive to violations <strong>of</strong> its conditions <strong>the</strong>neven <strong>the</strong> Poisson <strong>the</strong>orem remarkable <strong>in</strong> this connection. Thedevelopment <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability is essentially l<strong>in</strong>ked with <strong>the</strong>perfection <strong>of</strong> its pro<strong>of</strong> <strong>and</strong> weaken<strong>in</strong>g <strong>of</strong> its conditions. It is possible todeny, i. e. to replace by less restrictive each <strong>of</strong> <strong>the</strong> latter without<strong>in</strong>validat<strong>in</strong>g its conclusion. Liapunov denied <strong>the</strong> identical distribution<strong>of</strong> <strong>the</strong> r<strong>and</strong>om variables <strong>and</strong> thus occurred his <strong>the</strong>orem (Gnedenko1950, Chapter 8) whereas Bernste<strong>in</strong> (1926) denied <strong>in</strong>dependence.Attempts were recently made to ab<strong>and</strong>on essentially <strong>the</strong> condition <strong>of</strong>r<strong>and</strong>omness <strong>of</strong> <strong>the</strong> variables (4.4). It is also possible to replace r<strong>and</strong>omvariables tak<strong>in</strong>g numerical values by r<strong>and</strong>om elements <strong>of</strong> some groups(<strong>and</strong> to consider <strong>the</strong> relevant group operation <strong>in</strong>stead <strong>of</strong> summ<strong>in</strong>g).Certa<strong>in</strong> success was achieved but it is too soon to discuss this subjec<strong>the</strong>re. The f<strong>in</strong>iteness <strong>of</strong> <strong>the</strong> variance can not be denied (Gnedenko &Kolmogorov 1949) s<strong>in</strong>ce <strong>the</strong> convergence to <strong>the</strong> normal law will nothold. It is only possible to weaken somewhat that condition.4.3. The normal distribution. And so, <strong>the</strong> most widely applieddistribution <strong>of</strong> probabilities is <strong>the</strong> Gauss – Laplace law whose densityis provided by formula (4.7). In o<strong>the</strong>r words, it is said that <strong>the</strong> r<strong>and</strong>omvariable ξ has a st<strong>and</strong>ard normal distribution if (practically for any) setA <strong>the</strong> equation21 xP{ξ ∈ A} = ∫ exp[ − ] dxA2π 2is valid.It is natural to consider r<strong>and</strong>om variables <strong>of</strong> <strong>the</strong> typeη = σξ + a34


along with ξ. For example, if ξ is <strong>the</strong> result <strong>of</strong> some measurement, <strong>and</strong>η is its result <strong>in</strong> ano<strong>the</strong>r system <strong>of</strong> units, it is not difficult to show that<strong>the</strong> density <strong>of</strong> distribution <strong>of</strong> η is21 ( x − a)pη ( x) = exp[ − ]2σ 2π 2σwith expectation <strong>and</strong> variance <strong>of</strong> η be<strong>in</strong>g Eη =a <strong>and</strong> varη = σ 2 . Thedistribution <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variable η is called normal with parametersa <strong>and</strong> σ <strong>and</strong> denoted by N(a, σ).Most important is <strong>the</strong> follow<strong>in</strong>g <strong>the</strong>orem: If η 1 <strong>and</strong> η 2 are<strong>in</strong>dependent r<strong>and</strong>om variables hav<strong>in</strong>g distributions N(a 1 , σ 1 ) <strong>and</strong> N(a 2 ,σ 2 ), <strong>the</strong>ir sum will have distribution N(a 1 + a 2 ,σ + σ ).2 21 24.4. The De Moivre – Laplace <strong>the</strong>orem. Consider n Bernoullitrials with probability <strong>of</strong> success p <strong>in</strong> each. The number <strong>of</strong> successes µcan be represented as a sumµ = µ 1 + µ 2 + ... + µ nwhere <strong>the</strong> r<strong>and</strong>om variable µ k is <strong>the</strong> number <strong>of</strong> successes <strong>in</strong> <strong>the</strong> k-thtrial, i. e., is equal to 1 or 0 for achiev<strong>in</strong>g a success or not. S<strong>in</strong>ce alltrials are identical, µ is <strong>the</strong> sum <strong>of</strong> <strong>in</strong>dependent r<strong>and</strong>om variables <strong>and</strong><strong>the</strong> central limit <strong>the</strong>orem is applicable to it. The distribution <strong>of</strong> µ is<strong>the</strong>refore approximately normal with parametersEµ= np, var µ = npqwhich is <strong>in</strong>deed <strong>the</strong> De Moivre – Laplace <strong>the</strong>orem 7 .4.5. The application <strong>of</strong> <strong>the</strong> central limit <strong>the</strong>oremCheck<strong>in</strong>g statistical homogeneity. In Chapter 1 we have discussed atlength <strong>the</strong> statement that a scientific application <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability is conditioned by check<strong>in</strong>g statistical homogeneity. Here,f<strong>in</strong>ally, we can expla<strong>in</strong> <strong>the</strong> ma<strong>in</strong> pert<strong>in</strong>ent methods.The discussion usually concerns <strong>the</strong> follow<strong>in</strong>g problem. In n 1 trials<strong>the</strong> event A occurred µ 1 times, <strong>in</strong> n 2 trials, µ 2 times. May we believethat <strong>the</strong> probability <strong>of</strong> success was <strong>the</strong> same <strong>in</strong> both series? Or, is <strong>the</strong>difference <strong>of</strong> <strong>the</strong> frequencies µ 1 /n 1 <strong>and</strong> µ 2 /n 2 sufficiently small <strong>and</strong>possible to be expla<strong>in</strong>ed by purely r<strong>and</strong>om causes?It is natural to assume that µ 1 <strong>and</strong> µ 2 are approximately normallydistributed whe<strong>the</strong>r <strong>the</strong> trials were dependent or not. If <strong>the</strong>probabilities <strong>in</strong> <strong>the</strong> series are p 1 <strong>and</strong> p 2 <strong>the</strong>nE(µ 1 /n 1 ) = p 1 , E(µ 2 /n 2 ) = p 2 ,<strong>and</strong>, if p 1 = p 2 = p, E(µ 1 /n 1 − µ 2 /n 2 ) = 0. Also, it is natural to assumethat <strong>the</strong> trials <strong>in</strong> both series are <strong>in</strong>dependent, <strong>the</strong>n <strong>the</strong> magnitude (µ 1 /n 1− µ 2 /n 2 ) should be approximately normally distributed with zeroexpectation <strong>and</strong> variance35


µ µ µ µn n n n1 2 1 2var( − ) = var( ) + var( ).1 2 1 2If <strong>the</strong> terms <strong>in</strong> <strong>the</strong> right side are known, we could have said bymeans <strong>of</strong> a table <strong>of</strong> <strong>the</strong> normal distribution whe<strong>the</strong>r <strong>the</strong> mentioneddifference can be expla<strong>in</strong>ed by purely r<strong>and</strong>om causes or not. And forcalculat<strong>in</strong>g those variances (but not at all for apply<strong>in</strong>g <strong>the</strong> central limit<strong>the</strong>orem) we have to assume that <strong>the</strong> trials <strong>in</strong> both series are<strong>in</strong>dependent, that is, that two series <strong>of</strong> Bernoulli trials were madewhereas <strong>the</strong> central limit <strong>the</strong>orem does not dem<strong>and</strong> complete<strong>in</strong>dependence. Then, if p 1 = p 2 = p (whose value is unknown),µ µ 1 1n n n n1 2var( ) + var( ) = p(1 − p)( + ).1 2 1 2It can be shown that <strong>the</strong> unknown p may be replaced here byµ + µˆp =n + n1 21 2<strong>and</strong>, assum<strong>in</strong>g that <strong>the</strong> probabilities were identical, we see thatµ µ 1 1ˆ ˆn n n n1 2ξ = ( − ) ÷ p(1 − p)( + )1 2 1 2has an approximately st<strong>and</strong>ard normal distribution. Now, ξ can also becalculated only by issu<strong>in</strong>g from numbers (µ 1 , n 1 ) <strong>and</strong> (µ 2 , n 2 ), i. e., by<strong>the</strong> known results <strong>of</strong> experiment<strong>in</strong>g. It is known that <strong>the</strong> absolutevalues <strong>of</strong> ξ exceed<strong>in</strong>g 2 or 3 are unlikely. It follows that, whenobta<strong>in</strong><strong>in</strong>g values <strong>of</strong> that order, we should conclude that ei<strong>the</strong>r <strong>the</strong>hypo<strong>the</strong>sis p 1 = p 2 does not hold or, if it does, that an unlikely eventhad taken place.How to choose between <strong>the</strong>se conclusions? Some authors th<strong>in</strong>k that<strong>the</strong> decision <strong>the</strong>ory can allegedly numerically express <strong>the</strong> risk <strong>of</strong> eachpossible choice <strong>and</strong> thus help here. However, <strong>the</strong> risk <strong>the</strong>re isexpressed by magnitudes which ei<strong>the</strong>r are senseless or <strong>in</strong> any case willnever be known to <strong>the</strong> researcher. The <strong>the</strong>ory based on a quantitativeexpression <strong>of</strong> risk is always useless except <strong>in</strong> studies <strong>of</strong> games <strong>of</strong>chance 8 . Actually, <strong>the</strong> choice between <strong>the</strong> two mentioned decisions is acomplicated procedure; at present, it is impossible to study it with<strong>in</strong><strong>the</strong> limits <strong>of</strong> a ma<strong>the</strong>matical <strong>the</strong>ory.When solv<strong>in</strong>g such problems, we have to compare somehow <strong>the</strong>importance <strong>of</strong> each possible solution should it occur wrong. Bothscientific <strong>and</strong> moral considerations denoted by <strong>the</strong> word conscienceare <strong>in</strong>volved here. Approximately similar but somewhat simpler ischeck<strong>in</strong>g <strong>the</strong> hypo<strong>the</strong>sis that <strong>the</strong> probability <strong>of</strong> success p <strong>in</strong> a givenseries <strong>of</strong> Bernoulli trials equals a given number p 0 . If it is also true fora series <strong>of</strong> n trials with µ successes, <strong>the</strong>n36


µ( − p0) ÷np0 (1 − p0)nhas an approximate st<strong>and</strong>ard normal distribution. Exactly bycalculat<strong>in</strong>g that magnitude can <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> a male birth be<strong>in</strong>gequal to 1/2, see § 3.1, be checked (<strong>and</strong> rejected).The arc s<strong>in</strong>e transformation. I have just expounded <strong>the</strong> pr<strong>in</strong>ciples <strong>of</strong>check<strong>in</strong>g <strong>the</strong> equality <strong>of</strong> probabilities <strong>in</strong> two series <strong>of</strong> Bernoulli trials.Now, I aim at <strong>in</strong>dicat<strong>in</strong>g by an example <strong>the</strong> pert<strong>in</strong>ent convenientmethods developed <strong>in</strong> ma<strong>the</strong>matical statistics. Noth<strong>in</strong>g new <strong>in</strong>pr<strong>in</strong>ciple is here <strong>in</strong>volved, but <strong>the</strong> practical convenience is essential.As an example, I choose <strong>the</strong> so-called arc s<strong>in</strong>e transformationdiscovered by <strong>the</strong> celebrated English statistician Fisher. His idea wasvery simple: we consider some function f(µ/n) <strong>in</strong>stead <strong>of</strong> µ/n, <strong>of</strong> <strong>the</strong>frequency <strong>of</strong> success itself. We haveµ µ µf ( ) = f [( − p) + p] = f ( p) + f ′( p)( − p) + ...n n nFor large values <strong>of</strong> n, that frequency is close to p, so that we ignore<strong>the</strong> o<strong>the</strong>r terms <strong>of</strong> that expression <strong>and</strong>µ µvar f ( ) = var[ f ′( p)( − p)]=nnµ p(1 − p)′ ′nn2 2[ f ( p)] var = [ f ( p)] .Let us choose <strong>the</strong> function f <strong>in</strong> such a manner that <strong>the</strong> expression forvar f(µ/n) would not depend on <strong>the</strong> unknown parameter p. Moreprecisely, we assume that′ − =2[ f ( p)] p(1 p) 1.This is a differential equation <strong>and</strong> we may choose any <strong>of</strong> its solutionsas f(p); <strong>in</strong> particular,f(p) = 2arcs<strong>in</strong>√p.S<strong>in</strong>ce, if allow<strong>in</strong>g for <strong>the</strong> approximation made, f(µ/n) is a l<strong>in</strong>earfunction <strong>of</strong> µ <strong>and</strong>, for large values <strong>of</strong> n, <strong>the</strong> distribution <strong>of</strong> µ isapproximately normal, <strong>the</strong> expressionµ µf ( ) = 2arcs<strong>in</strong>(4.8)nnis also approximately normal. Its expectation is approximatelyf(p) = 2arcs<strong>in</strong>√p37


<strong>and</strong> variance approximately 1/n which is how we have chosen <strong>the</strong>function f <strong>and</strong> it does not <strong>the</strong>refore depend on p. It also occurs that <strong>the</strong>distribution <strong>of</strong> (4.8) is even more close to <strong>the</strong> normal that that <strong>of</strong> <strong>the</strong>number µ itself.Now let us have those two series <strong>of</strong> Bernoulli trials with n 1 , µ 1 , p 1<strong>and</strong> n 2 , µ 2 , p 2 <strong>and</strong> suppose we wish to check <strong>the</strong> equality <strong>of</strong> <strong>the</strong>probabilities. Assum<strong>in</strong>g that <strong>the</strong> two series are <strong>in</strong>dependent, <strong>the</strong>magnitude2arcs<strong>in</strong>µ µ− 2arcs<strong>in</strong>nn1 21 2is approximately normal <strong>and</strong> its expectation2arcs<strong>in</strong>p − 2arcs<strong>in</strong>p1 2vanishes if <strong>the</strong> hypo<strong>the</strong>sis is true. The variance <strong>of</strong> that r<strong>and</strong>om variableis <strong>the</strong> sum <strong>of</strong> <strong>the</strong> variances, (1/n 1 ) + (1/n 2 ), <strong>and</strong>µ µ 1 1− ÷ + (4.9)n n n n1 2[2arcs<strong>in</strong> 2arcs<strong>in</strong> ]1 2 1 2has a st<strong>and</strong>ard normal distribution N(0, 1). After calculat<strong>in</strong>g (4.9) wemay ei<strong>the</strong>r adopt or reject <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> equal probabilities.Ma<strong>the</strong>matical statistics has plenty <strong>of</strong> such simple but veryconvenient methods; here, convenience is really atta<strong>in</strong>ed whenapply<strong>in</strong>g tables <strong>of</strong> <strong>the</strong> function 2arc s<strong>in</strong>√x <strong>in</strong>cluded <strong>in</strong> many collections<strong>of</strong> statistical tables; even a slide-rule will do.Behaviour <strong>of</strong> <strong>the</strong> sum <strong>of</strong> <strong>in</strong>dependent r<strong>and</strong>om variables. Whenconsider<strong>in</strong>g <strong>in</strong>dependent identically distributed r<strong>and</strong>om variables (4.4)with expectation <strong>and</strong> variance a <strong>and</strong> σ 2 respectively, <strong>the</strong>ir normed sumsξ + ... + ξn− na=σ n* 1ncan not be especially large at any fixed n. Thus, <strong>in</strong> case <strong>of</strong> <strong>the</strong> normaldistribution we easily f<strong>in</strong>d by means <strong>of</strong> its table thatP s ≤ =*{|n| 3} 0.997<strong>and</strong> it is almost certa<strong>in</strong> that|ξ 1 + ... + ξ n − na| ≤ 3σ√n. (4.10)True, we ought to caution readers that that statement was derived for* *any fixed n. When consider<strong>in</strong>g a set s1 ,..., sn,...we certa<strong>in</strong>ly can notstate that all <strong>of</strong> its terms were less than 3 <strong>in</strong> absolute value. The38


distribution <strong>of</strong> <strong>the</strong> maximal term |s k |, 1 ≤ k ≤ n, presents a specialproblem which we will not discuss.For any fixed n <strong>the</strong> deviation <strong>of</strong> ξ 1 + ... + ξ n from its expectation nais only possible by a magnitude <strong>of</strong> <strong>the</strong> order <strong>of</strong> √n. This means that <strong>the</strong>non-r<strong>and</strong>om magnitude na, certa<strong>in</strong>ly if a ≠ 0, plays a predom<strong>in</strong>ant partas compared with r<strong>and</strong>om deviations <strong>of</strong> <strong>the</strong> order √n. Now, divid<strong>in</strong>g(4.10) by n, we getξ + ... + ξ 3σ− ≤ (4.11)nn1n| a |which is valid with probability 0.997 (if we believe <strong>in</strong> <strong>the</strong> normaldistribution). When replac<strong>in</strong>g 3σ by 4σ or 5σ, this <strong>in</strong>equality will bevalid even with a higher probability. Given a large n, <strong>the</strong> difference <strong>in</strong><strong>the</strong> left side <strong>of</strong> (4.11) is practically certa<strong>in</strong>ly small which is <strong>the</strong>celebrated law <strong>of</strong> large numbers.It is <strong>in</strong>terest<strong>in</strong>g to dwell on <strong>the</strong> history <strong>of</strong> its pro<strong>of</strong> <strong>and</strong><strong>in</strong>terpretation. It was note long ago that <strong>the</strong> results <strong>of</strong> separateobservations, physical, meteorological, demographic or o<strong>the</strong>r, fluctuateessentially whereas <strong>the</strong> mean values <strong>of</strong> a large number <strong>of</strong> observationsreveal a remarkable stability. The first statisticians had seen herediv<strong>in</strong>e <strong>in</strong>tervention, but, as a scientific underst<strong>and</strong><strong>in</strong>g <strong>of</strong> <strong>the</strong> world wasbe<strong>in</strong>g established, that stability became a scientific fact.In <strong>the</strong> 18 th , <strong>the</strong> century <strong>of</strong> reason, ma<strong>the</strong>matics became very trusted;it was believed that <strong>the</strong> ma<strong>in</strong> laws <strong>of</strong> natural sciences <strong>and</strong> even <strong>of</strong>economics, moral philosophy <strong>and</strong> politics can be derived by thatscience. A desire to regard <strong>the</strong> stability <strong>of</strong> mean values as ama<strong>the</strong>matical <strong>the</strong>orem had been established <strong>and</strong> that op<strong>in</strong>ion persisted<strong>in</strong> <strong>the</strong> 19 th century. Exactly <strong>in</strong> that sense did Poisson <strong>in</strong>terpret <strong>the</strong>discovery <strong>of</strong> his form <strong>of</strong> <strong>the</strong> law <strong>of</strong> large numbers <strong>and</strong> he thought tha<strong>the</strong> had succeeded <strong>in</strong> prov<strong>in</strong>g that <strong>the</strong> mean <strong>of</strong> really made observationsshould be stable.Chebyshev essentially developed <strong>the</strong> ma<strong>the</strong>matical form <strong>of</strong> <strong>the</strong> law<strong>of</strong> large numbers by reduc<strong>in</strong>g its pro<strong>of</strong> to <strong>the</strong> application <strong>of</strong> <strong>the</strong>[Bienaymé –] Chebyshev <strong>in</strong>equality. His pro<strong>of</strong> can be found <strong>in</strong> anytextbook on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability, <strong>and</strong> after him that law began tobe considered as a very simple <strong>the</strong>orem <strong>in</strong>dependent <strong>of</strong>, <strong>and</strong>expounded before <strong>the</strong> central limit <strong>the</strong>orem. Students are now eventaught to apply <strong>the</strong> <strong>in</strong>equalityP2ξ + ... + ξ σ− > ≤nnε1n[| a | ε]2for estimat<strong>in</strong>g <strong>the</strong> probability that <strong>the</strong> mean will deviate from a morethan by ε. However, such an application (<strong>of</strong> <strong>the</strong> Bienaymé –Chebyshev <strong>in</strong>equality) is absolutely absurd because <strong>the</strong> central limit<strong>the</strong>orem provides a much more precise result. True, <strong>the</strong> Chebyshevform <strong>of</strong> <strong>the</strong> law <strong>of</strong> large numbers dem<strong>and</strong>s less ma<strong>the</strong>maticalrestrictions to be imposed on <strong>the</strong> r<strong>and</strong>om variables ξ i as compared with<strong>the</strong> central limit <strong>the</strong>orem, but it is just <strong>the</strong> same practically impossible39


to check whe<strong>the</strong>r <strong>the</strong> appropriate ma<strong>the</strong>matical restrictions are met.And it is certa<strong>in</strong>ly impossible to dist<strong>in</strong>guish when only <strong>the</strong> Chebyshev<strong>the</strong>orem is valid from <strong>the</strong> case when both it <strong>and</strong> <strong>the</strong> central limit<strong>the</strong>orem are valid.The evolution <strong>of</strong> <strong>the</strong> op<strong>in</strong>ion on <strong>the</strong> natural scientific significance <strong>of</strong><strong>the</strong> law <strong>of</strong> large numbers is connected with Mises. He especially<strong>in</strong>dicated that <strong>the</strong>re can not be any ma<strong>the</strong>matic pro<strong>of</strong> that <strong>the</strong> mean <strong>of</strong><strong>the</strong> results <strong>of</strong> an experiment should be close to some number.Nowadays, we believe much less than at <strong>the</strong> time <strong>of</strong> Laplace <strong>and</strong>Poisson that <strong>the</strong> laws <strong>of</strong> <strong>the</strong> outer world can be ma<strong>the</strong>maticallyderived. There exist too many causes which can change <strong>the</strong> course <strong>of</strong>an experiment from what should have followed accord<strong>in</strong>g to ourma<strong>the</strong>matical model.For example, <strong>the</strong> conditions <strong>of</strong> all <strong>the</strong> known ma<strong>the</strong>matical <strong>the</strong>oremson <strong>the</strong> law <strong>of</strong> large numbers <strong>in</strong>clude as an assumption that (4.4) is asequence <strong>of</strong> r<strong>and</strong>om variables. Practically this means that we maydiscuss <strong>the</strong> probability <strong>of</strong> an event consist<strong>in</strong>g <strong>in</strong> that ξ 1 took a valuefrom some number set A 1 , ξ 2 , <strong>the</strong> same from A 2 , etc, so that for anysets A 1 , ..., A n <strong>the</strong> event {ξ1∈ A1,...,ξn∈ An} should be statisticallystable. However, possible sets A 1 , ..., A n are so numerous that anexperimental check <strong>of</strong> <strong>the</strong> stability <strong>of</strong> all such events is impossible.And <strong>the</strong> violation <strong>of</strong> statistical stability wholly depreciates anystochastic <strong>the</strong>orem <strong>and</strong> can be <strong>the</strong> cause <strong>of</strong> <strong>the</strong> observed violation <strong>of</strong><strong>the</strong> stability <strong>of</strong> experimental means.The natural scientific significance <strong>of</strong> <strong>the</strong> law <strong>of</strong> large numbers isnow reduced to an underst<strong>and</strong><strong>in</strong>g that when stochastic models areapplied <strong>the</strong> correspond<strong>in</strong>g <strong>the</strong>orems reflect <strong>the</strong> experimental fact <strong>of</strong> <strong>the</strong>stability <strong>of</strong> means. In Chapter 1 we <strong>in</strong>dicated that <strong>the</strong>re are manyproblems, for <strong>in</strong>stance <strong>in</strong> geology or economics (<strong>the</strong>ir examples can bemultiplied without any difficulty) <strong>in</strong> which it is senseless to discuss <strong>the</strong>statistical homogeneity <strong>of</strong> <strong>the</strong> ensemble <strong>of</strong> experiments. It is<strong>in</strong>terest<strong>in</strong>g that <strong>in</strong> such cases stability <strong>of</strong> means ra<strong>the</strong>r <strong>of</strong>ten alsopersists. We must acknowledge that we do not nowadays have anysatisfactory ma<strong>the</strong>matical explanation <strong>of</strong> <strong>the</strong> stability.In <strong>the</strong> 20 th century <strong>the</strong> study <strong>of</strong> <strong>the</strong> law <strong>of</strong> large numbers by means <strong>of</strong>a model <strong>of</strong> <strong>the</strong> space <strong>of</strong> elementary events had been essentiallyadvanced. The so-called strong law <strong>of</strong> large numbers connected withBorel <strong>and</strong> especially Kolmogorov was discovered. For expla<strong>in</strong><strong>in</strong>g itsessence recall that <strong>in</strong> <strong>the</strong> Kolmogorov model <strong>the</strong> r<strong>and</strong>om variables(4.4) are functions ξ 1 (ω), ..., ξ n (ω) considered <strong>in</strong> <strong>the</strong> space <strong>of</strong>elementary events. It is possible to consider <strong>the</strong> event, that is, <strong>the</strong> setξ (ω) + ... + ξ (ω)= , n → ∞n1n{ω : lim a}consist<strong>in</strong>g <strong>of</strong> those elementary events ω for which that limit exists <strong>and</strong>is equal to a. The <strong>the</strong>orems <strong>of</strong> <strong>the</strong> type <strong>of</strong> <strong>the</strong> strong law <strong>of</strong> largenumbers state that <strong>the</strong> probability <strong>of</strong> that set is 1 whereas <strong>the</strong> usual lawdoes not deal with that set at all, it only discusses <strong>the</strong> sets <strong>of</strong> <strong>the</strong> type40


ξ (ω) + ... + ξ (ω)n1n{ω :| − a | > ε}, ε > 0for any f<strong>in</strong>ite n <strong>and</strong> states that <strong>the</strong> probability <strong>of</strong> such sets tend tovanish as n → ∞.After consider<strong>in</strong>g ra<strong>the</strong>r subtle ma<strong>the</strong>matical examples it occurs that<strong>the</strong> strong law <strong>of</strong> large numbers is really strong: <strong>the</strong> ord<strong>in</strong>ary law iscerta<strong>in</strong>ly obeyed when <strong>the</strong> strong law is valid, but <strong>the</strong> <strong>in</strong>versestatement is not necessarily true. From <strong>the</strong> <strong>the</strong>orems concern<strong>in</strong>g <strong>the</strong>strong law we <strong>in</strong>dicate a very elegant Kolmogorov statement: for<strong>in</strong>dependent <strong>and</strong> identically distributed r<strong>and</strong>om variables <strong>the</strong> existence<strong>of</strong> expectation is sufficient for it to hold. Ma<strong>the</strong>matically <strong>in</strong>terest<strong>in</strong>g isthat <strong>the</strong> existence <strong>of</strong> variances is not dem<strong>and</strong>ed.A special ma<strong>the</strong>matical tool was needed for prov<strong>in</strong>g that <strong>the</strong>orem<strong>and</strong> <strong>in</strong> particular Kolmogorov discovered a remarkable <strong>in</strong>equality thatgoes under his name <strong>and</strong> generalizes <strong>the</strong> [Bienaymé −] Chebyshev<strong>in</strong>equality; <strong>the</strong> tool itself can certa<strong>in</strong>ly be applied <strong>in</strong> natural science.However, no such applications <strong>in</strong> which essentially more can beelicited from only <strong>the</strong> formulation <strong>of</strong> <strong>the</strong> strong law than from <strong>the</strong>usual law are discovered. This is connected with <strong>the</strong> fact that (seeChapter 2) a r<strong>and</strong>om variable ξ = ξ(ω) as a function <strong>of</strong> an elementaryevent is usually not observed; we know <strong>the</strong> value <strong>of</strong> ξ(ω), but not ωitself. We can ra<strong>the</strong>r discuss probabilities <strong>of</strong> various events. Similarly,it is somewhat senseless to discuss <strong>the</strong> observation <strong>of</strong> <strong>the</strong> limit <strong>of</strong> ξ ,we can only study ξ for a f<strong>in</strong>ite n. Those circumstances lead to anynon-ma<strong>the</strong>matical applications <strong>of</strong> <strong>the</strong> strong law be<strong>in</strong>g unlikely.To conclude <strong>the</strong> problem <strong>of</strong> <strong>the</strong> application <strong>of</strong> <strong>the</strong> central limit<strong>the</strong>orem we will dwell on <strong>the</strong> statement made by no o<strong>the</strong>r but <strong>the</strong>undoubtedly great scientist <strong>of</strong> genius, Laplace, which for us is only<strong>in</strong>terest<strong>in</strong>g as be<strong>in</strong>g a psychological curious historical example. Hediscovered <strong>the</strong> mentioned above fact that for large values <strong>of</strong> n <strong>the</strong> sumξ 1 + ... + ξ n behaves approximately like <strong>the</strong> non-r<strong>and</strong>om magnitude nawhereas <strong>the</strong> r<strong>and</strong>om variations have order √n so that with an<strong>in</strong>creas<strong>in</strong>g n that non-r<strong>and</strong>om magnitude will f<strong>in</strong>ally prevail over thosevariations. It follows that if a > 0, <strong>the</strong> sum <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variables willalso become positive.Without any explanation Laplace <strong>in</strong>fers that a colony situated faracross <strong>the</strong> sea will f<strong>in</strong>ally achieve <strong>in</strong>dependence. He evidentlyimag<strong>in</strong>ed <strong>the</strong> strive for <strong>in</strong>dependence as some non-r<strong>and</strong>om factorwhose action was ga<strong>the</strong>r<strong>in</strong>g force with time whereas <strong>the</strong> oppositeefforts <strong>of</strong> <strong>the</strong> metropolitan country as r<strong>and</strong>om variables with zeroexpectation. The first assumption is sufficiently underst<strong>and</strong>able but <strong>the</strong>second one is very strange. However, <strong>in</strong> <strong>the</strong> long run Laplace was <strong>in</strong><strong>the</strong> right: colonies did free <strong>the</strong>mselves but we can not consider <strong>the</strong>effort to hold on to <strong>the</strong>m as a r<strong>and</strong>om variable, it does not possessstatistical stability. In <strong>the</strong> 19 th century <strong>the</strong>re was noth<strong>in</strong>g special <strong>in</strong>send<strong>in</strong>g out an expeditionary corps for putt<strong>in</strong>g down a rebellion <strong>in</strong> acolony but <strong>in</strong> our days that would have led to vigorous protests <strong>in</strong> <strong>the</strong>metropolitan country as well.41


A scientist, discover<strong>in</strong>g someth<strong>in</strong>g remarkable (as <strong>the</strong> Laplaceancentral limit <strong>the</strong>orem) evidently can not keep from apply<strong>in</strong>g iteverywhere. For example, <strong>in</strong> our time Wiener proposed to apply <strong>the</strong><strong>the</strong>ory <strong>of</strong> extrapolation <strong>of</strong> stochastic processes for forecast<strong>in</strong>g <strong>the</strong> route<strong>of</strong> an airplane under anti-aircraft fire. That route however is not astochastic process, or at least not such process for which <strong>the</strong>re exists a<strong>the</strong>ory <strong>of</strong> extrapolation <strong>and</strong> Wiener’s proposal was senseless.Evidently, science is collectively created; true, it is not beyondquestion whe<strong>the</strong>r an essential discovery can be made collectively, or isit necessary to have an outst<strong>and</strong><strong>in</strong>g scientist <strong>in</strong> a collective with itso<strong>the</strong>r members work<strong>in</strong>g <strong>in</strong> essence as his assistants. But what isundoubtedly a collective process is <strong>the</strong> delivery <strong>of</strong> science from <strong>the</strong>rubbish which some scientists usually adduce to <strong>the</strong>ir real discoveries.4.6. When <strong>the</strong> central limit <strong>the</strong>orem can not be applied? That<strong>the</strong>orem is one <strong>of</strong> <strong>the</strong> reasons for believ<strong>in</strong>g that observational resultsusually obey <strong>the</strong> normal distribution. If only <strong>the</strong>y, ξ 1 , ..., ξ n , are known,but not <strong>the</strong> parameters <strong>of</strong> <strong>the</strong> correspond<strong>in</strong>g law, we are able todeterm<strong>in</strong>e <strong>the</strong>m approximately by appropriate methods. Indeed,accord<strong>in</strong>g to <strong>the</strong> law <strong>of</strong> large numbersa = Eξ i ≈ (1/n) (ξ 1 +... + ξ n ) = ξ .It can be shown that1σ ∑ (ξ ξ) .−n2 2 2≈i− = sn 1 i=1The <strong>the</strong>ory <strong>of</strong> errors allows to determ<strong>in</strong>e <strong>the</strong> precision <strong>of</strong> thoseapproximate values.In general, <strong>the</strong> observations are ra<strong>the</strong>r well describable by <strong>the</strong>normal law thus determ<strong>in</strong>ed. In o<strong>the</strong>r words ifF(x) = P{ ξ i < x},N(x, ξ , s) be<strong>in</strong>g those probabilities calculated accord<strong>in</strong>g to <strong>the</strong> normaldistribution, <strong>the</strong>nF(x) ≈ N(x, ξ , s).However, this approximate equality is sometimes very perceptivelyviolated. It happens when <strong>the</strong> values <strong>of</strong> x are such that F(x) is near 0 or1, − that its so-called tail areas are <strong>in</strong>volved.Let us beg<strong>in</strong> by consider<strong>in</strong>g why those areas are practicallysignificant <strong>in</strong> a special way. Suppose we <strong>in</strong>tend to build some tallstructure which will have to withst<strong>and</strong> high w<strong>in</strong>ds (or, if you wish, aspillway which has to pass spr<strong>in</strong>g floods, etc). We desire to reckonwith such w<strong>in</strong>d velocities that happen sufficiently rarely, once <strong>in</strong> acentury, say. But how are we to f<strong>in</strong>d out that velocity? Or, if ξ(t) is thatvelocity at moment t, we ought to <strong>in</strong>dicate such a number x, that42


P{max ξ(t) ≥ x} = 0.01, 0 ≤ t ≤ 1where t is measured <strong>in</strong> years <strong>and</strong> <strong>the</strong> left part <strong>of</strong> <strong>the</strong> <strong>in</strong>equality is <strong>the</strong>maximal yearly w<strong>in</strong>d velocity.Suppose that we know <strong>the</strong> values ξ 1 , ξ 2 , ..., ξ n <strong>of</strong> <strong>the</strong> maximalvelocity dur<strong>in</strong>g <strong>the</strong> first, <strong>the</strong> second, ..., n-th year dur<strong>in</strong>g whichmeteorological observations were made. However, w<strong>in</strong>d velocities hadnot been recorded cont<strong>in</strong>uously but only several times a day, so thatthose maximal yearly velocities are <strong>in</strong> essence unknown. For <strong>the</strong> timebe<strong>in</strong>g, let us never<strong>the</strong>less abstract ourselves from this extremelyessential difficulty.And so, we have those observations <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variable ξ, <strong>the</strong>maximal yearly w<strong>in</strong>d velocity, <strong>and</strong> we wish to assign an x such thatP{ξ ≥ x} = 0.01. (4.11)Had <strong>the</strong> number n been very large, we would have been obliged toselect such an x that about a hundredth part <strong>of</strong> <strong>the</strong> ξ i will be larger thanit. The trouble, however, is that n, <strong>the</strong> number <strong>of</strong> years dur<strong>in</strong>g whichobservations are available, is much less than 100. Then, if x is suchthat (4.11) is fulfilled, that is,P{ξ i ≥ x} = 0.01 for each i,<strong>the</strong> number <strong>of</strong> variables ξ i larger than x will obey <strong>the</strong> Poisson law withparameter λ = 0.01n < 1. It will follow that most likely all <strong>of</strong> our ξ i willbe less than x so that we are only able to say that x should be largerthan each <strong>of</strong> <strong>the</strong> ξ i ′s with no upper boundary available.Therefore, we are tempted to smooth our ξ 1 , ..., ξ n by some law, forexample by <strong>the</strong> normal law N( x; ξ, s ) <strong>and</strong> determ<strong>in</strong>e x from equationN( x; ξ, s ) = 1− 0.01 = 0.99.Or, we will propose to identify <strong>the</strong> tail areas <strong>of</strong> <strong>the</strong> unknown functionF(x) with those <strong>of</strong> <strong>the</strong> normal law.We turn <strong>the</strong> readers’ attention to <strong>the</strong> fact that such a procedureshould not be trusted ei<strong>the</strong>r when apply<strong>in</strong>g <strong>the</strong> normal, or any o<strong>the</strong>rlaw, <strong>and</strong> that <strong>the</strong>re exist both <strong>the</strong>oretical grounds <strong>and</strong> considerationsbased on statistical experiments for that <strong>in</strong>ference. Theoretical groundsconsist <strong>in</strong> that <strong>the</strong> central limit <strong>the</strong>orem only states that <strong>the</strong> difference*between <strong>the</strong> exact distribution function P{ sn< x}<strong>and</strong> <strong>the</strong> normal lawis small:P s x x*{n< } − N( ) → 0.For example, if that probability P = 0.95, N(x) = 0.99 <strong>and</strong> <strong>the</strong>difference is only 0.04 which is sufficiently small. However, <strong>the</strong>relative error*[1 − P{ sn< x}] ÷ [1 − N( x)] = 400%43


is very large. It is not <strong>in</strong>different that actuallyP s*{nx}≥ = 0.05 so that*<strong>the</strong> event { sn≥ x}occurs once <strong>in</strong> 20 cases (once <strong>in</strong> 20 years, so to say)whereas by means <strong>of</strong> <strong>the</strong> normal distribution we found out that ithappens once <strong>in</strong> a hundred cases (once <strong>in</strong> a century, so to say). Westress that <strong>the</strong> central limit <strong>the</strong>orem does not state that*[1 P{ snx}] [1 N( x)] 1− < ÷ − → (4.12)uniformly for every x, <strong>and</strong> such a conclusion is actually wrong.Thus, <strong>in</strong> <strong>the</strong> doma<strong>in</strong> <strong>of</strong> probabilities close to 1 (<strong>and</strong> to 0) <strong>the</strong>application <strong>of</strong> <strong>the</strong> normal distribution can lead (<strong>and</strong> as a rule actuallyleads) to a large relative error whereas accord<strong>in</strong>g to <strong>the</strong> central limit<strong>the</strong>orem <strong>the</strong> absolute error will be small. In particular, it should beborne <strong>in</strong> m<strong>in</strong>d that <strong>the</strong> equalityPξ + ... + ξ 3σnn1n{| − a | ≤ } = 0.997applied <strong>in</strong> § 4.5 is somewhat tentative. Instead <strong>of</strong> 0.997 values 0.990 or0.980 can easily happen. Only when n is very large will 0.997 actuallyoccur.The ratio <strong>in</strong> <strong>the</strong> left side <strong>of</strong> (4.12) is stochastically studied by means<strong>of</strong> <strong>the</strong> so-called <strong>the</strong>orems <strong>of</strong> large deviations (Feller 1966). Theirpractical significance is however <strong>in</strong>sufficiently clear. Incidentally, <strong>the</strong>y<strong>in</strong>dicate that <strong>the</strong> result will not be better if o<strong>the</strong>r frequently occurr<strong>in</strong>gdistributions, for example, <strong>the</strong> Pearson curves, are applied <strong>in</strong>stead <strong>of</strong><strong>the</strong> normal law.As to <strong>the</strong> available statistical experience <strong>of</strong> work<strong>in</strong>g with <strong>the</strong> tailareas <strong>of</strong> distributions, it shows that <strong>the</strong>ir behaviour is irregular. Theviolation <strong>of</strong> statistical homogeneity <strong>in</strong>fluenc<strong>in</strong>g <strong>the</strong> outcome <strong>of</strong>separate trials possibly especially concerns those areas. In such cases<strong>the</strong> attempts <strong>of</strong> describ<strong>in</strong>g <strong>the</strong> trials by statistical methods are hopeless.The study <strong>of</strong> <strong>the</strong> values <strong>of</strong> w<strong>in</strong>d velocities possibly occurr<strong>in</strong>g once <strong>in</strong>a century becomes complicated also because maximal yearly valuesare meant. If <strong>the</strong> values <strong>of</strong> those velocities at given moments arenaturally assumed to be normally distributed, that maximal value willbe naturally considered by means <strong>of</strong> some distribution <strong>of</strong> extremevalues. However, <strong>the</strong>se latter are only derived for <strong>in</strong>dependentmagnitudes <strong>and</strong> are <strong>the</strong>refore unable to allow for a gradual <strong>in</strong>crease <strong>of</strong>w<strong>in</strong>d velocities under certa<strong>in</strong> meteorological conditions. In addition,<strong>the</strong> <strong>the</strong>ory <strong>of</strong> extreme values itself is <strong>of</strong>ten applied at an essentialstretch. Recall also <strong>the</strong> lack <strong>of</strong> cont<strong>in</strong>uous records <strong>of</strong> w<strong>in</strong>d velocities<strong>and</strong> you will be able to say absolutely for sure that nowadays <strong>the</strong>reexist no scientific method <strong>of</strong> f<strong>in</strong>d<strong>in</strong>g out how strong can <strong>the</strong> w<strong>in</strong>d beonce <strong>in</strong> a century. The designers should f<strong>in</strong>d some o<strong>the</strong>r method forstat<strong>in</strong>g how reliable are <strong>the</strong>ir build<strong>in</strong>gs.Notes44


1. This example <strong>and</strong> considerations perta<strong>in</strong><strong>in</strong>g to medical statistics below arecerta<strong>in</strong>ly <strong>in</strong> order. However, it is <strong>in</strong>structive that Soviet authors apparently avoidedillustrations concern<strong>in</strong>g touchy social statistics. O. S.2. A draw<strong>in</strong>g <strong>of</strong> lots described <strong>in</strong> <strong>the</strong> Talmud (Sheyn<strong>in</strong> 1998) shows that <strong>the</strong>participants doubted <strong>the</strong> irrelevance <strong>of</strong> <strong>the</strong> order <strong>of</strong> draw<strong>in</strong>g.Laplace (1812/1886, p. 413) was apparently <strong>the</strong> first to note that prelim<strong>in</strong>arydraw<strong>in</strong>gs tend to equalize chances <strong>of</strong> <strong>the</strong> participants. O. S.3. Sometimes experts have to apply subjective probability for various estimations.The same apparently may be said about jurors. Jakob Bernoulli, <strong>in</strong> his ArsConject<strong>and</strong>i, <strong>in</strong>troduced non-additive subjective probabilities. He could haveborrowed that idea from <strong>the</strong> scholastic <strong>the</strong>ory <strong>of</strong> probabilism accord<strong>in</strong>g to which <strong>the</strong>op<strong>in</strong>ion <strong>of</strong> each Fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> Church was considered probable. O. S.4. The author’s conclusion is too harsh. Laplace <strong>and</strong> Poisson apparently onlyexam<strong>in</strong>ed <strong>the</strong> ideal case; <strong>the</strong> former mentioned this restriction only <strong>in</strong> pass<strong>in</strong>g, <strong>the</strong>latter did not at all. Their work concerned general recommendations, for example,about <strong>the</strong> needed number <strong>of</strong> jurors. Dur<strong>in</strong>g <strong>the</strong> latest few decades <strong>the</strong> <strong>in</strong>terest <strong>in</strong>stochastic studies <strong>of</strong> <strong>the</strong> adm<strong>in</strong>istration <strong>of</strong> justice has been revived, although muchmore stress is now laid on <strong>in</strong>terpret<strong>in</strong>g background <strong>in</strong>formation (e. g., on estimat<strong>in</strong>g<strong>the</strong> number <strong>of</strong> possible perpetrators).Laplace considered <strong>the</strong> juror’s mistake (see somewhat below) accord<strong>in</strong>g to <strong>the</strong>Bayesian approach <strong>and</strong> apparently only as a first approximation. See his actualunderst<strong>and</strong><strong>in</strong>g <strong>of</strong> that po<strong>in</strong>t elsewhere (Laplace 1812/1886, p. 523). Gelf<strong>and</strong> &Solomon (1973, p. 273) somewhat s<strong>of</strong>tened <strong>the</strong> issue <strong>of</strong> <strong>the</strong> <strong>in</strong>terdependence <strong>of</strong>jurors. O. S.5. The author cited <strong>the</strong> first Gauss’ justification <strong>of</strong> <strong>the</strong> pr<strong>in</strong>ciple <strong>of</strong> least squares(which he later ab<strong>and</strong>oned). Gauss arrived at <strong>the</strong> normal distribution by assum<strong>in</strong>g, <strong>in</strong>part, that <strong>the</strong> arithmetic mean was <strong>the</strong> best estimator <strong>of</strong> a set <strong>of</strong> measurements.Incidentally, <strong>the</strong> true value mentioned by <strong>the</strong> author has been later understood as <strong>the</strong>limit <strong>of</strong> <strong>the</strong> appropriate arithmetic mean (Sheyn<strong>in</strong> 2007). O. S.6. The author could have stressed that a rigorous pro<strong>of</strong> <strong>of</strong> <strong>the</strong> central limit <strong>the</strong>oremwas only due to Liapunov <strong>and</strong> Markov, <strong>the</strong>n to Chebyshev. I also note (Zolotarev1999, p. 794) that that <strong>the</strong>orem is now understood <strong>in</strong> a somewhat more general sense(as <strong>the</strong> appearance <strong>of</strong> <strong>the</strong> normal distribution or its analogues). O. S.7. It is <strong>in</strong> order to note additionally that De Moivre proved his <strong>the</strong>orem (<strong>the</strong> firstpro<strong>of</strong> <strong>of</strong> <strong>the</strong> most simple case <strong>of</strong> <strong>the</strong> central limit <strong>the</strong>orem) not at all as <strong>the</strong> authordid. O. S.8. The author did not, however, mention any such <strong>the</strong>ory. O. S.BibliographyArnold V. I. (1968), Lektsii po Klassicheskoi Mekhanike (Lectures on ClassicalMechanics), pts 1 – 2. Moscow.Bernste<strong>in</strong> S. N. (1926), Sur l’extension du théorème limite du calcul desprobabilités aux sommes de quantités dépendantes. Math. Annalen, Bd. 97, pp. 1 –59.Cournot A. A. (1843), Exposition de la théorie des chances et des probabilités.Paris, 1984.Feller W. (1950, 1966), Introduction to <strong>Probability</strong> Theory <strong>and</strong> Its Applications.New York, vol. 1, 1950; vol. 2, 1966. Later editions available.Feynman R. P., Leighton R. B., S<strong>and</strong>s M. (1963), Lectures <strong>in</strong> Physics, vol. 1, pt.1. English – German edition. München – Wien − Read<strong>in</strong>g (Mass.), 1974.Gelf<strong>and</strong> A. E., Solomon H. (1973), A study <strong>of</strong> Poisson’s models for jury verdicts<strong>in</strong> crim<strong>in</strong>al <strong>and</strong> civil trials. J. Amer. Stat. Assoc., vol. 68, pp. 271 – 278.Gnedenko B. V. (1950, Russian), Theory <strong>of</strong> <strong>Probability</strong>. Moscow, 1969, 1973.Gnedenko B. V., Kolmogorov A. N. (1949, Russian), Grenzverteilungen vonSummen unabhängiger Zufallgrößen. Berl<strong>in</strong>, 1959.Kolmogorov A. N. (1933, German), Foundations <strong>of</strong> <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>.New York, 1950, 1956.Laplace P. S. (1812), Théorie analytique des probabilités. Oeuvr. Compl., t. 7.Paris, 1886.Mises R. von (1928, German), <strong>Probability</strong>, <strong>Statistics</strong> <strong>and</strong> Truth. New York, 1981.45


Poisson S.-D. (1837), Recherches sur la probabilité des jugements en matièrecrim<strong>in</strong>elle et en matière civile etc. Paris, 2003.Sheyn<strong>in</strong> O. (1998), Statistical th<strong>in</strong>k<strong>in</strong>g <strong>in</strong> <strong>the</strong> Bible <strong>and</strong> <strong>the</strong> Talmud. Annals <strong>of</strong>Sci., vol. 55, pp. 185 – 198.--- (2007), The true value <strong>of</strong> a measured constant <strong>and</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> errors. Hist.Scientiarum, vol. 17, pp. 38 – 48.Zolotarev V. M. (1999), Central limit <strong>the</strong>orem. In Prokhorov Yu. V., Editor,Veroiatnost i Matematicheskaia Statistika. Enziklopedia (<strong>Probability</strong> <strong>and</strong> Math. Stat.Enc.). Moscow, pp. 794 – 796.46


IIV. N. Tutubal<strong>in</strong>Treatment <strong>of</strong> Observational SeriesStatisticheskaia Obrabotka Riadov Nabliudenii. Moscow, 1973IntroductionFacts are known to be <strong>the</strong> breath <strong>of</strong> <strong>the</strong> scholar’s life. In our century<strong>of</strong> exact scientific methods, observation usually means measure, <strong>and</strong>facts which we have to deal with, are as a rule expressed <strong>in</strong> numbers.In any scientific establishment you will be shown long series <strong>of</strong>numbers also represented by graphs drawn by coloured pencils onsquared paper. All <strong>of</strong> <strong>the</strong>m are observational series. What benefit can4we elicit <strong>of</strong> such coloured splendour whose collection dem<strong>and</strong>edmany long years <strong>of</strong> efforts by many authors?Observational series <strong>of</strong>ten lead to some evident conclusion. Thus,after <strong>the</strong> <strong>in</strong>troduction <strong>of</strong> antibiotics <strong>in</strong>to medical practice, mortalityfrom most <strong>in</strong>fectious diseases sharply decl<strong>in</strong>ed, but no ma<strong>the</strong>maticaltreatment for such conclusions is necessary: <strong>the</strong> result speaks for itself.In o<strong>the</strong>r cases, however, conclusions can be not so unquestionable, <strong>and</strong>we have to apply statistical treatment <strong>and</strong> attempt to make <strong>the</strong>m morereliable by ma<strong>the</strong>matical methods.It is important to imag<strong>in</strong>e that <strong>in</strong> many cases <strong>the</strong> statistical treatmentis beneficial but that perhaps even more <strong>of</strong>ten it is useless <strong>and</strong>sometimes even harmful s<strong>in</strong>ce it prompts us to make wrongconclusions. Thus, antibiotics are useless <strong>in</strong> cases <strong>of</strong> viral <strong>in</strong>fection.This booklet deals with <strong>in</strong>stances <strong>in</strong> which statistical treatment isscientifically justified.1. Two Ma<strong>in</strong> Ma<strong>the</strong>matical Models <strong>of</strong> Observational Series1.1. Why is statistical treatment needed? As stated <strong>in</strong> <strong>the</strong>Introduction, it can be not necessary at all. One more such exampleconcerns <strong>the</strong> reliability <strong>of</strong> mach<strong>in</strong>ery. Suppose we discovered somepreventive measure that obviously lowers <strong>the</strong> number <strong>of</strong> failures. Ourobservational series (for example, <strong>the</strong> number <strong>of</strong> failures over someyears) certa<strong>in</strong>ly confirms <strong>the</strong> efficacy <strong>of</strong> our f<strong>in</strong>d<strong>in</strong>g <strong>and</strong> we may besatisfied. Human nature, however, is <strong>in</strong>cessantly wish<strong>in</strong>g somewhatbetter; s<strong>in</strong>ce <strong>the</strong>re are less failures, we will wish to have none <strong>of</strong> <strong>the</strong>mat all, so we propose ano<strong>the</strong>r development <strong>and</strong> desire to confirm itsefficacy by show<strong>in</strong>g that <strong>the</strong> number <strong>of</strong> yearly failures will lower stillmore.You can guarantee that this will not be so easy. When <strong>the</strong> number <strong>of</strong>yearly failures is small, it will be noticeably <strong>in</strong>fluenced by r<strong>and</strong>omcauses. This does not yet mean that it can be studied by purelystatistical methods, because <strong>the</strong>ir applicability dem<strong>and</strong>s <strong>the</strong> probablylack<strong>in</strong>g statistical homogeneity [i]. However, such models allow toreach some important conclusions which we need to bear <strong>in</strong> m<strong>in</strong>d. Inaddition, once a good technological result is already achieved, <strong>and</strong> westrive for a still better outcome, statistical homogeneity occurs ra<strong>the</strong>r<strong>of</strong>ten.47


We will <strong>the</strong>refore assume that a stochastic model for <strong>the</strong> number <strong>of</strong>failures is valid <strong>and</strong> consider <strong>the</strong> check <strong>of</strong> efficacy <strong>of</strong> <strong>the</strong> <strong>in</strong>novation.When recogniz<strong>in</strong>g stochastic methods <strong>in</strong> general it is very natural toacknowledge <strong>the</strong> Poisson distribution <strong>of</strong> rare events as well, i. e., toapply <strong>the</strong> formulakλP{µ = k}= ek!−λ<strong>in</strong> which λ is <strong>the</strong> mean yearly number <strong>of</strong> failures. Suppose we have<strong>in</strong>troduced <strong>the</strong> <strong>in</strong>novation at <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a year <strong>and</strong> that dur<strong>in</strong>gthat year no failures have occurred whereas <strong>the</strong> mean number <strong>of</strong> <strong>the</strong>mfor <strong>the</strong> previous years was 2. May we conclude that <strong>the</strong> newdevelopment was effective?That number, 2, was derived from previous statistical data <strong>and</strong> itdoes not necessarily co<strong>in</strong>cide with <strong>the</strong> real value <strong>of</strong> λ, but for <strong>the</strong> timebe<strong>in</strong>g we will disregard this circumstance. And so, λ = 2. Then <strong>the</strong>probability <strong>of</strong> a purely r<strong>and</strong>om lack <strong>of</strong> failures, or <strong>of</strong> µ = 0, will beP{µ = 0} = e −2 ≈ 1/7.Therefore, if recogniz<strong>in</strong>g <strong>the</strong> <strong>in</strong>novation’s efficacy, <strong>and</strong> award<strong>in</strong>gprizes to its <strong>in</strong>ventors, <strong>the</strong> loss <strong>of</strong> money will have probability 1/7.Thus, 1/7 <strong>of</strong> all <strong>the</strong> employees propos<strong>in</strong>g someth<strong>in</strong>g useless, forexample, perfum<strong>in</strong>g <strong>the</strong> mach<strong>in</strong>ery, will get prizes. The trouble is notso much that <strong>the</strong> money will be lost, but ra<strong>the</strong>r that absolutely falseviewpo<strong>in</strong>ts will be accepted. And so, perfum<strong>in</strong>g <strong>of</strong> mach<strong>in</strong>ery isentered <strong>in</strong> search eng<strong>in</strong>es which do not dist<strong>in</strong>guish between truth <strong>and</strong>rubbish <strong>and</strong> <strong>the</strong>refore f<strong>in</strong>d its way <strong>in</strong>to general practice. Next year 1/7<strong>of</strong> those who applied that method will once more be happy <strong>and</strong> publishpert<strong>in</strong>ent rapturous papers with <strong>the</strong> unlucky 6/7 keep<strong>in</strong>g silencebecause papers on setbacks can not be written 1 . That process<strong>in</strong>tensifies as an avalanche; chairs <strong>of</strong> perfum<strong>in</strong>g are established atuniversities, conferences organized, dissertations <strong>and</strong> textbookscompiled.Such a picture although really sad is not a po<strong>in</strong>tless abstraction s<strong>in</strong>cesome pert<strong>in</strong>ent examples are known, <strong>and</strong> we will provide some <strong>in</strong> <strong>the</strong>sequel. Considerations <strong>of</strong> that picture compels us, as we see it, toestimate <strong>in</strong> a new manner <strong>the</strong> merits <strong>of</strong> real science <strong>of</strong> that wonderfulachievement, <strong>of</strong> collective <strong>in</strong>tellect. Sciences <strong>of</strong> perfum<strong>in</strong>g do emergenow <strong>and</strong> <strong>the</strong>n, <strong>and</strong> even <strong>of</strong>ten, flourish (<strong>the</strong> more numerous are thoseparticipat<strong>in</strong>g <strong>the</strong> more reports about successes are made s<strong>in</strong>ce 1/7 <strong>of</strong><strong>the</strong>m will become yearly successful) but do not live long.Someone will always destroy <strong>the</strong>m, <strong>and</strong> only <strong>the</strong> really valuablesurvives. The part played by stochastic methods <strong>in</strong> that selfpurification<strong>of</strong> science is far from be<strong>in</strong>g <strong>the</strong> least important, althoughto declare that its role is exclusive will be nonsensical. We should beable to say whe<strong>the</strong>r <strong>the</strong> observed outcome can have been purelyr<strong>and</strong>om 2 .However, just as any o<strong>the</strong>r science, ma<strong>the</strong>matical statistics can haveits own branches treat<strong>in</strong>g perfum<strong>in</strong>g. We will consider <strong>the</strong> general48


structure <strong>of</strong> statistical methods, discuss what is certa<strong>in</strong> <strong>and</strong> whattentative <strong>the</strong>re <strong>and</strong> on what premises are <strong>the</strong>y founded.1.2. The part played by ma<strong>the</strong>matical models. Any statisticaltreatment must be preceded by a ma<strong>the</strong>matical model <strong>of</strong> <strong>the</strong>phenomenon studied stat<strong>in</strong>g which magnitudes are r<strong>and</strong>om, which not;which are dependent, <strong>and</strong> which not, etc. Sometimes you willencounter a delusion that tells you that if any magnitude is notdeterm<strong>in</strong>ate (if its values can not be precisely predicted), it may beconsidered r<strong>and</strong>om. This is completely wrong because r<strong>and</strong>omnessdem<strong>and</strong>s statistical stability. Therefore, <strong>in</strong>determ<strong>in</strong>ate behaviour is notgenerally speak<strong>in</strong>g, r<strong>and</strong>omness; or, if you wish, <strong>in</strong> addition todeterm<strong>in</strong>ate <strong>and</strong> r<strong>and</strong>om <strong>the</strong>re exist <strong>in</strong>determ<strong>in</strong>ate magnitudes whichwe do not know how to deal with.A ma<strong>the</strong>matical model can <strong>in</strong>clude ei<strong>the</strong>r determ<strong>in</strong>ate or r<strong>and</strong>ommagnitudes, or both, but, as <strong>of</strong> today, not those last mentioned. The art<strong>of</strong> choos<strong>in</strong>g a ma<strong>the</strong>matical model <strong>the</strong>refore consists <strong>in</strong> approximatelyrepresent<strong>in</strong>g <strong>the</strong> <strong>in</strong>determ<strong>in</strong>ate magnitudes appear<strong>in</strong>g practicallyalways as ei<strong>the</strong>r determ<strong>in</strong>ate or r<strong>and</strong>om. It is also necessary that <strong>the</strong>values <strong>of</strong> <strong>the</strong> determ<strong>in</strong>ate magnitudes or <strong>the</strong> distributions <strong>of</strong> <strong>the</strong>probabilities <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variables be derivable from <strong>the</strong>experimental material at h<strong>and</strong> (or available <strong>in</strong> pr<strong>in</strong>ciple).Let us return to <strong>the</strong> determ<strong>in</strong>ation <strong>of</strong> <strong>the</strong> efficacy <strong>of</strong> a newpreventive measure. We have an observational seriesµ 1 , µ 2 , ..., µ n , µ (1.1)where µ i are <strong>the</strong> numbers <strong>of</strong> failures for <strong>the</strong> previous years <strong>and</strong> µ , <strong>the</strong>same for <strong>the</strong> year when <strong>the</strong> <strong>in</strong>novation is be<strong>in</strong>g tested. Where is <strong>the</strong>ma<strong>the</strong>matical model here? In case <strong>of</strong> rare failures it is ra<strong>the</strong>rreasonable to assume that <strong>the</strong> series (1.1) is composed <strong>of</strong> r<strong>and</strong>omvariables. However, when <strong>in</strong>troduc<strong>in</strong>g that term, we oblige ourselvesto state <strong>the</strong> statistical ensemble <strong>of</strong> experiments <strong>in</strong> which <strong>the</strong> variable isrealized. Two paths are open: ei<strong>the</strong>r we believe that <strong>the</strong> number <strong>of</strong>failures before <strong>the</strong> <strong>in</strong>novation was implemented are realizations <strong>of</strong> ar<strong>and</strong>om variable, or we imag<strong>in</strong>e <strong>the</strong> results <strong>of</strong> many sets <strong>of</strong> mach<strong>in</strong>eryidentical to our set work<strong>in</strong>g under <strong>the</strong> same conditions. In <strong>the</strong> first, butnot necessarily <strong>in</strong> <strong>the</strong> second case <strong>the</strong> magnitudesµ 1 , µ 2 , ..., µ n (1.2)ought to be identically distributed. Or, assum<strong>in</strong>g a Poisson distribution,we have <strong>in</strong> <strong>the</strong> first caseEµ 1 = Eµ 2 = ... = Eµ n = λ (1.3)<strong>and</strong>, <strong>in</strong> <strong>the</strong> second case we may assume thatEµ 1 = λ 1 , Eµ 2 = λ 2 , ..., Eµ n = λ nwhere49


λ 1 , λ 2 , ..., λ n (1.4)can differ.Theoretically, <strong>the</strong> second case is more general <strong>and</strong> <strong>the</strong>refore, at aglance, more <strong>in</strong>vit<strong>in</strong>g, but we will see now that it does not lead toanyth<strong>in</strong>g <strong>and</strong> should be left aside. Indeed, we have to know <strong>the</strong> value<strong>of</strong> <strong>the</strong> Poisson parameter λ = Eµ for <strong>the</strong> number <strong>of</strong> yearly failuresdur<strong>in</strong>g <strong>the</strong> test <strong>of</strong> <strong>the</strong> <strong>in</strong>novation had it been <strong>in</strong>effective. However, if<strong>the</strong>re is no connection between <strong>the</strong> numbers (1.4), this parameter is notat all l<strong>in</strong>ked with our observations (1.2). And so, we are unable todeterm<strong>in</strong>e λ . Then, when estimat<strong>in</strong>g (1.4) we should choose estimatorsˆλ ibased on a s<strong>in</strong>gle realization (if we only observed one set <strong>of</strong>mach<strong>in</strong>ery) <strong>and</strong> we can only very roughly assume thatλ ˆ = µ ,...,λ ˆ = µ .1 1nnThus, when choos<strong>in</strong>g a very general model, we are unable todeterm<strong>in</strong>e its parameters which happens always. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, aparticular model with equalities (1.3) is able to provide betterapproximationˆλ = µ. (1.5)For <strong>the</strong> case <strong>of</strong> an <strong>in</strong>effective <strong>in</strong>novation it is natural to assumeapproximately thatλ = Eµ ≈ ˆλ = µ. (1.6)This particular model enables us, <strong>in</strong> general, to solve our problem,but it has ano<strong>the</strong>r disadvantage: it can be wrong. For example, ag<strong>in</strong>g <strong>of</strong><strong>the</strong> mach<strong>in</strong>ery can lead to <strong>in</strong>crease <strong>of</strong> <strong>the</strong> mean number <strong>of</strong> failuresfrom year to year:Eµ 1 < Eµ 2 < ... < Eµ n < Eµ -Here, equality (1.6) will underestimate <strong>the</strong> actual value <strong>of</strong> Eµ. Supposethat ˆλ = 2 but that actually Eµ= 4, <strong>the</strong>n properlyP{µ = 0} = e −4 ≈ 1/55<strong>in</strong>stead <strong>of</strong> <strong>the</strong> result <strong>of</strong> our calculation, P ≈ 1/7, see § 1.1, after whichwe will not admit that <strong>the</strong> <strong>in</strong>novation is effective although actuallyalmost surely it is such.We see that when construct<strong>in</strong>g a statistical model we have to choosebetween Scylla <strong>and</strong> Charybdis, that is, between a general model,useless s<strong>in</strong>ce we are unable to def<strong>in</strong>e its parameters <strong>and</strong> a particularmodel, possibly wrong <strong>and</strong> <strong>the</strong>refore lead<strong>in</strong>g to false conclusions. It isonly unknown which is Scylla <strong>and</strong> which is Charybdis.50


Suppose that we have adopted <strong>the</strong> particular model, i. e. declaredthat <strong>the</strong> magnitudes (1.2) are identically distributed r<strong>and</strong>om variables.Will this be <strong>the</strong> sole necessary assumption? No, s<strong>in</strong>ce we badly need toknow how large can be <strong>the</strong> error <strong>of</strong> <strong>the</strong> approximate equality (1.6). Forexample, if ˆλ=2, can <strong>the</strong> real value <strong>of</strong> λ be 4? In o<strong>the</strong>r words, weshould be able to calculate <strong>the</strong> variance <strong>of</strong> (1.5). It is equal tonˆ 1var λ = [ var µ + cov(µ µ )].∑ ∑2i i jn i= 1i≠jFor <strong>the</strong> Poisson lawvarµ i = Eµ i = λ<strong>and</strong>, as a rough estimate, it is possible to assume varµ i = ˆλ µ, = but wecan not say anyth<strong>in</strong>g abut <strong>the</strong> covariations. The available data areusually far from adequate for estimat<strong>in</strong>g it. Noth<strong>in</strong>g is left than tosuppose that <strong>the</strong> variables (1.2) are <strong>in</strong>dependent, i. e. to consider that<strong>the</strong> covariations vanish.We thus arrive at a model <strong>of</strong> <strong>in</strong>dependent identically distributedr<strong>and</strong>om variables, that is, to a sample. The reader will probably agreethat our considerations, if not logically prove that only a model <strong>of</strong> asample is useful, are still sufficiently conv<strong>in</strong>c<strong>in</strong>g <strong>in</strong> show<strong>in</strong>g that it isdifficult to tear away from <strong>the</strong> sphere <strong>of</strong> ideas lead<strong>in</strong>g to <strong>the</strong> model <strong>of</strong> asample. It is <strong>the</strong>refore very popular <strong>and</strong> researchers are try<strong>in</strong>g to workwith it provided that its falsity is not proven.The chapters <strong>of</strong> ma<strong>the</strong>matical statistics devoted to samples areundoubtedly <strong>in</strong> its best <strong>and</strong> <strong>the</strong> most developed part. However, <strong>the</strong>model <strong>of</strong> sample is sufficiently (<strong>and</strong> even too) <strong>of</strong>ten wrong. We sawthat if <strong>the</strong> mach<strong>in</strong>ery is ag<strong>in</strong>g, <strong>the</strong> observations (1.2) do not compose asample. The same is true when a prelim<strong>in</strong>ary period is <strong>in</strong>volved, when<strong>the</strong> work beg<strong>in</strong>s by elim<strong>in</strong>at<strong>in</strong>g defects after which <strong>the</strong> number <strong>of</strong>failures drops. O<strong>the</strong>r causes violat<strong>in</strong>g <strong>the</strong> identity <strong>of</strong> <strong>the</strong> distribution <strong>of</strong><strong>the</strong> variables (1.2) also exist.Their <strong>in</strong>dependence can also be violated. For example, if a failurewill lead to a capital repair with <strong>the</strong> replacement <strong>of</strong> many depreciatedalthough still workable mach<strong>in</strong>e parts, a negative correlation betweenµ i <strong>and</strong> <strong>the</strong> depreciated <strong>and</strong> worn-out µ j will appear. If, however, <strong>the</strong>wear <strong>and</strong> tear <strong>of</strong> a mach<strong>in</strong>e part <strong>in</strong>tensifies <strong>the</strong> depreciation <strong>of</strong> <strong>the</strong> o<strong>the</strong>rparts <strong>and</strong> no replacements are made, <strong>the</strong> appeared correlation will bepositive.When <strong>in</strong>troduc<strong>in</strong>g models differ<strong>in</strong>g from a model <strong>of</strong> a sample, weshould evidently specify <strong>the</strong>ir dist<strong>in</strong>ction by a small number <strong>of</strong>parameters determ<strong>in</strong>able ei<strong>the</strong>r <strong>the</strong>oretically or by available statisticaldata. Complicated models, as stated above, are absolutely useless. It ispractically possible to allow for ei<strong>the</strong>r deviations from <strong>the</strong> identity <strong>of</strong>distributions given by determ<strong>in</strong>ate functions or from <strong>in</strong>dependenceprovided that identity is preserved. We will now consider such models.It should be borne <strong>in</strong> m<strong>in</strong>d that both <strong>the</strong>se models <strong>and</strong> <strong>the</strong> model <strong>of</strong>sample are sufficiently tentative. True, if a model is proper, our51


conclusions are derived <strong>in</strong> a purely ma<strong>the</strong>matical way <strong>and</strong> <strong>the</strong>reforecerta<strong>in</strong>. However, on <strong>the</strong> whole everyth<strong>in</strong>g depends on <strong>the</strong> model.Statistical methods are as certa<strong>in</strong> (not more or less) as <strong>the</strong> conclusions<strong>of</strong> o<strong>the</strong>r sciences apply<strong>in</strong>g ma<strong>the</strong>matical means, for example physics,astronomy or strength <strong>of</strong> material. In practical problems <strong>the</strong>se sciencescan provide guid<strong>in</strong>g l<strong>in</strong>es but can not guarantee that we have correctlyapplied <strong>the</strong>m.1.3. Model <strong>of</strong> trend with an error. In a ma<strong>the</strong>matical model <strong>of</strong> anobservational series someth<strong>in</strong>g is always determ<strong>in</strong>ate <strong>and</strong> someth<strong>in</strong>gr<strong>and</strong>om. We will consider a model <strong>in</strong> which that seriesx 1 , x 2 , ..., x nis given by formulax i = f(t i ) + δ i . (1.7)Here t i is <strong>the</strong> value <strong>of</strong> some determ<strong>in</strong>ate variable specify<strong>in</strong>g <strong>the</strong> i-<strong>the</strong>xperiment, f(t), some determ<strong>in</strong>ate function (<strong>the</strong> trend) <strong>and</strong> δ i , ar<strong>and</strong>om variable usually called <strong>the</strong> error <strong>of</strong> that experiment. Thissituation means that <strong>the</strong> Lord determ<strong>in</strong>ed <strong>the</strong> true dependence by f(t)so that we should have observed f(t i ) <strong>in</strong> experiment i, but that <strong>the</strong> devil<strong>in</strong>serted <strong>the</strong> error δ i .For example, f(t) can represent one or ano<strong>the</strong>r coord<strong>in</strong>ate <strong>of</strong> anobject <strong>in</strong> space as dependent on time, <strong>and</strong> x i is our measurement <strong>of</strong> thatcoord<strong>in</strong>ate at moment t i . The devil’s <strong>in</strong>terference δ i can certa<strong>in</strong>ly bedeterm<strong>in</strong>ate, r<strong>and</strong>om or generally <strong>of</strong> an <strong>in</strong>determ<strong>in</strong>ate nature. Thus, <strong>the</strong>observed x i can be corrupted by a systematic error so that Eδ i is notnecessarily zero. We may assume that Eδ i = C <strong>and</strong> does not depend oni but it is also possible to consider Eδ i = φ(t i ) is a function <strong>of</strong> t i . Stillworse will happen if Eδ i depends on a variable u i which we can notcheck. In nei<strong>the</strong>r <strong>of</strong> those cases statistical treatment can elim<strong>in</strong>ate <strong>the</strong>errors.However, a sufficiently thorough plann<strong>in</strong>g <strong>of</strong> <strong>the</strong> observations canallow us to hope that <strong>the</strong> errors will be purely r<strong>and</strong>om <strong>in</strong> <strong>the</strong> sense thatstatistical homogeneity is ma<strong>in</strong>ta<strong>in</strong>ed <strong>and</strong> <strong>the</strong>re is no systematic shift:Eδ i = 0. More precisely, <strong>the</strong> systematic error will be sufficiently small<strong>and</strong> can be neglected. Such situations <strong>in</strong>deed comprise <strong>the</strong> scope <strong>of</strong> <strong>the</strong>statistical methods.After recall<strong>in</strong>g what was said <strong>in</strong> § 1.2 it becomes clear that mostsimple statistical assumptions should be imposed on <strong>the</strong> errors δ i . Most<strong>of</strong>ten <strong>the</strong>se errors are supposed to be <strong>in</strong>dependent <strong>and</strong> identicallydistributed. Normality is also usually assumed. Only one <strong>of</strong> <strong>the</strong>irdeviations from <strong>the</strong> model <strong>of</strong> sample was brought <strong>in</strong>to use: it issometimes thought that <strong>the</strong>ir variances are not equal to one ano<strong>the</strong>r butproportional to numbers assigned accord<strong>in</strong>g to some considerations.Or, it is assumed that such numbers w i called weights <strong>of</strong> observationsare known thatw 1 var δ 1 = w 2 var δ 2 = ... = w n var δ n = σ 252


<strong>and</strong> <strong>the</strong> variances are <strong>in</strong>versely proportional to <strong>the</strong> weights2σvar δ = .iw iI have described <strong>the</strong> assumptions imposed on <strong>the</strong> r<strong>and</strong>omcomponent <strong>of</strong> our observations. Now I pass to <strong>the</strong>ir determ<strong>in</strong>atecomponent f(t) o<strong>the</strong>rwise called trend.The most simple <strong>and</strong> classical case consists <strong>in</strong> that <strong>the</strong> function f(t)is <strong>of</strong> a quite def<strong>in</strong>ite class but depends on some unknown parametersc 1 , c 2 , ..., c k :f(t) = F(t, c 1 , c 2 , ..., c k ) (1.8)where <strong>the</strong> function F is given by a known formula or an algorithm <strong>of</strong>calculation. For example, <strong>in</strong> case <strong>of</strong> <strong>the</strong> motion <strong>of</strong> an object <strong>in</strong> spacethose parameters can be understood as its coord<strong>in</strong>ates <strong>and</strong> velocities atany def<strong>in</strong>ite moment; o<strong>the</strong>r, more opportune parameters can also be<strong>in</strong>troduced.Then any coord<strong>in</strong>ate f(t) will be uniquely determ<strong>in</strong>ed by <strong>the</strong>parameters <strong>and</strong> <strong>the</strong> Newtonian laws <strong>of</strong> motion (if that object has noeng<strong>in</strong>e). The problem consists <strong>in</strong> determ<strong>in</strong><strong>in</strong>g estimates <strong>of</strong> <strong>the</strong>parameters c ˆigiven observations (1.7). It is solved by <strong>the</strong> Gaussianmethod <strong>of</strong> least squares: <strong>the</strong> estimates are determ<strong>in</strong>ed <strong>in</strong> such a waythat <strong>the</strong> m<strong>in</strong>imal value <strong>of</strong> <strong>the</strong> functionn∑i=1[ x − F( t ; c ,..., c ]i i 1 k2<strong>of</strong> c i will be atta<strong>in</strong>ed at po<strong>in</strong>t ( cˆ1,..., c ˆk).More <strong>of</strong>ten, however, is <strong>the</strong> case <strong>in</strong> which <strong>the</strong> real dependence f(t) isunknown. Here also <strong>the</strong> equality (1.8) is applied but <strong>the</strong> function F ischosen more or less arbitrarily. Thus, a polynomial might be chosen<strong>and</strong> <strong>the</strong> method <strong>of</strong> least squares once more applied.Such a non-classical situation when f(t) is not known beforeh<strong>and</strong>dem<strong>and</strong>s a more detailed analysis, see a concrete example <strong>in</strong> <strong>the</strong> nextChapter. Here, however, we describe an absolutely different modelalso applied for statistically treat<strong>in</strong>g observational series.1.4. Model <strong>of</strong> a stochastic process. The ma<strong>in</strong> attention is turned to<strong>the</strong> isolation <strong>of</strong> <strong>the</strong> determ<strong>in</strong>ate component, <strong>the</strong> trend f(t). The values<strong>the</strong>mselves, x i , <strong>of</strong> <strong>the</strong> observational series (1.8) are not r<strong>and</strong>om;r<strong>and</strong>om are only <strong>the</strong> additional magnitudes δ i considered as errors,noise, <strong>and</strong> generally <strong>the</strong> devil’s mach<strong>in</strong>ations. Ano<strong>the</strong>r approach ispossible with r<strong>and</strong>omness be<strong>in</strong>g considered <strong>the</strong> ma<strong>in</strong> property <strong>of</strong> <strong>the</strong>series under study which we now denote byξ 1 , ξ 2 , ..., ξ n . (1.9)Here, <strong>the</strong> most simple model consists <strong>in</strong> treat<strong>in</strong>g that set as arealization <strong>of</strong> an n-dimensional r<strong>and</strong>om variable. Such a model can beuseful if <strong>the</strong> experiment provid<strong>in</strong>g it can be repeated many times over,53


i. e., if many observational series can be obta<strong>in</strong>ed under similarstatistically homogeneous conditions. More <strong>of</strong>ten, however, we haveonly one such series, distributions <strong>of</strong> probabilities certa<strong>in</strong>ly can not bereconstructed <strong>and</strong> <strong>the</strong> model <strong>of</strong> an n-dimensional distribution isabsolutely useless. However, if we assume that <strong>the</strong> jo<strong>in</strong>t distribution <strong>of</strong><strong>the</strong> magnitude ξ 1 , ξ 2 , is <strong>the</strong> same as that <strong>of</strong> ξ 2 , ξ 3 , <strong>of</strong> ξ 3 , ξ 4 , etc, <strong>the</strong>n <strong>the</strong>pairs (ξ 1 , ξ 2 ), (ξ 2 , ξ 3 ),..., (ξ n−1 , ξ n ) provide many realizations, althoughperhaps not mutually <strong>in</strong>dependent, <strong>of</strong> that bivariate distribution. Such adistribution is <strong>the</strong>refore determ<strong>in</strong>able <strong>in</strong> pr<strong>in</strong>ciple.It is convenient to generalize somewhat <strong>the</strong> ma<strong>the</strong>matical model. Letus consider a sequence <strong>of</strong> r<strong>and</strong>om variables <strong>in</strong>f<strong>in</strong>ite <strong>in</strong> both directions... ξ −1 , ξ 0 , ξ 1 , ξ 2 , ..., ξ n , ξ n+1 , ... (1.10)called a stochastic process. We assume that <strong>the</strong>oretically <strong>the</strong>re existdistributions <strong>of</strong> probabilities <strong>of</strong> any f<strong>in</strong>ite set{ξ α , ξ β , ξ γ } (1.11)<strong>of</strong> r<strong>and</strong>om variables. Our observational series (1.9) is a part <strong>of</strong> <strong>the</strong><strong>in</strong>f<strong>in</strong>ite sequence (1.10) <strong>and</strong> only allows us to reach some conclusionsabout that whole process if <strong>the</strong> model <strong>in</strong>cludes a rule represent<strong>in</strong>gdistributions <strong>of</strong> magnitudes (1.11) with negative <strong>and</strong> large positivesubscripts through <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> observed variables (1.9).Without such a rule <strong>the</strong> model <strong>of</strong> a stochastic process is useless.In <strong>the</strong> most simple <strong>and</strong> most natural case <strong>the</strong> condition <strong>of</strong>stationarity is imposed: for any τ <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> variables (ξ α+τ ,..., ξ γ+ τ ) co<strong>in</strong>cides with that for τ = 0. The model <strong>of</strong> a stochasticprocess consists <strong>in</strong> that [now] we consider our observations (1.9) as apart <strong>of</strong> <strong>the</strong> realization (1.10) <strong>of</strong> a stationary stochastic process.When assum<strong>in</strong>g a model <strong>of</strong> a stochastic process, only bivariatedistributions are usually applied <strong>and</strong> <strong>in</strong> addition only <strong>the</strong> correlationbetween <strong>the</strong> different values <strong>of</strong> that process are studied. It ought to besaid that <strong>in</strong> spite <strong>of</strong> <strong>the</strong> popularity <strong>of</strong> <strong>the</strong> concept <strong>of</strong> stochastic process,only quite a few examples can be cited <strong>in</strong> which it allowed to describeadequately <strong>the</strong> statistical properties <strong>of</strong> observational series. Mostpublications beg<strong>in</strong> by stat<strong>in</strong>g that a pert<strong>in</strong>ent stochastic processspecified <strong>in</strong> such <strong>and</strong> such a way is given, but <strong>the</strong>re really are only afew works where <strong>the</strong>se specifications are <strong>in</strong>deed determ<strong>in</strong>ed<strong>the</strong>oretically or experimentally.The <strong>the</strong>ory <strong>of</strong> stochastic processes is here suitable for solv<strong>in</strong>gabstract problems: what will happen if a white noise <strong>of</strong> a given<strong>in</strong>tensity <strong>in</strong>fluences some system. Such problems, however, only<strong>in</strong>directly bear on <strong>the</strong> real behaviour <strong>of</strong> a system because under realconditions it is not likely <strong>the</strong> white noise that <strong>in</strong>fluences <strong>the</strong> system, –it does not even concern a stochastic process (lack <strong>of</strong> statisticalhomogeneity). But meanwhile <strong>of</strong>ten no one studies what is reallyact<strong>in</strong>g on <strong>the</strong> system because such <strong>in</strong>vestigations are complicated,difficult <strong>and</strong> expensive so that it is much easier to restrict <strong>the</strong> attentionto arbitrary prior assumptions.54


It is <strong>in</strong>terest<strong>in</strong>g <strong>the</strong>refore to see what occurred when <strong>the</strong> mostem<strong>in</strong>ent statisticians attempted to study actual data by models <strong>of</strong> astochastic process. Ra<strong>the</strong>r <strong>of</strong>ten <strong>the</strong>y experienced failure, see Chapter3. We will also briefly mention <strong>the</strong> statistical <strong>the</strong>ory <strong>of</strong> turbulence <strong>in</strong>which <strong>the</strong> notion <strong>of</strong> stochastic process has been applied with brilliantsuccess.2. The Method <strong>of</strong> Least SquaresGauss discovered <strong>and</strong> <strong>in</strong>troduced it <strong>in</strong>to general usage. The classicalcase which he considered consisted <strong>in</strong> that some known relationsshould be ma<strong>in</strong>ta<strong>in</strong>ed between <strong>the</strong> terms <strong>of</strong> <strong>the</strong> observational seriesx 1 , x 2 , ..., x nhad not <strong>the</strong> observations been corrupted by errors. For example, <strong>in</strong> <strong>the</strong>case <strong>of</strong> <strong>the</strong> path <strong>of</strong> an object <strong>in</strong> space 3 it would have been possible toexpress all terms <strong>of</strong> <strong>the</strong> series through a few <strong>of</strong> its first terms had <strong>the</strong>sebeen known absolutely precisely. This classical case can becomparatively easily studied with<strong>in</strong> <strong>the</strong> boundaries <strong>of</strong> ma<strong>the</strong>maticalstatistics. Practical applications <strong>of</strong> <strong>the</strong> method <strong>of</strong> least squares canencounter more or less essential calculational difficulties which weleave aside. O<strong>the</strong>r difficulties are connected with <strong>the</strong> possible nonfulfilment<strong>of</strong> <strong>the</strong> assumption <strong>of</strong> <strong>the</strong> model <strong>of</strong> trend with error. Thus,errors <strong>of</strong> successive measurements <strong>of</strong> distances by radar apparentlycan not be assumed <strong>in</strong>dependent r<strong>and</strong>om variables. It is <strong>in</strong> generalunclear whe<strong>the</strong>r <strong>the</strong>y possess a statistical character so that statisticalmethods are here unreliable <strong>and</strong> moreover helpless.The observations <strong>the</strong>mselves, however, are highly precise <strong>and</strong> canbe made many times, so that statistical methods are not needed <strong>the</strong>re.In spite <strong>of</strong> all <strong>the</strong> merits <strong>of</strong> <strong>the</strong> classical case, its shortcom<strong>in</strong>g is that itoccurs comparatively rarely. Much more <strong>of</strong>ten we are conv<strong>in</strong>ced thatour observations can be approximated by a smooth dependencex i ≈ f(t i )where t i is a variable describ<strong>in</strong>g <strong>the</strong> conditions <strong>of</strong> <strong>the</strong> i-th experiment.The exact form <strong>of</strong> <strong>the</strong> function f(t) is, however, unknown.Methods strongly resembl<strong>in</strong>g those <strong>of</strong> <strong>the</strong> classical case are appliedhere, but <strong>the</strong>ir study <strong>in</strong>dicates that <strong>the</strong>y are not ma<strong>the</strong>maticallyjustified. Ma<strong>the</strong>matical statistics widely applies ma<strong>the</strong>matics but is notreduced to that comparatively very transparent science. <strong>Statistics</strong> isra<strong>the</strong>r an art <strong>and</strong> as such it has its own secrets <strong>and</strong> we will <strong>in</strong>deedbeg<strong>in</strong> by study<strong>in</strong>g <strong>the</strong>m.2.1. The secrets <strong>of</strong> <strong>the</strong> statistical art. When wish<strong>in</strong>g to apply <strong>the</strong>method <strong>of</strong> least squares we can <strong>in</strong> most cases use a computerprogramme compiled once <strong>and</strong> for all. It is just necessary to enter <strong>the</strong>data, wait for <strong>the</strong> calculations to be made <strong>and</strong> <strong>the</strong> pr<strong>in</strong>ter will provide aformula for a curve fitt<strong>in</strong>g <strong>the</strong> observations. However, he who passesall <strong>the</strong>se procedures to a mach<strong>in</strong>e will be wrong. It is absolutelynecessary to represent <strong>the</strong> available data <strong>in</strong> a visible way <strong>and</strong> at least toglance at <strong>the</strong> figure.55


The human eye is able to detect such special features <strong>in</strong> <strong>the</strong> materialthat <strong>the</strong> mach<strong>in</strong>e will miss. For example, if <strong>the</strong> first half <strong>of</strong> <strong>the</strong>observations is situated above, <strong>and</strong> <strong>the</strong> second half, below <strong>the</strong> fitt<strong>in</strong>gcurve, <strong>the</strong>n, obviously, <strong>the</strong> assumption <strong>of</strong> <strong>in</strong>dependent errors <strong>in</strong> <strong>the</strong>model <strong>of</strong> trend with error is violated. In such a case no computercalculation has any sense [s<strong>in</strong>ce] <strong>the</strong> mach<strong>in</strong>e is unable to note <strong>the</strong>sespecial features all by itself. A pert<strong>in</strong>ent programme can certa<strong>in</strong>ly becompiled but <strong>the</strong> trouble is that <strong>the</strong>re are so many possible features <strong>of</strong><strong>the</strong> data for <strong>in</strong>clud<strong>in</strong>g <strong>the</strong> study <strong>of</strong> all <strong>of</strong> <strong>the</strong>m <strong>in</strong> <strong>the</strong> programme.It is natural to entrust <strong>the</strong> application <strong>of</strong> any given statistical test to amach<strong>in</strong>e, which however is barely able to formulate <strong>the</strong> necessarytests. This should be done by a statistician by issu<strong>in</strong>g from a visualestimation <strong>of</strong> <strong>the</strong> statistical data that should be <strong>the</strong>refore represented <strong>in</strong>a graphical way. It follows <strong>the</strong>refore that <strong>the</strong> statistical art is based <strong>in</strong><strong>the</strong> first <strong>in</strong>stance on visual estimation.He who wholly trusts <strong>the</strong> automatic computer calculations depriveshimself <strong>of</strong> <strong>the</strong> possibility <strong>of</strong> check<strong>in</strong>g <strong>the</strong> statistical model <strong>and</strong>, as aresult, <strong>the</strong> more is given over to mach<strong>in</strong>e treatment, <strong>the</strong> less trust itdeserves. However, if statistical material dem<strong>and</strong>s to be estimated by<strong>the</strong> naked eye, this will be possible for functions <strong>of</strong> one variable wellenough (s<strong>in</strong>ce <strong>the</strong>y can be represented by graphs), much worse withfunctions <strong>of</strong> two variables (<strong>the</strong>y can be depicted by isol<strong>in</strong>es like <strong>the</strong>heights above sea level on topographic maps) but we are absolutelyunable to study functions <strong>of</strong> a larger number <strong>of</strong> variables.That is <strong>the</strong> doma<strong>in</strong> where we may only reckon on help from <strong>the</strong>computer. First steps were done here. Such directions like multivariatestatistical analysis <strong>and</strong> design <strong>of</strong> extremal experiments have emerged,but it is still a very long way to go before really reliable methods arecreated. The methods <strong>of</strong> <strong>the</strong> directions just mentioned are sometimeseffective, sometimes not <strong>and</strong> we do not know <strong>the</strong> reason why. Thema<strong>in</strong> shortcom<strong>in</strong>g here is <strong>the</strong> low moral level <strong>of</strong> research, <strong>the</strong> custom<strong>of</strong> pretend<strong>in</strong>g <strong>the</strong> desired to be real so that we do not know whatexactly can we trust <strong>in</strong>.And so, when desir<strong>in</strong>g to apply <strong>the</strong> method <strong>of</strong> least squares, weshould beg<strong>in</strong> by draw<strong>in</strong>g a graph <strong>of</strong> <strong>the</strong> observational series. Thereader will imag<strong>in</strong>e what transpires here by hav<strong>in</strong>g a look on <strong>the</strong>broken l<strong>in</strong>e on Fig. 1; its mean<strong>in</strong>g is yet unimportant. Such a brokenl<strong>in</strong>e obviously fluctuates about some smoothly chang<strong>in</strong>g curve. Thiscurve is <strong>in</strong>deed express<strong>in</strong>g <strong>the</strong> true regularity whereas <strong>the</strong> fluctuations<strong>of</strong> that broken l<strong>in</strong>e are occasioned by r<strong>and</strong>om causes <strong>and</strong> have norelation [...]The italicized phrases usually comprise all <strong>the</strong> available <strong>in</strong>formationabout <strong>the</strong> real studied dependence. Underst<strong>and</strong>ably, it is too diffuse<strong>and</strong> <strong>in</strong>determ<strong>in</strong>ate for directly admitt<strong>in</strong>g some scientific <strong>in</strong>vestigation.Conclusions reached by a naked eye study should be transferred <strong>in</strong>to ama<strong>the</strong>matical model applicable for statistical treatment. Thattransformation is <strong>the</strong> second mystery <strong>of</strong> <strong>the</strong> statistical art.In case <strong>of</strong> <strong>the</strong> method <strong>of</strong> least squares most <strong>of</strong>ten a modelx i = P(t i ) + δ i , i = 1, 2, ..., n56


is applied with P(t) be<strong>in</strong>g a polynomial whose coefficients should beestimated by that method. [...] It is usually said that <strong>in</strong> case ano<strong>the</strong>rmodelx i = f(t i ) + δ i , i = 1, 2, ..., nis valid with f(t) not be<strong>in</strong>g a polynomial, we may apply <strong>the</strong> Weierstrass<strong>the</strong>orem accord<strong>in</strong>g to which we can approximate f(t) by <strong>the</strong> polynomialP(t) as precisely as desired. However, that reference is , however,<strong>in</strong>appropriate because any cont<strong>in</strong>uous function can be approximated bya polynomial <strong>of</strong> a sufficiently high degree. In practice we attempt tochoose a low ra<strong>the</strong>r than a high degree. [...] A polynomial <strong>of</strong> a higherdegree [can be] fur<strong>the</strong>r from reality than that <strong>of</strong> a lower degree. Then,<strong>the</strong> Weierstrass <strong>the</strong>orem is also valid for functions <strong>of</strong> several variables.However, if <strong>the</strong> earth’s surface as shown by isol<strong>in</strong>es on a topographicmap is considered a typical function <strong>of</strong> two variables, <strong>and</strong> itsapproximation is attempted, <strong>the</strong> result will be usually unsatisfactory:<strong>the</strong> degree <strong>of</strong> <strong>the</strong> polynomial should be too high. The <strong>the</strong>oreticalWeierstrass <strong>the</strong>orem <strong>and</strong> practical smooth<strong>in</strong>g differ.The method <strong>of</strong> smooth<strong>in</strong>g by a polynomial is <strong>the</strong>refore notma<strong>the</strong>matically justified <strong>and</strong> <strong>the</strong> success <strong>of</strong> that procedure is one moremystery <strong>of</strong> <strong>the</strong> statistical art. How can we expla<strong>in</strong> <strong>the</strong> ra<strong>the</strong>r <strong>of</strong>tensuccess here? Apparently <strong>the</strong> human eye feels well enough <strong>the</strong>behaviour <strong>of</strong> <strong>the</strong> graphs <strong>of</strong> analytic functions, polynomials <strong>in</strong>particular. As students are taught, only a few po<strong>in</strong>ts ought to becalculated, − discont<strong>in</strong>uities, extrema, sometimes po<strong>in</strong>ts <strong>of</strong> <strong>in</strong>flexion, −<strong>and</strong> functions are <strong>the</strong>n reconstructed quite well. It may be supposedthat we are able to catch whe<strong>the</strong>r <strong>the</strong> real dependence is approximatedwhen smooth<strong>in</strong>g a broken l<strong>in</strong>e by a polynomial well enough. Thisstatistical procedure <strong>of</strong> smooth<strong>in</strong>g a function by a polynomial isprobably only applied when success is expected after hav<strong>in</strong>g a look at<strong>the</strong> graph by naked eye.The situation changes at once if that procedure is attempted to bewholly accomplished automatically. In that case no data will beprelim<strong>in</strong>arily estimated <strong>and</strong> <strong>the</strong> portion <strong>of</strong> successful smooth<strong>in</strong>g willbe sharply reduced. The case <strong>of</strong> functions <strong>of</strong> many variables is quitecomplicated. We are unable to show ei<strong>the</strong>r <strong>the</strong> experimental data or <strong>the</strong>result <strong>of</strong> smooth<strong>in</strong>g <strong>and</strong> can not even say whe<strong>the</strong>r it was successful ornot.2.2. Smooth<strong>in</strong>g by a polynomial: an example. It is time to expla<strong>in</strong><strong>the</strong> provenance <strong>of</strong> <strong>the</strong> observations represented on Fig. 1. [It was <strong>the</strong>study <strong>of</strong> <strong>the</strong> damage <strong>of</strong> <strong>in</strong>sulation <strong>of</strong> <strong>the</strong> stators <strong>of</strong> large turbogenerators(Belova et al 1965, 1967).] The total number <strong>of</strong> failures isnaturally comprised <strong>of</strong> failures <strong>of</strong> separate generators. Ra<strong>the</strong>r early <strong>in</strong>our work we decided that <strong>the</strong> probability <strong>of</strong> a failure <strong>of</strong> a givengenerator is proportional to <strong>the</strong> total area <strong>of</strong> its <strong>in</strong>sulation <strong>and</strong> littledepends on its constructive or operational conditions; hydrogencool<strong>in</strong>g was <strong>the</strong>n almost non-exist<strong>in</strong>g. We have <strong>the</strong>refore studied <strong>the</strong>behaviour <strong>of</strong> a unit area <strong>of</strong> that <strong>in</strong>sulation (100sq. m. correspond<strong>in</strong>g <strong>in</strong>its order to an area <strong>of</strong> <strong>in</strong>sulation a large mach<strong>in</strong>e) without allow<strong>in</strong>g forany o<strong>the</strong>r peculiarities. The problem <strong>of</strong> ag<strong>in</strong>g <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation, i. e., <strong>of</strong>57


<strong>the</strong> <strong>in</strong>crease <strong>of</strong> <strong>the</strong> probability <strong>of</strong> failure with time, was formulated.[...]The values <strong>of</strong> <strong>the</strong> frequencies <strong>of</strong> failures comprise a broken l<strong>in</strong>e.Their scatter <strong>in</strong>creases with t, a circumstance connected with a sharpdecrease <strong>of</strong> <strong>the</strong> area <strong>of</strong> <strong>in</strong>sulation, i. e., <strong>of</strong> <strong>the</strong> amount <strong>of</strong> experimentalmaterial.We are <strong>in</strong>terested <strong>in</strong> <strong>the</strong> values <strong>of</strong> probabilities p(t) <strong>of</strong> a failure <strong>of</strong> aunit area <strong>of</strong> <strong>in</strong>sulation aged t dur<strong>in</strong>g unit time (10 4 work<strong>in</strong>g hours,about 1.5 years). For small values <strong>of</strong> t <strong>the</strong> amount <strong>of</strong> experimentalmaterial is large, but p(t) <strong>the</strong>mselves are low, 0.01 – 0.02, so that <strong>the</strong>irdirect determ<strong>in</strong>ation through frequencies is fraught with very largeerrors. The mean square deviation <strong>of</strong> <strong>the</strong> frequency, µ i /S i , where µ i is<strong>the</strong> number <strong>of</strong> failures dur<strong>in</strong>g time <strong>in</strong>terval between (i – 1)-th <strong>and</strong> i-thtime units <strong>and</strong> S i , <strong>the</strong> correspond<strong>in</strong>g area <strong>of</strong> <strong>in</strong>sulation, is known to beequal top( ti)[1 − p( ti)]Siwhere t i =10 4 i hours. For t 1 = 10 5 p(t i ) ≈ 0.02, S i = 200, so thatdeviation is roughly 0.01 or 50% <strong>of</strong> p(t) itself.Then it is natural to attempt to heighten <strong>the</strong> precision <strong>of</strong> determ<strong>in</strong><strong>in</strong>gp(t) by smooth<strong>in</strong>g s<strong>in</strong>ce <strong>the</strong> estimation <strong>of</strong> this probability <strong>the</strong>n dependson all o<strong>the</strong>r experimental data. But <strong>the</strong>n, a statistical model isnecessary here. It is ra<strong>the</strong>r natural to consider <strong>the</strong> observed number <strong>of</strong>failures (1.2) as r<strong>and</strong>om variables with a Poisson distribution.Underst<strong>and</strong>ably,Eµ i = S i p(t i ).It is somewhat more difficult to agree that <strong>the</strong> magnitudes µ i are<strong>in</strong>dependent. Here, however, <strong>the</strong> follow<strong>in</strong>g considerations applicableto any rare events will help. Take for example µ 1 , µ 2 . Failure occurr<strong>in</strong>gdur<strong>in</strong>g <strong>the</strong> first <strong>in</strong>terval <strong>of</strong> time <strong>in</strong>fluences <strong>the</strong> behaviour <strong>of</strong> <strong>the</strong><strong>in</strong>sulation <strong>in</strong> <strong>the</strong> second <strong>in</strong>terval, but that action is only restricted to <strong>the</strong>failed mach<strong>in</strong>es whose portion was small. Hav<strong>in</strong>g admitted<strong>in</strong>dependence, <strong>the</strong> ma<strong>the</strong>matical model is completely given although itis connected not with <strong>the</strong> most convenient normal, but with <strong>the</strong>Poisson distribution. Then, <strong>the</strong> variancesvar µ i = Eµ i = S i p(t i )depend on probabilities p(t i ) which we <strong>in</strong>deed aim to derive. Atransformation to magnitudesv = 2 µ(2.1)iiessentially equalizes <strong>the</strong> variances <strong>and</strong> <strong>the</strong>refore helps.These magnitudes v 1 , v 2 , ..., v n from which we later return tomagnitudes (1.2) are smoo<strong>the</strong>d. The smooth<strong>in</strong>g itself is easy <strong>in</strong> essence58


ut some ma<strong>the</strong>matical tricks described <strong>in</strong> detail elsewhere (Belova etal 1965, 1967) are applied.The approximate expressionp(t i ) = p(x i ) ≈ (1/4)[b 0 – 0.1333b 2 + b 2 x 2 ] 2 + 0.35/S i , x i = t i /22,where t i is measured <strong>in</strong> <strong>the</strong> selected <strong>in</strong>tervals <strong>of</strong> time <strong>and</strong> <strong>the</strong> last termis necessary for allow<strong>in</strong>g for <strong>the</strong> systematic error that occurred whentransferr<strong>in</strong>g to v i (2.1) should be considered f<strong>in</strong>al.Estimates for b 0 <strong>and</strong> b 2 <strong>and</strong> <strong>the</strong>ir variances arebˆ = 0.225, bˆ = 0.20, var bˆ = 2.12⋅ 10 , var bˆ= 44.5⋅10 .−4 −40 2 0 2The magnitudes 0.35/S i are smoo<strong>the</strong>d by a certa<strong>in</strong> polynomial.Careful statistical work concern<strong>in</strong>g probabilities <strong>of</strong> failures shouldapply our answer exactly <strong>in</strong> <strong>the</strong> provided form. However, it is not vividenough <strong>and</strong> we have <strong>the</strong>refore represented it <strong>in</strong> a simplified way. Aconfidence rectangle for (b 0 , b 2 ) with an 80% coefficient was <strong>the</strong>refore<strong>in</strong>dicated with curves p 1 (t), p 2 (t), p 3 (t) added. Curve p 2 (t) provides <strong>the</strong>best estimate <strong>of</strong> <strong>the</strong> real probability <strong>of</strong> failure which we are able to<strong>of</strong>fer. It corresponds to <strong>the</strong> estimates <strong>of</strong> b 0 <strong>and</strong> b 2 . The o<strong>the</strong>r curves areobta<strong>in</strong>ed if <strong>the</strong> real po<strong>in</strong>t (b 0 , b 2 ) is replaced by <strong>the</strong> left lower <strong>and</strong> rightupper vertices <strong>of</strong> <strong>the</strong> confidence rectangle respectively. They providean idea about <strong>the</strong> order <strong>of</strong> <strong>the</strong> possible error <strong>of</strong> curve p 2 (t) but, strictlyspeak<strong>in</strong>g, are not <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> confidence region for <strong>the</strong> truecurve. The confidence region for p(t) can be constructed <strong>in</strong> differentways. Strictly speak<strong>in</strong>g, it is not needed s<strong>in</strong>ce all <strong>the</strong> <strong>in</strong>formationapplied for construct<strong>in</strong>g it is summed <strong>in</strong> <strong>the</strong> mentionedvariances, var bˆˆ0<strong>and</strong> var b2.The region between p 1 (t) <strong>and</strong> p 3 (t) can beconsidered as some approximate (hav<strong>in</strong>g <strong>the</strong> adequate order) version<strong>of</strong> <strong>the</strong> confidence region.Various versions <strong>of</strong> check<strong>in</strong>g our model by statistical tests are <strong>of</strong>fundamental significance. In statistics, no check is exhaust<strong>in</strong>g but anumber <strong>of</strong> well passed tests never<strong>the</strong>less produces a feel<strong>in</strong>g <strong>of</strong>certitude <strong>in</strong> <strong>the</strong> results. We will dwell <strong>in</strong> detail on all <strong>the</strong>se checks.The simplest criterion is <strong>the</strong> study <strong>of</strong> <strong>the</strong> f<strong>in</strong>al result. Let us have agood look at <strong>the</strong> curve represent<strong>in</strong>g <strong>the</strong> dependence sought. Do not <strong>the</strong>actual data deviate too much? (The maximal among <strong>the</strong> 22 deviationsis µ 20 = 4 <strong>and</strong> accord<strong>in</strong>g to <strong>the</strong> Poisson formula P{µ 20 ≥ 4} ≈ 0.12.) Initself, this is not especially significant, <strong>and</strong> for <strong>the</strong> maximal <strong>of</strong> 22deviations with only 1/22 ≈ 0.05, it is quite acceptable. [...]It is possible to compile an expression similar to <strong>the</strong> sum <strong>of</strong> <strong>the</strong>squares <strong>of</strong> deviations <strong>of</strong> <strong>the</strong> experimental data from <strong>the</strong> smooth curvep 2 (t). But it is better to deal with magnitudes v i (2.1) s<strong>in</strong>ce <strong>the</strong>irvariance does not essentially depend on <strong>the</strong> unknown probabilities p(t i )<strong>and</strong> is roughly equal to 1. In our case, <strong>the</strong> variance <strong>of</strong> <strong>the</strong> observationsis thus known almost exactly, a circumstance connected with <strong>the</strong>Poisson distribution, which depends only on one parameter ra<strong>the</strong>r thantwo as <strong>the</strong> normal law does. In general, <strong>the</strong> test apply<strong>in</strong>g <strong>the</strong> sum <strong>of</strong> <strong>the</strong>squares <strong>of</strong> deviations also shows that <strong>the</strong> f<strong>in</strong>al curve fits well enough.59


Ano<strong>the</strong>r group <strong>of</strong> tests is connected with <strong>the</strong> choice <strong>of</strong> <strong>the</strong> degree<strong>and</strong> <strong>the</strong> number <strong>of</strong> terms <strong>of</strong> <strong>the</strong> approximat<strong>in</strong>g polynomial. Here, wealso deal with v i (2.1) <strong>and</strong> test what happens when <strong>the</strong>y areapproximated by various polynomials up to <strong>the</strong> third degree <strong>in</strong>clusive.It is obvious that <strong>the</strong> polynomial sought <strong>in</strong>cludes a free term. Then weadd, <strong>in</strong> turn, terms <strong>of</strong> <strong>the</strong> first, second <strong>and</strong> third degree. The bestimprovement <strong>of</strong> approximation is reached when polynomials <strong>of</strong> <strong>the</strong>typec 0 + c 2 t 2 (2.2)are chosen.And now we check that <strong>the</strong> addition <strong>of</strong> terms <strong>of</strong> <strong>the</strong> first <strong>and</strong> thirddegree to it does not significantly improve <strong>the</strong> approximation; fordetails, see Belova et al (1965, 1967). After all <strong>the</strong>se checks webecome sure that apply<strong>in</strong>g a polynomial (2.2) we have <strong>in</strong>deed ascompletely as was possible elicited <strong>the</strong> determ<strong>in</strong>ate component from<strong>the</strong> available data.However, hav<strong>in</strong>g happily concluded <strong>the</strong> tests <strong>of</strong> <strong>the</strong> hypo<strong>the</strong>sesconnected with <strong>the</strong> smooth<strong>in</strong>g, we do not at all check <strong>the</strong> ma<strong>in</strong>hypo<strong>the</strong>sis, that <strong>the</strong> probability <strong>of</strong> <strong>the</strong> failure <strong>of</strong> a unit area <strong>of</strong><strong>in</strong>sulation does not depend on <strong>the</strong> constructive or operationalpeculiarities <strong>of</strong> <strong>the</strong> pert<strong>in</strong>ent mach<strong>in</strong>e. Indeed, we are only check<strong>in</strong>gwhe<strong>the</strong>r <strong>the</strong> magnitudes µ i are obey<strong>in</strong>g <strong>the</strong> Poisson distribution (<strong>and</strong>, <strong>in</strong>part, whe<strong>the</strong>r <strong>the</strong>y are <strong>in</strong>dependent).However, that distribution also occurs when <strong>the</strong> probabilities <strong>of</strong>failures occurr<strong>in</strong>g on different areas <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation are unequal(provided all <strong>the</strong> probabilities are sufficiently low). The mostimportant hypo<strong>the</strong>sis <strong>of</strong> statistical homogeneity <strong>of</strong> <strong>the</strong> various unitareas <strong>of</strong> <strong>in</strong>sulation is yet left unchecked <strong>and</strong> can not be checked byissu<strong>in</strong>g from <strong>the</strong> generalized data <strong>of</strong> Fig. 1 4 . At <strong>the</strong> same time most<strong>in</strong>terest<strong>in</strong>g is exactly <strong>the</strong> isolation <strong>and</strong> study <strong>of</strong> mach<strong>in</strong>es with high <strong>and</strong>low break-down rates (or a confirmation that all <strong>of</strong> <strong>the</strong>m have <strong>the</strong> samerate <strong>of</strong> failures). We will see now how <strong>the</strong>se problems can be solved.2.3. Check <strong>of</strong> statistical homogeneity. The most importantcondition <strong>of</strong> acquir<strong>in</strong>g a statistically homogeneous totality, or, so tosay, <strong>the</strong> most important mystery <strong>of</strong> <strong>the</strong> statistical art consists <strong>in</strong>carefully select<strong>in</strong>g <strong>the</strong> material to be studied. Thus, <strong>the</strong> data <strong>of</strong> Fig. 1does not <strong>in</strong>clude failures <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation occurr<strong>in</strong>g because <strong>of</strong> causes[<strong>of</strong> various causes <strong>of</strong> its r<strong>and</strong>om damage]. We supposed that suchcauses, although usually called r<strong>and</strong>om, are not r<strong>and</strong>om <strong>in</strong> <strong>the</strong>stochastic sense s<strong>in</strong>ce <strong>the</strong>y are not statistically stable.The selection <strong>of</strong> material was made easier by <strong>the</strong> fact that a failure<strong>of</strong> a large mach<strong>in</strong>e is an extreme event whose causes are thoroughly<strong>in</strong>vestigated <strong>and</strong> duly registered. The most suitable for <strong>in</strong>clud<strong>in</strong>g afailure <strong>in</strong>to statistical treatment was <strong>the</strong> formulation local defect <strong>of</strong><strong>in</strong>sulation. In general, however, all failures were <strong>in</strong>cluded if an aliencause was not clearly <strong>in</strong>dicated. Failures <strong>in</strong>cluded <strong>in</strong>to statisticaltreatment composed about a half <strong>of</strong> all <strong>the</strong> failures <strong>of</strong> <strong>in</strong>sulation.When select<strong>in</strong>g material, <strong>the</strong> statistician must <strong>in</strong>variably keep to somepr<strong>in</strong>ciple once <strong>and</strong> for all.60


It is clear <strong>the</strong>refore that no special significance can be attached tostatistical calculations <strong>of</strong> reliability. This conclusion is important for apr<strong>in</strong>cipled evaluation <strong>of</strong> <strong>the</strong> real mean<strong>in</strong>g <strong>of</strong> <strong>the</strong> reliability <strong>the</strong>ory.Now, however, our <strong>in</strong>terest is concentrated on ano<strong>the</strong>r po<strong>in</strong>t, onascerta<strong>in</strong><strong>in</strong>g whe<strong>the</strong>r our thoroughly selected totality was statisticallyhomogeneous. Suppose that practically <strong>the</strong> derived curve preciselyexpresses <strong>the</strong> probability <strong>of</strong> failure, p(t). If <strong>the</strong> failures <strong>of</strong> <strong>the</strong> <strong>in</strong>sulationare mostly due to its local damage, it is logical to assume that a failure<strong>of</strong> a certa<strong>in</strong> mach<strong>in</strong>e does not <strong>in</strong>fluence (or little <strong>in</strong>fluences) its failureafter repair.But <strong>the</strong>n <strong>the</strong> total number <strong>of</strong> failures ξ i dur<strong>in</strong>g all <strong>the</strong> operationaltime <strong>of</strong> a mach<strong>in</strong>e is a sum <strong>of</strong> <strong>in</strong>dependent r<strong>and</strong>om variables, − <strong>the</strong>number <strong>of</strong> failures dur<strong>in</strong>g <strong>the</strong> first, <strong>the</strong> second, ... selected <strong>in</strong>tervals <strong>of</strong>time. Each term obeys <strong>the</strong> Poisson distribution, so that <strong>the</strong> totalnumber <strong>of</strong> failures also obeys it. The parameter <strong>of</strong> that distribution formach<strong>in</strong>e i for <strong>the</strong> (k – 1)-th time <strong>in</strong>terval isλ ik = p(t k )S i ≈ p 2 (t k )S i (2.3)where as before S i is <strong>the</strong> area <strong>of</strong> <strong>in</strong>sulation <strong>of</strong> mach<strong>in</strong>e i.Therefore, <strong>the</strong> parameterλ i = Eξ i (2.4)<strong>of</strong> <strong>the</strong> total number <strong>of</strong> failures for <strong>the</strong> i-th mach<strong>in</strong>e can be calculatedby summ<strong>in</strong>g <strong>the</strong> expressions (2.3) over such t k that are less than <strong>the</strong>general work<strong>in</strong>g time <strong>of</strong> <strong>the</strong> pert<strong>in</strong>ent mach<strong>in</strong>e. We may thus considerthat <strong>the</strong> numbers (2.4) are known for all <strong>the</strong> mach<strong>in</strong>es. [...] Thismethod <strong>of</strong> determ<strong>in</strong><strong>in</strong>g λ i is only valid when statistical homogeneity issupposed, o<strong>the</strong>rwise <strong>the</strong> computed curve p 2 (t) only provides a generalcharacteristic <strong>of</strong> <strong>the</strong> breakdown rate.Some mach<strong>in</strong>es will have a higher, o<strong>the</strong>r mach<strong>in</strong>es, a lower rate, −will have ei<strong>the</strong>r more or less failures than <strong>in</strong>dicated by <strong>the</strong> Poisson lawwith parameter calculated accord<strong>in</strong>g to our rule. So it seems that wehave established <strong>the</strong> effect to be sought for <strong>in</strong> order to check violations<strong>of</strong> statistical homogeneity. However, <strong>the</strong> trouble is that it is verydifficult to discern that effect. Indeed, suppose we have determ<strong>in</strong>edthat for a certa<strong>in</strong> mach<strong>in</strong>e λ i = 0.1 whereas ξ i = 2. S<strong>in</strong>ce2λiλ 1ie −P{ξ i≥ 2} = + ... ≈2 200it would seem that we detected a significant departure from thathomogeneity. But statistics covers several hundred mach<strong>in</strong>es, so thatfor one (<strong>and</strong> even for a few) <strong>of</strong> <strong>the</strong>m an event with probability 1/200can well happen.There are several possible ways for establish<strong>in</strong>g a useful statisticaltest <strong>of</strong> homogeneity. One <strong>of</strong> <strong>the</strong>m is, to apply <strong>the</strong> Poisson <strong>the</strong>oremonce more. Consider <strong>the</strong> total number <strong>of</strong> mach<strong>in</strong>es that experiencedone, two, three, ... failures. We will show that <strong>the</strong> distribution <strong>of</strong>61


probabilities for those magnitudes can be derived. Introduce a r<strong>and</strong>omvariablef k (ξ) = 1, if ξ i = k; 0, if not, k = 1, 2, 3, ...S<strong>in</strong>ce ξ i is <strong>the</strong> number <strong>of</strong> failures for <strong>the</strong> i-th mach<strong>in</strong>e, <strong>the</strong> number <strong>of</strong>mach<strong>in</strong>es that had k failures is equal to ∑f k (ξ i ), a sum <strong>of</strong> <strong>in</strong>dependentr<strong>and</strong>om variables. For most mach<strong>in</strong>es λ i is near zero, <strong>the</strong>refore, if k ≠0, <strong>the</strong> probabilityP{f k (ξ i ) = 1}is low, <strong>and</strong> <strong>the</strong> sum above roughly obeys <strong>the</strong> Poisson distribution. Itsparameter is derived fromkλiE[ ∑ fk (ξi)] = ∑ E fk (ξi), E fk (ξi) = P{ξ i= k}= ek!ii−λiwhere, provided <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> statistical homogeneity is valid, λ i iscalculated as stated above. A simple calculation (Belova et al 1965,1967) <strong>in</strong>dicates that at different values <strong>of</strong> k <strong>the</strong> studied sums are closeto <strong>in</strong>dependent r<strong>and</strong>om variables.How does deviation from statistical homogeneity reveal itself?Some mach<strong>in</strong>es will have a higher breakdown rate <strong>and</strong> experience twoor more failures, o<strong>the</strong>r will deviate <strong>in</strong> <strong>the</strong> opposite sense <strong>and</strong> workfailure-freely. When statistical homogeneity is corrupted, <strong>the</strong> number<strong>of</strong> mach<strong>in</strong>es with two or more failures will <strong>in</strong>crease, <strong>and</strong> will decreasefor those with one failure.The treatment <strong>of</strong> actual data resulted <strong>in</strong> <strong>the</strong> follow<strong>in</strong>g number <strong>of</strong>mach<strong>in</strong>es with 1, 2, 3 <strong>and</strong> 4 failures (l<strong>in</strong>e 1) as compared with <strong>the</strong>correspond<strong>in</strong>g expectations (l<strong>in</strong>e 2).1. 27 10 1 12. 29.6 5.7 1.5 0.44The number <strong>of</strong> mach<strong>in</strong>es with one failure decreased <strong>in</strong>significantly but<strong>of</strong> those with two failures <strong>in</strong>creased noticeably: for <strong>the</strong> Poisson lawwith parameter 5.7 <strong>the</strong> probability <strong>of</strong> 10 or more is 0.065. For k = 3<strong>and</strong> 4 <strong>the</strong> deviations were small.The only deviation worth discuss<strong>in</strong>g is that for mach<strong>in</strong>es hav<strong>in</strong>g 2failures. However, we may consider it maximal for four <strong>in</strong>dependentdeviations <strong>and</strong> <strong>the</strong>n its probability is 1 − (1 – 0.065) 4 ≈ 0.25 so that itsdeviation is not especially significant.Although <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> statistical homogeneity had passed ara<strong>the</strong>r rigid test with credit, some shadow <strong>of</strong> doubt is still cast on it.This seems to mean that for most mach<strong>in</strong>es <strong>the</strong> breakdown rate isroughly <strong>the</strong> same but that small groups <strong>of</strong> <strong>the</strong>m it can st<strong>and</strong> out. Awide scatter would have led to an essentially more significant result <strong>of</strong><strong>the</strong> test. [...]It follows that <strong>in</strong> general <strong>the</strong> derived fitt<strong>in</strong>g mean curve p 2 (t) can beapplied for an approximate calculation <strong>of</strong> <strong>the</strong> mean number <strong>of</strong> failures62


<strong>of</strong> various groups <strong>of</strong> mach<strong>in</strong>es, <strong>and</strong> this provides us a test forestimat<strong>in</strong>g <strong>the</strong> reliability <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation. Purely statistical methodscerta<strong>in</strong>ly do not concern <strong>the</strong> improvement <strong>of</strong> that reliability which is atechnological problem. But at least we may say whe<strong>the</strong>r <strong>the</strong> reliability<strong>of</strong> <strong>in</strong>sulation had changed <strong>and</strong> <strong>in</strong> which direction or that it rema<strong>in</strong>ed asit was previously. This is <strong>the</strong> practical significance <strong>of</strong> <strong>the</strong> work donewhich would not be so important had <strong>the</strong> comparatively high statisticalhomogeneity <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation not been established. [...]2.4. The naked eye study. We had assumed that smooth<strong>in</strong>g bypolynomials is usually successful because <strong>the</strong> data for that treatment isselected beforeh<strong>and</strong> by naked eye. It would have been improper to failto mention that physicists <strong>and</strong> eng<strong>in</strong>eers also <strong>of</strong>ten perform <strong>the</strong>smooth<strong>in</strong>g itself by naked eye without apply<strong>in</strong>g <strong>the</strong> method <strong>of</strong> leastsquares. And how do we decide that <strong>the</strong> smooth<strong>in</strong>g <strong>in</strong> a given case wassuccessful? Perhaps because <strong>the</strong> curve derived by least squares passesexactly where it would have been if drawn without apply<strong>in</strong>g thatmethod?An experimental smooth<strong>in</strong>g by naked eye <strong>of</strong> <strong>the</strong> broken l<strong>in</strong>e <strong>in</strong> § 2.2was carried out. Participants were ma<strong>the</strong>maticians, workers at astatistical laboratory, <strong>and</strong> eng<strong>in</strong>eers. Each received a list <strong>of</strong> paper withonly that l<strong>in</strong>e shown [...]. The results achieved by an overwhelm<strong>in</strong>gmajority were very good. Fifteen out <strong>of</strong> sixteen <strong>of</strong> those participat<strong>in</strong>ghad almost completely drawn <strong>the</strong>ir curves between <strong>the</strong> two curves,p 1 (t) <strong>and</strong> p 3 (t) as shown on Fig. 2. [...]I. V. Girsanov, <strong>the</strong> chief <strong>of</strong> one <strong>of</strong> <strong>the</strong> sections <strong>of</strong> <strong>the</strong> statisticallaboratory, achieved <strong>the</strong> best result; he unfortunately perished <strong>in</strong> a latertourist mounta<strong>in</strong> tour. [...] In general, <strong>the</strong> results <strong>of</strong> smooth<strong>in</strong>g bynaked eye are quite comparable <strong>in</strong> precision with <strong>the</strong> method <strong>of</strong> leastsquares. Had we been only <strong>in</strong>terested <strong>in</strong> curve p 2 (t), we could havewell drawn it without any calculations. However, a thorough statisticaltreatment dem<strong>and</strong>s an estimation <strong>of</strong> precision as well for which astatistical model <strong>and</strong> science <strong>in</strong> general are necessary.Thus, when estimat<strong>in</strong>g <strong>the</strong> probability <strong>of</strong> success <strong>in</strong> Bernoulli trials,we turn to frequencies, but for underst<strong>and</strong><strong>in</strong>g how large can <strong>the</strong>deviations <strong>of</strong> frequency from probability be, we should, first, consider<strong>the</strong> trials <strong>in</strong>dependent (<strong>the</strong> statistical model) <strong>and</strong> second, apply <strong>the</strong> DeMoivre – Laplace <strong>the</strong>orem which (however done) is proven <strong>in</strong> acomplicated manner [<strong>in</strong> essence, by <strong>the</strong> former <strong>in</strong> 1733] <strong>and</strong> this isundoubtedly science.When smooth<strong>in</strong>g a broken l<strong>in</strong>e by naked eye, we do not even haveto know <strong>the</strong> number <strong>of</strong> observations used for calculat<strong>in</strong>g its po<strong>in</strong>ts [...]<strong>and</strong> anyway it is impossible to <strong>in</strong>dicate <strong>the</strong> confidence region for <strong>the</strong>curve sought. Here, we need all <strong>the</strong> science connected with <strong>the</strong> method<strong>of</strong> least squares <strong>and</strong> still <strong>the</strong> almost complete co<strong>in</strong>cidence <strong>of</strong> <strong>the</strong> areashown on Fig.2 with that between <strong>the</strong> curves p 1 (t) <strong>and</strong> p 3 (t) dem<strong>and</strong>s tobe somehow expla<strong>in</strong>ed.Note, however, that for small values <strong>of</strong> t that first area is somewhatnarrower than <strong>the</strong> second one whereas that latter, as shown bycalculation, is 1.5 – 2 times narrower <strong>the</strong>re than a thoroughlyconstructed confidence region with <strong>the</strong> usual confidence coefficient <strong>of</strong>0.70 – 0. 95. This means that <strong>the</strong> <strong>in</strong>def<strong>in</strong>iteness <strong>of</strong> <strong>the</strong> naked eye63


smooth<strong>in</strong>g is, however, <strong>in</strong> general less than it is when calculatedaccord<strong>in</strong>g to <strong>the</strong> rules <strong>of</strong> statistics.It occurs because, when decid<strong>in</strong>g by naked eye, we have to do witha given graph, with a result determ<strong>in</strong>ed by r<strong>and</strong>om experiment<strong>in</strong>g; on<strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, when work<strong>in</strong>g by statistical methods, we apply astatistical model <strong>and</strong> <strong>the</strong>refore also cover <strong>the</strong> possible scatter <strong>of</strong> <strong>the</strong>results <strong>of</strong> r<strong>and</strong>om experiments <strong>the</strong>mselves from one <strong>of</strong> <strong>the</strong>irrealizations to ano<strong>the</strong>r one. However, <strong>the</strong> general problem <strong>of</strong> <strong>the</strong> realpossibilities <strong>of</strong> <strong>the</strong> naked eye methods dem<strong>and</strong>s wide experimental<strong>in</strong>vestigation.3. The Theory <strong>of</strong> Stochastic ProcessesHowever beneficial (<strong>in</strong> suitable cases) is <strong>the</strong> method <strong>of</strong> leastsquares, a glance at <strong>the</strong> observational series <strong>of</strong>ten conv<strong>in</strong>ces us that <strong>the</strong>model <strong>of</strong> trend with error can not describe <strong>the</strong> observations, s<strong>in</strong>ce weare unable to isolate by naked eye a determ<strong>in</strong>ate curve withobservational po<strong>in</strong>ts chaotically scattered around it. This is whatSlutsky (1927/1937, p. 105), a co-creator <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochasticprocesses, wrote about it:Almost all <strong>of</strong> <strong>the</strong> phenomena <strong>of</strong> economic life, like many o<strong>the</strong>rprocesses, social, meteorological, <strong>and</strong> o<strong>the</strong>rs, occur <strong>in</strong> sequences <strong>of</strong>ris<strong>in</strong>g <strong>and</strong> fall<strong>in</strong>g movements, like waves. Just as waves follow<strong>in</strong>g eacho<strong>the</strong>r on <strong>the</strong> sea do not repeat each o<strong>the</strong>r perfectly, so economic cyclesnever repeat earlier ones exactly ei<strong>the</strong>r <strong>in</strong> duration or <strong>in</strong> amplitude.Never<strong>the</strong>less, <strong>in</strong> both cases, it is almost always possible to detect, even<strong>in</strong> <strong>the</strong> multitude <strong>of</strong> <strong>in</strong>dividual peculiarities <strong>of</strong> <strong>the</strong> phenomena, marks <strong>of</strong>certa<strong>in</strong> approximate uniformities <strong>and</strong> regularities. The eye <strong>of</strong> <strong>the</strong>observer <strong>in</strong>st<strong>in</strong>ctively discovers on waves <strong>of</strong> a certa<strong>in</strong> order o<strong>the</strong>rsmaller waves, so that <strong>the</strong> idea <strong>of</strong> harmonic analysis [...] presents itselfto <strong>the</strong> m<strong>in</strong>d almost spontaneously.The idea <strong>of</strong> harmonic analysis can never<strong>the</strong>less attempted to beachieved by <strong>the</strong> model <strong>of</strong> trend with error. It is done by <strong>the</strong> so-calledmethod <strong>of</strong> periodogram that preceded <strong>the</strong> methods <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>stochastic processes <strong>and</strong> we will briefly consider it.3.1. The periodogram method. Suppose that our observationsmade at discrete moments <strong>of</strong> time, each second, say, can be describedby <strong>the</strong> modelx t = s<strong>in</strong>(λ 0 t + φ) + δ t , t = 0, 1, ..., n (3.1)where λ 0 is some parameter (circular frequency <strong>of</strong> oscillation), φ, <strong>the</strong>phase <strong>of</strong> oscillation <strong>and</strong> δ i , r<strong>and</strong>om error. Suppose that λ 0 is much lessthan 2π, so that successive observations <strong>of</strong> only one component s<strong>in</strong>(λ 0 t+ φ) would provide a clearly seen s<strong>in</strong>e curve each unit <strong>of</strong> time (whichis much shorter than <strong>the</strong> period <strong>of</strong> oscillation, 2π/λ 0 ). The addition <strong>of</strong>r<strong>and</strong>om errors (suppose, for <strong>the</strong> sake <strong>of</strong> simplicity, <strong>in</strong>dependent) willcerta<strong>in</strong>ly corrupt <strong>the</strong> picture. So how to reconstruct <strong>the</strong> frequency λ 0 ?Multiply our observations x i by s<strong>in</strong>(λt) <strong>and</strong> cos(λt) where λ is avariable, <strong>and</strong> consider <strong>the</strong> sums64


nnA(λ) x s<strong>in</strong> λ t, B(λ) = x cosλ t.= ∑ ∑tt= 1 t=1tIn previous times this calculation for various values <strong>of</strong> fairly many λwas ra<strong>the</strong>r tedious, but computers removed that difficulty. Calculatenow <strong>the</strong> functionC(λ) = A 2 (λ) + B 2 (λ)called periodogram. Formerly, it was imag<strong>in</strong>ed as a function <strong>of</strong> <strong>the</strong>period, 2π/λ, ra<strong>the</strong>r than <strong>of</strong> frequency λ, which expla<strong>in</strong>s <strong>the</strong> orig<strong>in</strong> <strong>of</strong>that term.The ma<strong>in</strong> statement is that, given a sufficiently large number <strong>of</strong>observations n, <strong>the</strong> periodogram as a function <strong>of</strong> λ will take a clearlyexpressed maximal value <strong>in</strong> a small vic<strong>in</strong>ity <strong>of</strong> <strong>the</strong> real frequency λ 0 . If<strong>the</strong> determ<strong>in</strong>ate part <strong>of</strong> <strong>the</strong> observations consists <strong>of</strong> several harmonicss<strong>in</strong>(λ 0 t + φ) ra<strong>the</strong>r than one; that is, ifkx = ∑ A s<strong>in</strong>(λ t + φ ) + δ ,(3.2)t j j j tj=0<strong>the</strong>n <strong>the</strong> periodogram will have several maximal values situated closeto λ 0 , ..., λ k . Their heights will depend on <strong>the</strong> number <strong>of</strong> observations,n, <strong>and</strong> amplitudes, A j . When not know<strong>in</strong>g beforeh<strong>and</strong> <strong>the</strong> number <strong>of</strong>harmonics <strong>and</strong> <strong>the</strong> variances σ 2 <strong>of</strong> <strong>the</strong> errors δ i , we f<strong>in</strong>d ourselves <strong>in</strong> ara<strong>the</strong>r difficult situation. The periodogram generally has very manylocal maxima <strong>and</strong> it is <strong>in</strong>comprehensible how to <strong>in</strong>terpret <strong>the</strong>m, ei<strong>the</strong>ras really correspond<strong>in</strong>g to latent periods λ j or as occurr<strong>in</strong>g purelyr<strong>and</strong>omly 5 .These difficulties can be somehow overcome. It is worse that as arule <strong>the</strong>re is no guarantee that model (3.2) is valid. We can likely besure that it is not. For example, when study<strong>in</strong>g series <strong>of</strong> observationstaken from economics, it is seen by naked eye that <strong>the</strong>y ra<strong>the</strong>rsmoothly depend on time (<strong>the</strong>y rarely change from <strong>in</strong>creas<strong>in</strong>g todecreas<strong>in</strong>g or vice versa) which should not occur for observationsrepresented by model (3.2): <strong>the</strong>y ought to be scattered around <strong>the</strong>smooth curve, <strong>the</strong> ma<strong>in</strong> term at <strong>the</strong> right side <strong>of</strong> (3.2). We maycerta<strong>in</strong>ly assume that that curve itself badly corresponds to our idea <strong>of</strong>a smooth curve <strong>and</strong> that its roughness compensated r<strong>and</strong>om scatter, butan unreasonably large number <strong>of</strong> harmonics is needed for that tohappen.The most important problem <strong>the</strong>refore consists <strong>in</strong> determ<strong>in</strong><strong>in</strong>g howreasonable are <strong>the</strong> results provided by <strong>the</strong> periodogram method when<strong>the</strong> model (3.2) is wrong <strong>and</strong> <strong>the</strong> observational series {x} is describedby some o<strong>the</strong>r model. The generally known English statistician M. G.Kendall [Sir Maurice Kendall] carried out such experimental studiesdescribed <strong>in</strong> his ra<strong>the</strong>r rare book (1946) on ma<strong>the</strong>matical statistics.This small contribution is one <strong>of</strong> <strong>the</strong> most remarkable books onma<strong>the</strong>matical statistics. Its epigraph is curious:65


To George Udny YuleTo borrow a strik<strong>in</strong>g illustration from Abraham Tucker, <strong>the</strong>substructure <strong>of</strong> our convictions is not so much to be compared to <strong>the</strong>solid foundations <strong>of</strong> an ord<strong>in</strong>ary build<strong>in</strong>g, as to <strong>the</strong> piles <strong>of</strong> <strong>the</strong> houses<strong>of</strong> Rotterdam which rest somehow <strong>in</strong> a deep bed <strong>of</strong> s<strong>of</strong>t mud.J. A. Venn, The Logic <strong>of</strong> Chance [1886] 6We (§ 1.1) stated that sciences <strong>of</strong> perfum<strong>in</strong>g do emerge [...] flourish[...] but do not live long. This had <strong>in</strong>deed happened to <strong>the</strong> periodogrammethod which was ru<strong>in</strong>ed <strong>in</strong> particular by Kendall (1946). Heconsiders <strong>the</strong> model <strong>of</strong> autoregression (we will soon deal with it)which is as applicable as model (3.2) if not to a greater extent toanalyz<strong>in</strong>g series <strong>in</strong> economics. And <strong>in</strong> case <strong>of</strong> that model <strong>the</strong>periodogram method isolates frequencies that have absolutely noth<strong>in</strong>g<strong>in</strong> common with its structure. Kendall concludes his op<strong>in</strong>ion about thatmethod <strong>in</strong> a brief sentence: As mislead<strong>in</strong>g as it could be.It seems <strong>in</strong> particular that exactly <strong>in</strong> <strong>the</strong> same way Kendall regards<strong>the</strong> works <strong>of</strong> <strong>the</strong> renown English economist Beveridge who iscelebrated due to his compilation <strong>and</strong> analysis by <strong>the</strong> periodogrammethod a few long series <strong>in</strong> economics, for example <strong>of</strong> cost <strong>of</strong> wheat<strong>in</strong> Europe cover<strong>in</strong>g 370 years. It could have been <strong>in</strong>terest<strong>in</strong>g to know<strong>the</strong> considerations that had guided him while compil<strong>in</strong>g that series <strong>and</strong>whe<strong>the</strong>r it was done properly, but this is likely impossible. Beveridgecompiled a periodogram <strong>and</strong> isolated many periods <strong>in</strong> his series whichare likely senseless.3.2. Stochastic processes. Nowadays correlation <strong>and</strong> spectral<strong>the</strong>ories <strong>of</strong> stationary stochastic processes are applied <strong>in</strong>stead <strong>of</strong>periodograms. A stochastic process is a function <strong>of</strong> variable t <strong>of</strong>ten butnot necessarily play<strong>in</strong>g <strong>the</strong> part <strong>of</strong> time <strong>and</strong> <strong>of</strong> an elementary r<strong>and</strong>omevent ω. We will denote a stochastic process <strong>in</strong> an abbreviated form asξ i leav<strong>in</strong>g aside <strong>the</strong> r<strong>and</strong>om argument ω s<strong>in</strong>ce <strong>the</strong> functionaldependence <strong>of</strong> <strong>the</strong> stochastic process on w is never considered <strong>in</strong>applications. Had we desired to describe clearly <strong>the</strong> space <strong>of</strong>elementary events, Ω = {ω}, <strong>the</strong> separate elementary events wouldhave been as a rule extremely complicated. Thus, a separateelementary event is <strong>of</strong>ten understood as a function ω = ω(t) <strong>of</strong>argument t.In that case, <strong>the</strong> value <strong>of</strong> <strong>the</strong> stochastic process at moment t <strong>and</strong>elementary event ω is ω(t) which is a tautology pure <strong>and</strong> simple <strong>and</strong>practically does not lead anywhere. Such an underst<strong>and</strong><strong>in</strong>g isnecessary for develop<strong>in</strong>g an axiomatic <strong>the</strong>ory but it is not practicallyapplicable. Applications <strong>in</strong>variably discuss only distributions <strong>of</strong>probabilities <strong>of</strong> process ξ i at some moments t 1 , t 2 , ..., t n . Two cases arepossible with time t tak<strong>in</strong>g discrete values (observations are made atdiscrete moments <strong>of</strong> time) or cont<strong>in</strong>uous values on some <strong>in</strong>terval.The concept <strong>of</strong> stochastic process with cont<strong>in</strong>uous time dem<strong>and</strong>s tobe very cautiously treated. When underst<strong>and</strong><strong>in</strong>g <strong>the</strong> relevantma<strong>the</strong>matical <strong>the</strong>orems too seriously, <strong>the</strong> realizations <strong>of</strong>ten acquireparadoxical properties able to direct <strong>the</strong> researcher’s m<strong>in</strong>d along awrong route. I [i] mentioned <strong>the</strong> paradox concerned with <strong>the</strong> property66


<strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical model <strong>of</strong> <strong>the</strong> Brownian motion allow<strong>in</strong>g todeterm<strong>in</strong>e precisely <strong>the</strong> coefficient <strong>of</strong> diffusion given observations <strong>of</strong> ahowever small <strong>in</strong>terval <strong>of</strong> a realization <strong>of</strong> that motion. He who believesthat this is <strong>in</strong>deed true for a physical Brownian motion will be wrong.Here is ano<strong>the</strong>r such example. Any broadcast<strong>in</strong>g station istransmitt<strong>in</strong>g over a waveb<strong>and</strong> <strong>of</strong> restricted width. If a radio signal isconsidered as a stochastic process, its spectrum will be conta<strong>in</strong>ed <strong>in</strong>that f<strong>in</strong>ite <strong>in</strong>terval. And <strong>the</strong>re exists a ma<strong>the</strong>matical <strong>the</strong>orem stat<strong>in</strong>gthat with probability 1 a realization ξ i <strong>of</strong> such a process is an analyticalfunction <strong>of</strong> t. Consequently, after listen<strong>in</strong>g for any however short<strong>in</strong>terval <strong>of</strong> time, we may unambiguously establish what was <strong>and</strong> whatwill be broadcast, an obviously absurd conclusion.It is certa<strong>in</strong>ly easy to <strong>in</strong>dicate <strong>the</strong> mistake here. First, a broadcast isnot a stochastic process s<strong>in</strong>ce it is not an element <strong>of</strong> some statisticalensemble; second, an analytical function can be reconstructed given itsvalues on any <strong>in</strong>terval only if <strong>the</strong>y are given absolutely preciselywhich is impossible for a function <strong>of</strong> a cont<strong>in</strong>uous variable. Even as<strong>in</strong>gle number can not be written down precisely, much less a totality<strong>of</strong> an <strong>in</strong>f<strong>in</strong>itely many numbers. Third, a radio signal is not an analyticalfunction <strong>of</strong> time because <strong>in</strong> <strong>the</strong> 19 th century <strong>the</strong>re were no broadcast<strong>in</strong>gstations whereas an analytical function vanish<strong>in</strong>g on some <strong>in</strong>tervalvanishes everywhere.A digression about <strong>the</strong> concept <strong>of</strong> function <strong>in</strong> ma<strong>the</strong>matics is <strong>in</strong>order here 7 . At <strong>the</strong> emergence <strong>of</strong> ma<strong>the</strong>matical analysis it was usuallyunderstood as a formula determ<strong>in</strong><strong>in</strong>g a dependence y = y(x). And allfunctions except at a few po<strong>in</strong>ts were cont<strong>in</strong>uous <strong>and</strong> differentiable.The problem concern<strong>in</strong>g <strong>the</strong> pro<strong>of</strong> <strong>of</strong> differentiability did not evenexist. Later, however, <strong>in</strong> <strong>the</strong> 19 th century an idea was established that afunction is simply a relation between <strong>the</strong> sets <strong>of</strong> values <strong>of</strong> <strong>the</strong> argumentx <strong>and</strong> <strong>the</strong> function y = y(x). It is usually dem<strong>and</strong>ed that exactly onevalue <strong>of</strong> y corresponded to each value <strong>of</strong> x, but that <strong>the</strong> <strong>in</strong>verse was notnecessarily true. And <strong>the</strong>re was no cause for an arbitrarycorrespondence y = y(x) to be cont<strong>in</strong>uous or differentiable.It is ra<strong>the</strong>r difficult but <strong>the</strong>refore <strong>in</strong>terest<strong>in</strong>g to provide an example<strong>of</strong> a cont<strong>in</strong>uous but nowhere differentiable function. The first suchexample was due to Weierstrass, later o<strong>the</strong>r <strong>and</strong> more simple exampleswere discovered. Such objects proved very <strong>in</strong>terest<strong>in</strong>g forma<strong>the</strong>maticians <strong>and</strong> to <strong>the</strong>m <strong>the</strong>ir attention had been to a large extentswung. For us, it is especially <strong>in</strong>terest<strong>in</strong>g that ma<strong>the</strong>maticalconsiderations concern<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes lead to<strong>the</strong> realization <strong>of</strong> many such processes which should be recognized ascont<strong>in</strong>uous but not differentiable functions (or functions only twice,say, differentiable with a cont<strong>in</strong>uous but not anymore differentiablesecond derivative).This was <strong>in</strong>deed joyful because it apparently proved that nondifferentiablefunctions <strong>in</strong>deed existed <strong>in</strong> nature. However, we wish tocast a shadow on that joy: it is absolutely absurd to believe that such afunction can be experimentally observed. Such a realization <strong>of</strong> astochastic process can not be given ei<strong>the</strong>r by a formula, or a table, agraph, or an algorithm <strong>of</strong> calculation. When consider<strong>in</strong>g it <strong>in</strong>deed real,exactly known at all <strong>of</strong> its po<strong>in</strong>ts, we will be able to come to absurd67


conclusions. The same concerns realizations <strong>of</strong> such processes whichshould be analytic functions. Here, we will discuss stochasticprocesses with discrete time which do not tacitly conta<strong>in</strong> suchparadoxes as processes with cont<strong>in</strong>uous time <strong>and</strong> <strong>in</strong> general we mayusually state that <strong>the</strong> observed values ξ i are precisely known.One remark concern<strong>in</strong>g term<strong>in</strong>ology. In Russian literature, <strong>the</strong> termtime series usually denotes a stochastic (<strong>and</strong> <strong>of</strong>ten, a stationarystochastic) process, see its def<strong>in</strong>ition <strong>in</strong> Chapter 1. In <strong>the</strong> Englishliterature, however, <strong>the</strong> same term denotes <strong>the</strong> values <strong>of</strong> any variable<strong>in</strong>clud<strong>in</strong>g non-r<strong>and</strong>om ones depend<strong>in</strong>g on time <strong>and</strong> observed at itsdiscrete moments. Here, we call such objects observational series <strong>and</strong>will not apply <strong>the</strong> term time series but ra<strong>the</strong>r ei<strong>the</strong>r stochastic process(when r<strong>and</strong>omness is supposed to exist) or observational series (whenit can exist or not). Because <strong>of</strong> causes described <strong>in</strong> Chapter 1,stationary stochastic processes are play<strong>in</strong>g <strong>the</strong> ma<strong>in</strong> part.The concept <strong>of</strong> stochastic process allows us to imag<strong>in</strong>e a jo<strong>in</strong>tdistribution <strong>of</strong> r<strong>and</strong>om variables ξ i although usually <strong>the</strong> discussion isonly restricted to bivariate distributions, <strong>and</strong> only <strong>the</strong> expectation <strong>and</strong><strong>the</strong> correlation functionm(t) = Eξ i , B(s, t) = E[(ξ s − m(s)] [(ξ t − m(t)]are studied. For stationary processes distributions <strong>of</strong> probabilities donot change <strong>in</strong> time, som(t) = Eξ i = m (3.3)does not depend on t <strong>and</strong> <strong>the</strong> correlation function only depends on <strong>the</strong>difference <strong>of</strong> <strong>the</strong> arguments, (t – s):B(s, t) = B(t – s). (3.4)A process only satisfy<strong>in</strong>g conditions (3.3) <strong>and</strong> (3.4) is calledstationary <strong>in</strong> <strong>the</strong> wide sense. Exactly this is <strong>the</strong> ma<strong>in</strong> concept withwhich modern ma<strong>the</strong>matical statistics is advis<strong>in</strong>g to approachobservational series. The <strong>the</strong>ory <strong>of</strong> stochastic processes only deal<strong>in</strong>gwith mean value <strong>and</strong> correlation function is called correlation <strong>the</strong>ory.We will consider it now.3.3. Correlation <strong>and</strong> spectral <strong>the</strong>ories. The ma<strong>in</strong> achievement <strong>of</strong><strong>the</strong> general <strong>the</strong>ory <strong>of</strong> stationary stochastic processes is <strong>the</strong> <strong>the</strong>oremestablish<strong>in</strong>g that <strong>in</strong> <strong>the</strong> general case <strong>the</strong> correlation function can berepresented asπB( s, t) = B( t − s) = ∫ cos λ( t − s) dF(λ)(3.5)−πwhere F(λ) is a restricted non-decreas<strong>in</strong>g function. It is usuallypresumed that <strong>the</strong>re exists a spectral density, i. e., such function f(x) ≥0 that68


dF(λ) = f (λ) dλ, so that B( t − s) = cosλ( t − s) f (λ) dλ.π∫−πSpectral analysis, that is, an experimental determ<strong>in</strong>ation <strong>of</strong> <strong>the</strong>spectral density f(λ), is <strong>the</strong>refore sometimes expla<strong>in</strong>ed as <strong>the</strong>determ<strong>in</strong>ation <strong>of</strong> <strong>the</strong> variances <strong>of</strong> <strong>the</strong> separate r<strong>and</strong>om components <strong>of</strong><strong>the</strong> process. For practically apply<strong>in</strong>g <strong>the</strong> correlation or spectral <strong>the</strong>oryit is necessary, first, to f<strong>in</strong>d out <strong>the</strong> practical conclusions possible from<strong>the</strong> correlation function or spectrum (spectral density); <strong>and</strong>, second, tobe able to estimate <strong>the</strong> correlation function (or spectral density) byobservations.That correlation function is normally applied <strong>in</strong> statistical problems.For example, <strong>the</strong> variance <strong>of</strong> <strong>the</strong> arithmetic mean ξ is expressedthrough <strong>the</strong> sum <strong>of</strong> paired covariations, i. e., through a correlationfunction. It can also be expressed through <strong>the</strong> spectral density.However, an estimate <strong>of</strong> a spectrum, or <strong>of</strong> a correlation function, issometimes applied as a magic remedy allegedly mak<strong>in</strong>g it possible topenetrate <strong>the</strong> essence <strong>of</strong> <strong>the</strong> observed process. It should be clearlyimag<strong>in</strong>ed that <strong>the</strong> correlation <strong>the</strong>ory generally deals with suchcharacteristics that are far from determ<strong>in</strong><strong>in</strong>g <strong>the</strong> process as a whole <strong>and</strong><strong>of</strong>ten only provides a superficial <strong>in</strong>formation about it. If we are<strong>in</strong>terested <strong>in</strong> some problem <strong>of</strong> its structure, we must be able t<strong>of</strong>ormulate it <strong>in</strong> terms <strong>of</strong> <strong>the</strong> correlation <strong>the</strong>ory while bear<strong>in</strong>g <strong>in</strong> m<strong>in</strong>dthat usually we do not know precisely ei<strong>the</strong>r <strong>the</strong> correlation function or<strong>the</strong> spectral density but estimate <strong>the</strong>m by observations. We should thusconsider comparatively rough characteristics determ<strong>in</strong>able by issu<strong>in</strong>gfrom non-precise data.For example, <strong>the</strong>re exists <strong>the</strong> so-called method <strong>of</strong> canonicalexpansion whose application dem<strong>and</strong>s <strong>the</strong> knowledge <strong>of</strong> <strong>the</strong>eigenfunctions <strong>of</strong> an <strong>in</strong>tegral equation <strong>in</strong> which a correlation function<strong>of</strong> a process is <strong>in</strong>cluded as a series. This method ought to berecognized as practically hopeless because <strong>the</strong> <strong>in</strong>accuracy <strong>of</strong> <strong>the</strong>equation’s kernel very essentially <strong>in</strong>fluences <strong>the</strong> eigenfunctions. I donot know about any practical application <strong>of</strong> that method. All so-calledapplications issue from arbitrarily given correlation functions <strong>and</strong> donot deal with statistical material.The estimation <strong>of</strong> <strong>the</strong> correlation function <strong>and</strong> spectrum is ra<strong>the</strong>rcomplicated. At first you should estimate <strong>and</strong> subtract <strong>the</strong> mean valuem <strong>of</strong> <strong>the</strong> process. Its estimate is <strong>the</strong> arithmetic mean mˆ = ξ . Theestimate <strong>of</strong>B(u) = Eξ t ξ t+n – m 2will beˆ 1B u m u nn−u2( ) = ∑ ξ ξ ˆt t+u− , = 0, 1, ..., −1.n − u t=169


It possesses a number <strong>of</strong> unpleasant properties. First, for an ergodicprocess <strong>the</strong> actual values <strong>of</strong> B(u) rapidly decrease with an <strong>in</strong>crease <strong>of</strong>u. However, <strong>the</strong> st<strong>and</strong>ard deviations <strong>of</strong> <strong>the</strong> estimates B ˆ( u ) are roughly<strong>the</strong> same for any u <strong>and</strong> have order 1/ n − u.Thus, for u <strong>of</strong> <strong>the</strong> order<strong>of</strong> a few dozen <strong>the</strong> magnitudes B(u) <strong>the</strong>mselves are very small, onlyhundredth <strong>and</strong> thous<strong>and</strong>th parts <strong>of</strong> B(0) whereas <strong>the</strong> st<strong>and</strong>arddeviations (if n is not too large), tenth parts <strong>of</strong> B(0) so that <strong>the</strong> estimateis senseless.Second, <strong>the</strong>se estimates B ˆ( u ) when <strong>the</strong> values u are close to eacho<strong>the</strong>r are not scattered chaotically near <strong>the</strong> real values because <strong>the</strong>neighbour<strong>in</strong>g estimates B ˆ( u ) , B ˆ( u + 1) , B ˆ( u + 2) , ... are correlatedwith each o<strong>the</strong>r. When look<strong>in</strong>g at a graph <strong>of</strong> <strong>the</strong>ir values <strong>the</strong> eyeautomatically selects ra<strong>the</strong>r regular oscillations, see Fig. 3, atunreasonably large values <strong>of</strong> u where actually B(u) can not bedist<strong>in</strong>guished from zero. Therefore, when estimat<strong>in</strong>g <strong>the</strong> correlationfunction we can not trust our eyes <strong>and</strong> all our actions becomeuncerta<strong>in</strong>.The estimation <strong>of</strong> <strong>the</strong> spectral density f(λ) is preferable. Whenestimat<strong>in</strong>g it at po<strong>in</strong>ts λ = λ 1 , λ 2 , ..., λ m not too close to each o<strong>the</strong>r, <strong>the</strong>respective estimates f ˆ(λ i) will be almost <strong>in</strong>dependent r<strong>and</strong>omvariables, a fact first discovered by Slutsky. For estimat<strong>in</strong>g <strong>the</strong> spectraldensity we apply <strong>the</strong> same periodogram only suitably normed. It ishowever very <strong>in</strong>dent because its variance does not tend to vanish as <strong>the</strong>number <strong>of</strong> observations <strong>in</strong>creases. Therefore <strong>the</strong> periodogram issmoo<strong>the</strong>d, i. e. a mean value with some weight is taken 8 <strong>and</strong> we obta<strong>in</strong>an estimate not <strong>of</strong> <strong>the</strong> spectral density itself but <strong>of</strong> <strong>the</strong> functionresult<strong>in</strong>g from tak<strong>in</strong>g its mean with <strong>the</strong> same weight. This means that<strong>the</strong> <strong>in</strong>terval <strong>of</strong> tak<strong>in</strong>g <strong>the</strong> mean should be small. However, thatprocedure when a small <strong>in</strong>terval is chosen will little decrease <strong>the</strong>variance <strong>of</strong> <strong>the</strong> periodogram. Practical recommendations are here aresult <strong>of</strong> a compromise between <strong>the</strong>se contradictory dem<strong>and</strong>s.I can not go <strong>in</strong>to details <strong>of</strong> ma<strong>the</strong>matical tricks <strong>and</strong> I ought to saythat textbooks on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes do not usuallydescribe <strong>the</strong> estimation <strong>of</strong> <strong>the</strong> correlation function or spectral density<strong>in</strong> any scientific manner. As I noted above, textbooks prefer to issuefrom a stochastic process given along with its correlation function.As a very reliable source <strong>of</strong> <strong>in</strong>formation concern<strong>in</strong>g statisticalproblems, I can cite Hannan (1960). This book is, however, veryconcise <strong>and</strong> difficult to read. Jenk<strong>in</strong>s & Watts [1971 – 1972] is easierto read, but less reliable. For example, <strong>the</strong>y do not say sufficientlyclearly that none <strong>of</strong> <strong>the</strong> provided formulas for <strong>the</strong> variances <strong>of</strong> <strong>the</strong>estimates <strong>of</strong> <strong>the</strong> correlation function <strong>and</strong> spectral density is at allapplicable to each stationary process; some strong conditionsma<strong>the</strong>matically express<strong>in</strong>g <strong>the</strong> property <strong>of</strong> ergodicity are necessary.Never<strong>the</strong>less, that book is usable although regrettably <strong>the</strong>ir practicalexamples should be studied very critically.I wish to warn <strong>the</strong> reader who will study <strong>the</strong> sources <strong>in</strong>dicated that<strong>the</strong> <strong>in</strong>itial material on which <strong>the</strong> methods <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochasticprocesses had been developed mostly consisted <strong>of</strong> economic data,70


usually very little <strong>of</strong> <strong>the</strong>m. Indeed, we may trace <strong>the</strong> change <strong>of</strong> someeconomic <strong>in</strong>dicator over decades or at best over a few centuries (as <strong>in</strong><strong>the</strong> case <strong>of</strong> <strong>the</strong> Beveridge series). A year usually means oneobservation (o<strong>the</strong>rwise seasonal periodicity which we should somehowdeal with will <strong>in</strong>terfere, <strong>and</strong> <strong>in</strong> general most economic <strong>in</strong>dicators arecalculated on a yearly basis). We <strong>the</strong>refore have tens or hundreds <strong>of</strong>observations whereas calculations show that for a reliable estimate <strong>of</strong><strong>the</strong> correlation function or spectral density we need thous<strong>and</strong>s <strong>and</strong> tens<strong>of</strong> thous<strong>and</strong>s <strong>of</strong> <strong>the</strong>m. Already Kendall (1946) formulated thisconclusion <strong>in</strong> respect <strong>of</strong> <strong>the</strong> former.As a result, ma<strong>the</strong>maticians attempt to atta<strong>in</strong> someth<strong>in</strong>g by select<strong>in</strong>gan optimal method <strong>of</strong> smooth<strong>in</strong>g periodograms, but with a smallnumber <strong>of</strong> observations this method is generally hopeless. The realapplicability <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes is <strong>in</strong> <strong>the</strong> spherewhere any number <strong>of</strong> observations is available. Radio physicists havelong ago developed methods allow<strong>in</strong>g easily <strong>and</strong> simply to obta<strong>in</strong>estimates <strong>of</strong> <strong>the</strong> spectrum <strong>of</strong> a stochastic process if unnecessary toeconomize on <strong>the</strong> number <strong>of</strong> observations. They apply systems <strong>of</strong>filters separat<strong>in</strong>g b<strong>and</strong>s <strong>of</strong> frequencies (Mon<strong>in</strong> & Jaglom 1967, pt. 2).3.4. A survey <strong>of</strong> practical applications. Among <strong>the</strong> creators <strong>of</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> stochastic processes who had also dealt with statisticalmaterials we should mention Yale, Slutsky <strong>and</strong> M. G. Kendall (<strong>and</strong>most important are Kolmogorov’s contributions, see below). Thoseworks had appeared even before World War II, that is, when automaticmeans <strong>of</strong> treat<strong>in</strong>g <strong>the</strong> material were unavailable, <strong>and</strong> <strong>the</strong>se pioneershad to work with hundreds <strong>of</strong> observations at <strong>the</strong> most.Thous<strong>and</strong>s <strong>and</strong> tens <strong>of</strong> thous<strong>and</strong>s are needed <strong>in</strong> <strong>the</strong> correlation<strong>the</strong>ory because we are attempt<strong>in</strong>g to f<strong>in</strong>d out too much, not an estimate<strong>of</strong> one or a few parameters, but <strong>in</strong>f<strong>in</strong>itely many magnitudes B(u), u =0, 1, 2, ... i. e. <strong>the</strong> correlation function (or spectral density, <strong>the</strong> functionf(λ) for λ tak<strong>in</strong>g values on [− π, π]).We can choose ano<strong>the</strong>r approach for achiev<strong>in</strong>g practically effectivemethods <strong>of</strong> correlation <strong>the</strong>ory when hav<strong>in</strong>g a small number <strong>of</strong>observations, namely, look<strong>in</strong>g for models depend<strong>in</strong>g on a smallnumber <strong>of</strong> parameters. Slutsky provided one such model, <strong>the</strong> model <strong>of</strong>mov<strong>in</strong>g average. Imag<strong>in</strong>e an <strong>in</strong>f<strong>in</strong>ite sequence <strong>of</strong> <strong>in</strong>dependentidentically distributed r<strong>and</strong>om variables... ξ −1 , ξ 0 , ξ 1 , ..., ξ n , ...<strong>in</strong>stead <strong>of</strong> which we observe <strong>the</strong> sequence... ς −1 , ς 0, ς 1, .. , ς n, ... (3.6)wheremς = ∑ α ξ .(3.7)n k n−kk = 0In o<strong>the</strong>r words, ς n is a sum <strong>of</strong> some number <strong>of</strong> <strong>in</strong>dependentmagnitudes ξ n−k multiplied by suitable α k . Slutsky modelled <strong>the</strong> system71


{ξ n }; for obta<strong>in</strong><strong>in</strong>g ς n he superimposed a frame with a w<strong>in</strong>dow throughwhich ξ n , ξ n−1 , ..., ξ n−k were seen. For obta<strong>in</strong><strong>in</strong>g ξ n+1 <strong>the</strong> frame wasmoved one step to <strong>the</strong> right, hence <strong>the</strong> term, mov<strong>in</strong>g average. Numbersα 0 , ..., α m were parameters.He showed that his model could provide a picture <strong>of</strong> wavyoscillations very similar to oscillations <strong>of</strong> economic <strong>in</strong>dicators.However, he did not state that all <strong>the</strong> statistical properties <strong>of</strong> someobservational series taken from practice are thus described. As far as Iknow, no such examples are provided <strong>in</strong> careful statistical works.It is necessary to say here that statistical work with observationalseries dem<strong>and</strong>s versatile statistical checks <strong>of</strong> <strong>the</strong> adopted model.Slutsky, as well as <strong>the</strong> representatives <strong>of</strong> <strong>the</strong> serious English schoolsuch as Yule <strong>and</strong> Kendall 9 understood it perfectly well but this attitudeis now regrettably lost, certa<strong>in</strong>ly if hav<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d an average work onapplications <strong>of</strong> stochastic processes.A conviction that <strong>the</strong>se processes must be universally applicable ischaracteristic for <strong>the</strong> bulk <strong>of</strong> publications <strong>and</strong>, as a result, <strong>the</strong> ma<strong>in</strong>premises with <strong>the</strong> most important <strong>of</strong> <strong>the</strong>m, that <strong>the</strong> phenomenon itselfshould be <strong>of</strong> a statistical ra<strong>the</strong>r than <strong>of</strong> just an <strong>in</strong>determ<strong>in</strong>ate essence,are not checked at all. A current <strong>of</strong> publications thus appears which donot deserve to be seriously considered at all. It is a fact that only a fewworks are left for be<strong>in</strong>g seriously analyzed.Among <strong>the</strong>se latter we mention first <strong>of</strong> all Yule (1927). He studies<strong>the</strong> change <strong>of</strong> <strong>the</strong> number <strong>of</strong> solar spots <strong>in</strong> time. First <strong>of</strong> all Yulerejects <strong>the</strong> model <strong>of</strong> periodogram because <strong>in</strong> that case r<strong>and</strong>omness isonly <strong>in</strong>herent <strong>in</strong> <strong>the</strong> errors <strong>of</strong> our measurements <strong>and</strong> does not at all<strong>in</strong>fluence <strong>the</strong> course <strong>of</strong> <strong>the</strong> process itself. He remarks that we ought tohave some such model <strong>in</strong> which a r<strong>and</strong>om <strong>in</strong>terference <strong>in</strong>fluences <strong>the</strong>subsequent behaviour <strong>of</strong> <strong>the</strong> process.Imag<strong>in</strong>e for example that we observe <strong>the</strong> oscillation <strong>of</strong> a pendulumbut that naughty boys have begun to shoot it with peas. Each r<strong>and</strong>omhit changes its velocity <strong>and</strong> <strong>the</strong>refore <strong>in</strong>fluences <strong>the</strong> entire subsequentprocess. It is difficult to expect here statistically homogeneousshoot<strong>in</strong>g, but <strong>in</strong> real processes, such as solar activity or economic lifestatistical homogeneity <strong>of</strong> r<strong>and</strong>om <strong>in</strong>terference sometimes possiblyexists.Let us observe <strong>the</strong> position <strong>of</strong> <strong>the</strong> pendulum at discrete moments <strong>of</strong>time (but sufficiently <strong>of</strong>ten, so that many observations will occurdur<strong>in</strong>g one period <strong>of</strong> <strong>the</strong> <strong>in</strong>itial oscillations). We will obta<strong>in</strong> a sequence<strong>of</strong> observationsξ 0 , ξ 0 , ..., ξ n , ...<strong>and</strong> Yule supposes that it can be described by a model <strong>of</strong> <strong>the</strong> typeξ n + aξ n−1 + bξ n−2 = δ n (3.8)where a <strong>and</strong> b are numbers (parameters <strong>of</strong> <strong>the</strong> model), {δ n }, asequence <strong>of</strong> identically distributed <strong>in</strong>dependent r<strong>and</strong>om variables suchthat δ n does not depend on ξ n−1 , ξ n−2 , ..., Eδ n = 0 <strong>and</strong> σ 2 = varδ n is <strong>the</strong>third parameter <strong>of</strong> <strong>the</strong> model.72


This is <strong>the</strong> celebrated model <strong>of</strong> autoregression (<strong>of</strong> <strong>the</strong> second order)which was applied by many statisticians deserv<strong>in</strong>g complete trust.Yule’s considerations lead<strong>in</strong>g to model (3.8) were, however, not quiteclear. In particular, for <strong>the</strong> case <strong>of</strong> <strong>the</strong> pendulum, {δ n } is not asequence <strong>of</strong> <strong>in</strong>dependent r<strong>and</strong>om variables but is ra<strong>the</strong>r describable bySlutsky’s mov<strong>in</strong>g average. However, <strong>in</strong>troduc<strong>in</strong>g additionalparameters <strong>of</strong> that average <strong>in</strong>to <strong>the</strong> model will mean hav<strong>in</strong>g too manyparameters <strong>and</strong> extremely complicated work <strong>in</strong> its application.Yule’s mistake certa<strong>in</strong>ly does not logically prove that <strong>the</strong> modelis not applicable to sunspots or some economic <strong>in</strong>dicator, but <strong>of</strong> courseit is a bad omen.Descartes noted that <strong>the</strong> world can be expla<strong>in</strong>ed <strong>in</strong> many differentmanners <strong>and</strong> <strong>the</strong> problem only is, to choose that which is really valid.Most chances to be valid certa<strong>in</strong>ly has that manner which is <strong>the</strong> mostnatural <strong>and</strong> harmonious <strong>and</strong> does not conta<strong>in</strong> contradictions. If,however, it occurs that <strong>the</strong> creator <strong>of</strong> a <strong>the</strong>ory committed a mistake at<strong>the</strong> very outset, even if only concern<strong>in</strong>g a particular case, our chances<strong>of</strong> success <strong>in</strong> o<strong>the</strong>r cases will sharply dim<strong>in</strong>ish.As to sunspots, Yule himself did not achieve a decisive positiveresult. He was compelled to change his model (3.8) by assum<strong>in</strong>g thatwe observe not <strong>the</strong> variables {ξ n } <strong>the</strong>mselves, but that our observationswere corrupted by an additional r<strong>and</strong>om error. He had to make thischange because his model did not pass a statistical check to which hesubjected it, as was supposed to be done. The change <strong>of</strong> <strong>the</strong> modelallows to make ends meet but <strong>in</strong> statistics <strong>in</strong>troduc<strong>in</strong>g an additionalparameter is very bad.In general, Yule’s contribution (1927) is an example <strong>of</strong> a statisticalmasterpiece which, however, provided a dubious (if not negative)result <strong>of</strong>ten happen<strong>in</strong>g exactly with masterpieces.The <strong>in</strong>terest emerged <strong>in</strong> forecast<strong>in</strong>g stochastic processes led ano<strong>the</strong>rrepresentative <strong>of</strong> <strong>the</strong> English school, Moran (1954), to study <strong>the</strong>possibilities <strong>of</strong> apply<strong>in</strong>g model (3.8) for predict<strong>in</strong>g solar activity. S<strong>in</strong>ceδ n does not depend on <strong>the</strong> previous behaviour <strong>of</strong> <strong>the</strong> process, that is, onvariables ξ n−1 , ξ n−2 , ..., <strong>the</strong> best possible method <strong>of</strong> forecast<strong>in</strong>g <strong>the</strong>estimate <strong>of</strong> ξ n from all <strong>the</strong> previous <strong>in</strong>formation is to assume thatˆξ = − ( a ξ + b ξ ).n n−1 n−2Moran did that <strong>and</strong> had showed his result to his friends among radiophysicists who told him that a forecast <strong>of</strong> such a quality could havebeen possible without any science, just by naked eye. And so it was, asproved by an experiment. That was <strong>the</strong> second failure <strong>of</strong> <strong>the</strong> model <strong>of</strong>autoregression.That model possesses, however, an excellent property: it is easilyapplied. Its parameters are easy to estimate , <strong>the</strong> correlation function is<strong>of</strong> <strong>the</strong> k<strong>in</strong>d <strong>of</strong> fad<strong>in</strong>g s<strong>in</strong>usoidal oscillations <strong>and</strong> is comparatively easyto be <strong>in</strong>terpreted. The spectral density is also expressed <strong>in</strong> a simpleway. It made sense <strong>the</strong>refore to test it many times on differ<strong>in</strong>g material<strong>and</strong> hope that cases <strong>in</strong> which it works well enough will be found. It isbest to read about <strong>the</strong> application <strong>of</strong> <strong>the</strong> autoregression model <strong>in</strong>Kendall & Stuart (1968).73


Kendall did that even before Moran’s work (1954) appeared. Herestricted his attention to such values <strong>of</strong> <strong>the</strong> parameters a <strong>and</strong> b <strong>in</strong>formula (3.8) which determ<strong>in</strong>e a stationary process, <strong>and</strong> he mostlyworked with series from economics. Such series rarely oscillate aroundone level creat<strong>in</strong>g a stationary process. They usually have a tendency,a trend. The production <strong>of</strong> electrical energy, say, <strong>in</strong>creasesexponentially <strong>and</strong> <strong>the</strong>refore has a l<strong>in</strong>ear trend when described on alogarithmic scale. The problem consists <strong>in</strong> describ<strong>in</strong>g <strong>the</strong> deviationsdur<strong>in</strong>g different years from <strong>the</strong> general tendency.Kendall thought it possible to determ<strong>in</strong>e <strong>the</strong> trend by some method(but certa<strong>in</strong>ly not by naked eye which is too subjective for a rigorousstatistical school) <strong>and</strong> to subtract it. This additionally complicates <strong>the</strong>statistical structure <strong>of</strong> <strong>the</strong> rema<strong>in</strong><strong>in</strong>g deviations, but <strong>the</strong>re is noth<strong>in</strong>g tobe done about it. Exactly such deviations as though form<strong>in</strong>g astationary process were studied by <strong>the</strong> method <strong>of</strong> autoregression.It is difficult to pronounce a def<strong>in</strong>ite op<strong>in</strong>ion about his results. Insome cases <strong>the</strong> statistical tests were happily passed, but not <strong>in</strong> o<strong>the</strong>rcases. May we consider that success was really achieved <strong>in</strong> thoseformer or should we expla<strong>in</strong> it only by <strong>the</strong> small number <strong>of</strong>observations? And no explanation is known why, for example, <strong>the</strong>model <strong>of</strong> autoregression with <strong>the</strong> trend be<strong>in</strong>g elim<strong>in</strong>ated does not suit<strong>the</strong> series <strong>of</strong> <strong>the</strong> cost <strong>of</strong> wheat but suits <strong>the</strong> total head <strong>of</strong> sheep. Nodecisive success <strong>in</strong> treat<strong>in</strong>g economic series was thus achieved.Kendall (1946) <strong>in</strong>vestigated <strong>the</strong> process <strong>of</strong> autoregressionconstructed accord<strong>in</strong>g to equation (3.8) by means <strong>of</strong> tables <strong>of</strong> r<strong>and</strong>omnumbers; <strong>the</strong> longest <strong>of</strong> <strong>the</strong> modelled series had 480 terms. Inconclud<strong>in</strong>g, let us have a look at <strong>the</strong> empirical estimate <strong>of</strong> a correlationfunction (Fig. 3, dotted l<strong>in</strong>e). See how much <strong>the</strong> estimate differs from<strong>the</strong> real values (cont<strong>in</strong>uous l<strong>in</strong>e) <strong>and</strong> fades considerably slower than<strong>the</strong> real function.Hannan (1960) published an estimate <strong>of</strong> <strong>the</strong> spectral density <strong>of</strong>Kendall’s series. The graphs <strong>of</strong> <strong>the</strong> <strong>the</strong>oretical density <strong>and</strong> its variousestimates are shown on Fig. 4. It is seen that <strong>the</strong>y are pretty littlesimilar to <strong>the</strong> true density. In particular, <strong>the</strong> later takes a maximalvalue near po<strong>in</strong>t λ = π/5 whereas <strong>the</strong> maximal values <strong>of</strong> all <strong>the</strong>estimates are at po<strong>in</strong>t λ = π/15.An unaccustomed eye can imag<strong>in</strong>e that small values <strong>of</strong> <strong>the</strong> spectraldensity are estimated well enough, but noth<strong>in</strong>g <strong>of</strong> <strong>the</strong> sort is reallytak<strong>in</strong>g place. The relative error is here just as great as <strong>in</strong> <strong>the</strong> left side <strong>of</strong><strong>the</strong> graph, i. e., as for large values <strong>of</strong> <strong>the</strong> density. We see that <strong>the</strong>correlation <strong>the</strong>ory, created by <strong>the</strong> founders <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochasticprocesses for treat<strong>in</strong>g discrete observational series, such as <strong>the</strong> number<strong>of</strong> sunspots <strong>in</strong> various years or <strong>the</strong> values <strong>of</strong> economic <strong>in</strong>dicatorsexactly <strong>in</strong> those cases did not atta<strong>in</strong> undoubted success.The idea <strong>of</strong> a ma<strong>the</strong>matical description <strong>of</strong> wavy processesencountered <strong>the</strong> practical difficulty <strong>in</strong> that any proper estimation <strong>of</strong> <strong>the</strong>correlation function dem<strong>and</strong>s not tens or hundreds <strong>of</strong> separateobservations, but (Kendall 1946) tens <strong>and</strong> hundreds <strong>of</strong> pert<strong>in</strong>ent waveswhich means thous<strong>and</strong>s <strong>and</strong> tens <strong>of</strong> thous<strong>and</strong>s observations. On <strong>the</strong>o<strong>the</strong>r h<strong>and</strong>, parametric models such as <strong>the</strong> model <strong>of</strong> autoregression hadnot been conv<strong>in</strong>c<strong>in</strong>gly statistically confirmed. Consequently, <strong>the</strong>74


applications <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes to that material, <strong>and</strong>to forecast<strong>in</strong>g <strong>in</strong> particular, are not sufficiently scientifically justified.The worst circumstance is that many contributions are published <strong>in</strong>that field such as Ivakhnenko & Lapa (1971) which do not sufficientlycheck <strong>the</strong> adopted model statistically <strong>and</strong> <strong>the</strong>refore can not beconsidered seriously.The situation would have been quite bad but at <strong>the</strong> same time newfields <strong>of</strong> application <strong>of</strong> <strong>the</strong> correlation <strong>the</strong>ory <strong>in</strong> aero-hydrodynamics<strong>and</strong> physics which constitute <strong>the</strong> real worth <strong>of</strong> that <strong>the</strong>ory werecreated. We will <strong>in</strong>deed consider <strong>the</strong>se applications.3.5. Processes with stationary <strong>in</strong>crements. When hav<strong>in</strong>g somema<strong>the</strong>matical tool <strong>and</strong> wish<strong>in</strong>g to describe natural phenomena by itsmeans, <strong>the</strong> most important consideration is, not to ask nature for toomuch, not to attempt to apply that tool <strong>in</strong> cases <strong>in</strong> which it is helpless.Thus, when imag<strong>in</strong><strong>in</strong>g some wavy phenomenon, we would have likedto apply <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stationary stochastic processes for describ<strong>in</strong>g it.However, it was gradually understood that <strong>the</strong> largest waves <strong>in</strong> <strong>the</strong>observed process can ei<strong>the</strong>r be not <strong>of</strong> a statistical essence at all, or thatour observations conta<strong>in</strong> <strong>in</strong>sufficient data for determ<strong>in</strong><strong>in</strong>g <strong>the</strong>irstatistical characteristics, or, f<strong>in</strong>ally, that a purely statistical descriptioncan be short <strong>of</strong> our aims.For example, <strong>the</strong> cyclic recurrence <strong>of</strong> economic life apparently hasall <strong>the</strong>se <strong>in</strong>dications. Here, we can not on pr<strong>in</strong>ciple consider aphenomenon as statistical because only one realization <strong>and</strong> nostatistical ensemble is available. And <strong>of</strong> course we usually have<strong>in</strong>sufficient observations. F<strong>in</strong>ally, a statistical description does notsatisfy us because we need to know, for example, not how one orano<strong>the</strong>r decl<strong>in</strong>e or rise is develop<strong>in</strong>g <strong>in</strong> <strong>the</strong> mean but what happenswith <strong>the</strong> particular decl<strong>in</strong>e or rise exist<strong>in</strong>g this moment.It is absolutely impossible to reckon on describ<strong>in</strong>g phenomena <strong>of</strong><strong>the</strong> largest scale <strong>in</strong> <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> stochastic processes.The situation is different for phenomena on a small scale; <strong>in</strong> such casesperhaps someth<strong>in</strong>g can be done. Take ano<strong>the</strong>r example, <strong>the</strong> course <strong>of</strong>meteorological processes. It is absolutely clear that a statisticaldescription <strong>of</strong> <strong>the</strong> largest changes <strong>of</strong> <strong>the</strong> wea<strong>the</strong>r on a secular scale isimpossible <strong>and</strong> senseless. It is uncerta<strong>in</strong> beforeh<strong>and</strong> whe<strong>the</strong>r statisticalmethods can be applied for describ<strong>in</strong>g changes <strong>of</strong> <strong>the</strong> wea<strong>the</strong>r on asmall scale dur<strong>in</strong>g a few days, for predict<strong>in</strong>g it, say. However,experience shows that this is sufficiently useless. Still, when restrict<strong>in</strong>gforecasts to small territories <strong>and</strong> short <strong>in</strong>tervals, <strong>the</strong> success <strong>of</strong>statistical methods is brilliant. The relevant <strong>the</strong>ory is called statisticalKolmogorov – Obukhov <strong>the</strong>ory <strong>of</strong> turbulence <strong>and</strong> we will later say afew words about it.We turn now to geology <strong>and</strong> formulate, for example, <strong>the</strong> problem <strong>of</strong>estimat<strong>in</strong>g <strong>the</strong> reserves <strong>of</strong> a deposit given <strong>the</strong> per cent <strong>of</strong> <strong>the</strong> usefulcomponent <strong>in</strong> a number <strong>of</strong> sample po<strong>in</strong>ts. Here also we encounter <strong>the</strong>risk <strong>of</strong> apply<strong>in</strong>g stationary processes for describ<strong>in</strong>g that per cent over<strong>the</strong> entire deposit. The situation with <strong>the</strong> ensemble <strong>of</strong> realizations <strong>and</strong><strong>the</strong> availability <strong>of</strong> data is very bad for determ<strong>in</strong><strong>in</strong>g <strong>the</strong> statistics <strong>of</strong> <strong>the</strong>largest fluctuations. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, <strong>the</strong> largest irregularities occuron a large scale <strong>and</strong> likely change smoothly; it may be <strong>the</strong>refore75


expected that we know <strong>the</strong>m accurately enough <strong>and</strong> do not need anystatistical description. But what should be done with irregularities on asmall scale which can <strong>in</strong>fluence <strong>the</strong> estimation <strong>of</strong> <strong>the</strong> reserves as well?Take f<strong>in</strong>ally radio physics <strong>in</strong> which <strong>the</strong> concept <strong>of</strong> stationary processis recognized best <strong>of</strong> all. All k<strong>in</strong>ds <strong>of</strong> <strong>in</strong>terferences <strong>and</strong> noises are hereusually considered as stationary stochastic processes. However, <strong>the</strong>reis a special noise, <strong>the</strong> flicker noise or shimmer<strong>in</strong>g expla<strong>in</strong>ed by chaoticvariations <strong>of</strong> <strong>the</strong> emissive capability <strong>of</strong> <strong>the</strong> cathode electronic tubes. Itis sufficiently clearly <strong>in</strong>dicated, see for example Rytov (1966), that <strong>the</strong>shimmer<strong>in</strong>g can hardly be described by <strong>the</strong> model <strong>of</strong> stationarystochastic process.It follows that at present we beg<strong>in</strong> to realize that a ma<strong>the</strong>maticaldescription <strong>of</strong> <strong>the</strong> largest waves <strong>of</strong> wavy processes by methods <strong>of</strong>ma<strong>the</strong>matical statistics is <strong>in</strong> most cases impossible. We have to reckonon describ<strong>in</strong>g phenomena on a smaller scale but we certa<strong>in</strong>ly have t<strong>of</strong>orfeit much. Thus, <strong>the</strong> <strong>the</strong>ory <strong>of</strong> <strong>the</strong> microstructure <strong>of</strong> turbulence isuseless for predict<strong>in</strong>g <strong>the</strong> wea<strong>the</strong>r because it does not describe <strong>the</strong>most essential phenomena occurr<strong>in</strong>g on a large scale. However, it isuseful <strong>in</strong> o<strong>the</strong>r fields, for example when calculat<strong>in</strong>g <strong>the</strong> passage <strong>of</strong>light through <strong>the</strong> atmosphere which is important for astronomy (fortak<strong>in</strong>g <strong>in</strong>to account <strong>the</strong> corruption <strong>of</strong> images <strong>in</strong> telescopes).Kolmogorov <strong>in</strong>troduced a universal concept <strong>of</strong> process withstationary <strong>in</strong>crements which can hopefully replace <strong>the</strong> concept <strong>of</strong>stationary stochastic process <strong>in</strong> all <strong>the</strong> cases considered above. Fordiscrete time it means that we turn from an observed process... ξ −1 , ξ 0 , ξ 1 , ..., ξ n , ...to differences... η −1 = ξ −1 − ξ −2 , η 0 = ξ 0 − ξ −1 , η 1 = ξ 1 − ξ 0 , ...<strong>and</strong> consider <strong>the</strong>m a realization <strong>of</strong> a stationary stochastic process.For processes with cont<strong>in</strong>uous time we turn <strong>in</strong>stead from ξ(t) to <strong>the</strong>derivativeη (t) = ξ′(t)<strong>and</strong> call it stationary stochastic process; <strong>the</strong> differentiation shouldsometimes be understood <strong>in</strong> a generalized sense.Let us expla<strong>in</strong> <strong>in</strong> more detail what do we expect when turn<strong>in</strong>g todifferences or derivatives. Imag<strong>in</strong>e that <strong>the</strong> observed process is a sumξ(t) = a(t) + ς(t)<strong>of</strong> some r<strong>and</strong>om or not component a(t) similar to large waves <strong>and</strong> <strong>the</strong>o<strong>the</strong>r component chang<strong>in</strong>g much more rapidly <strong>and</strong> can reasonably becalled a stationary stochastic process. We recognize our <strong>in</strong>ability todescribe <strong>the</strong> changes <strong>of</strong> a(t) <strong>and</strong> wish to study <strong>the</strong> changes on a smallscale mostly determ<strong>in</strong>ed by <strong>the</strong> o<strong>the</strong>r component. This is <strong>in</strong>deed76


achieved by differentiat<strong>in</strong>g because <strong>the</strong> large component a(t) likelychanges slowly, so that its derivation is small. We haveη(t) = ξ′(t) = a′(t) + ς′(t) ≈ ς′(t)which means that ξ′(t) practically does not <strong>in</strong>clude any componentconnected with a(t). The same is achieved by tak<strong>in</strong>g <strong>the</strong> differences <strong>in</strong>case <strong>of</strong> discrete time.For cont<strong>in</strong>uous time, ra<strong>the</strong>r than differentiat<strong>in</strong>g, we certa<strong>in</strong>ly canalso study differences∆ τ ξ(t) = ξ(t + τ) − ξ(t) ≈t+τ∫tη( s)dswhere η(s) is a stationary process. The second equality is needed forconstruct<strong>in</strong>g a correlation <strong>and</strong> spectral <strong>the</strong>ory <strong>of</strong> processes withstationary <strong>in</strong>crements be<strong>in</strong>g <strong>in</strong>tegrals <strong>of</strong> a stationary process.Instead <strong>of</strong> a correlation function a structural function <strong>in</strong>troduced byKolmogorov is be<strong>in</strong>g used:D2(τ) = E[ ∆τξ( t)] ,that is, <strong>the</strong> variance <strong>of</strong> <strong>the</strong> <strong>in</strong>crement <strong>of</strong> <strong>the</strong> process dur<strong>in</strong>g time τ.Practical application <strong>of</strong> processes with stationary <strong>in</strong>crements can bestudied by means <strong>of</strong> Mon<strong>in</strong> & Jaglom (1967, pt. 2/1975).The situation that emerged nowadays <strong>in</strong> science can be <strong>the</strong>reforedescribed <strong>in</strong> <strong>the</strong> follow<strong>in</strong>g way. We do not expect that generalstatistical methods can characterize wavy processes as a whole, i. e.,<strong>in</strong>clud<strong>in</strong>g large waves. In general, <strong>the</strong> notion <strong>of</strong> stationary stochasticprocess is compromised. For applications, it is <strong>the</strong> turn <strong>of</strong> <strong>the</strong> concept<strong>of</strong> stochastic process with stationary <strong>in</strong>crements that does not claim tocover a phenomenon as a whole but can cover it <strong>in</strong> <strong>the</strong> sphere <strong>of</strong> <strong>the</strong>small scale. Its possibilities are not yet sufficiently <strong>in</strong>vestigated. Thesituation concern<strong>in</strong>g <strong>the</strong> examples with which we have dealt is this.In economics, <strong>the</strong>re exist works <strong>of</strong> <strong>the</strong> American school founded byBox, for example Box, Jenk<strong>in</strong>s & Bacon (1967); Box & Jenk<strong>in</strong>s(1970). However, <strong>the</strong> quality <strong>of</strong> statistical approach is <strong>the</strong>re doubtful:no statistical checks are made, attention is concentrated on forecast<strong>in</strong>gwhereas <strong>the</strong> exclusion <strong>of</strong> <strong>the</strong> large-scale component compels us toth<strong>in</strong>k that it would have been better to ab<strong>and</strong>on altoge<strong>the</strong>r predictionss<strong>in</strong>ce <strong>the</strong>y depend <strong>in</strong> <strong>the</strong> first place on <strong>the</strong> excluded component. Ingeneral, <strong>the</strong> situation is doubtful. We will consider it later.For meteorology, processes with stationary <strong>in</strong>crements are <strong>of</strong> nospecial significance. The Kolmogorov – Obukhov <strong>the</strong>ory <strong>of</strong> turbulencera<strong>the</strong>r belongs to aero-hydrodynamics. Brilliant success is achieved<strong>the</strong>re: conclusions made by <strong>the</strong> creators <strong>of</strong> <strong>the</strong> <strong>the</strong>ory wereexperimentally confirmed. That success rema<strong>in</strong>s, however, <strong>the</strong> onlyone atta<strong>in</strong>ed.In geology, we have <strong>the</strong> book <strong>of</strong> Ma<strong>the</strong>ron (1962). The factualmaterial <strong>in</strong>cluded <strong>the</strong>re supports <strong>in</strong> some measure <strong>the</strong> hypo<strong>the</strong>sis that77


<strong>the</strong> structural function <strong>of</strong> <strong>the</strong> contents <strong>of</strong> <strong>the</strong> useful component is <strong>of</strong> <strong>the</strong>typeD(r) = αlnr + βwhere r is <strong>the</strong> distance between sample po<strong>in</strong>ts <strong>and</strong> α <strong>and</strong> β, parametersdeterm<strong>in</strong>ed by observation. However, <strong>the</strong> book has a number <strong>of</strong><strong>in</strong>consistencies. Thus, <strong>the</strong> logarithmic dependence is cont<strong>in</strong>ued <strong>in</strong>to <strong>the</strong><strong>in</strong>terval <strong>of</strong> small values <strong>of</strong> r which is impossible because D(r) is a nonnegativemagnitude. Then, <strong>in</strong> some cases <strong>the</strong> subject concerns <strong>the</strong>content, <strong>in</strong> o<strong>the</strong>r <strong>in</strong>stances, its logarithm. In addition, no statisticalchecks are made. But still, <strong>the</strong> factual material impresses so stronglythat careful reliable studies <strong>in</strong> <strong>the</strong> same direction become desirable.In radio physics, <strong>the</strong> scientific level is high <strong>and</strong> similar<strong>in</strong>consistencies just can not occur. However, as far as we know, noreports about successful apply<strong>in</strong>g <strong>the</strong> model <strong>of</strong> process with stationary<strong>in</strong>crements are <strong>in</strong> existence. Rytov (1966) only formulated ahypo<strong>the</strong>sis that <strong>the</strong> phenomenon <strong>of</strong> flicker should be thus described.In conclud<strong>in</strong>g, I deal <strong>in</strong> more detail with <strong>the</strong> statistical <strong>the</strong>ory <strong>of</strong>turbulence <strong>and</strong> <strong>the</strong> problem <strong>of</strong> forecast<strong>in</strong>g.3.6. Statistical <strong>the</strong>ory <strong>of</strong> turbulence. This <strong>the</strong>ory provides abrilliant success <strong>of</strong> a purely statistical description <strong>of</strong> a phenomenon, <strong>of</strong>a highly developed <strong>and</strong> very complicated turbulence with a largenumber <strong>of</strong> vortical movements on differ<strong>in</strong>g scales. Kolmogorov <strong>and</strong>Obukhov founded <strong>the</strong> basis <strong>of</strong> <strong>the</strong> <strong>the</strong>ory before 1941. Experimentalconfirmation <strong>of</strong> <strong>the</strong>ir <strong>the</strong>oretical conclusions dem<strong>and</strong>ed perfectmeasur<strong>in</strong>g <strong>in</strong>struments <strong>and</strong> up to 25 years. Application <strong>of</strong> that <strong>the</strong>oryto problems <strong>in</strong> propagation <strong>of</strong> electromagnetic <strong>and</strong> acousticoscillations <strong>in</strong> <strong>the</strong> atmosphere is also be<strong>in</strong>g developed.A precise knowledge <strong>of</strong> <strong>the</strong> field <strong>of</strong> velocities <strong>in</strong> a turbulent currentis underst<strong>and</strong>ably both impossible <strong>and</strong> useless. Indeed, had we somemethod <strong>of</strong> calculat<strong>in</strong>g all <strong>the</strong> velocities at all po<strong>in</strong>ts, <strong>the</strong>ir registrationwith sufficient precision would have alone dem<strong>and</strong>ed an unimag<strong>in</strong>ableamount <strong>of</strong> paper or magnetic tape <strong>and</strong> work with so much <strong>in</strong>formationis absolutely impossible. The situation should be resolved by someversion <strong>of</strong> a statistical description.It occurred that <strong>the</strong> ma<strong>in</strong> suitable notions can be borrowed from <strong>the</strong>correlation <strong>the</strong>ory; however, <strong>in</strong> <strong>the</strong>ir <strong>in</strong>itial form <strong>the</strong>y were <strong>in</strong>sufficient.There is a scientific law stat<strong>in</strong>g that ex nihilo nihil fit which means thatan application <strong>of</strong> established <strong>the</strong>ories does not cover anyth<strong>in</strong>g new.Without go<strong>in</strong>g <strong>in</strong>to ma<strong>the</strong>matical detail, I will attempt to showexactly how does this law work <strong>in</strong> case <strong>of</strong> turbulence <strong>and</strong> what newconsiderations it was necessary to draw for gett<strong>in</strong>g <strong>the</strong> th<strong>in</strong>gs mov<strong>in</strong>g.Imag<strong>in</strong>e a turbulent current. Its mean velocity depends on concreteconditions (what <strong>and</strong> where is <strong>the</strong> current set <strong>in</strong>to motion [...]) <strong>and</strong> it issenseless to describe it by statistical methods. However, <strong>the</strong>differences <strong>of</strong> velocity <strong>in</strong> various po<strong>in</strong>ts <strong>of</strong> <strong>the</strong> current <strong>and</strong> <strong>in</strong> differ<strong>in</strong>gmoments <strong>of</strong> time less depend on <strong>in</strong>itial conditions <strong>and</strong> to a largerextent are determ<strong>in</strong>ed by <strong>the</strong> properties <strong>of</strong> <strong>the</strong> liquid or gas itself. So,let us study <strong>the</strong> differences78


u(x 1 , x 2 , t 1 , t 2 ) = v(x 1 , t 1 ) − v(x 2 , t 2 )where v(x, t) is <strong>the</strong> velocity <strong>of</strong> <strong>the</strong> liquid at po<strong>in</strong>t x <strong>and</strong> moment t with<strong>the</strong> po<strong>in</strong>t x be<strong>in</strong>g remote from <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> current <strong>and</strong> tsufficiently large for <strong>the</strong> stationary condition to be established.It is natural to suppose that <strong>the</strong> turbulence is stationary <strong>in</strong> <strong>the</strong> sensethat <strong>the</strong> statistical characteristics <strong>of</strong> <strong>the</strong> difference u only depend on <strong>the</strong>difference t 1 − t 2 = τ. The three-dimensional variables x 1 , x 2 as also <strong>the</strong>difference u itself, that is, a three-dimensional vector, still rema<strong>in</strong>. Wehave a three-dimensional field <strong>of</strong> vectors depend<strong>in</strong>g on six space <strong>and</strong>two temporal variables. Its statistical properties however only dependon <strong>the</strong> difference between <strong>the</strong> latter. If stopp<strong>in</strong>g here <strong>and</strong> expect<strong>in</strong>g todeterm<strong>in</strong>e experimentally <strong>the</strong> statistical characteristics <strong>of</strong> such a field,<strong>the</strong> experiment will <strong>in</strong>variably fail: it is practically impossible <strong>and</strong>science f<strong>in</strong>ds itself <strong>in</strong> a cul-de-sac.And this is exactly <strong>the</strong> situation <strong>in</strong> some o<strong>the</strong>r sciences. R<strong>and</strong>omstress tensors, r<strong>and</strong>om strength, elasticity etc can be <strong>in</strong>troduced but <strong>the</strong>advantage <strong>of</strong> <strong>the</strong>se notions is zero s<strong>in</strong>ce <strong>the</strong>ir statistical characteristicscan not be determ<strong>in</strong>ed. Fur<strong>the</strong>r <strong>the</strong>oretical development <strong>of</strong> <strong>the</strong> <strong>the</strong>ory<strong>of</strong> turbulence was necessary, o<strong>the</strong>rwise no science would haveemerged <strong>the</strong>re.First <strong>of</strong> all, <strong>in</strong> a sufficiently developed turbulence all po<strong>in</strong>ts <strong>and</strong> alldirections should have <strong>the</strong> same rights. This statement seems simplebut actually is ra<strong>the</strong>r subtle. Indeed, we can imag<strong>in</strong>e a measur<strong>in</strong>gdevice consist<strong>in</strong>g <strong>of</strong> three vectors (x, e 1 , e 2 ) <strong>the</strong> last two <strong>of</strong> <strong>the</strong>mapplied to <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> vector x <strong>and</strong> all three fixed toge<strong>the</strong>r. Anobservation consists <strong>in</strong> apply<strong>in</strong>g <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> vector x to po<strong>in</strong>t x 1<strong>of</strong> <strong>the</strong> current so that its end will be at po<strong>in</strong>t x 2 = x 1 + x <strong>and</strong> weconstruct <strong>the</strong> projection <strong>of</strong> <strong>the</strong> difference <strong>of</strong> velocitys v(x 2 , t) − v(x 1 , t)on directions e 1 <strong>and</strong> e 2 which will be two r<strong>and</strong>om variables. Incorrelation <strong>the</strong>ory, <strong>the</strong>ir correlation is considered observable. Thiscorrelation should not change when <strong>the</strong> triplet (x, e 1 , e 2 ) is rotatedanyhow as a solid body nor should it depend on po<strong>in</strong>t x 1 . Turbulencesatisfy<strong>in</strong>g this condition is called locally isotropic.It can be shown that, given such turbulence <strong>and</strong> an <strong>in</strong>compressibleliquid, all <strong>the</strong> statistical characteristics <strong>of</strong> <strong>the</strong> vector field u(x 1 , x 2 , t 1 , t 2 )are expressed through characteristics <strong>of</strong> any <strong>of</strong> its components, i. e., <strong>of</strong><strong>the</strong> projection <strong>of</strong> that field on any coord<strong>in</strong>ate axis. We may consider x 1<strong>and</strong> x 2 situated on that axis <strong>and</strong> so <strong>the</strong> problem is reduced to oner<strong>and</strong>om function <strong>of</strong> two one-dimensional space <strong>and</strong> two temporalvariables.The reduction to one k<strong>in</strong>d <strong>of</strong> variables, ei<strong>the</strong>r space or temporal, ispossible due to <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> freez<strong>in</strong>g which means that <strong>the</strong>turbulent curls are carried along <strong>the</strong> ma<strong>in</strong> current without change, asthough <strong>the</strong>y were frozen <strong>in</strong> <strong>the</strong> liquid. In such cases we do not have tomeasure turbulence <strong>in</strong> various po<strong>in</strong>ts x 1 <strong>and</strong> x 2 . We arrange <strong>the</strong> l<strong>in</strong>e (x 1 ,x 2 ) along <strong>the</strong> velocity <strong>of</strong> <strong>the</strong> ma<strong>in</strong> current, put our measur<strong>in</strong>g device atpo<strong>in</strong>t x 2 <strong>and</strong> wait for <strong>the</strong> turbulence to move from x 1 to x 2 . Thus, all isreduced to temporal functions only. This hypo<strong>the</strong>sis (strictly speak<strong>in</strong>g,its statistical characteristics ra<strong>the</strong>r than <strong>the</strong> turbulence itself) waschecked experimentally <strong>and</strong> fit well enough.79


After reduc<strong>in</strong>g everyth<strong>in</strong>g to one space or temporal function, that is,to an ord<strong>in</strong>ary process with stationary <strong>in</strong>crements, we may expectsometh<strong>in</strong>g. Still, for determ<strong>in</strong>ation by experiment we need <strong>the</strong>structural function, which is too much. We need it <strong>in</strong> a parameter formD(r) where r is <strong>the</strong> distance between <strong>the</strong> po<strong>in</strong>ts where <strong>the</strong> component<strong>of</strong> <strong>the</strong> velocity is measured.The most important considerations are here due to Kolmogorov.Accord<strong>in</strong>g to <strong>the</strong>m, D(r) can only depend on <strong>the</strong> viscosity <strong>of</strong> <strong>the</strong> liquidwhich is responsible for <strong>the</strong> dissipation, <strong>the</strong> conversion <strong>of</strong> <strong>the</strong> energy<strong>of</strong> <strong>the</strong> turbulent heterogeneities <strong>in</strong>to heat (<strong>and</strong> thus reduc<strong>in</strong>gturbulence) <strong>and</strong> on <strong>the</strong> amount <strong>of</strong> energy that be<strong>in</strong>g adopted from <strong>the</strong>ma<strong>in</strong> current is gradually passed from large to small whirls (<strong>and</strong> thussupport<strong>in</strong>g turbulence). The energy is certa<strong>in</strong>ly considered for a unitmass <strong>of</strong> <strong>the</strong> liquid <strong>and</strong> unit time. ThereforeD( r) = φ( r, v, ε)where φ is some universal function, v <strong>and</strong> ε , parameters. Viscosity vis known, <strong>and</strong> <strong>the</strong> amount <strong>of</strong> energy ε is <strong>the</strong> only parameter chang<strong>in</strong>gfrom one experiment to ano<strong>the</strong>r.If <strong>the</strong> distance r is sufficiently small as compared with <strong>the</strong> size <strong>of</strong><strong>the</strong> current, for <strong>the</strong> model <strong>of</strong> isotropic turbulence to be applicable butlarge enough so that viscosity is not yet essential for whirls <strong>of</strong> size r,<strong>the</strong>n D(r) does not depend on v. In this case <strong>the</strong> consideration <strong>of</strong>similarity leads toD( r) Cεr2/3 2/3= (3.9)where C is a universal constant.For lesser r when viscosity is essential, a formula is not foundalthough it is known thatD rr( v ε)1/2( ) = ( vε) β[ ]3 1/4where β is some universal function <strong>of</strong> one variable. The dependence(3.9) is called <strong>the</strong> two thirds Kolmogorov law.There exist spectral analogues <strong>of</strong> all those statements concern<strong>in</strong>g <strong>the</strong>structural function. These conclusions were published <strong>in</strong> 1940 – 1941<strong>and</strong> all <strong>of</strong> <strong>the</strong>m were hypo<strong>the</strong>tical. Intense experimental checks hadbegun after <strong>the</strong> war [<strong>in</strong> 1945]. Structural functions are very similar tocorrelation functions so that <strong>the</strong>ir estimates have <strong>the</strong> same unpleasantproperties <strong>and</strong> it was more convenient to carry out <strong>the</strong> check byempirically measur<strong>in</strong>g <strong>the</strong> spectra. No one certa<strong>in</strong>ly calculatedsmoo<strong>the</strong>d periodograms, filters were used, see Mon<strong>in</strong> & Jaglom(1967).For my part, I will just say that <strong>the</strong> measurements had confirmedeveryth<strong>in</strong>g, <strong>the</strong> two thirds law for a sufficiently wide <strong>in</strong>terval <strong>of</strong> <strong>the</strong>values <strong>of</strong> r, <strong>the</strong> universality <strong>of</strong> <strong>the</strong> constant C <strong>and</strong> <strong>the</strong> universaldependence <strong>of</strong> D(r) expressed through function β for small values <strong>of</strong> r.80


Reason<strong>in</strong>g based on common sense <strong>and</strong> <strong>the</strong> dimensionality <strong>the</strong>oremoccurred exceptionally successful although <strong>the</strong>y can not be absolutelyprecise because some physical consideration oppose <strong>the</strong>m. The entire<strong>the</strong>ory is <strong>of</strong> a purely statistical essence; its aim is to cover <strong>the</strong> ma<strong>in</strong>features <strong>of</strong> <strong>the</strong> statistics <strong>of</strong> <strong>the</strong> studied phenomenon by issu<strong>in</strong>g fromra<strong>the</strong>r rough considerations <strong>and</strong> to approach <strong>the</strong> possibility <strong>of</strong> anexperimental check. Now let us pass to a failed example.3.7. Statistical forecast. [...] We firmly believe <strong>in</strong> scientificpredictions, for example <strong>in</strong> calculations <strong>of</strong> <strong>the</strong> future situation <strong>of</strong> <strong>the</strong>planets based on <strong>the</strong> law <strong>of</strong> universal gravitation. Actually, our beliefis certa<strong>in</strong>ty <strong>and</strong> it is never deceived, although <strong>the</strong> general <strong>the</strong>ory <strong>of</strong>relativity is known to <strong>in</strong>troduce corrections here. Are scientificmethods <strong>of</strong> forecast<strong>in</strong>g stochastic processes able to provide areasonable if not firm certa<strong>in</strong>ty <strong>in</strong> predict<strong>in</strong>g <strong>the</strong> future?Kolmogorov <strong>and</strong> somewhat later Wiener <strong>in</strong>dependently developedmethods <strong>of</strong> forecast<strong>in</strong>g stationary stochastic processes. In hiscontribution on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> turbulence Kolmogorov clearly statesthat he considers his hypo<strong>the</strong>ses about <strong>the</strong> structure <strong>of</strong> turbulence verylikely. It is curious to compare this with <strong>the</strong> absence <strong>in</strong> his works on<strong>the</strong> prediction <strong>of</strong> stochastic processes <strong>of</strong> any h<strong>in</strong>t on <strong>the</strong> possibility <strong>of</strong>practical applications.Both <strong>in</strong> his report (1952) <strong>and</strong> <strong>in</strong> Cybernetics (1969?) Wiener<strong>in</strong>dicated that <strong>the</strong> <strong>the</strong>ory <strong>of</strong> forecast<strong>in</strong>g was practically important. In<strong>the</strong> first case he stated that he was prompted byThe problem <strong>of</strong> predict<strong>in</strong>g <strong>the</strong> future position <strong>of</strong> an airplane byissu<strong>in</strong>g from general statistical <strong>in</strong>formation on <strong>the</strong> methods <strong>of</strong> its flight<strong>and</strong> from more specific knowledge <strong>of</strong> its previous path. [...] My workwas concerned with <strong>in</strong>struments necessary for realiz<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong>predicted fir<strong>in</strong>g <strong>in</strong> an automatic device for shoot<strong>in</strong>g at <strong>the</strong> airplane(Translated back from Russian.)It is known, however, that such a method <strong>of</strong> shoot<strong>in</strong>g was notrealized, not because <strong>of</strong> calculational difficulties but first <strong>of</strong> all s<strong>in</strong>ce<strong>the</strong> path <strong>of</strong> an airplane can not be described by a model <strong>of</strong> stationarystochastic process. There possibly is a statistical component <strong>in</strong> <strong>the</strong>airplane’s manoeuvre, but how can it be isolated? The manoeuvredepends so much on <strong>the</strong> concrete conditions that we can not at alldiscuss <strong>the</strong> statistical homogeneity <strong>of</strong> all <strong>the</strong> routes <strong>of</strong> <strong>the</strong> flight. Wecan attempt to isolate <strong>the</strong> statistical elements, but this problem is toodifficult for be<strong>in</strong>g solvable under war conditions.In all o<strong>the</strong>r processes, economic, technological, meteorological, etc.we usually encounter <strong>the</strong> fact that <strong>the</strong> statistical element, even ifpresent, does not cover <strong>the</strong> entire phenomenon. Thus, only <strong>the</strong> rapidcomponent on <strong>the</strong> small scale can yield to statistical description. Andeven that fact is only scientifically established <strong>in</strong> exceptional cases, forexample for <strong>the</strong> microstructure <strong>of</strong> turbulence. Ano<strong>the</strong>r such exampleconcerns <strong>the</strong> change <strong>of</strong> <strong>the</strong> frequency <strong>of</strong> <strong>the</strong> generator <strong>of</strong> oscillationsdur<strong>in</strong>g very short periods <strong>of</strong> time, when <strong>the</strong> action <strong>of</strong> flicker <strong>and</strong> o<strong>the</strong>rtechnological causes <strong>of</strong> <strong>the</strong> change does not have enough time forbe<strong>in</strong>g felt (Rytov 1966).81


In a great majority <strong>of</strong> cases <strong>the</strong> possibility <strong>of</strong> a statistical description<strong>of</strong> at least any s<strong>in</strong>gle aspect <strong>of</strong> <strong>the</strong> studied phenomenon is notestablished with certa<strong>in</strong>ty. Here <strong>the</strong> causes can be ei<strong>the</strong>r an <strong>in</strong>sufficientamount <strong>of</strong> experimental material or lack <strong>of</strong> underst<strong>and</strong><strong>in</strong>g <strong>the</strong> need toperform all imag<strong>in</strong>able statistical checks. In such cases a statisticalforecast is not more scientific than a prediction by eye which is how itis done as a rule. The only possible advantage <strong>of</strong> <strong>the</strong> former is that itcan be more precise but generally its error is large <strong>and</strong> that advantageis hardly realized.For example, if we are <strong>in</strong>terested <strong>in</strong> forecast<strong>in</strong>g micro-irregularities<strong>of</strong> turbulence (for which statistical homogeneity is established), <strong>the</strong>best statistical prediction <strong>of</strong> <strong>the</strong> values ξ(t + τ) <strong>of</strong> some characteristicfor moment t + τ given <strong>the</strong> values ξ(s), s ≤ t, almost does not differfrom <strong>the</strong> trivial forecast <strong>of</strong> ξ(t + τ) = ξ(t). Consequently, <strong>the</strong> advantage<strong>of</strong> <strong>the</strong> statistical forecast is not evident beforeh<strong>and</strong> but should beexperimentally established.I (§ 3.4) have mentioned Moran’s experimental prediction <strong>of</strong> <strong>the</strong>number <strong>of</strong> sunspots that <strong>in</strong>dicated <strong>the</strong> uselessness <strong>of</strong> <strong>the</strong> statisticalmethod <strong>of</strong> forecast<strong>in</strong>g. Let us approach <strong>the</strong> method <strong>of</strong> predictionrecently provided by Box, Jenk<strong>in</strong>s & Bacon (1967) <strong>and</strong> Box & Jenk<strong>in</strong>s(1970) from <strong>the</strong> same viewpo<strong>in</strong>t. The method consists <strong>of</strong> two parts.First, <strong>the</strong> differences <strong>in</strong> <strong>the</strong> available observational series should becalculated <strong>and</strong> attempted to be described by a model <strong>of</strong> a stationaryprocess be<strong>in</strong>g a comb<strong>in</strong>ation <strong>of</strong> <strong>the</strong> models <strong>of</strong> autoregression <strong>and</strong>mov<strong>in</strong>g average. If unsuccessful, second differences should becalculated etc.This part <strong>of</strong> <strong>the</strong> method does not give rise to any special objections;<strong>the</strong> only reservation is that <strong>the</strong> more differences we calculate, <strong>the</strong> more<strong>in</strong>formation about <strong>the</strong> <strong>in</strong>itial process we lose. In most cases we will beable to describe <strong>the</strong> differences <strong>of</strong> a sufficiently high order, but how dowe return back from <strong>the</strong>m?The second part <strong>of</strong> <strong>the</strong> method provides an answer althoughma<strong>the</strong>matically it is <strong>in</strong>correct Thus, sums <strong>of</strong> <strong>in</strong>f<strong>in</strong>itely many identicallydistributed r<strong>and</strong>om variables are considered, but such series are alwaysdivergent. Nowadays ma<strong>the</strong>matics does not regard divergent series asnegatively as previously because generalized functions enabled tomake many <strong>of</strong> <strong>the</strong>m sensible, but this does not concern <strong>the</strong> <strong>in</strong>dicatedtype <strong>of</strong> series. Worse <strong>of</strong> all, <strong>the</strong>se series are formally applied <strong>in</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> conditional expectations whereas that procedure allows toprovide anyth<strong>in</strong>g.It seems that that second part is not applicable at all to observationalseries described by <strong>the</strong> model <strong>of</strong> trend with error. And no statisticaltests which would have excluded that model is made. In general, all<strong>the</strong> recommendations are directed to consider only correlationfunctions <strong>and</strong> forget <strong>the</strong> observations <strong>the</strong>mselves which radicallyopposes a sound statistical tradition. There is <strong>the</strong>refore no guaranteethat <strong>the</strong> provided method <strong>of</strong> prediction is scientifically justified; <strong>in</strong>particular, that <strong>the</strong> error <strong>of</strong> <strong>the</strong> forecast will be situated with<strong>in</strong> <strong>the</strong>calculated confidence <strong>in</strong>tervals.Forecasts by eye have absolutely <strong>the</strong> same rights, <strong>the</strong> only problemis which method results <strong>in</strong> a larger error. Experimental material for82


answer<strong>in</strong>g that question is extremely restricted. In <strong>the</strong> sources citedabove <strong>the</strong>re is <strong>in</strong> essence only one example <strong>of</strong> a forecast, see Fig. 5borrowed from Box, Jenk<strong>in</strong>s & Bacon (1967). The cont<strong>in</strong>uous brokenl<strong>in</strong>e shows <strong>the</strong> logarithms <strong>of</strong> <strong>the</strong> monthly receipts from <strong>the</strong> sale <strong>of</strong>plane tickets dur<strong>in</strong>g 1949 – 1960. The dotted l<strong>in</strong>e is <strong>the</strong> result <strong>of</strong> aforecast made from data up to July 1957. The straight l<strong>in</strong>es above <strong>and</strong>below <strong>the</strong> graph show <strong>the</strong> result <strong>of</strong> an experiment consist<strong>in</strong>g <strong>in</strong>smooth<strong>in</strong>g <strong>the</strong> yearly extrema by a straight l<strong>in</strong>e <strong>and</strong> forecast<strong>in</strong>g by <strong>the</strong>eye <strong>the</strong> future results.This is shown by cont<strong>in</strong>u<strong>in</strong>g those two straight l<strong>in</strong>es through August1957 – 1960. The forecast almost co<strong>in</strong>cided with that provided by <strong>the</strong>three authors. The extrema corresponded to July or August or to one <strong>of</strong><strong>the</strong> w<strong>in</strong>ter months (maxima <strong>and</strong> m<strong>in</strong>ima respectively) <strong>of</strong> each year. Itis impossible to repeat that experiment for o<strong>the</strong>r months because <strong>the</strong>data on <strong>the</strong> graph are unreadable <strong>and</strong> no table <strong>of</strong> <strong>the</strong> forecast results isprovided.It is strange that Box & Jenk<strong>in</strong>s (1970) did not show <strong>the</strong> describedexperiment on <strong>the</strong>ir Fig. 9.2 (p. 308). Here, <strong>the</strong>ir forecast is essentiallybetter than that made by eye, <strong>and</strong> it is closer to <strong>the</strong> actual data.However, <strong>the</strong> model, its parameters <strong>and</strong> <strong>the</strong> <strong>in</strong>terval <strong>of</strong> prediction, allare <strong>the</strong> same, so how can we expla<strong>in</strong> <strong>the</strong> improvement? In general, <strong>the</strong>contributions <strong>of</strong> that school do not pass an attentive analysis.Borrow<strong>in</strong>g an expression from <strong>the</strong> Russian author Bulgakov, <strong>the</strong>irstatistics can be called a statistics <strong>of</strong> a light-weighted type s<strong>in</strong>ce it ispresented as universally applicable <strong>and</strong> not dem<strong>and</strong><strong>in</strong>g statisticalchecks, <strong>and</strong> it is <strong>in</strong>tended to be generally popular but it does not ensurea reliable result.The general conclusion from all <strong>the</strong> above is that we should notespecially rely on statistical methods <strong>of</strong> forecast<strong>in</strong>g. For apply<strong>in</strong>g, <strong>and</strong>rely<strong>in</strong>g on <strong>the</strong>m we should first <strong>of</strong> all establish whe<strong>the</strong>r <strong>the</strong> studiedphenomenon can be described by a model <strong>of</strong> stochastic process.Notes1. Moran (see § 3.4) possibly was an exception. O. S.2. The separation <strong>of</strong> <strong>the</strong> r<strong>and</strong>om from div<strong>in</strong>e design was De Moivre’s ma<strong>in</strong> goal,see his Dedication to Newton <strong>of</strong> <strong>the</strong> first edition <strong>of</strong> his Doctr<strong>in</strong>e <strong>of</strong> Chances repr<strong>in</strong>ted<strong>in</strong> its third edition. O. S.3. The formal <strong>in</strong>troduction <strong>of</strong> least squares was due to Legendre. The author’sexample <strong>of</strong> an artificial object <strong>in</strong> space certa<strong>in</strong>ly had noth<strong>in</strong>g <strong>in</strong> common with thosetimes. O. S.4. In <strong>the</strong> sequel, <strong>the</strong> author applied <strong>the</strong> three curves <strong>of</strong> that figure. Their equationsare <strong>of</strong> <strong>the</strong> form c 0 + c 1 t 2 + c 2 t 4 . All <strong>the</strong> o<strong>the</strong>r Figures are sufficiently described <strong>in</strong> <strong>the</strong>ma<strong>in</strong> text. O. S.5. Those magnitudes are frequencies ra<strong>the</strong>r than periods. O. S.6. Follow<strong>in</strong>g a nasty tradition, Venn did not provide an exact reference, <strong>and</strong> Fisherfollowed suit. Abraham Tucker (1705 – 1774) is remembered for his contribution(1768 – 1778). O. S.7. On <strong>the</strong> history <strong>of</strong> <strong>the</strong> notion <strong>of</strong> function see Youshkevich (1977). O. S.8. Not clear enough. O. S.9. The author apparently had <strong>in</strong> m<strong>in</strong>d Karl Pearson’s generally knownshortcom<strong>in</strong>gs. Student (Gosset) was also serious, but Kendall did not at all belong to<strong>the</strong> Pearson school. O. S.Bibliography83


Belova L. A., Mamikonianz L. G., Tutubal<strong>in</strong> V. N. (1965 Russian), <strong>Probability</strong><strong>of</strong> a breakdown puncture <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation <strong>of</strong> <strong>the</strong> coil <strong>of</strong> turbo-generators depend<strong>in</strong>gon <strong>the</strong> duration <strong>of</strong> work. Elektrichestvo, No 4, pp. 42 – 47.--- (1967 Russian), On statistical homogeneity <strong>of</strong> <strong>the</strong> <strong>in</strong>sulation <strong>of</strong> <strong>the</strong> frame <strong>of</strong>stators <strong>of</strong> turbo-generators. Elektrichestvo, No. 6, pp. 40 – 46.Box G. E. P., Jenk<strong>in</strong>s G. M., Bacon D. W. (1967), Models for forecast<strong>in</strong>gseasonal <strong>and</strong> non-seasonal time series. In Spectral Analysis <strong>of</strong> Time Series. NewYork, pp. 271 – 311.Box G. E. P., Jenk<strong>in</strong>s G. M. (1970), Time Series Analysis, Forecast<strong>in</strong>g <strong>and</strong>Control. San Francisco.Hannan E. J. (1960), Time Series Analysis. London.Ivakhnenko A. G., Lapa V. G. (1971), Predskazanie slucha<strong>in</strong>ykh prozessov(Prediction <strong>of</strong> Stochastic Processes). Kiev.Jenk<strong>in</strong>s G. M., Watts D. G. (1968), Spectral Analysis <strong>and</strong> Its Application. SanFrancisco, 1971.Kendall M. G. (1946), Contributions to <strong>the</strong> Study <strong>of</strong> Oscillatory Time Series.Cambridge.Kendall M. G., Stuart A. (1968), The Advanced Theory <strong>of</strong> <strong>Statistics</strong>, vol. 3.London, 1976.Ma<strong>the</strong>ron G. (1962), Traité de géostatistique appliquée. Paris.Mon<strong>in</strong> A. S., Jaglom A. M. (1967 Russian), Statistical Fluid Mechanics.Cambridge, Mass., 1973 – 1975.Moran P. A. P. (1954), Some experiments on <strong>the</strong> prediction <strong>of</strong> sunspot numbers.J. Roy. Stat. Soc., vol. B16, pp. 112 – 117.Rytov S. M. (1966), Vvedenie v Statisticheskuiu Radi<strong>of</strong>iziku (Introduction <strong>in</strong>Statistical Radio Physics). Moscow, 1976.Slutsky E. E. (1927 Russian), Summation <strong>of</strong> r<strong>and</strong>om causes as <strong>the</strong> source <strong>of</strong>cyclic processes. Econometrica, vol. 5, 1937, pp. 105 – 146.Tucker A. (1768 – 1778), The Light <strong>of</strong> Nature Pursued, vols 1 – 7. Publishedunder <strong>the</strong> name Edw. Search. Abridged edition, 1807.Wiener N. (1952), Comprehensive view <strong>of</strong> prediction <strong>the</strong>ory. Proc. Intern. Congr.Ma<strong>the</strong>maticians 1950. Cambridge, Mass., vol. 2, pp. 308 – 321.--- (1969), Survey <strong>of</strong> Cybernetics. London. Possible reference; author had notprovided exact source.Youshkevich A. P. (1977), On <strong>the</strong> history <strong>of</strong> <strong>the</strong> notion <strong>of</strong> function. Arch. Hist.Ex. Sci., vol. 26.Yule G. U. (1927), On a method <strong>of</strong> <strong>in</strong>vestigat<strong>in</strong>g periodicities <strong>in</strong> disturbed seriesetc. Phil. Trans. Roy. Soc., vol. A226, pp. 267 – 298.84


IIIV. N. Tutubal<strong>in</strong>The Boundaries <strong>of</strong> Applicability(Stochastic Methods <strong>and</strong> Their Possibilities)Granitsy Primenimosti(veroiatnostno-statisticheskie metody i ikh vozmoznosti).Moscow, 19771. IntroductionI have published two booklets [i, ii]. The first was devoted toelementary statistical methods, <strong>the</strong> second one, to somewhat morecomplicated methods. Their ma<strong>in</strong> idea was that stochastic methods(like <strong>the</strong> methods <strong>of</strong> any o<strong>the</strong>r science) can not be applied withoutexam<strong>in</strong>ation to any problem <strong>in</strong>terest<strong>in</strong>g for <strong>the</strong> researcher; <strong>the</strong>re existdef<strong>in</strong>ite boundaries <strong>of</strong> that applicability.Ra<strong>the</strong>r numerous comments followed, naturally positive <strong>and</strong>negative <strong>and</strong>, as far as I know, <strong>the</strong> former prevailed. In purelyscientific matters a numerical prevalence (dur<strong>in</strong>g some short period)can mean noth<strong>in</strong>g; concern<strong>in</strong>g publications, it is not so. The possibility<strong>of</strong> repr<strong>in</strong>t<strong>in</strong>g [i] for a broader circle <strong>of</strong> readers had been discussed.However, consider<strong>in</strong>g that problem, I have gradually concluded thatdur<strong>in</strong>g <strong>the</strong> last five years its contents had <strong>in</strong> some specific sense, seebelow, become dated.The po<strong>in</strong>t is certa<strong>in</strong>ly not that previously stochastic methods shouldnot have been applied if <strong>the</strong> studied phenomenon was not statisticallystable, but that now it became possible. This could have happened ifnew methods not dem<strong>and</strong><strong>in</strong>g that condition were developed, butscience does not advance so rapidly. However, a quite def<strong>in</strong>ite <strong>and</strong>provable by referr<strong>in</strong>g to publications shift <strong>in</strong> <strong>the</strong> viewpo<strong>in</strong>t on <strong>the</strong>sphere <strong>of</strong> applications <strong>of</strong> stochastic methods had happened. It willeventually make prov<strong>in</strong>g such a simple circumstance as <strong>the</strong> need torestrict somehow <strong>the</strong> application <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability almostunnecessary.Then, a rapid development <strong>of</strong> concrete statistical <strong>in</strong>vestigations iscerta<strong>in</strong>ly <strong>in</strong> <strong>the</strong> spirit <strong>of</strong> our time. They are difficult, dem<strong>and</strong><strong>in</strong>g almostsuperhuman patience <strong>and</strong> <strong>in</strong>sistence, but <strong>the</strong>y still emerge <strong>and</strong> arebe<strong>in</strong>g done. In a s<strong>in</strong>gle statistical <strong>in</strong>vestigation, <strong>the</strong> study <strong>of</strong> statisticalstability is practically impossible (<strong>and</strong> at best only if <strong>the</strong> result isnegative). However, a repeated (actually, dur<strong>in</strong>g many years)statistical <strong>in</strong>vestigation accompanied by checks <strong>of</strong> <strong>the</strong> conclusions onever new material provides <strong>the</strong>m quite sufficient certa<strong>in</strong>ty.More precisely, we always come to underst<strong>and</strong> what we knowcerta<strong>in</strong>ly; what somewhat doubtfully; <strong>and</strong> what we do not know at all.For a publication <strong>in</strong>tended for a wide circle <strong>of</strong> readers it is <strong>the</strong>reforeextremely important to show how should statistical <strong>in</strong>vestigations becarried out from <strong>the</strong> methodical po<strong>in</strong>t <strong>of</strong> view so that <strong>the</strong> conclusionsare sufficiently certa<strong>in</strong> for be<strong>in</strong>g practically applied. No generalma<strong>the</strong>matical results are here available, this can only be done byexamples.85


I have published someth<strong>in</strong>g <strong>in</strong> that direction [i, ii] but now I wouldhave wished to accomplish such work fuller <strong>and</strong> better. F<strong>in</strong>ally, foreach author <strong>the</strong> aim <strong>of</strong> publication consists not only <strong>in</strong> <strong>in</strong>struct<strong>in</strong>go<strong>the</strong>rs, but to learn someth<strong>in</strong>g himself as well. In those booklets, Ihave made some ra<strong>the</strong>r extreme statements on <strong>the</strong> practical uselessness<strong>of</strong> certa<strong>in</strong> specific methods, for example [...]. It is not difficult toquestion such viewpo<strong>in</strong>ts; concern<strong>in</strong>g each def<strong>in</strong>ite problem it issufficient to <strong>in</strong>dicate at least one successful practical application <strong>of</strong> <strong>the</strong>discussed method. Obviously nei<strong>the</strong>r I, nor anyone else is acqua<strong>in</strong>tedwith all <strong>the</strong> pert<strong>in</strong>ent literature but I attempted to accomplish a sample<strong>of</strong> sorts from an <strong>in</strong>f<strong>in</strong>ite amount <strong>of</strong> <strong>in</strong>vestigations so that <strong>the</strong> partisans<strong>of</strong> one or ano<strong>the</strong>r method could have felt <strong>of</strong>fended by my extremepo<strong>in</strong>t <strong>of</strong> view <strong>and</strong> prove <strong>the</strong> opposite.However, concern<strong>in</strong>g <strong>the</strong> application <strong>of</strong> <strong>the</strong> Bernoulli pattern tojudicial verdicts, nowadays no one will probably argue; it is generallyacknowledged rubbish 1 . All <strong>the</strong> o<strong>the</strong>r problems are, however, quitevital. I have thus considered <strong>the</strong> publication <strong>of</strong> those statements not asf<strong>in</strong>al conclusions but as <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> a big work for betterascerta<strong>in</strong><strong>in</strong>g <strong>the</strong> actual situation.It was thought that we will have to do with a comparatively smallamount <strong>of</strong> concrete material. However, this is not <strong>the</strong> only essentialadvantage <strong>of</strong> <strong>the</strong> described method <strong>of</strong> sampl<strong>in</strong>g as compared with afull study <strong>of</strong> <strong>the</strong> publications. It is known that scientific papers areusually too short so that read<strong>in</strong>g <strong>the</strong>m means decod<strong>in</strong>g 2 whereas <strong>in</strong> thiscase all difficult questions could have been resolved by ask<strong>in</strong>g <strong>the</strong>authors <strong>the</strong>mselves.Of course, along with really scientific objections I have receivedo<strong>the</strong>r, <strong>in</strong>significant letters. Usually such are reports about <strong>the</strong> results <strong>of</strong><strong>in</strong>vestigations <strong>in</strong> which <strong>the</strong> correspondent did not participate but onlyknows about <strong>the</strong>m by hearsay. In such cases, s<strong>in</strong>ce no def<strong>in</strong>ite data areprovided, it always rema<strong>in</strong>s <strong>in</strong>comprehensible whe<strong>the</strong>r <strong>the</strong> success wasachieved ow<strong>in</strong>g to a correct application <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability or<strong>in</strong> spite <strong>of</strong> its wrong use which is not excluded ei<strong>the</strong>r. For example, if<strong>the</strong> report <strong>in</strong>forms about <strong>the</strong> successful work <strong>of</strong> some technical system,that could have been achieved both because <strong>of</strong> a correct estimation <strong>of</strong><strong>the</strong> essence <strong>of</strong> r<strong>and</strong>om disturbances but also because <strong>the</strong> designerneglected wrong stochastic estimation <strong>and</strong> guided himself by hiseng<strong>in</strong>eer experience which had proved sufficient.On <strong>the</strong> whole, <strong>the</strong> desired result was however achieved: I have<strong>in</strong>deed obta<strong>in</strong>ed objections <strong>of</strong> a scientific k<strong>in</strong>d, although a smallnumber <strong>of</strong> <strong>the</strong>m. They concerned <strong>the</strong> tail areas <strong>of</strong> distributions,forecast<strong>in</strong>g stochastic processes <strong>and</strong> possibilities <strong>of</strong> a periodogramanalysis. Regard<strong>in</strong>g <strong>the</strong> first two items, I was able to become thusacqua<strong>in</strong>ted with <strong>in</strong>terest<strong>in</strong>g <strong>and</strong>, judg<strong>in</strong>g by <strong>the</strong>ir first results,promis<strong>in</strong>g studies, far, however, from be<strong>in</strong>g accomplished. Therefore,I should not yet reject my statement that no reliable practicalapplication <strong>of</strong> <strong>the</strong> pert<strong>in</strong>ent methods is known. In spite <strong>of</strong> all <strong>of</strong> itsnegative essence, it is useful <strong>in</strong> that it stresses <strong>the</strong> need to workpractically <strong>in</strong> those fields.The most remarkable <strong>and</strong> scientifically irrefutable was <strong>the</strong> objectionmade by Pr<strong>of</strong>essor V. A. Tim<strong>of</strong>eev concern<strong>in</strong>g <strong>the</strong> application <strong>of</strong>86


periodograms. It occurred that work with <strong>the</strong>m can be successful forexample when adjust<strong>in</strong>g systems <strong>of</strong> automatic regulation for isolat<strong>in</strong>gspecific periods <strong>of</strong> disturbances so as to suppress <strong>the</strong>m. The appliedtechnique is not stochastic but I considered it necessary to describebriefly <strong>the</strong> example provided by Tim<strong>of</strong>eev (§ 2.3 below).Then, when becom<strong>in</strong>g acqua<strong>in</strong>ted with some statistical medicalproblems, I encountered an apparently promis<strong>in</strong>g example <strong>of</strong>application <strong>of</strong> multivariate analysis (§ 2.2 below). It is almostdoubtless that such methods can also be widely applied <strong>in</strong> technologyfor solv<strong>in</strong>g various problems <strong>of</strong> reliability <strong>of</strong> mach<strong>in</strong>ery. However,much efforts should be made for exclud<strong>in</strong>g <strong>the</strong> almost.I thought it useful to discuss also a problem <strong>of</strong> a more generalnature: what k<strong>in</strong>d <strong>of</strong> aims is it reasonable to formulate for a stochasticstudy? Naturally, <strong>the</strong>y should not be ei<strong>the</strong>r too particular (that wouldbe un<strong>in</strong>terest<strong>in</strong>g), or too general (unatta<strong>in</strong>able), see <strong>the</strong> historicalmaterial <strong>in</strong> Chapter 1.I am s<strong>in</strong>cerely grateful to <strong>the</strong> Editor, V. I. Kovalev 3 , who <strong>in</strong>itiatedthis booklet <strong>and</strong> <strong>in</strong>variably helped me.1. Extreme Op<strong>in</strong>ions about <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>1.1. Laplace’s s<strong>in</strong>gular <strong>and</strong> very facile metaphysics. Both <strong>in</strong>teach<strong>in</strong>g <strong>and</strong> dur<strong>in</strong>g practical work I have to encounter (although evermore rarely) delusions about <strong>the</strong> actual possibilities <strong>of</strong> stochasticmethods. In an <strong>in</strong>tentionally rough way <strong>the</strong>y can be expressed thus.Consider some event. We are obviously unable to say whe<strong>the</strong>r itoccurs or not. It is <strong>the</strong>refore r<strong>and</strong>om, so let us study it by stochasticmethods.If you beg<strong>in</strong> to argue, a few textbooks can be cited where <strong>in</strong>deed anapproximately same statement (although less roughly) is written. Itfollows that <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability is a special science <strong>in</strong> whichsome essential conclusions can be made out <strong>of</strong> complete ignorance.From many viewpo<strong>in</strong>ts (historical, psychological, etc) it seems<strong>in</strong>terest<strong>in</strong>g to f<strong>in</strong>d out <strong>the</strong> historical roots <strong>of</strong> that delusion. In general,<strong>the</strong> study <strong>of</strong> <strong>the</strong> emergence <strong>of</strong> some approach (scientific approach <strong>in</strong>particular) is extremely difficult s<strong>in</strong>ce it usually dem<strong>and</strong>s an analysis<strong>of</strong> great many sources. The <strong>the</strong>ory <strong>of</strong> probability was, however, lucky<strong>in</strong> some sense.At <strong>the</strong> turn <strong>of</strong> <strong>the</strong> 18 th century a greatest scholar, Laplace, summed<strong>and</strong> essentially advanced both its general ideology <strong>and</strong> concreteresults. Be<strong>in</strong>g extremely diligent, he left a very detailed description <strong>of</strong>his views <strong>and</strong> results <strong>in</strong> his Théorie analytique des probabilités (TAP).We consider it permissible to restrict our attention by analyz<strong>in</strong>g thiss<strong>in</strong>gle source although a strict historian <strong>of</strong> science certa<strong>in</strong>ly will notapprove <strong>of</strong> such a view. For his part, he will be <strong>in</strong> <strong>the</strong> right; forexample, it is extremely important for <strong>the</strong> history <strong>of</strong> science to study<strong>the</strong> evolution <strong>of</strong> Laplace’s own ideas <strong>and</strong> his relations with o<strong>the</strong>rscientists, but we are actually pursu<strong>in</strong>g a narrow applied aim.In our century <strong>of</strong> rapid development <strong>of</strong> <strong>the</strong> science <strong>of</strong> science weought to describe our source [see Bibliography]. It is a great volumeconta<strong>in</strong><strong>in</strong>g about 58 lists 4 <strong>and</strong> it is pleasant to note that also <strong>in</strong> our time87


only a small number <strong>of</strong> monographs are more volum<strong>in</strong>ous, so thathuman capability <strong>of</strong> writ<strong>in</strong>g great books has not changed much.The TAP is separated <strong>in</strong>to two parts utterly different <strong>in</strong> style. Thefirst part, <strong>the</strong> Essai, is an Introduction <strong>and</strong> summary <strong>of</strong> <strong>the</strong> book <strong>and</strong> itobeys an <strong>in</strong>dispensable condition <strong>of</strong> hav<strong>in</strong>g no formulas. Thus, <strong>the</strong>formula21 xf ( x ) = exp( − )2π 2is expressed by words toge<strong>the</strong>r with <strong>the</strong> def<strong>in</strong>ition <strong>of</strong> <strong>the</strong> numbers π<strong>and</strong> e. Such phrases are certa<strong>in</strong>ly little adaptable for perception.However, <strong>the</strong> Essai also conta<strong>in</strong>s many materials <strong>of</strong> philosophical,general scientific <strong>and</strong> applied nature described, as I see it, <strong>in</strong> a mostwonderful style 5 . Had that style not been so beautiful, we wouldperhaps have no need to counter, after a century <strong>and</strong> a half, attempts atapply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability universally <strong>and</strong> <strong>in</strong>discrim<strong>in</strong>ately.The Essai is about 12 lists long; <strong>the</strong> rest consists <strong>of</strong> <strong>the</strong> TAP properwhere Laplace applied ma<strong>the</strong>matical analysis <strong>in</strong> plenty <strong>and</strong>, for us,ra<strong>the</strong>r strangely. This strangeness extremely impedes <strong>the</strong>underst<strong>and</strong><strong>in</strong>g <strong>of</strong> <strong>the</strong> second part <strong>of</strong> <strong>the</strong> book (whereas <strong>the</strong> same is trueconcern<strong>in</strong>g <strong>the</strong> Essai ow<strong>in</strong>g to <strong>the</strong> complete absence <strong>the</strong>re <strong>of</strong> analyticalformulas). It is apparently difficult to f<strong>in</strong>d someone nowadays whocould be able to boast about hav<strong>in</strong>g read (<strong>and</strong> understood) <strong>the</strong> TAPproper. However, many people have read <strong>the</strong> Essai whereas <strong>the</strong>attempts to underst<strong>and</strong> <strong>the</strong> second, ma<strong>the</strong>matical part led to <strong>the</strong>creation <strong>of</strong> more rigorous (<strong>and</strong> <strong>the</strong>refore more easily underst<strong>and</strong>able)methods <strong>of</strong> prov<strong>in</strong>g limit <strong>the</strong>orems <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. We arehere only <strong>in</strong>terested <strong>in</strong> <strong>the</strong> Essai.As stated above, it is a work <strong>of</strong> a ra<strong>the</strong>r free style. A scientist’spsychology is doubtlessly such that he builds a superstructure abovehis concrete scientific results. It consists <strong>of</strong> general ideas <strong>and</strong> emotionsemerg<strong>in</strong>g out <strong>of</strong> those results <strong>and</strong> provid<strong>in</strong>g new faith, will <strong>and</strong> energy.The concrete results are usually published whereas <strong>the</strong> superstructurerema<strong>in</strong>s <strong>the</strong> property <strong>of</strong> a narrow circle <strong>of</strong> students <strong>and</strong> friends 6 .Laplace, however, published both <strong>and</strong> thus, as I see it, rendered hisreaders an <strong>in</strong>estimable service.In his Essai, not be<strong>in</strong>g shy <strong>of</strong> <strong>the</strong> boundaries <strong>of</strong> a purely scientificpublication, Laplace carried out a wide polemic. Many scientistsendured quite a lot: Pascal (pp. 70 <strong>and</strong> 110) 7 for a number <strong>of</strong>unfounded statements <strong>in</strong> his Pensées about <strong>the</strong> estimation <strong>of</strong>probabilities <strong>of</strong> testimonies; <strong>the</strong> author <strong>of</strong> <strong>the</strong> Novum Organum(Bacon, p. 113) for his <strong>in</strong>ductive reason<strong>in</strong>g which led him to believethat <strong>the</strong> Earth was motionless (<strong>and</strong> thus to deny <strong>the</strong> Copernicanteach<strong>in</strong>g); <strong>and</strong> many o<strong>the</strong>rs, but <strong>the</strong> great Leibniz endured <strong>the</strong> most.Leibniz is mentioned <strong>in</strong> connection with summ<strong>in</strong>g <strong>the</strong> series (p. 96)11+ x = − + − +2 31 x x x ...(1.1)88


at po<strong>in</strong>t x = 1. However, preced<strong>in</strong>g <strong>the</strong> criticism <strong>of</strong> Leibniz’ procedure,Laplace describes <strong>the</strong> follow<strong>in</strong>g case, perhaps too far-fetched to betrue, but characteristic <strong>of</strong> his attitude to Leibniz. When consider<strong>in</strong>g <strong>the</strong>b<strong>in</strong>ary number system, Leibniz thought that <strong>the</strong> unit represented God,<strong>and</strong> zero, Noth<strong>in</strong>g. The Supreme Be<strong>in</strong>g pulled all <strong>the</strong> o<strong>the</strong>r creaturesout <strong>of</strong> Noth<strong>in</strong>g just like <strong>in</strong> b<strong>in</strong>ary arithmetic zero is zero but all <strong>the</strong>numbers are expressed by units <strong>and</strong> zeros. This idea so pleasedLeibniz, that he told <strong>the</strong> Jesuit Grimaldi, president <strong>of</strong> <strong>the</strong> ma<strong>the</strong>maticalcouncil <strong>of</strong> Ch<strong>in</strong>a, about it <strong>in</strong> <strong>the</strong> hope that this symbolic representation<strong>of</strong> creation would convert <strong>the</strong> emperor <strong>of</strong> that time (who had aparticular predilection for <strong>the</strong> sciences) to Christianity 8 .Laplace goes on: Leibniz, always directed by a s<strong>in</strong>gular <strong>and</strong> veryfacile metaphysics, reasoned thus: S<strong>in</strong>ce at x = 1 <strong>the</strong> particular sums <strong>of</strong><strong>the</strong> series (1.1) alternatively become 0 <strong>and</strong> 1, we will take <strong>the</strong>expectation, i. e., 1/2, as its sum. We know now that such a method <strong>of</strong>summ<strong>in</strong>g is far from be<strong>in</strong>g stupid <strong>and</strong> may be sometimes applied, butLaplace hastens to defeat Leibniz, already compromised by <strong>the</strong>preced<strong>in</strong>g story.It is <strong>in</strong>deed remarkable that now, a century <strong>and</strong> a half later, we mayrightfully say <strong>the</strong> same about Laplace: directed by a s<strong>in</strong>gular <strong>and</strong> veryfacile metaphysics. This does not at all touch his concrete scientificwork but fully concerns his general ideas connected with concretescientific foundation. His Essai beg<strong>in</strong>s thus (p. 1):Here, I shall present, without us<strong>in</strong>g Analysis, <strong>the</strong> pr<strong>in</strong>ciples <strong>and</strong>general results <strong>of</strong> <strong>the</strong> Théorie, apply<strong>in</strong>g <strong>the</strong>m to <strong>the</strong> most importantquestions <strong>of</strong> life, which are <strong>in</strong>deed, for <strong>the</strong> most part, only problems <strong>in</strong>probability.So, which most important questions <strong>of</strong> life did Laplace th<strong>in</strong>k about,<strong>and</strong> how had he connected <strong>the</strong>m with <strong>the</strong> aims <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability? That <strong>the</strong>ory <strong>in</strong>cludes <strong>the</strong> central limit <strong>the</strong>orem (CLT)which establishes that under def<strong>in</strong>ite conditions <strong>the</strong> sumS n = ξ 1 + ... + ξ n<strong>of</strong> a large number <strong>of</strong> r<strong>and</strong>om terms ξ i approximately follows <strong>the</strong>normal law. When measur<strong>in</strong>g <strong>the</strong> deviation <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variable S nfrom its expectation ES n <strong>in</strong> terms <strong>of</strong> var Sn,we <strong>the</strong>refore obta<strong>in</strong>values <strong>of</strong> a r<strong>and</strong>om variable obey<strong>in</strong>g <strong>the</strong> st<strong>and</strong>ard normal law. Brieflyit is written <strong>in</strong> <strong>the</strong> formSn− ESvar Snn→ N(0,1).Here, N(0, 1) is <strong>the</strong> st<strong>and</strong>ard normal distribution (with zeroexpectation <strong>and</strong> unit variance). Consider now <strong>the</strong> case <strong>of</strong> n → ∞. If <strong>the</strong>expectations <strong>of</strong> all <strong>the</strong> ξ i are <strong>the</strong> same <strong>and</strong> equal a, <strong>the</strong> variance also <strong>the</strong>same <strong>and</strong> equal σ 2 , <strong>and</strong> <strong>the</strong> r<strong>and</strong>om variables ξ i <strong>the</strong>mselves<strong>in</strong>dependent. Follow<strong>in</strong>g generally known rules, we get89


nn2ESn = ∑ Eξi= na, var Sn = ∑ varξi = nσ , var Sn= σ n.i= 1 i=1For a r<strong>and</strong>om variable obey<strong>in</strong>g <strong>the</strong> law N(0, 1) typical are absolutevalues <strong>of</strong> <strong>the</strong> order 1. For example, <strong>the</strong> probability <strong>of</strong> its absolute valueexceed<strong>in</strong>g 3 is about 0.003 (hence <strong>the</strong> three sigma rule): we see that<strong>the</strong> <strong>in</strong>equality| Sn− E Sn|≤ 3, so that | Sn− na | ≤ 3σ nvar Snis practically certa<strong>in</strong>.Let a ≠ 0. Then na is <strong>the</strong> typical value <strong>of</strong> S n <strong>and</strong> its r<strong>and</strong>omdeviations do not exceed 3σ√n, a magnitude that <strong>in</strong>creases with nessentially slower than na. Given a large n, <strong>the</strong> order <strong>of</strong> <strong>the</strong>determ<strong>in</strong>ate component na exceeds that <strong>of</strong> <strong>the</strong> r<strong>and</strong>om deviations.Such is <strong>the</strong> purely scientific result known (at least <strong>in</strong> some particularcases) to Laplace. Let us see now what philosophical <strong>and</strong> emotionalsuperstructure did he build above it. Here is one more quotation fromhis Essai (pp. 37 – 38):Every time that a great power, <strong>in</strong>toxicated by <strong>the</strong> love <strong>of</strong> conquest,aspires to world dom<strong>in</strong>ation, <strong>the</strong> love <strong>of</strong> <strong>in</strong>dependence produces,among <strong>the</strong> threatened nations, a coalition to which that power almostalways becomes a victim. [...] It is important <strong>the</strong>n, for both <strong>the</strong> stability<strong>and</strong> <strong>the</strong> prosperity <strong>of</strong> <strong>the</strong> states, that <strong>the</strong>y not be extended beyond thoseboundaries to which <strong>the</strong>y are cont<strong>in</strong>ually restored by <strong>the</strong> action <strong>of</strong><strong>the</strong>se causes.This conclusion is reasonable, excellent <strong>and</strong> <strong>in</strong>deed typical for <strong>the</strong>post-Napoleon France. But <strong>the</strong>n Laplace adds: This is ano<strong>the</strong>r result <strong>of</strong><strong>the</strong> probability calculus. He bears <strong>in</strong> m<strong>in</strong>d that, just as <strong>the</strong> determ<strong>in</strong>atecomponent prevails over r<strong>and</strong>omness, see above, so also <strong>in</strong> politics,what is dest<strong>in</strong>ed actually happens. But was it necessary to justify thatstatement by <strong>the</strong> CLT? For <strong>the</strong> modern reader it is quite obvious thatwe can only see here a remote analogy, peculiar not for science butexactly for metaphysics, <strong>and</strong> a s<strong>in</strong>gular <strong>and</strong> very facile metaphysics atthat.A bit later Laplace (p. 38) states, aga<strong>in</strong> cit<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability: When a vast sea or a great distance separates a colonyfrom <strong>the</strong> centre <strong>of</strong> <strong>the</strong> empire, <strong>the</strong> colony will sooner or later free itselfbecause it <strong>in</strong>variably attempts to get free. And elsewhere he (p. 123)says:The sequence <strong>of</strong> historic events shows us <strong>the</strong> constant action <strong>of</strong> <strong>the</strong>great moral pr<strong>in</strong>ciples amidst <strong>the</strong> passions <strong>and</strong> <strong>the</strong> various <strong>in</strong>tereststhat disturb societies <strong>in</strong> every way.90


He concludes that s<strong>in</strong>ce <strong>the</strong> action <strong>of</strong> <strong>the</strong> great moral pr<strong>in</strong>ciples isconstant, <strong>and</strong>, as <strong>the</strong> CLT teaches us, <strong>the</strong>y will <strong>in</strong> any case prevail overr<strong>and</strong>omness, it is better to keep to <strong>the</strong>m, o<strong>the</strong>rwise you will experiencebad times. That conclusion is really commendable, but from <strong>the</strong>scientific viewpo<strong>in</strong>t it is obviously not better than convert<strong>in</strong>g <strong>the</strong>Ch<strong>in</strong>ese emperor to Christianity desired by Leibniz. At <strong>the</strong> end <strong>of</strong> <strong>the</strong>Essai (p. 123) we f<strong>in</strong>d <strong>the</strong> celebrated phrase:It is remarkable that a science that began by consider<strong>in</strong>g games <strong>of</strong>chance should itself be raised to <strong>the</strong> rank <strong>of</strong> <strong>the</strong> most importantsubjects <strong>of</strong> human knowledge.He means exactly those political applications <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability.All <strong>the</strong> strangeness <strong>of</strong> metaphysics <strong>in</strong> <strong>the</strong> philosophical <strong>and</strong>emotional spheres notwithst<strong>and</strong><strong>in</strong>g, Laplace shows an amaz<strong>in</strong>g <strong>in</strong>sightwhen concretely apply<strong>in</strong>g <strong>the</strong> probability <strong>the</strong>ory. I have lookedthrough <strong>the</strong> ATP with a special aim, to f<strong>in</strong>d at least one wrong def<strong>in</strong>itestatement. It seemed that support<strong>in</strong>g myself with a hundred <strong>and</strong> fiftyyears dur<strong>in</strong>g which science has been s<strong>in</strong>ce develop<strong>in</strong>g <strong>and</strong> given suchstrangeness <strong>of</strong> his general philosophical views, it will not be difficultto f<strong>in</strong>d <strong>the</strong>re def<strong>in</strong>ite errors as well. Indeed, he considered somedubious problems on <strong>the</strong> probability <strong>of</strong> judicial decisions etc.It occurred, however, that it was not at all easy to f<strong>in</strong>d at least onewrong statement 9 . A great many applications that he considered can beseparated <strong>in</strong>to three parts:1. Obvious <strong>and</strong> absolutely unquestionable problems such as partialcensuses <strong>of</strong> population or <strong>the</strong> change <strong>of</strong> <strong>the</strong> frequency <strong>of</strong> male births <strong>in</strong>Paris due to foundl<strong>in</strong>gs.2. Treatment <strong>of</strong> <strong>the</strong> results <strong>of</strong> astronomical observations. It isdifficult to discuss those applications s<strong>in</strong>ce vast material ought to bestudied.3. Obviously dubious problems like <strong>the</strong> probabilities <strong>of</strong> judicialdecisions. Here, however, Laplace’s conclusions are so careful thatpurely scientific errors are simply impossible.There is noth<strong>in</strong>g to say here about <strong>the</strong> first group, but someth<strong>in</strong>g<strong>in</strong>structive can be noted concern<strong>in</strong>g <strong>the</strong> second one. There, Laplace (p.46) quotes <strong>the</strong> result <strong>of</strong> <strong>the</strong> treatment <strong>of</strong> observations: <strong>the</strong> ratio <strong>of</strong> <strong>the</strong>masses <strong>of</strong> Jupiter <strong>and</strong> <strong>the</strong> Sun is equal to 1:1071 <strong>and</strong> states that hisprobabilistic method gives odds <strong>of</strong> 1 000 000 to 1 that this result is nota hundredth <strong>in</strong> error 10 . Accord<strong>in</strong>g to modern data, that ratio is a littlemore than 2% larger so that <strong>the</strong> odds are obviously wrong.The great question here is, however, was that occasioned by amistaken treatment <strong>of</strong> <strong>the</strong> observations or by a systematic error <strong>of</strong>those observations impossible to elim<strong>in</strong>ate by any statistical treatment.I was unable to answer that question. In general, it is very easy tocommit such an error, <strong>and</strong> it is relevant to remark that quite recently<strong>the</strong> mass <strong>of</strong> <strong>the</strong> Moon was corrected <strong>in</strong> its third significant digit so that<strong>the</strong> precision <strong>of</strong> modern numbers should be carefully considered. If,however, we tend to believe that <strong>the</strong> observations were treatedcorrectly, <strong>and</strong> modern numbers are also correct, we arrive at an91


<strong>in</strong>structive conclusion that <strong>the</strong> presence <strong>of</strong> systematic errors ought tobe allowed for.It is <strong>in</strong>terest<strong>in</strong>g to quote also Laplace’s viewpo<strong>in</strong>t on <strong>the</strong> problems<strong>of</strong> <strong>the</strong> probabilities <strong>of</strong> judicial decisions etc. Unlike, for <strong>in</strong>stancePoisson, he (p. 120) did not overestimate <strong>the</strong>ir reality:So many passions, varied <strong>in</strong>terests <strong>and</strong> circumstances complicatequestions about <strong>the</strong>se matters that <strong>the</strong>y are almost always <strong>in</strong>soluble.In essence, Laplace considered <strong>the</strong> relevant ma<strong>the</strong>matical problemsas models (<strong>in</strong> <strong>the</strong> modern sense <strong>of</strong> that word) <strong>and</strong> thought thatconclusions <strong>of</strong> precise calculations were <strong>in</strong>variably better than <strong>the</strong>most ref<strong>in</strong>ed general reason<strong>in</strong>g. As an example, I take up <strong>the</strong> desirednumber <strong>of</strong> jurors. Laplace does not attempt to f<strong>in</strong>d <strong>the</strong>ir optimalnumber. His only careful recommendation (p. 80) is that, hav<strong>in</strong>g 12jurors, <strong>the</strong> number <strong>of</strong> votes necessary for conviction should apparentlybe <strong>in</strong>creased from 8 to 9 s<strong>in</strong>ce, as <strong>the</strong> solution <strong>of</strong> model problems hadshowed him, 8 votes do not sufficiently guarantee aga<strong>in</strong>st mistakenconvictions.Bear<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d <strong>the</strong> exposition below, it is important to note thatLaplace readily recognized <strong>the</strong> existence <strong>of</strong> problems unsolvable by<strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability although (see above) <strong>the</strong> most importantquestions <strong>of</strong> life,[...] are <strong>in</strong>deed, for <strong>the</strong> most part, only problems <strong>in</strong>probability. In our century, <strong>the</strong> follow<strong>in</strong>g formulations are almostequivalent:The given problem does not belong to one or ano<strong>the</strong>r branch <strong>of</strong>science; The given problem belongs to this branch <strong>of</strong> science but isunsolvable.1.2. Speculative criticism <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. We seethat by <strong>the</strong> time <strong>of</strong> Laplace a somewhat contradictory situation hadalready formed <strong>in</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. Concrete results occurred<strong>in</strong>comparably more modest than <strong>the</strong> wide perspectives imag<strong>in</strong>ed byhim. We ought to stress that such a situation exists elsewhere as well.Thus, it is widely believed that physics considers <strong>the</strong> mostfundamental laws <strong>of</strong> nature from which <strong>the</strong> laws <strong>of</strong> o<strong>the</strong>r, for examplebiological phenomena can <strong>in</strong> pr<strong>in</strong>ciple be, or will be <strong>in</strong> <strong>the</strong> remotefuture derived. Biology also readily speaks, for example, about <strong>the</strong>need for learn<strong>in</strong>g to rule <strong>the</strong> biosphere as a whole.It seems that <strong>the</strong> psychology <strong>of</strong> a scientist is arranged <strong>in</strong> such a waythat for engag<strong>in</strong>g <strong>in</strong> science a certa<strong>in</strong> psychological atmosphere isabsolutely necessary for attach<strong>in</strong>g a certa<strong>in</strong> concord <strong>and</strong> generality toconcrete results which <strong>of</strong>ten are modest <strong>and</strong> isolated. In particular, <strong>the</strong>pass<strong>in</strong>g <strong>of</strong> an unfail<strong>in</strong>g <strong>in</strong>terest <strong>in</strong> scientific pursuits from onegeneration to <strong>the</strong> next one can hardly be realized without work<strong>in</strong>g outsuch a psychological arrangement.Suppose that a school student tends to choose physics as his futurepr<strong>of</strong>ession; tell him: All your life you will have to sit by <strong>the</strong> cyclotron<strong>and</strong> measure no one knows what, <strong>and</strong> he will hardly become aphysicist. But tell him: You will be able to contribute to <strong>the</strong> study <strong>of</strong><strong>the</strong> most fundamental laws <strong>of</strong> nature, <strong>and</strong> <strong>the</strong> result will be different.92


The verification <strong>of</strong> <strong>the</strong> truth <strong>of</strong> a scientific proposition by practice,<strong>in</strong> <strong>the</strong> first place concern<strong>in</strong>g fundamental sciences, has a specialproperty, namely, that it <strong>of</strong>ten takes more than a generation.Consequently, at least because <strong>of</strong> this <strong>the</strong> transfer <strong>of</strong> <strong>in</strong>terest <strong>in</strong> sciencefrom one generation to <strong>the</strong> next one is essentially important.On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, it is also important to br<strong>in</strong>g that generalpsychological arrangement <strong>in</strong> correspondence with <strong>the</strong> actual results.Such efforts are go<strong>in</strong>g on <strong>in</strong> all sciences under differ<strong>in</strong>g circumstances.In <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability <strong>the</strong> tension <strong>of</strong> passions is somewhatstronger than, say, <strong>in</strong> ma<strong>the</strong>matics as a whole: it is possibly partlyconnected with Laplace. He was at <strong>the</strong> source <strong>of</strong> modern probability<strong>and</strong> <strong>the</strong> literary merits <strong>of</strong> his contribution laid an excessive discrepancybetween its emotional <strong>and</strong> philosophical <strong>and</strong> its concrete scientificaspects.The too wide general hopes are characterized by <strong>the</strong> emotionalshortcom<strong>in</strong>g <strong>of</strong> chang<strong>in</strong>g <strong>in</strong>to disappo<strong>in</strong>tment once encounter<strong>in</strong>g a realproblem. In a purely scientific aspect it consists <strong>in</strong> that <strong>the</strong> researcher,when formulat<strong>in</strong>g new problems, is not sufficiently critical. As aresult, efforts <strong>and</strong> material values are spent on futile attempts to solveproblems whereas <strong>the</strong> impossibility <strong>of</strong> achiev<strong>in</strong>g this would be obvioushad he been a bit more critical.In any case, certa<strong>in</strong> ideas were be<strong>in</strong>g developed <strong>in</strong> scienceconcern<strong>in</strong>g <strong>the</strong> sphere <strong>of</strong> application <strong>of</strong> <strong>the</strong> stochastic methods.Actually, each scientist, who carried out some applied study <strong>in</strong>volv<strong>in</strong>gprobability <strong>the</strong>ory, made a certa<strong>in</strong> contribution to <strong>the</strong>se ideas.However, <strong>the</strong>ir clear formulation (brilliant also <strong>in</strong> <strong>the</strong> purely literarysense) is due to Mises (1928, p. 14). He himself also attempted toconstruct a peculiar ma<strong>the</strong>matical foundation <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability which stirred up animated criticism <strong>and</strong> at present <strong>the</strong>generally recognized axiomatization <strong>of</strong> probability is that provided byKolmogorov (1933/1974). Never<strong>the</strong>less <strong>the</strong> concept itself <strong>of</strong> practicalapplication largely follows Mises’ idea.I rem<strong>in</strong>d briefly this concept <strong>of</strong> statistical homogeneity or statisticalensemble (collective). For ascerta<strong>in</strong><strong>in</strong>g <strong>the</strong> pr<strong>in</strong>ciples I restrict myattention to <strong>the</strong> most simple case when an experiment can ei<strong>the</strong>r leadto <strong>the</strong> occurrence <strong>of</strong> some event A or not. Denote by n A <strong>the</strong> number <strong>of</strong>its occurrences <strong>in</strong> n experiments repeated under presumably <strong>the</strong> sameconditions. The ratio n A /n is called <strong>the</strong> frequency <strong>of</strong> <strong>the</strong> occurrence <strong>of</strong>event A. Even before Mises statisticians (for example Poisson whostudied <strong>the</strong> probability <strong>of</strong> judicial verdicts) understood perfectly wellthat for <strong>the</strong> applicability <strong>of</strong> stochastic methods to study <strong>the</strong> event A <strong>the</strong>stability <strong>of</strong> <strong>the</strong> frequency n A /n as n <strong>in</strong>creases should experience everless fluctuations <strong>and</strong> tend, <strong>in</strong> some sense, to a limit (which is <strong>in</strong>deedunderstood as <strong>the</strong> probability P(A) <strong>of</strong> A).Mises supplemented <strong>the</strong>se ideas by a clear formulation <strong>of</strong> ano<strong>the</strong>rproperty that was also <strong>in</strong>tuitively perfectly well understood bystatisticians. Here it is. Separate <strong>the</strong> n trials beforeh<strong>and</strong> <strong>in</strong>tosufficiently large totalities n 1 , n 2 , ..., <strong>the</strong>n <strong>the</strong> respective frequenciesn A /n 1 , n A /n 2 , ... should also be close to each o<strong>the</strong>r. The separation oughtto be done by draw<strong>in</strong>g on <strong>the</strong> previous <strong>in</strong>formation; thus, two totalitiescould have been trials done <strong>in</strong> summer <strong>and</strong> w<strong>in</strong>ter with <strong>the</strong> frequencies93


n A1 /n 1 , n A2 /n 2 , ..., becom<strong>in</strong>g known after <strong>the</strong> trials. Quite admissible<strong>and</strong> practically useful is also <strong>the</strong> separation <strong>of</strong> <strong>the</strong> trials <strong>in</strong>to parts <strong>of</strong><strong>the</strong> collected material although <strong>in</strong> this case <strong>the</strong> problem <strong>of</strong> <strong>in</strong>tentionalor <strong>in</strong>tuitive arbitrary fit becomes acute.The dem<strong>and</strong> <strong>in</strong>dicated by Mises is important. Suppose that event Ais <strong>the</strong> production <strong>of</strong> defective articles whose probability P(A)experiences, say, seasonal fluctuations:P(A) = P t (A) = p 0 + p 1 s<strong>in</strong>(ωt + φ).Here t is <strong>the</strong> moment <strong>of</strong> observation, p 0 <strong>and</strong> p 1 are some constants suchthat P t (A) ≥ 0. Suppose that t = 1, 2, ..., n. It is not difficult to showthat, for <strong>in</strong>dependent results <strong>of</strong> observation at those moments <strong>the</strong> ration A /n will tend to p 0 (if only ω ≠ 2π). At <strong>the</strong> same time <strong>the</strong> separationaccord<strong>in</strong>g to <strong>the</strong> seasons if <strong>the</strong> seasonal fluctuations really exist willshow that Mises’ dem<strong>and</strong> is violated. The knowledge that suchfluctuations exist can be practically very important.Here, however, a very complicated question emerges: suppose thatwe did not know whe<strong>the</strong>r seasonal fluctuations existed. How couldhave we suspected that <strong>the</strong> data should be separated accord<strong>in</strong>g to <strong>the</strong>seasons? And, on <strong>the</strong> whole, is <strong>the</strong>re any general method for choos<strong>in</strong>g<strong>the</strong> separate groups or should we test all possible groups? We can onlysay that such general method does not exist <strong>and</strong> that it is obviouslysenseless to test all possible groups because, whatever is <strong>the</strong> situation,a certa<strong>in</strong> group can conta<strong>in</strong> all <strong>the</strong> occurrences <strong>of</strong> <strong>the</strong> event A, <strong>and</strong>ano<strong>the</strong>r one, none <strong>of</strong> <strong>the</strong>m so that <strong>the</strong> equality <strong>of</strong> <strong>the</strong> frequencies willbe violated as much as possible. The researcher chooses <strong>the</strong> groups<strong>in</strong>tuitively or bases his choice on <strong>the</strong> available pert<strong>in</strong>ent <strong>in</strong>formation.Then, we wish to discuss ano<strong>the</strong>r problem: suppose that <strong>the</strong> Misesdem<strong>and</strong>s are fulfilled, will that be sufficient for apply<strong>in</strong>g stochasticmethods? In o<strong>the</strong>r words, are those dem<strong>and</strong>s not only necessary, butalso sufficient? Hav<strong>in</strong>g such a general problem, we can only discusssome versions <strong>of</strong> a ma<strong>the</strong>matical <strong>the</strong>orem establish<strong>in</strong>g, say, that, giventhat <strong>the</strong> Mises conditions are fulfilled, some proposition is true, forexample <strong>the</strong> law <strong>of</strong> large numbers.Here, however, <strong>the</strong> same question emerges: how are we to choose<strong>the</strong> groups <strong>of</strong> observations? When admitt<strong>in</strong>g all possible groups such adem<strong>and</strong> will be contradictory, hence can not underlie a ma<strong>the</strong>maticalpro<strong>of</strong>. If not all possible, <strong>the</strong>n it ought to be stated which groups, <strong>and</strong>this is difficult.We see that once we only beg<strong>in</strong> th<strong>in</strong>k<strong>in</strong>g about <strong>the</strong> simplest problemconcern<strong>in</strong>g <strong>the</strong> possible presence <strong>of</strong> seasonal fluctuations <strong>of</strong> <strong>the</strong>probability <strong>of</strong> produc<strong>in</strong>g defective articles, let alone proceed to<strong>in</strong>vestigate it, we conclude that available general scientificprescriptions are obviously <strong>in</strong>sufficient for solv<strong>in</strong>g a given concreteproblem. I do not know even a s<strong>in</strong>gle exception from this rule. It doesnot, however, follow that no practical problem can be solved at all, seebelow, but I note now that <strong>in</strong> spite <strong>of</strong> all <strong>the</strong> shortcom<strong>in</strong>gs <strong>of</strong> thatconcept, it still establishes absolutely clearly that some restrictions <strong>of</strong><strong>the</strong> sphere <strong>of</strong> <strong>the</strong> application <strong>of</strong> statistical methods are necessary.94


In <strong>the</strong> purely scientific sense this conclusion is not at all new. Wesaw how careful was Laplace concern<strong>in</strong>g those stochastic applicationswhere <strong>in</strong>deed such carefulness was needed. Poisson, although hiscontribution on <strong>the</strong> probabilities <strong>of</strong> judicial verdicts was wrong on <strong>the</strong>whole 11 , perfectly well understood <strong>the</strong> need to verify a number <strong>of</strong>assumptions by factual materials <strong>and</strong> performed some checksobta<strong>in</strong><strong>in</strong>g an excellent fit [i]. And <strong>in</strong> general <strong>the</strong>re was likely noresearcher who did not somehow choose to solve such problems where<strong>the</strong> application <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability could have provedeffective.So <strong>the</strong> discussion can only concern methodical problems (methods<strong>of</strong> teach<strong>in</strong>g). What should be <strong>in</strong>cluded <strong>in</strong> textbooks <strong>in</strong>tended forbeg<strong>in</strong>ners, or <strong>in</strong> a paper designed for be<strong>in</strong>g widely debated? Suchconsiderations lead to a special k<strong>in</strong>d <strong>of</strong> reason<strong>in</strong>g that I am <strong>in</strong>deedcall<strong>in</strong>g speculative criticism <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability.A student, beg<strong>in</strong>n<strong>in</strong>g to study a subject usually does not master anyconcrete material. This concerns not only students <strong>of</strong> purelyma<strong>the</strong>matical specialities for which <strong>the</strong> curriculum does not envisageany such material, but also those follow<strong>in</strong>g applied specialities whostudy <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probabilities (toge<strong>the</strong>r with all <strong>the</strong>oreticaldiscipl<strong>in</strong>es) dur<strong>in</strong>g <strong>the</strong>ir first years <strong>of</strong> learn<strong>in</strong>g. If, however, weconsider a paper discuss<strong>in</strong>g problems <strong>of</strong> pr<strong>in</strong>ciple, it is addressed topeople who are mostly acqua<strong>in</strong>ted with factual materials, althoughdifferent from one <strong>of</strong> <strong>the</strong>m to ano<strong>the</strong>r. This is <strong>in</strong>deed what dem<strong>and</strong>s aspeculative discussion <strong>of</strong> <strong>the</strong> problem.Such discussions are based on a s<strong>in</strong>gle pr<strong>in</strong>ciple: s<strong>in</strong>ce <strong>the</strong> necessity<strong>of</strong> restrictions <strong>in</strong> applications <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability isacknowledged, let us see whe<strong>the</strong>r we are able to verify <strong>the</strong>ir realization<strong>in</strong> practice. It is easily established that <strong>the</strong> restrictions are generallyformulated too <strong>in</strong>def<strong>in</strong>itely, <strong>and</strong> if desir<strong>in</strong>g to check <strong>the</strong> conclusionsra<strong>the</strong>r than <strong>the</strong> restrictions, we f<strong>in</strong>d that an exhaust<strong>in</strong>g verification ishere also impossible.Pert<strong>in</strong>ent examples can be seen <strong>in</strong> [i] <strong>and</strong> Tutubal<strong>in</strong> (1972).However, some contributions <strong>of</strong> Alimov have become recently known.His style is very vivid, <strong>and</strong> many quotations <strong>of</strong> his statements isdesirable, but we have to choose only one (1974, p. 21):Thus, <strong>the</strong> correctness <strong>of</strong> compar<strong>in</strong>g n measurements with n<strong>in</strong>dependent r<strong>and</strong>om variables is not threatened by any experimentalcheck. Follow<strong>in</strong>g an established tradition, such comparisons areassumed as a basis <strong>of</strong> many branches <strong>of</strong> ma<strong>the</strong>matical statistics, <strong>of</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> Monte Carlo methods, r<strong>and</strong>om search<strong>in</strong>g, rationalization <strong>of</strong>experiments <strong>and</strong> a number <strong>of</strong> o<strong>the</strong>r apparently serious discipl<strong>in</strong>es.Be<strong>in</strong>g impossible to check experimentally, <strong>the</strong>y are significantly, so tosay, present at <strong>the</strong> development <strong>of</strong> systems <strong>of</strong> automatic control.Here, Alimov bears <strong>in</strong> m<strong>in</strong>d that, hav<strong>in</strong>g one sample, it is impossibleto verify ei<strong>the</strong>r <strong>the</strong> <strong>in</strong>dependence <strong>of</strong> separate observations or <strong>the</strong>co<strong>in</strong>cidence <strong>of</strong> <strong>the</strong>ir laws <strong>of</strong> distribution. In general, imag<strong>in</strong><strong>in</strong>g anensemble <strong>of</strong> many possible samples given one really observable, is forhim <strong>in</strong>admissible. Accord<strong>in</strong>gly, he proposes to ab<strong>and</strong>on <strong>the</strong> ma<strong>in</strong>95


notions <strong>and</strong> methods <strong>of</strong> ma<strong>the</strong>matical statistics: confidence <strong>in</strong>tervals,distribution <strong>of</strong> sample characteristics, criteria <strong>of</strong> fit, consistency,unbiasedness <strong>and</strong> efficiency <strong>of</strong> estimators.In particular, <strong>the</strong> problem <strong>of</strong> Laplace’s wrong estimation <strong>of</strong> <strong>the</strong>confidence <strong>in</strong>terval for <strong>the</strong> mass <strong>of</strong> Jupiter 12 should have been solvedsimply, although, as I see it, somewhat cruelly: eng<strong>in</strong>eers applyconfidence <strong>in</strong>tervals for avoid<strong>in</strong>g responsibility to <strong>the</strong> direct customer.Accord<strong>in</strong>g to Alimov (1974, pp. 31 – 32), <strong>the</strong> sense <strong>of</strong> classicalformulations <strong>of</strong> a number <strong>of</strong> results essentially differs from thatattributed to <strong>the</strong>m by tradition, <strong>and</strong>, after be<strong>in</strong>g ascerta<strong>in</strong>ed, becomesimply un<strong>in</strong>terest<strong>in</strong>g for an applied scientist.The quoted paper is written very expressively <strong>and</strong> clearly. The onlypo<strong>in</strong>t which we still did not underst<strong>and</strong> is why does <strong>the</strong> Mises conceptor <strong>the</strong> related second Kolmogorov axiomatics 13 better correspond to<strong>the</strong> <strong>in</strong>terests <strong>of</strong> that scientist than <strong>the</strong> classical set-<strong>the</strong>oretic axiomatics.In any case, <strong>the</strong> assumptions <strong>of</strong> a <strong>the</strong>ory can not be logically verified.His work should possibly be understood <strong>in</strong> <strong>the</strong> follow<strong>in</strong>g way.The concept accord<strong>in</strong>g to, say, Ville – Postnikov 14 provides ano<strong>the</strong>rspeculatively possible approach to applied problems whereas <strong>the</strong>traditional methods <strong>of</strong> ma<strong>the</strong>matical statistics <strong>the</strong>n seem absurd.Consequently, if two speculative models contradict each o<strong>the</strong>r, at leastone <strong>of</strong> <strong>the</strong>m is very doubtful. However, Alimov’s text <strong>in</strong>dicates nodecisive grounds for such an <strong>in</strong>terpretation.Alimov’s views about <strong>the</strong> classical <strong>the</strong>ory <strong>of</strong> probability, at leastwhen compar<strong>in</strong>g <strong>the</strong>m with Laplace’s underst<strong>and</strong><strong>in</strong>g, are reallyextreme. We do not agree with <strong>the</strong>m, see Chapter 2. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>,we can easily imag<strong>in</strong>e factual material <strong>the</strong> acqua<strong>in</strong>tance with whichmust only lead to such views. Now, however, we note that <strong>the</strong>methodical aims, <strong>the</strong> only ones that <strong>the</strong> speculative criticism <strong>of</strong>probability <strong>the</strong>ory is able to pursue, seems to be although not achieved,but such whose atta<strong>in</strong>ment is seen <strong>in</strong> pr<strong>in</strong>ciple secured.Planck wrote 15 :A new scientific truth does not triumph by conv<strong>in</strong>c<strong>in</strong>g its opponents<strong>and</strong> mak<strong>in</strong>g <strong>the</strong>m see <strong>the</strong> light but ra<strong>the</strong>r because its opponentseventually die <strong>and</strong> a new generation grows up that is familiar with it.It is doubtless, at least s<strong>in</strong>ce methodology <strong>of</strong> teach<strong>in</strong>g <strong>in</strong>variablyfollows science, that <strong>the</strong> same happens <strong>in</strong> teach<strong>in</strong>g understood <strong>in</strong> awide sense (<strong>in</strong>clud<strong>in</strong>g propag<strong>and</strong>a <strong>of</strong> some views). The po<strong>in</strong>t iscerta<strong>in</strong>ly not that critical op<strong>in</strong>ions (expressed, say, <strong>in</strong> my or Alimov’scontributions) change <strong>the</strong> viewpo<strong>in</strong>t <strong>of</strong> <strong>the</strong> public on <strong>the</strong> problems <strong>of</strong><strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. On <strong>the</strong> contrary, those works only serve asexpressions <strong>of</strong> <strong>the</strong> changed public op<strong>in</strong>ion. No matter that even nowmany university lecturers possibly keep repeat<strong>in</strong>g to <strong>the</strong> students thatThe <strong>the</strong>ory <strong>of</strong> probability studies r<strong>and</strong>om events; r<strong>and</strong>om are suchevents that can ei<strong>the</strong>r happen or not.Yes, public op<strong>in</strong>ion had changed which is reflected <strong>in</strong> newtextbooks. For example, <strong>in</strong> a recently published textbook by Borovkov(1972) <strong>the</strong>re is not even a trace <strong>of</strong> Laplace’s strange <strong>and</strong> very facilemetaphysics. On <strong>the</strong> whole, it is doubtless that <strong>the</strong> ris<strong>in</strong>g generation96


ought to learn at once <strong>the</strong> simple truth that a thorough comparison <strong>of</strong><strong>the</strong> <strong>the</strong>ory with reality is necessary for <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability as foro<strong>the</strong>r sciences.What, however, does such comparison consist <strong>of</strong>, <strong>and</strong> how do wesearch for it? Alimov believes that <strong>in</strong> <strong>the</strong> most important cases this is<strong>in</strong> general impossible. Indeed, scientifically thorough works where it isdone, are ra<strong>the</strong>r rare. We are end<strong>in</strong>g this Chapter by discuss<strong>in</strong>g ageneral pert<strong>in</strong>ent problem about what can we reckon on here <strong>and</strong>provide some concrete results <strong>in</strong> <strong>the</strong> next Chapter.1.3. The superstition <strong>of</strong> science <strong>and</strong> a more realistic view.Alimov’s proposal to abolish a larger part <strong>of</strong> ma<strong>the</strong>matical statistics isnot <strong>the</strong> most severe from what can be said about science <strong>in</strong> general.Tolstoi (1910) <strong>in</strong>cluded a whole chapter entitled False science. Hisma<strong>in</strong> idea was that <strong>the</strong> empty sciences such as ma<strong>the</strong>matics,astronomy, physics do not at all answer such ma<strong>in</strong> moral questions likeWhy am I liv<strong>in</strong>g <strong>and</strong> how should I live. In addition, <strong>the</strong> contents <strong>of</strong>sciences consists <strong>of</strong> separate weakly connected fragments <strong>of</strong>knowledge which had <strong>in</strong>terested, no one knows why, some smallgroup <strong>of</strong> people. And scientists had freed <strong>the</strong>mselves from worknecessary for life (here, Tolstoi first <strong>of</strong> all thought about <strong>the</strong> work <strong>of</strong>peasants) <strong>and</strong> are liv<strong>in</strong>g an unreasonable life.It is extremely <strong>in</strong>terest<strong>in</strong>g to see what can be answered <strong>in</strong> our timeto <strong>the</strong>se accusations. Nowadays, s<strong>in</strong>ce <strong>the</strong> power <strong>of</strong> science is ever<strong>in</strong>creas<strong>in</strong>g, moral problems are discussed especially <strong>in</strong>tensively, seefor example a review <strong>of</strong> <strong>the</strong>se problems (Gulyga 1975). As to <strong>the</strong>fragmentary contents <strong>of</strong> natural sciences, this is true to some extent.Indeed, we do not dwell with an all-embrac<strong>in</strong>g <strong>the</strong>ory cover<strong>in</strong>g <strong>the</strong>entire nature <strong>and</strong> issu<strong>in</strong>g from common pr<strong>in</strong>ciples, but with many<strong>the</strong>ories <strong>of</strong> different phenomena perta<strong>in</strong><strong>in</strong>g to physics, chemistry,biology, etc. <strong>and</strong> many extremely important th<strong>in</strong>gs do not today yieldto scientific analysis.But does it follow that <strong>the</strong> contents <strong>of</strong> science had formed r<strong>and</strong>omly,only to please <strong>the</strong> whims <strong>of</strong> some people? I will try to show that this isnot only <strong>in</strong>correct, but extremely unjust (<strong>the</strong> same concerns <strong>the</strong>statement that <strong>the</strong> scientists had freed <strong>the</strong>mselves from work necessaryfor life). At first, I allow myself an example show<strong>in</strong>g <strong>the</strong> differencebetween science <strong>and</strong> magic. [Cf. [i, § 1.3].]I will now allow myself some useful for underst<strong>and</strong><strong>in</strong>g <strong>the</strong> problemif remote association. Let us compare <strong>the</strong> movement <strong>of</strong> science dur<strong>in</strong>gmany centuries towards certa<strong>in</strong> knowledge with ano<strong>the</strong>r century-longmovement for <strong>the</strong> development <strong>of</strong> a country’s North <strong>and</strong> East, forexample <strong>in</strong> Russia. The Russian peasant had been able to getacclimatized <strong>and</strong> build villages only where till<strong>in</strong>g <strong>the</strong> soil was possible(practically, along river valleys).Just <strong>the</strong> same, science had only developed where comparativelycerta<strong>in</strong> knowledge was possible. As a result, when look<strong>in</strong>g at a map,we see clusters <strong>of</strong> villages along <strong>the</strong> rivers with practically no<strong>in</strong>habitants <strong>in</strong> between <strong>the</strong>m. Turn<strong>in</strong>g to science, we see that somespheres (celestial mechanics) are well developed <strong>and</strong> more thanplentifully cover practical requirements, whereas we only learn how to97


solve scientifically many not less important problems from wea<strong>the</strong>rforecast<strong>in</strong>g to prevention <strong>of</strong> <strong>the</strong> flu.Elsewhere Tolstoi compares natural sciences with pleasures, −games, rid<strong>in</strong>g, skat<strong>in</strong>g, etc, out<strong>in</strong>gs, − <strong>and</strong> concludes that enjoymentshould not impede <strong>the</strong> ma<strong>in</strong> bus<strong>in</strong>ess <strong>of</strong> life. In his time, scientistsapparently yet constituted such a th<strong>in</strong> layer <strong>of</strong> <strong>the</strong> population, that <strong>the</strong>great writer had no occasion to feel <strong>the</strong> labour<strong>in</strong>g pr<strong>in</strong>ciple <strong>of</strong> sciences’nature 16 . Briefly, natural sciences constitute one <strong>of</strong> <strong>the</strong> many spheres<strong>of</strong> human activity with all <strong>the</strong> thus follow<strong>in</strong>g shortcom<strong>in</strong>gs <strong>and</strong> merits.Consequently, for example <strong>the</strong> criticism <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability <strong>of</strong><strong>the</strong> speculative k<strong>in</strong>d (cf. § 1.2) can only pursue restrictive aims.Indeed, it logically shows that <strong>the</strong> premises for apply<strong>in</strong>g that <strong>the</strong>orycan not be verified. This, however, concerns <strong>the</strong> premises <strong>of</strong> anyscience; although <strong>the</strong> lack <strong>of</strong> logic undoubtedly somewhat lowers <strong>the</strong>certa<strong>in</strong>ty <strong>of</strong> knowledge, <strong>in</strong> many cases <strong>the</strong> conclusions <strong>of</strong> probability<strong>the</strong>ory still have a quite sufficient certa<strong>in</strong>ty for admitt<strong>in</strong>g <strong>the</strong>m asscientific.Many authors <strong>in</strong>clud<strong>in</strong>g Laplace discussed how <strong>the</strong> practicalapplicability <strong>and</strong> certa<strong>in</strong>ty <strong>of</strong> those conclusions is established. Hisreason<strong>in</strong>g <strong>in</strong> <strong>the</strong> Essai is not rich <strong>in</strong> content <strong>and</strong> is reduced to stat<strong>in</strong>gthat <strong>in</strong>duction was not reliable [cf. his criticism <strong>of</strong> Bacon <strong>in</strong> § 1.1] <strong>and</strong>that analogies were still worse. In my context, <strong>the</strong> response is utmostsimple: <strong>the</strong> practical verification is achieved by <strong>the</strong> work <strong>of</strong> manypeople <strong>and</strong> many generations; <strong>the</strong>y ever aga<strong>in</strong> return to study<strong>in</strong>g agiven problem.If several large boulders were ly<strong>in</strong>g on a peasant’s plot, he had tobypass <strong>the</strong>m when plough<strong>in</strong>g. But if his son becomes able to remove<strong>the</strong>m, he will do it. Just <strong>the</strong> same, <strong>in</strong> science it is not forbidden toapproach old problems by new methods <strong>and</strong> ei<strong>the</strong>r to confirm or refute<strong>the</strong> previous results. In statistics, this means that, hav<strong>in</strong>g a smallamount <strong>of</strong> data, it is impossible to say anyth<strong>in</strong>g <strong>in</strong> a certa<strong>in</strong> way, butdur<strong>in</strong>g a prolonged statistical <strong>in</strong>vestigation, with new material be<strong>in</strong>gever aga<strong>in</strong> available, no doubts are f<strong>in</strong>ally left.Alimov is <strong>in</strong> <strong>the</strong> right when assert<strong>in</strong>g that, hav<strong>in</strong>g one sample, it isnot at all possible to verify whe<strong>the</strong>r we are deal<strong>in</strong>g with <strong>in</strong>dependentr<strong>and</strong>om variables. However, <strong>the</strong> situation is sharply changed after afew new samples become available. Then, <strong>in</strong> particular, we can check<strong>the</strong> previously calculated confidence <strong>in</strong>tervals.I had occasion to encounter some people keep<strong>in</strong>g to logicalreason<strong>in</strong>g for whom <strong>the</strong> very concept <strong>of</strong> statistical test<strong>in</strong>g <strong>of</strong>hypo<strong>the</strong>ses caused a feel<strong>in</strong>g <strong>of</strong> displeasure. That concept from <strong>the</strong> verybeg<strong>in</strong>n<strong>in</strong>g fixes <strong>the</strong> level <strong>of</strong> significance, i. e. some non-zeroprobability to reject mistakenly an actually true hypo<strong>the</strong>sis. Someconsider this unacceptable, but <strong>the</strong> process <strong>of</strong> cognition does notconsist <strong>of</strong> a s<strong>in</strong>gle test, <strong>and</strong> even when we reject a hypo<strong>the</strong>sis, we donot, happily, pass a death sentence. If new data appear, we will test itanew.Tolstoi would have hardly rejected <strong>the</strong> viewpo<strong>in</strong>t that science issome sphere <strong>of</strong> labour not higher, not lower than any o<strong>the</strong>r sphere(<strong>in</strong>dustry, agriculture, fish<strong>in</strong>g etc). To support this assumption I cancite his admission, <strong>in</strong> <strong>the</strong> same book, that <strong>in</strong> its sphere <strong>of</strong> cognition <strong>of</strong>98


<strong>the</strong> material world science had <strong>in</strong>deed essentially advanced. Andmodern development leaves no doubt <strong>in</strong> <strong>the</strong> existence <strong>of</strong> <strong>the</strong> really truescience <strong>in</strong> contrast to <strong>the</strong> false science.What are <strong>the</strong> practical conclusions from <strong>the</strong> considerations above?Once we acknowledge science as a k<strong>in</strong>d <strong>of</strong> active human work, itfollows, on <strong>the</strong> one h<strong>and</strong>, that at each moment it is <strong>in</strong>complete <strong>and</strong>fragmentary; <strong>in</strong>deed, active work always lacks someth<strong>in</strong>g (or evenvery much). On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, what also follows is universality: manwill always engage <strong>in</strong> science <strong>and</strong> attempt to widen <strong>the</strong> sphere <strong>of</strong> <strong>the</strong>certa<strong>in</strong>ty known.In a number <strong>of</strong> fields <strong>of</strong> application <strong>of</strong> ma<strong>the</strong>matics <strong>and</strong> probability<strong>the</strong>ory <strong>in</strong> particular to real phenomena <strong>the</strong> situation became abnormals<strong>in</strong>ce <strong>the</strong> practical possibilities <strong>of</strong> application are overestimated. Insuch cases it is expedient to stress <strong>the</strong> unavoidable fragmentary state <strong>of</strong>all <strong>the</strong> exist<strong>in</strong>g applications: <strong>in</strong> ma<strong>the</strong>matics, too gr<strong>and</strong> <strong>in</strong>tentions canoccur unatta<strong>in</strong>able <strong>and</strong> <strong>the</strong>ir <strong>in</strong>evitable failure will create for thatscience an extremely undesirable blow to its prestige, a situation <strong>in</strong>which science can not normally develop.Thus, some years ago it was thought that, had <strong>the</strong>re occurred apossibility <strong>of</strong> solv<strong>in</strong>g great problems <strong>of</strong> l<strong>in</strong>ear programm<strong>in</strong>g cover<strong>in</strong>g<strong>the</strong> economics <strong>of</strong> <strong>the</strong> entire nation, economic plann<strong>in</strong>g should bereorganized on that foundation. It is now absolutely clear that such aproblem can not be ei<strong>the</strong>r formulated or solved at least because, giventhat global sett<strong>in</strong>g, such a notion <strong>of</strong> l<strong>in</strong>ear programm<strong>in</strong>g as set <strong>of</strong>possible technological methods has no sense 17 . As a result, <strong>the</strong> study <strong>of</strong>local problems for which l<strong>in</strong>ear programm<strong>in</strong>g can be effective, is not atall sufficiently developed.Awkward <strong>and</strong> absolutely useless concepts emerge when attempt<strong>in</strong>gto comb<strong>in</strong>e global problems <strong>of</strong> l<strong>in</strong>ear programm<strong>in</strong>g with a stochasticdescription <strong>of</strong> <strong>the</strong> possible <strong>in</strong>determ<strong>in</strong>ateness. Here also only properlyisolated local problems can have sense. In general, when apply<strong>in</strong>g <strong>the</strong>probability <strong>the</strong>ory to describe an <strong>in</strong>determ<strong>in</strong>ate situation, it isextremely important to atta<strong>in</strong> some unity between <strong>the</strong> extent <strong>of</strong>rough<strong>in</strong>g out <strong>the</strong> reality still admissible for a stochastic model <strong>and</strong> <strong>the</strong>amount <strong>of</strong> <strong>in</strong>formation to be extracted from reality for determ<strong>in</strong><strong>in</strong>g <strong>the</strong>parameters <strong>of</strong> <strong>the</strong> model. This situation is perfectly well described by<strong>the</strong> proverb: You can not run with <strong>the</strong> hare <strong>and</strong> hunt with <strong>the</strong> hounds.In o<strong>the</strong>r words, a model that adequately describes reality <strong>in</strong> detail c<strong>and</strong>em<strong>and</strong> so much <strong>in</strong>formation for determ<strong>in</strong><strong>in</strong>g its parameters, that it isimpossible to collect it. And a rough model only dem<strong>and</strong><strong>in</strong>g a littleamount <strong>of</strong> statistical <strong>in</strong>formation can be unsuited for describ<strong>in</strong>g reality.The ma<strong>in</strong> dem<strong>and</strong> on a researcher who practically applies <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability is <strong>in</strong>deed to be able to f<strong>in</strong>d a way out <strong>of</strong> <strong>the</strong>se difficulties.2. Logical <strong>and</strong> Illogical Applications <strong>of</strong> <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>Five years ago I thought it expedient to explicate, <strong>in</strong> a popularbooklet, <strong>the</strong> elements <strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical arsenal <strong>of</strong> probability<strong>the</strong>ory. However, almost at <strong>the</strong> same time as that booklet hadappeared, a sufficient number <strong>of</strong> textbooks on <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probabilityhad been published with <strong>the</strong> ma<strong>the</strong>matical aspect be<strong>in</strong>g described evenmore than completely. Then, a tradition beg<strong>in</strong>s to take shape (<strong>and</strong>99


wholly dom<strong>in</strong>ates now <strong>the</strong> teach<strong>in</strong>g <strong>of</strong> ma<strong>the</strong>matical analysis <strong>and</strong> anumber <strong>of</strong> o<strong>the</strong>r ma<strong>the</strong>matical discipl<strong>in</strong>es) which sharply separates <strong>the</strong>pert<strong>in</strong>ent contents <strong>in</strong>to ma<strong>the</strong>matical <strong>and</strong> applied parts.At <strong>the</strong> beg<strong>in</strong>n<strong>in</strong>g <strong>of</strong> <strong>the</strong> century textbooks on <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability had conta<strong>in</strong>ed very many real examples <strong>of</strong> statistical data;<strong>in</strong> <strong>the</strong> new textbooks such examples are disappear<strong>in</strong>g. A naturalprocess <strong>of</strong> demarcat<strong>in</strong>g teach<strong>in</strong>g ma<strong>the</strong>matical <strong>the</strong>ory <strong>and</strong> applicationsis possibly go<strong>in</strong>g on. Indeed, had we wished to <strong>in</strong>clude applications <strong>in</strong>a textbook on ma<strong>the</strong>matical analysis, we would have to expoundmechanics, physics, probability <strong>the</strong>ory <strong>and</strong> much o<strong>the</strong>r material.It is a fact, however, that <strong>the</strong> applications <strong>of</strong> ma<strong>the</strong>matical analysisnaturally f<strong>in</strong>d <strong>the</strong>mselves <strong>in</strong> courses <strong>and</strong> textbooks on mechanics <strong>and</strong>physics, but that <strong>the</strong> applications <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability, whiledisappear<strong>in</strong>g from textbooks on ma<strong>the</strong>matical sciences, are not yetbe<strong>in</strong>g <strong>in</strong>serted elsewhere. It follows that <strong>the</strong> ma<strong>in</strong> methods <strong>of</strong> properwork with actual data <strong>and</strong>, <strong>in</strong> particular, <strong>of</strong> how to decide whe<strong>the</strong>rsome statistical premises are fulfilled or not, are not <strong>in</strong>cludedanywhere.I have <strong>the</strong>refore thought it appropriate to <strong>in</strong>sert here a part <strong>of</strong> <strong>the</strong>semethods. They are <strong>in</strong>deed constitut<strong>in</strong>g its, so to say, didactical part.All such methods are particular, <strong>and</strong> are described <strong>in</strong> a natural way byconcrete examples. However, <strong>the</strong> <strong>in</strong>clusion <strong>of</strong> a few such examples,that seemed to me important for one or ano<strong>the</strong>r reason, pursues <strong>in</strong>addition ano<strong>the</strong>r <strong>and</strong> more general aim. I attempted to prove that, <strong>in</strong>spite <strong>of</strong> a possible logical groundlessness, a stochastic <strong>in</strong>vestigationcan provide a practically doubtless result. Confidence <strong>in</strong>tervals, criteria<strong>of</strong> significance <strong>and</strong> o<strong>the</strong>r statistical methods to which, <strong>in</strong> particular,Alimov objects, are serv<strong>in</strong>g <strong>in</strong> <strong>the</strong>se examples perfectly well <strong>and</strong> allowus to make def<strong>in</strong>ite practical conclusions. But <strong>of</strong> course, realapplications <strong>of</strong> probability <strong>the</strong>ory both at <strong>the</strong> time <strong>of</strong> Laplace <strong>and</strong>nowadays are <strong>of</strong> a particular <strong>and</strong> concrete type. As to my attitudetowards all-embrac<strong>in</strong>g global constructions, it is sufficiently expressed<strong>in</strong> Chapter 1.2.1. On a new confirmation <strong>of</strong> <strong>the</strong> Mendelian laws. We explicateKolmogorov’s paper (1940) directly connected with <strong>the</strong> discussion <strong>of</strong>biological problems which took place <strong>the</strong>n 18 .At first, some simple <strong>the</strong>oretical <strong>in</strong>formation. Suppose thatsuccessive repetitions <strong>of</strong> an observed event constitute a genu<strong>in</strong>estatistical ensemble <strong>and</strong> its results are values <strong>of</strong> some r<strong>and</strong>om variableξ. The results <strong>of</strong> n experiments are traditionally denotedx 1 , ..., x n (2.1)(not ξ 1 ,..., ξ n ) <strong>and</strong> F n (x) is called <strong>the</strong> empirical distribution function:F x<strong>the</strong> number <strong>of</strong> x < x among all x ,..., x= (2.2)ni1 nn( ) .This function changes by jumps <strong>of</strong> size 1/n at po<strong>in</strong>ts (2.1); for <strong>the</strong>sake <strong>of</strong> simplicity we assume that among those numbers <strong>the</strong>re are noequal to each o<strong>the</strong>r. That function <strong>the</strong>refore depends on <strong>the</strong> r<strong>and</strong>om100


values <strong>of</strong> (2.1) realized <strong>in</strong> <strong>the</strong> n experiments <strong>and</strong> is <strong>the</strong>refore itselfr<strong>and</strong>om. In addition, <strong>the</strong>re exists a non-r<strong>and</strong>om (<strong>the</strong>oretical)distribution functionF(x) = P[ξ < x] = P[x i < x] (2.3)<strong>of</strong> each result <strong>of</strong> <strong>the</strong> experiment.Kolmogorov proved that at n → ∞ <strong>the</strong> magnitudeλ = sup n | F( x) − F ( x) |(2.4)nhas some st<strong>and</strong>ard distribution (<strong>the</strong> Kolmogorov distribution); <strong>the</strong>supremum is taken over <strong>the</strong> values <strong>of</strong> x. This result is valid under as<strong>in</strong>gle assumption that F(x) is cont<strong>in</strong>uous. Now not only <strong>the</strong>asymptotic distribution <strong>of</strong> (2.4) is known, but also its distributions at n= 2, 3, ...The practical sense <strong>of</strong> <strong>the</strong> empirical distribution function F n (x)consists, first <strong>of</strong> all, <strong>in</strong> that its graph vividly represents <strong>the</strong> samplevalues (2.1). In a certa<strong>in</strong> sense this function at sufficiently large values<strong>of</strong> n resembles <strong>the</strong> <strong>the</strong>oretical distribution function F(x). [...]There also exists ano<strong>the</strong>r method <strong>of</strong> representation <strong>of</strong> a samplecalled histogram [...] Given a large number <strong>of</strong> observations, itresembles <strong>the</strong> density <strong>of</strong> distribution <strong>of</strong> r<strong>and</strong>om variable ξ. However, itis only expressive (<strong>and</strong> almost <strong>in</strong>dependent from <strong>the</strong> choice <strong>of</strong> <strong>the</strong><strong>in</strong>tervals <strong>of</strong> group<strong>in</strong>g) for <strong>the</strong> number <strong>of</strong> observations <strong>of</strong> <strong>the</strong> order <strong>of</strong> atleast a few tens. The histogram is more commonly used, but <strong>in</strong> allcases I decidedly prefer to apply <strong>the</strong> empirical distribution function.The Kolmogorov criterion based on statistics λ, see (2.4), can beapplied for test<strong>in</strong>g <strong>the</strong> fit <strong>of</strong> <strong>the</strong> supposed <strong>the</strong>oretical law F(x) to <strong>the</strong>observational data (2.1) represented by function (2.2). However, that<strong>the</strong>oretical law ought to be precisely known. A common (but graduallybe<strong>in</strong>g ab<strong>and</strong>oned) mistake was <strong>the</strong> application <strong>of</strong> <strong>the</strong> Kolmogorovcriterion for test<strong>in</strong>g <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> <strong>the</strong> k<strong>in</strong>d The <strong>the</strong>oreticaldistribution function is normal. Indeed, <strong>the</strong> normal law is onlydeterm<strong>in</strong>ed to <strong>the</strong> choice <strong>of</strong> its parameters a (<strong>the</strong> mean) <strong>and</strong> σ (meansquare scatter). In <strong>the</strong> hypo<strong>the</strong>sis formulated just above <strong>the</strong>separameters are not mentioned; it is assumed that <strong>the</strong>y are determ<strong>in</strong>edby sample data, naturally through <strong>the</strong> estimators1x s x xn2 2; = ∑ (i− ) .n −1i=1Thus, <strong>in</strong>stead <strong>of</strong> statistic (2.4), <strong>the</strong> statisticx − xsup n | F0[ ] − Fn( x) |(2.5)sis meant. Here, F 0 is <strong>the</strong> st<strong>and</strong>ard normal law N(0, 1).Statistic (2.5) differs from (2.4) <strong>in</strong> that <strong>in</strong>stead <strong>of</strong> F(x) it <strong>in</strong>cludes F 0which depends on (2.1), x <strong>and</strong> s <strong>and</strong> is <strong>the</strong>refore r<strong>and</strong>om. Typical101


values <strong>of</strong> (2.5) are essentially less than those <strong>of</strong> <strong>the</strong> Kolmogorovstatistic (2.4). Therefore, when apply<strong>in</strong>g <strong>the</strong> Kolmogorov distributionfor (2.5), we will widen <strong>the</strong> boundaries <strong>of</strong> <strong>the</strong> confidence region <strong>and</strong>thus admit <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> normality more <strong>of</strong>ten than proper.And so, <strong>the</strong> careful practical application <strong>of</strong> <strong>the</strong> Kolmogorov test <strong>in</strong><strong>the</strong> most elementary (<strong>and</strong> <strong>the</strong>refore most common) situation isimpossible. That criterion helps <strong>in</strong> those cases when many suchexamples are available which were already tested by some statisticalcriteria <strong>and</strong> we wish to secure a general po<strong>in</strong>t <strong>of</strong> view concern<strong>in</strong>g <strong>the</strong>irnumerous applications.Let us pass now to <strong>the</strong> essence <strong>of</strong> <strong>the</strong> problem on <strong>the</strong> confirmation<strong>of</strong> <strong>the</strong> Mendelian laws. Here is <strong>the</strong> classical situation. Some <strong>in</strong>dicationhas two alleles, A (dom<strong>in</strong>ant) <strong>and</strong> a (recessive). Two pure l<strong>in</strong>es withgenotypes AA <strong>and</strong> aa are taken <strong>and</strong> compulsorily crossed. A hybridwith genotype Aa emerges with its phenotype correspond<strong>in</strong>g to<strong>in</strong>dication A. Then a second generation is obta<strong>in</strong>ed under free cross<strong>in</strong>g.When admitt<strong>in</strong>g <strong>the</strong> hypo<strong>the</strong>sis <strong>of</strong> absolute r<strong>and</strong>omness <strong>of</strong> <strong>the</strong>comb<strong>in</strong>ations <strong>of</strong> <strong>the</strong> gametes, <strong>the</strong> probability <strong>of</strong> <strong>the</strong> occurrence <strong>of</strong>genotype aa is 1/4. Only <strong>in</strong>dividuals with genotype aa reveal<strong>in</strong>dication a <strong>in</strong> <strong>the</strong>ir phenotype so that <strong>the</strong> probability <strong>of</strong> its occurrenceis also 1/4. And so, if <strong>the</strong>re will be n <strong>in</strong>dividuals <strong>in</strong> <strong>the</strong> secondgeneration, <strong>the</strong> number <strong>of</strong> occurrences <strong>of</strong> <strong>in</strong>dication a <strong>in</strong> <strong>the</strong> phenotypemay be considered as <strong>the</strong> number <strong>of</strong> successes µ <strong>in</strong> n Bernoulli trialswith probability <strong>of</strong> success p = 1/4.This is <strong>the</strong> simplest case <strong>of</strong> <strong>the</strong> Mendelian law. Vast experimentalmaterial had been collected up to 1940 from which it was seen that <strong>in</strong>many cases such a simplest law was <strong>in</strong>deed obeyed. Essentialdeviations (perhaps connected with a differ<strong>in</strong>g survivorship <strong>of</strong><strong>in</strong>dividuals <strong>of</strong> different genotypes <strong>and</strong> o<strong>the</strong>r causes) was also revealed.The school <strong>of</strong> Lyssenko had been attempt<strong>in</strong>g to prove that that lawwas not work<strong>in</strong>g. To atta<strong>in</strong> that aim, experiments were carried out, <strong>in</strong>particular by Ermolaeva (1939). They were peculiar <strong>in</strong> that <strong>the</strong>material was considered not from all <strong>the</strong> <strong>in</strong>dividuals <strong>of</strong> <strong>the</strong> secondgeneration taken toge<strong>the</strong>r, but separately for families. It is better toexpla<strong>in</strong> <strong>the</strong> mean<strong>in</strong>g <strong>of</strong> that term by an example. In experiments withtomatoes a family is consist<strong>in</strong>g <strong>of</strong> all <strong>the</strong> plants <strong>of</strong> <strong>the</strong> secondgeneration grown <strong>in</strong> <strong>the</strong> same box. Each box is sown with seeds takenfrom <strong>the</strong> fruit <strong>of</strong> exactly one plant <strong>of</strong> <strong>the</strong> first generation. Theseparation <strong>in</strong>to families occurs quite naturally.However, Kolmogorov (see above) showed that Ermolaeva’s mostnumerous series <strong>of</strong> experiments can be expla<strong>in</strong>ed exactly by <strong>the</strong> mostelementary Mendel model. Suppose that for k families number<strong>in</strong>g n 1 ,n 1 , ..., n k <strong>the</strong> number <strong>of</strong> manifested recessive alleles was µ 1 , µ 2 , ..., µ k ,<strong>the</strong>n <strong>the</strong> classical De Moivre – Laplace <strong>the</strong>orem [prov<strong>in</strong>g that <strong>the</strong>b<strong>in</strong>omial law tended to normality] leads to <strong>the</strong> normed magnitudes* µi− nip 1 3µi= , p = , q = 1− p =n pq 4 4ihav<strong>in</strong>g approximately <strong>the</strong> st<strong>and</strong>ard normal distribution N(0, 1); <strong>the</strong>precision <strong>of</strong> approximation is quite sufficient for n i <strong>of</strong> <strong>the</strong> order <strong>of</strong>102


*several dozen. The totality µ ican thus be considered (if <strong>the</strong>Mendelian model is valid) a sample with <strong>the</strong>oretical distribution be<strong>in</strong>g<strong>the</strong> st<strong>and</strong>ard normal law.Kolmogorov studied two most numerous series <strong>of</strong> Ermolaeva’sexperiments <strong>and</strong> respectively two samples (2.6) with 98 <strong>and</strong> 123observations. [...] He obta<strong>in</strong>ed λ = 0.82 <strong>and</strong> 0.75. The probability <strong>of</strong> abetter fit (a lesser λ) was 0.49 <strong>and</strong> − 0.37 so that those values <strong>of</strong> λ werequite satisfactory.A purely statistical <strong>in</strong>vestigation thus changed <strong>the</strong> results: an allegedrefutation <strong>of</strong> <strong>the</strong> Mendelian laws became <strong>the</strong>ir essential confirmation.Apart from <strong>the</strong> opponents <strong>of</strong> <strong>the</strong> Mendel <strong>the</strong>ory Kolmogorov alsomentioned <strong>the</strong> work <strong>of</strong> his followers, En<strong>in</strong> (1939) <strong>in</strong> particular. He didnot subject that paper to a detailed analysis, but <strong>in</strong>dicated that <strong>the</strong>agreement with <strong>the</strong> ma<strong>in</strong> model <strong>of</strong> Bernoulli trials was too good (<strong>the</strong>frequencies concern<strong>in</strong>g separate families deviated from p = 1/4 lessthan it should have occurred accord<strong>in</strong>g to <strong>the</strong> ma<strong>in</strong> model <strong>of</strong> Bernoullitrials). A detailed analysis is <strong>in</strong>structive from many viewpo<strong>in</strong>ts <strong>and</strong> Iam <strong>the</strong>refore provid<strong>in</strong>g En<strong>in</strong>’s ma<strong>in</strong> results.He considers <strong>the</strong> segregation <strong>of</strong> <strong>the</strong> tomato hybrids accord<strong>in</strong>g todiffer<strong>in</strong>g leaves: normal <strong>and</strong> potato-like. His results are separated <strong>in</strong>totwo groups depend<strong>in</strong>g on <strong>the</strong> time <strong>of</strong> sow<strong>in</strong>g <strong>the</strong> seeds <strong>of</strong> <strong>the</strong> hybridplants <strong>in</strong> <strong>the</strong> hothouse (February or April). [...]All <strong>the</strong> material except one observation is shown on Fig. 3. Weought to decide now what k<strong>in</strong>d <strong>of</strong> statistical treatment is needed. Inapplied ma<strong>the</strong>matical statistics <strong>the</strong> application <strong>of</strong> each given statisticaltest is objective, [...] but which criteria should be chosen is anessentially subjective question. The answer depends on whichs<strong>in</strong>gularities <strong>of</strong> <strong>the</strong> data seem suspicious <strong>and</strong> <strong>the</strong> statistician more orless adequately converts this impression <strong>in</strong>to statistical tests. There areno common rules, we can only discuss examples.The matter is that <strong>in</strong> pr<strong>in</strong>ciple any given result <strong>of</strong> observations isunlikely (<strong>and</strong> <strong>in</strong> our present case <strong>of</strong> a cont<strong>in</strong>uous law <strong>of</strong> distribution<strong>the</strong> probability <strong>of</strong> any concrete result is simply equal to zero).Therefore, a criterion can also be found that will reject any hypo<strong>the</strong>sisconsidered <strong>in</strong> any circumstances. We ought not to be here superdiligent<strong>and</strong> only admit criteria hav<strong>in</strong>g a substantial sense suitable for<strong>the</strong> concrete natural scientific problem. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, if notwish<strong>in</strong>g to reject some tested hypo<strong>the</strong>sis, it will be usually possible tochoose such criteria that will not do that. Here, we are alreadyspeak<strong>in</strong>g about <strong>the</strong> honesty <strong>of</strong> <strong>the</strong> statistician.Concern<strong>in</strong>g <strong>the</strong> material presented on Fig. 3, we first turn ourattention to <strong>the</strong> empirical function for <strong>the</strong> first series <strong>of</strong> observations. Itis situated completely above <strong>the</strong> <strong>the</strong>oretical function <strong>and</strong> <strong>in</strong> general isquite well smoo<strong>the</strong>d by some straight l<strong>in</strong>e (dotted on <strong>the</strong> Figure)almost parallel to <strong>the</strong> <strong>the</strong>oretical. The entire difference is some shift to<strong>the</strong> left. S<strong>in</strong>ce we deal with a shift (we see it perfectly well, but do notknow whe<strong>the</strong>r it is significant or not), we ought to apply <strong>the</strong> test basedon <strong>the</strong> sample mean. It is equal to – 0.64 <strong>and</strong> its variance is1/ 11 ≈ 0.30; to rem<strong>in</strong>d, <strong>the</strong> tested hypo<strong>the</strong>sis concerns <strong>the</strong> st<strong>and</strong>ardnormal distribution for <strong>the</strong> values <strong>of</strong> µ*. The deviation exceeds two103


sigma <strong>in</strong> absolute value <strong>and</strong> is highly significant. The first series <strong>of</strong>experiments is not, strictly speak<strong>in</strong>g, a confirmation <strong>of</strong> <strong>the</strong> Mendelianlaws.Let us ask ourselves now, how large should <strong>the</strong> deviation be fromthose laws that we ought to admit when consider<strong>in</strong>g this series <strong>of</strong>experiments. It is certa<strong>in</strong>ly possible to say at once now that <strong>the</strong>discussion is po<strong>in</strong>tless when declar<strong>in</strong>g that such a result compels us todoubt <strong>the</strong> presence <strong>of</strong> a statistical ensemble; or, roughly <strong>the</strong> same, todoubt <strong>the</strong> <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> separate outcomes <strong>of</strong> <strong>the</strong> experiments.But let us try to manage by less cruel means. Suppose that eachplant reveals <strong>the</strong> recessive <strong>in</strong>dication <strong>in</strong>dependently from o<strong>the</strong>rs, butthat <strong>the</strong> probability <strong>of</strong> success (appearance <strong>of</strong> a plant with potato-likeleaves) p differs from 1/4: p = 1/4 + ∆p. How large should ∆p be forexpla<strong>in</strong><strong>in</strong>g <strong>the</strong> observed shift <strong>of</strong> <strong>the</strong> empirical distribution function?Suppose that ∆p = − 1/40. We thought that <strong>the</strong> magnitude (2.6) with p= p 0 = 3/4 <strong>and</strong> q = q 0 = 3/4 has a st<strong>and</strong>ard normal distribution;actually, this will be true for[µ − n( p + ∆q)]0µ* = .n( p + ∆ p)( q + ∆q)0 0When calculat<strong>in</strong>g <strong>the</strong> difference betweenneglect <strong>the</strong> change <strong>of</strong> <strong>the</strong> denom<strong>in</strong>ator so that* * n∆pµ0≈ µ + .np q0 0*µ0<strong>and</strong>*µ , we mayThe magnitudes n differ <strong>in</strong> different experiments, but, accord<strong>in</strong>g toTable 1, np = np 0 = n/4 mostly exceeds 50, so that n ≥ 200. Therefore,<strong>the</strong> systematic shift isn ∆ p 4= n ∆ p ≈ 30∆pnp q 30 0<strong>and</strong> ∆p = − 1/40 quite well expla<strong>in</strong>s <strong>the</strong> systematic shift <strong>of</strong> − 0.64.An estimate by naked eye us<strong>in</strong>g <strong>the</strong> dotted l<strong>in</strong>e on Fig. 3 provides− 0.58, little differ<strong>in</strong>g from − 0.64 s<strong>in</strong>ce <strong>the</strong> mean square deviation <strong>of</strong><strong>the</strong> arithmetic mean is 1/ 11 ≈ 0.30.At present, <strong>the</strong>re are tables <strong>of</strong> <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> statisticλ′ = sup |F(x) − F n (x)| (2.7)xalso for f<strong>in</strong>ite values <strong>of</strong> n, see for example Bolshev & Smirnov (1967).For <strong>the</strong> first series <strong>of</strong> observations (n = 11) that statistic is 0.28. It isvery moderately significant for levels higher than 20%.Consequently, when apply<strong>in</strong>g this test, we are not compelled toconsider that <strong>the</strong> data <strong>of</strong> <strong>the</strong> first series reject <strong>the</strong> applicability <strong>of</strong> <strong>the</strong>104


Mendelian laws. It is not sufficiently clear which conclusion has morenatural scientific sense: ei<strong>the</strong>r that <strong>the</strong> data do not agree with thoselaws, but that <strong>the</strong> discrepancy can be understood by a slight change <strong>of</strong>p (equal to 10%); or, that somewhat reluctantly we may suppose that<strong>the</strong>re is no obvious contradiction with those laws.However, En<strong>in</strong> provides some explanation <strong>of</strong> <strong>the</strong> possiblediscrepancy: <strong>the</strong> plants <strong>in</strong> <strong>the</strong> hothouse sown <strong>in</strong> February suffered froma shortage <strong>of</strong> heat <strong>and</strong> light <strong>and</strong> a considerable part <strong>of</strong> <strong>the</strong> sproutedseeds perished. Plants hav<strong>in</strong>g a recessive <strong>in</strong>dication could have wellhad a somewhat lower probability <strong>of</strong> survival (which should bechecked by a special experiment). The f<strong>in</strong>al results <strong>of</strong> <strong>the</strong> first seriescan be considered as some modest confirmation <strong>of</strong> <strong>the</strong> Mendelianlaws.We turn now to <strong>the</strong> second series. The pert<strong>in</strong>ent empiricaldistribution function on Fig. 3 is only badly smoo<strong>the</strong>d by a straightl<strong>in</strong>e (accord<strong>in</strong>g, however, to my somewhat subjective op<strong>in</strong>ion). In anycase, <strong>the</strong> scatter <strong>of</strong> <strong>the</strong> observations is essentially less than supposed by<strong>the</strong> st<strong>and</strong>ard normal distribution. The most simple way to show it by astatistical criterion is to calculate <strong>the</strong> sum <strong>of</strong> <strong>the</strong> squares <strong>of</strong> <strong>the</strong>observations. It is equal to 2.85 whereas its distribution (if <strong>the</strong> checkedhypo<strong>the</strong>sis is valid) is <strong>the</strong> chi-squared law with 14 degrees <strong>of</strong> freedom.As <strong>in</strong>dicated by <strong>the</strong> tables <strong>of</strong> that law, that value is thus practicallyimpossible. The value <strong>of</strong> <strong>the</strong> statistics (2.7) is 0.33; with n = 14 that issignificant at about <strong>the</strong> 5% level.The shift <strong>of</strong> <strong>the</strong> first series <strong>of</strong> observations was <strong>in</strong> some wayreasonably expla<strong>in</strong>ed; <strong>the</strong> second series has an <strong>in</strong>significant shift (<strong>the</strong>sample mean is − 0.21) but an essentially smaller than supposedvariance. The Mendelian laws are thus obeyed more precisely thansupposed which is hardly possible. The most probable statisticalconclusion is that <strong>the</strong> results were tampered with deliberately or not.The corruption <strong>of</strong> normality <strong>of</strong> <strong>the</strong> distribution (<strong>the</strong> impossibility <strong>of</strong>smooth<strong>in</strong>g <strong>the</strong> empirical distribution by a straight l<strong>in</strong>e) also <strong>in</strong>dicatessome defect; however, for <strong>the</strong> given number <strong>of</strong> observations thisconclusion would be difficult to justify by a statistical test.In general, as far as was possible to ascerta<strong>in</strong>, <strong>the</strong> trouble isapparently that <strong>the</strong> experimental data are not provided <strong>in</strong> full. And so,it is possible to confirm <strong>the</strong> Mendelian laws while <strong>in</strong>tend<strong>in</strong>g to refute<strong>the</strong>m, <strong>and</strong> it is also possible to throw <strong>the</strong>m <strong>in</strong>to doubt when <strong>in</strong>tend<strong>in</strong>gto confirm <strong>the</strong>m, <strong>and</strong> all <strong>of</strong> this is revealed by a purely statistical<strong>in</strong>vestigation.Here, we encountered a curious violation <strong>of</strong> <strong>the</strong> order be<strong>in</strong>gestablished <strong>in</strong> ma<strong>the</strong>matical statistics. When act<strong>in</strong>g strictlyscientifically, statistical tests should be chosen beforeh<strong>and</strong> <strong>and</strong> <strong>the</strong>experiment carried out <strong>and</strong> <strong>the</strong> verdict passed only afterwards.Actually, <strong>the</strong> tests are more <strong>of</strong>ten chosen by issu<strong>in</strong>g from peculiarities<strong>of</strong> <strong>the</strong> material noted by naked eye. They serve for check<strong>in</strong>g whe<strong>the</strong>r<strong>the</strong>se peculiarities are statistically significant or not. However, hav<strong>in</strong>gestablished <strong>in</strong> our case that useful are tests based on <strong>the</strong> sample mean<strong>and</strong> <strong>the</strong> sum <strong>of</strong> <strong>the</strong> squares <strong>of</strong> <strong>the</strong> sample values, we could have, whenanalyz<strong>in</strong>g new similar material, strictly followed statistical science.105


But <strong>the</strong>n <strong>the</strong> newly appear<strong>in</strong>g peculiarities <strong>of</strong> that material will havenot been noticed.What k<strong>in</strong>d <strong>of</strong> peculiarities could happen? For example, on Fig. 2a<strong>and</strong> 2b a certa<strong>in</strong> non-zero number <strong>of</strong> observations is shown <strong>in</strong> <strong>the</strong>region µ* ≤ − 3. The probability <strong>of</strong> one observation be<strong>in</strong>g <strong>the</strong>re(assum<strong>in</strong>g that <strong>the</strong> Mendelian laws are valid) is 0.0014, <strong>and</strong>, <strong>of</strong> one out<strong>of</strong> approximately a hundred (to recall, <strong>the</strong> numbers <strong>of</strong> observationswere 98 <strong>and</strong> 123), about a hundred times higher; here, almost preciselyso. Thus, <strong>the</strong> probability <strong>of</strong> observations appear<strong>in</strong>g <strong>in</strong> that region <strong>in</strong>both series is about 0.14 2 ≈ 0.02, which means that a deviation fromnormality N(0, 1) is significant on <strong>the</strong> level ≈ 2%. So, are <strong>the</strong>Mendelian laws never<strong>the</strong>less wrong? Well, first <strong>of</strong> all, we have chosena test correspond<strong>in</strong>g to known data; second, a perfectly reasonableattitude does not mean dogmatically follow<strong>in</strong>g tests <strong>of</strong> significance. Areasonable answer apparently means that <strong>the</strong> bulk <strong>of</strong> observationsperfectly agrees with <strong>the</strong> Mendelian laws but sharp deviations perhapsdo occur. It can be supposed that a deficiency <strong>in</strong> <strong>the</strong> number <strong>of</strong>displayed recessive <strong>in</strong>dications has some biological sense (if,accord<strong>in</strong>g to a very simple explanation, <strong>the</strong>re exists a connection withsurvivorship).Incidentally, <strong>the</strong> above sufficiently illustrates <strong>the</strong> simple idea thattruth <strong>in</strong> science is established by <strong>the</strong> work <strong>of</strong> a number <strong>of</strong> generations<strong>and</strong> is not always atta<strong>in</strong>ed at each separate <strong>in</strong>vestigation.2.2. No one knows <strong>the</strong> hour ... The ancient say<strong>in</strong>g, No one knows<strong>the</strong> hour <strong>of</strong> his death, became somewhat shaken (certa<strong>in</strong>ly <strong>in</strong> <strong>the</strong>statistical ra<strong>the</strong>r than <strong>in</strong>dividual sense) after life tables have beencompiled <strong>and</strong> it occurred that <strong>the</strong> probability <strong>of</strong> liv<strong>in</strong>g up to a def<strong>in</strong>iteage, is subject to fluctuations (depend<strong>in</strong>g on <strong>the</strong> conditions <strong>of</strong> life),which are, however, not too essential. A fur<strong>the</strong>r step towards an<strong>in</strong>dividual forecast based on multivariate statistical analysis is partlymade <strong>and</strong> partly be<strong>in</strong>g made. I am describ<strong>in</strong>g one <strong>of</strong> <strong>the</strong> mostoutst<strong>and</strong><strong>in</strong>g contributions <strong>in</strong> this field, <strong>the</strong> so-called Fram<strong>in</strong>gham<strong>in</strong>vestigation (one <strong>of</strong> <strong>the</strong> pert<strong>in</strong>ent publications is Truett et al 1967).The cardiovascular diseases are known to be one <strong>of</strong> <strong>the</strong> centralproblems <strong>of</strong> modern medic<strong>in</strong>e. They are manifested <strong>in</strong> different ways;one <strong>of</strong> <strong>the</strong> most common k<strong>in</strong>d is <strong>the</strong> so-called ischemic heart disease(IHD). Accord<strong>in</strong>g to <strong>the</strong> classification adopted <strong>in</strong> <strong>the</strong> cited work, itcomprises cases <strong>of</strong> myocardial <strong>in</strong>farction, coronary <strong>in</strong>sufficiency,ang<strong>in</strong>a pectoris <strong>and</strong> deaths occasioned by disturbances <strong>of</strong> <strong>the</strong> coronaryblood circulation. We know well enough that <strong>the</strong> IHD <strong>of</strong>ten affectspeople yet be<strong>in</strong>g <strong>in</strong> <strong>the</strong> prime <strong>of</strong> creative power which makes <strong>the</strong>problem especially acute.There exist some ra<strong>the</strong>r vague ideas on <strong>the</strong> part played by <strong>the</strong> factors<strong>of</strong> modern <strong>in</strong>dustrialized life <strong>in</strong> <strong>the</strong> development <strong>of</strong> <strong>the</strong> IHD (littlephysical activity, nervous-emotional stress, irrational diet, etc) <strong>and</strong>also by <strong>the</strong> possible <strong>in</strong>fluence <strong>of</strong> genetic factors. These ideas arecerta<strong>in</strong>ly extremely important but we would like to have, <strong>in</strong> addition togeneral (but <strong>in</strong>sufficiently clear <strong>and</strong> <strong>in</strong>completely proven) ideas someamount <strong>of</strong> scientific (i. e. trustworthy) <strong>in</strong>formation.That, perhaps not cover<strong>in</strong>g <strong>the</strong> entire problem, would provide areliable foundation for some practical conclusions. Important is, for106


example, <strong>the</strong> problem <strong>of</strong> <strong>the</strong> objectively established risk factors. To<strong>the</strong>se belong, on <strong>the</strong> one h<strong>and</strong>, portents <strong>of</strong> an illness established bymodern diagnostic means (e. g. changes <strong>in</strong> electro-cardiogram), on <strong>the</strong>o<strong>the</strong>r h<strong>and</strong>, factors <strong>of</strong> life <strong>and</strong> behaviour (age, smok<strong>in</strong>g, amount <strong>of</strong>cholesterol <strong>in</strong> <strong>the</strong> blood, etc). S<strong>in</strong>ce <strong>the</strong> bus<strong>in</strong>ess concerns someprecisely determ<strong>in</strong>ed factors ra<strong>the</strong>r than vaguely underst<strong>and</strong>ableexcessive tempo <strong>of</strong> modern life, a scientific <strong>in</strong>vestigation <strong>of</strong> <strong>the</strong>ir part is<strong>in</strong> pr<strong>in</strong>ciple not unlikely.The possible ways <strong>of</strong> <strong>the</strong> development <strong>of</strong> <strong>the</strong> IHD are little known,so <strong>the</strong> statistical method <strong>of</strong> study<strong>in</strong>g it is <strong>the</strong> ma<strong>in</strong> method. As usual,expectations here will be chiefly based on rely<strong>in</strong>g that a large amount<strong>of</strong> data will be able to compensate <strong>the</strong> deficiency <strong>of</strong> <strong>in</strong>formation about<strong>the</strong> essence <strong>of</strong> <strong>the</strong> phenomenon (<strong>in</strong> this case, <strong>of</strong> <strong>the</strong> IHD). And s<strong>in</strong>cethat disease develops gradually, over many years, it is desirable that<strong>the</strong> <strong>in</strong>vestigation covers not only a large number <strong>of</strong> people, but a verylong period <strong>of</strong> time as well (if possible, <strong>the</strong>ir whole life).A s<strong>in</strong>gle exam<strong>in</strong>ation <strong>of</strong> a large number <strong>of</strong> people presents seriousdifficulties; <strong>and</strong>, tak<strong>in</strong>g <strong>in</strong>to account that people usually move severaltimes dur<strong>in</strong>g <strong>the</strong>ir lifetime, you will underst<strong>and</strong> that <strong>the</strong> real difficultiesare great. It ought to be also borne <strong>in</strong> m<strong>in</strong>d that <strong>the</strong> relative number <strong>of</strong>cases (<strong>of</strong> people f<strong>in</strong>ally develop<strong>in</strong>g <strong>the</strong> IHD) is ra<strong>the</strong>r small, so that <strong>the</strong>population to be exam<strong>in</strong>ed mostly consists <strong>of</strong> non-cases (o<strong>the</strong>r people).Therefore, <strong>the</strong> loss <strong>of</strong> a non-case by <strong>the</strong> researcher is comparativelyunimportant, but los<strong>in</strong>g at least one case is extremely undesirable.However, if we allow <strong>the</strong> loss <strong>of</strong> people (for example, occasioned by<strong>the</strong> man’s move or refusal to come for <strong>the</strong> exam<strong>in</strong>ation), we do notknow whe<strong>the</strong>r it was a case or not <strong>and</strong> it should be attempted that <strong>the</strong>losses be as small as possible, so perhaps <strong>the</strong> greatest part <strong>of</strong> <strong>the</strong> entireeffort is spent to atta<strong>in</strong> that goal.The exam<strong>in</strong>ation covered practically <strong>the</strong> whole population <strong>of</strong> asmall American town Fram<strong>in</strong>gham aged 30 – 62 years at its beg<strong>in</strong>n<strong>in</strong>g.It is go<strong>in</strong>g on for more than 20 years <strong>and</strong> <strong>the</strong> cited source reports <strong>the</strong>results <strong>of</strong> <strong>the</strong> first twelve years. They are based on <strong>in</strong>vestigat<strong>in</strong>g 2187men <strong>and</strong> 2669 women not suffer<strong>in</strong>g <strong>in</strong>itially from <strong>the</strong> IHD. Itsdevelopment dur<strong>in</strong>g those twelve years was revealed <strong>in</strong> 258 men(11.8%) <strong>and</strong> 129 women (4.8%); it was known long ago that womensuffer from IHD more rarely than men.The connection between <strong>the</strong> risk factors measured dur<strong>in</strong>g <strong>the</strong> firstexam<strong>in</strong>ation <strong>and</strong> <strong>the</strong> probability <strong>of</strong> <strong>the</strong> development <strong>of</strong> <strong>the</strong> IHD dur<strong>in</strong>g<strong>the</strong> 12 follow<strong>in</strong>g years was considered. In general, it is possible to listra<strong>the</strong>r many such factors, but only seven were taken account <strong>of</strong>:1. Age (<strong>in</strong> years). 2. Content <strong>of</strong> cholesterol <strong>in</strong> <strong>the</strong> blood serum(mm/100 millilitre). 3. Systolic blood pressure (mm <strong>of</strong> mercurycolumn). 4. Relative weight (weight expressed <strong>in</strong> per cents <strong>of</strong> man’sweight relative to mean weight for appropriate sex <strong>and</strong> stature). 5.Content <strong>of</strong> haemoglob<strong>in</strong> (g/100 millilitre). 6. Smok<strong>in</strong>g (0, nonsmokers;1, 2 <strong>and</strong> 3, smok<strong>in</strong>g less than a packet daily, smok<strong>in</strong>g apacket <strong>and</strong> more than a packet). 7. Electro-cardiogram (0, normal, 1,abnormal).107


Treat<strong>in</strong>g observations whose results depend on many factors isfraught with an absolutely general difficulty <strong>and</strong> overcom<strong>in</strong>g it waspossibly <strong>the</strong> ma<strong>in</strong> f<strong>in</strong>d<strong>in</strong>g <strong>of</strong> <strong>the</strong> work done. The po<strong>in</strong>t is that <strong>the</strong> result<strong>of</strong> observation (<strong>in</strong> this case, <strong>the</strong> emergence <strong>of</strong> <strong>the</strong> IHD) is generallyconnected with <strong>the</strong> values <strong>of</strong> <strong>the</strong> risk factors <strong>in</strong> a barely understoodway. When <strong>the</strong>re are a few such factors, one or two, say, <strong>the</strong> data areusually divided <strong>in</strong>to <strong>in</strong>tervals accord<strong>in</strong>g to <strong>the</strong>ir value; <strong>in</strong> <strong>the</strong> mostsimple case, <strong>in</strong>to two, but this is very crude <strong>and</strong> it is better to havemore.If each factor is subdivided <strong>in</strong>to several levels, all <strong>the</strong>ircomb<strong>in</strong>ations should be applied to form <strong>the</strong> appropriate groupsprovid<strong>in</strong>g <strong>the</strong> frequencies <strong>of</strong> <strong>the</strong> IHD be<strong>in</strong>g estimates <strong>of</strong> <strong>the</strong>probabilities. These will <strong>in</strong>deed adequately describe <strong>the</strong> data(somewhat roughly because <strong>the</strong> values <strong>of</strong> <strong>the</strong> risk factors areconsidered approximately).For example, <strong>the</strong> contents <strong>of</strong> cholesterol can be considered on fourlevels [...], <strong>the</strong> values <strong>of</strong> <strong>the</strong> systolic blood pressure also on four levels[...]. We <strong>the</strong>n arrange a two-dimensional classification [...] <strong>and</strong> obta<strong>in</strong>16 groups with <strong>the</strong> frequency <strong>of</strong> <strong>the</strong> emergence <strong>of</strong> <strong>the</strong> IHD calculated<strong>in</strong> each <strong>of</strong> <strong>the</strong>m not for all 4856 observations, but for <strong>the</strong>ir number <strong>in</strong><strong>the</strong> group which is 16 times smaller <strong>in</strong> <strong>the</strong> mean. Jo<strong>in</strong><strong>in</strong>g men <strong>and</strong>women toge<strong>the</strong>r will likely be thought <strong>in</strong>admissible so that <strong>the</strong> number<strong>of</strong> observations becomes about twice smaller.In general, a modest number <strong>of</strong> observations <strong>of</strong> <strong>the</strong> order <strong>of</strong> ahundred (when hav<strong>in</strong>g a great many total number <strong>of</strong> observations) willbe left for each frequency. But what happens if we add three moregroups <strong>of</strong> different ages? And four more accord<strong>in</strong>g to <strong>the</strong> <strong>in</strong>tensity <strong>of</strong>smok<strong>in</strong>g? [..] As a result, we will obta<strong>in</strong> a classification with eachgroup conta<strong>in</strong><strong>in</strong>g at best one observation <strong>and</strong> cases <strong>of</strong> no observationsat all are not excluded. Consequently, we will be unable to determ<strong>in</strong>eany probabilities. [...]The same difficulty occurs <strong>in</strong> many technical problems concern<strong>in</strong>g<strong>the</strong> reliability <strong>of</strong> mach<strong>in</strong>ery established by several types <strong>of</strong> checks.Suppose that <strong>the</strong> results <strong>of</strong> <strong>the</strong> checks arex 1 , ..., x k (2.8)<strong>and</strong> we would like to derive <strong>the</strong> probability <strong>of</strong> failure-free work as afunction p(x 1 , ..., x k ). The attempt to achieve this by multivariateanalysis will be senseless.Let us see how this problem was solved <strong>in</strong> <strong>the</strong> Fram<strong>in</strong>gham<strong>in</strong>vestigation. As far as it is possible to judge, its solution had an<strong>in</strong>disputable part, but <strong>the</strong> o<strong>the</strong>r part was absolutely illogical. This doesnot mean that it is <strong>in</strong> essence wrong, but that it possibly needs somespecification. The first part can be thus expounded.When hav<strong>in</strong>g to do with several variables, <strong>the</strong>ir only well studiedfunction is <strong>the</strong> l<strong>in</strong>ear function; <strong>the</strong>re exists an entire pert<strong>in</strong>ent science,l<strong>in</strong>ear algebra, which also partly studies <strong>the</strong> function <strong>of</strong> <strong>the</strong> seconddegree. It would <strong>the</strong>refore be expedient to represent <strong>the</strong> unknownprobability p(x 1 , ..., x k ) by a l<strong>in</strong>ear function. This, however, isobviously impossible because probability changes from 0 to 1 whereas108


a l<strong>in</strong>ear function is not restricted. We will <strong>the</strong>refore take a necessarystep to fur<strong>the</strong>r complication suppos<strong>in</strong>g thatp(x 1 , ..., x k ) = f(a 0 + a 1 x 1 + ... + a k x k )where f is a function <strong>of</strong> one variable chang<strong>in</strong>g from 0 to 1.There still rema<strong>in</strong>s <strong>the</strong> problem <strong>of</strong> choos<strong>in</strong>g f; many considerations<strong>of</strong> simplicity show that most convenient is <strong>the</strong> so-called logisticfunction1f ( y) = .y1 + e −F<strong>in</strong>ally, chang<strong>in</strong>g notation to br<strong>in</strong>g it <strong>in</strong> correspondence with <strong>the</strong> citedwork, we have <strong>the</strong> ma<strong>in</strong> hypo<strong>the</strong>sis <strong>in</strong> <strong>the</strong> form1p( x1,..., xk) =.k1−exp[ α β x ]− −∑i=1ii(2.9)This function is called <strong>the</strong> multidimensional logistic function. Wehave certa<strong>in</strong>ly not proved that <strong>the</strong> probability sought, p, must beexpounded by (2.9), but arrived at that function without mak<strong>in</strong>g anylogical absurdities.After formulat<strong>in</strong>g <strong>the</strong> ma<strong>in</strong> hypo<strong>the</strong>sis (2.9) <strong>the</strong> parameters <strong>of</strong> thatfunction ought to be estimated <strong>and</strong> it is here that <strong>the</strong> authorsdeliberately admit a logical contradiction. They suggest <strong>the</strong> model <strong>of</strong> amultidimensional normal distribution for <strong>the</strong> results <strong>of</strong> <strong>the</strong> exam<strong>in</strong>ation(2.8). This is obviously impossible because two <strong>of</strong> <strong>the</strong> seven factors,NNo. 6 <strong>and</strong> 7, are measured <strong>in</strong> discrete units so that normality isformally impossible. Then, it is ra<strong>the</strong>r strange to suppose that age isnormally distributed. In general, unlike <strong>the</strong> small illogicalities <strong>of</strong>choos<strong>in</strong>g a statistical test when data are already available (§ 2.1), herewe see a serious corruption <strong>of</strong> logic which can only be exonerated by<strong>the</strong> result obta<strong>in</strong>ed (cf. <strong>the</strong> proverb: Victors are not judged).More precisely, <strong>the</strong> ma<strong>in</strong> hypo<strong>the</strong>sis consisted <strong>in</strong> that <strong>the</strong>re are twomany-dimensional normal totalities, one consist<strong>in</strong>g <strong>of</strong> <strong>the</strong> observations<strong>of</strong> <strong>the</strong> risk factors for those who were not taken ill dur<strong>in</strong>g <strong>the</strong> next 12years, <strong>the</strong> second concerned those who were. A problem is formulatedabout <strong>the</strong> methods <strong>of</strong> dist<strong>in</strong>guish<strong>in</strong>g <strong>the</strong>se totalities.The classical supposition <strong>of</strong> <strong>the</strong> discrim<strong>in</strong>ant analysis states that <strong>the</strong>covariance matrices <strong>of</strong> both totalities are <strong>the</strong> same which leads (ara<strong>the</strong>r remarkable fact!) to <strong>the</strong> expression (2.9) which we arrived at byconsiderations <strong>of</strong> simplicity. Nowadays this probability is understoodas <strong>the</strong> posterior probability <strong>of</strong> be<strong>in</strong>g taken ill given that <strong>the</strong>observations provided values (2.8) <strong>of</strong> risk factors. This time, however,<strong>the</strong> authors also obta<strong>in</strong>ed a method <strong>of</strong> estimat<strong>in</strong>g <strong>the</strong> parameters <strong>of</strong>(2.9). It is illogical to <strong>the</strong> same extent as <strong>the</strong> supposition <strong>of</strong> normality.Our own reason<strong>in</strong>g which first led us to <strong>the</strong> expression (2.9) wouldhave led us to a quite ano<strong>the</strong>r <strong>and</strong> more complicated <strong>in</strong> <strong>the</strong>109


calculational sense method <strong>of</strong> estimation <strong>of</strong> those parameters, <strong>the</strong>method <strong>of</strong> maximal likelihood. It can also be realized <strong>and</strong> it is believedthat, for <strong>the</strong> data given, both methods provide results very close toeach o<strong>the</strong>r. It is <strong>in</strong>terest<strong>in</strong>g, however, to see what practical conclusionswere made <strong>in</strong> <strong>the</strong> cited source. After estimat<strong>in</strong>g somehow <strong>the</strong> values <strong>of</strong><strong>the</strong> parameters, we can apply formula (2.9) to f<strong>in</strong>d out <strong>the</strong> approximatevalue <strong>of</strong> <strong>the</strong> probability ˆp <strong>of</strong> develop<strong>in</strong>g <strong>the</strong> IHD for each exam<strong>in</strong>edperson dur<strong>in</strong>g <strong>the</strong> next 12 years.The highest probabilities were observed for men <strong>of</strong> 30 – 39 <strong>and</strong> 40 –49 years (0.986 <strong>and</strong> 0.742 that <strong>the</strong> IHD developed) <strong>and</strong> 50 – 62 years(0.770 did not develop). For women <strong>the</strong> probabilities <strong>of</strong> develop<strong>in</strong>g <strong>the</strong>disease were 0.838 for ages 30 – 49 <strong>and</strong> 0.773 for ages 50 – 62 [that itdid not develop?]. To a certa<strong>in</strong> extent <strong>the</strong>se results refute <strong>the</strong> classicalsay<strong>in</strong>g which served as <strong>the</strong> title <strong>of</strong> this § 2.2. True, it should be borne<strong>in</strong> m<strong>in</strong>d that formally <strong>the</strong>se figures concern forecast<strong>in</strong>g alreadyhappened events. Such a forecast <strong>of</strong> future events is only possible if<strong>the</strong> coefficients <strong>of</strong> function (2.9) are roughly <strong>the</strong> same <strong>in</strong> ano<strong>the</strong>r placeor time as those established by <strong>the</strong> authors. Such a supposition isprobable but not yet verified.Then, hav<strong>in</strong>g arranged <strong>the</strong> set <strong>of</strong> values ˆp for each exam<strong>in</strong>edperson, <strong>the</strong>y can be subdivided <strong>in</strong>to several equally numerous groups(<strong>of</strong> ten people, for example) such that those with <strong>the</strong> lowest values <strong>of</strong>ˆp are placed <strong>in</strong> <strong>the</strong> first <strong>of</strong> <strong>the</strong>m, people hav<strong>in</strong>g higher, still higher, ...values comprise <strong>the</strong> second, third, ... group. Had <strong>the</strong>re been noconnection between <strong>the</strong> considered l<strong>in</strong>ear comb<strong>in</strong>ation <strong>of</strong> risk factorswith <strong>the</strong> IHD, <strong>the</strong> number <strong>of</strong> cases <strong>of</strong> that disease <strong>in</strong> all groups wouldhave been approximately <strong>the</strong> same, but actually <strong>the</strong> emerged picture isdifferent <strong>in</strong> pr<strong>in</strong>ciple, see for example Table 3 borrowed from Truett etal (1967). The expected number <strong>of</strong> cases <strong>of</strong> <strong>the</strong> IHD was determ<strong>in</strong>edby summ<strong>in</strong>g up <strong>the</strong> probabilities ˆp <strong>of</strong> all people <strong>in</strong> <strong>the</strong> appropriategroup.In that table, we are surprised first <strong>of</strong> all by <strong>the</strong> great difference(amount<strong>in</strong>g to a few dozen times) between <strong>the</strong> sickness rate <strong>in</strong> groups<strong>of</strong> high <strong>and</strong> low risk. Second, <strong>in</strong> spite <strong>of</strong> <strong>the</strong> obvious non-normality <strong>of</strong><strong>the</strong> distributions, <strong>the</strong> results obta<strong>in</strong>ed by means <strong>of</strong> a normality modelagree well with <strong>the</strong> actual data. The isolation <strong>of</strong> groups <strong>of</strong> people witha higher danger <strong>of</strong> develop<strong>in</strong>g <strong>the</strong> IHD is thus possible by issu<strong>in</strong>g from<strong>the</strong> most simple cl<strong>in</strong>ical exam<strong>in</strong>ation (provid<strong>in</strong>g <strong>the</strong> listed above riskfactors). The same conclusion can be made when consider<strong>in</strong>g <strong>the</strong> datarepresented <strong>in</strong> separate age groups.However, it should not be thought that those results are reallysuitable for <strong>in</strong>dividual forecasts. Those can only be successful forcases <strong>of</strong> very high or very low <strong>in</strong>dividual risk ˆp but for all <strong>the</strong> totality<strong>the</strong> result would have been bad. This is connected with <strong>the</strong> IHDoccurr<strong>in</strong>g never<strong>the</strong>less rarely (11.8% <strong>in</strong> <strong>the</strong> mean for 12 years).Indeed, issu<strong>in</strong>g from <strong>the</strong> values <strong>of</strong> ˆp we can only forecast <strong>the</strong> disease<strong>in</strong> people with a sufficiently high ˆp <strong>and</strong> <strong>the</strong> opposite for all <strong>the</strong> o<strong>the</strong>rs.When choos<strong>in</strong>g <strong>the</strong> boundary <strong>of</strong> <strong>the</strong> group with <strong>the</strong> highest risk <strong>in</strong>Table 3 as <strong>the</strong> critical value <strong>of</strong> ˆp , a forecast <strong>of</strong> <strong>the</strong> disease would havebeen wrong <strong>in</strong> 100 – 37.5 = 62.5% <strong>of</strong> cases. And on <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>110


258 – 82 – 176 cases or 68.5% <strong>of</strong> all cases <strong>of</strong> <strong>the</strong> disease would haveoccurred <strong>in</strong> spite <strong>of</strong> our promise <strong>of</strong> <strong>the</strong> opposite. The problem <strong>of</strong><strong>in</strong>dividual forecast is <strong>the</strong>refore far from be<strong>in</strong>g solved.Let us now have a look at <strong>the</strong> estimates <strong>of</strong> <strong>the</strong> coefficients <strong>in</strong>formula (2.9) <strong>and</strong> at <strong>the</strong> possible conclusions. These estimates differfor different age groups <strong>and</strong> also for women <strong>and</strong> men. For <strong>the</strong> group <strong>of</strong>men <strong>of</strong> all ages <strong>the</strong> estimate ˆα = − 10.8986. O<strong>the</strong>r estimates are shown<strong>in</strong> Table 4. It is seen <strong>the</strong>re that <strong>the</strong> coefficients <strong>of</strong> two factors out <strong>of</strong>seven (relative weight <strong>and</strong> haemoglob<strong>in</strong>) comparatively little exceed<strong>the</strong>ir mean square errors <strong>in</strong> absolute value. They should be recognizedas less <strong>in</strong>fluenc<strong>in</strong>g <strong>the</strong> IHD than <strong>the</strong> o<strong>the</strong>r five. One <strong>of</strong> those five, <strong>the</strong>age, can not be changed, <strong>and</strong> it is convenient to refer <strong>the</strong> action <strong>of</strong> <strong>the</strong>rest <strong>of</strong> <strong>the</strong>m with <strong>the</strong> <strong>in</strong>fluence <strong>of</strong> age.For example, a daily packet <strong>of</strong> cigarettes provides 2 po<strong>in</strong>ts <strong>and</strong> <strong>the</strong>correspond<strong>in</strong>g <strong>in</strong>crement <strong>of</strong> <strong>the</strong> l<strong>in</strong>ear function is 0.7220 whichapproximately means 10 years <strong>of</strong> age. In o<strong>the</strong>r words, smok<strong>in</strong>g apacket daily br<strong>in</strong>gs forward by ten years <strong>the</strong> occurrence <strong>of</strong> myocardial<strong>in</strong>fraction. This figure rema<strong>in</strong>s approximately <strong>the</strong> same <strong>in</strong> <strong>the</strong> differentage groups <strong>of</strong> men. For women, <strong>the</strong> harm <strong>of</strong> smok<strong>in</strong>g is representedessentially weaker. It is not quite clear why, ei<strong>the</strong>r <strong>the</strong> figuresrepresent reality or <strong>the</strong> number <strong>of</strong> smok<strong>in</strong>g women was small (1562women out <strong>of</strong> 2669 did not smoke at all, <strong>and</strong> only 301 used a packetdaily) <strong>and</strong> <strong>the</strong> statistics was <strong>in</strong>complete.It is <strong>in</strong>convenient to compare <strong>the</strong> <strong>in</strong>fluence <strong>of</strong> cholesterol <strong>and</strong> bloodpressure with <strong>the</strong> action <strong>of</strong> age by means <strong>of</strong> Table 4. The po<strong>in</strong>t is that<strong>the</strong> coefficients <strong>of</strong> <strong>the</strong> l<strong>in</strong>ear comb<strong>in</strong>ation <strong>of</strong> any dimensionalmagnitudes are also dimensional (whereas <strong>in</strong> our case, it is dem<strong>and</strong>edthat <strong>the</strong> l<strong>in</strong>ear comb<strong>in</strong>ation be dimensionless). The comparison <strong>of</strong> <strong>the</strong>type we have applied leads, for example, to such a result as 7mg % 19 <strong>of</strong>cholesterol is equivalent to a year <strong>of</strong> life. We know very well what is ayear <strong>of</strong> life, but is 7mg % much or little?For answer<strong>in</strong>g that question we ought to know how large are <strong>the</strong>fluctuations <strong>of</strong> <strong>the</strong> content <strong>of</strong> cholesterol. Or, that content be dividedby its mean square deviation <strong>and</strong> thus expressed as a dimensionlessmagnitude. After accomplish<strong>in</strong>g this procedure with all <strong>the</strong> riskfactors, a comparison <strong>of</strong> <strong>the</strong>ir coefficients provides <strong>the</strong> follow<strong>in</strong>garrangement <strong>of</strong> <strong>the</strong> factors <strong>in</strong> a decreas<strong>in</strong>g order <strong>of</strong> importance: age,cholesterol, smok<strong>in</strong>g, blood pressure, abnormal electro-cardiogram.Weight <strong>and</strong> haemoglob<strong>in</strong> <strong>in</strong>fluence less. The somewhat conditionalcharacter <strong>of</strong> such norm<strong>in</strong>g that has only a statistical sense should beunderst<strong>and</strong>able.A question such as <strong>the</strong> follow<strong>in</strong>g naturally comes to m<strong>in</strong>d: If,accord<strong>in</strong>g to Table 4, 7mg % cholesterol is equivalent to <strong>the</strong> samenumber <strong>of</strong> years <strong>of</strong> life as 4.5mm <strong>of</strong> mercury column <strong>of</strong> bloodpressure, <strong>the</strong>n what is easier to decrease, <strong>the</strong> former by 7mg % or <strong>the</strong>latter by 4.5mm <strong>of</strong> mercury column? Or, <strong>the</strong> question is formulatedabout dimensionless magnitudes concern<strong>in</strong>g <strong>the</strong> range <strong>of</strong> <strong>the</strong> scatter,but this is not <strong>the</strong> same. Concern<strong>in</strong>g <strong>the</strong> comparative possibilities <strong>of</strong><strong>in</strong>fluenc<strong>in</strong>g <strong>the</strong> cholesterol <strong>and</strong> blood pressure, we likely enter ascientific cul-de-sac: <strong>in</strong> all probability, those are quantitatively<strong>in</strong>comparable. In general, <strong>the</strong> dom<strong>in</strong>ant present style <strong>of</strong> reason<strong>in</strong>g111


when formulat<strong>in</strong>g problems about optimal solutions concern<strong>in</strong>geveryth<strong>in</strong>g happen<strong>in</strong>g <strong>in</strong> life usually very soon leads to a cul-de-sac.In addition, <strong>the</strong> cited <strong>in</strong>vestigation concerned a totality <strong>of</strong> men thatwas never attempted to be <strong>in</strong>fluenced by pharmacological means. Andit is absolutely unclear whe<strong>the</strong>r its quantitative characterization willpersist had it been o<strong>the</strong>rwise. Most likely, not, <strong>and</strong> that anycomparison <strong>of</strong> 7mm % cholesterol with 4.5mm mercury column <strong>of</strong>systolic blood pressure becomes senseless.Thus, we can not at all attribute practical significance to <strong>the</strong> cited<strong>in</strong>vestigation <strong>of</strong> be<strong>in</strong>g able to <strong>in</strong>dicate a desired way <strong>of</strong> <strong>in</strong>fluenc<strong>in</strong>g riskfactors for prevent<strong>in</strong>g <strong>the</strong> IHD. In my op<strong>in</strong>ion, that <strong>in</strong>vestigation hasno purely applied medical significance at all. However, <strong>the</strong>re exists acerta<strong>in</strong> set <strong>of</strong> studies for which its role should be essential. I bear <strong>in</strong>m<strong>in</strong>d <strong>the</strong> exam<strong>in</strong>ation <strong>of</strong> various medic<strong>in</strong>al preparations. Statistical<strong>in</strong>vestigation is <strong>the</strong> only way to obta<strong>in</strong> trustworthy results about <strong>the</strong>irefficacy.In <strong>the</strong> most simple case two totalities <strong>of</strong> men are formed by r<strong>and</strong>omselection, one <strong>of</strong> <strong>the</strong>m (<strong>the</strong> experimental group) is treated by apreparation, <strong>the</strong> o<strong>the</strong>r one (<strong>the</strong> control group) gets placebo, a harmlesssubstance similar <strong>in</strong> appearance. Results are compared, <strong>and</strong> nei<strong>the</strong>r is itforbidden to compare <strong>the</strong>m with general statistical <strong>in</strong>formation about<strong>the</strong> mean rate <strong>of</strong> <strong>the</strong> IHD <strong>in</strong> a city, country, etc. However, <strong>the</strong> so-calledplacebo effect is regrettably <strong>of</strong>ten revealed <strong>in</strong> modern cardiology.It consists <strong>in</strong> that <strong>the</strong> results <strong>of</strong> both groups practically co<strong>in</strong>cidewhereas <strong>the</strong> mean results for a larger totality are much worse 20 . Thiscan be ra<strong>the</strong>r likely expla<strong>in</strong>ed: apart from medic<strong>in</strong>es, <strong>the</strong> physicianapplies o<strong>the</strong>r means for help<strong>in</strong>g patients like advis<strong>in</strong>g him/her about arational way <strong>of</strong> life. Dur<strong>in</strong>g modern experiments, even <strong>the</strong> doctor doesnot know to which group does a given man belong. As a result, aga<strong>in</strong>st<strong>the</strong> background <strong>of</strong> general treatment by a skilled physician, <strong>the</strong> effect<strong>of</strong> chemo<strong>the</strong>rapy, if it exists, is absolutely imperceptible. On <strong>the</strong> o<strong>the</strong>rh<strong>and</strong>, <strong>the</strong> overwhelm<strong>in</strong>g majority <strong>of</strong> <strong>the</strong> population ei<strong>the</strong>r does notvisit physicians, or get treated by less skilled specialists, <strong>and</strong> <strong>the</strong>results are much worse 21 .The placebo effect makes it impossible to judge <strong>the</strong> real benefit <strong>of</strong>pharmacological means. It can be supposed that <strong>the</strong> po<strong>in</strong>t is, that <strong>the</strong>frequency <strong>of</strong> <strong>the</strong> occurrence <strong>of</strong> <strong>the</strong> IHD is not so high. Whenconsider<strong>in</strong>g <strong>the</strong> extreme case by suppos<strong>in</strong>g that some sickness rate is1% <strong>and</strong> that some prevent<strong>in</strong>g means lowers it to 0.5%, any reliableestimation <strong>of</strong> <strong>the</strong> efficacy <strong>of</strong> <strong>the</strong> preparation applied should be basedon at least a thous<strong>and</strong> patients (on two thous<strong>and</strong> when count<strong>in</strong>g <strong>the</strong>control group). And <strong>in</strong> that control group <strong>the</strong> number <strong>of</strong> people takenill µ 1 will obey <strong>the</strong> Poisson distribution with parameter 10, <strong>and</strong> <strong>in</strong> <strong>the</strong>experimental group, <strong>the</strong>ir number µ 2 will obey <strong>the</strong> same distributionwith parameter 5. The probability <strong>of</strong> a wrong result (µ 2 ≥ µ 1 ) will be,roughly, Φ( − 5 / 15) = Φ( −1.29) ≈ 0.10, which is not so small 22 .However, had we been able to select for <strong>the</strong> experiment a group <strong>of</strong>people with probability <strong>of</strong> be<strong>in</strong>g taken ill <strong>of</strong> 20%, <strong>the</strong>n for <strong>the</strong> sametw<strong>of</strong>old decrease <strong>of</strong> <strong>the</strong> sickness rate (down to 10% due to <strong>the</strong> action<strong>of</strong> <strong>the</strong> preparation) 100 patients will much better satisfy us.112


The Fram<strong>in</strong>gham <strong>in</strong>vestigation <strong>in</strong>deed <strong>in</strong>dicates that <strong>in</strong> pr<strong>in</strong>ciplechoos<strong>in</strong>g people with a many times greater probability <strong>of</strong> be<strong>in</strong>g takenill is possible by issu<strong>in</strong>g from a simple medical exam<strong>in</strong>ation. Note thatwe supposed that <strong>the</strong> relative efficacy <strong>of</strong> a preparation is <strong>the</strong> same forall <strong>the</strong> risk groups. If this premise seems unfounded, it was perhapsunreasonable to restrict <strong>the</strong> experiment by <strong>the</strong> group <strong>of</strong> highest risk.However, afterwards it is extremely necessary <strong>in</strong> any case to separate<strong>the</strong> exam<strong>in</strong>ed patients <strong>in</strong>to risk groups for check<strong>in</strong>g whe<strong>the</strong>r anyth<strong>in</strong>gis (more <strong>in</strong>structive than <strong>the</strong> placebo effect) will be found when groups<strong>of</strong> roughly <strong>the</strong> same risk are compared with each o<strong>the</strong>r. In o<strong>the</strong>r words,<strong>the</strong> results <strong>of</strong> <strong>the</strong> From<strong>in</strong>gham <strong>in</strong>vestigation should become aconstituent part <strong>of</strong> treat<strong>in</strong>g <strong>the</strong> results <strong>of</strong> exam<strong>in</strong><strong>in</strong>g (each or at leastmany) medical preparations.The stated above method naturally applies not only to medicalpreparations but also to many technological means <strong>in</strong>tended toheighten <strong>the</strong> reliability <strong>of</strong> <strong>the</strong> work <strong>of</strong> <strong>the</strong> mach<strong>in</strong>ery.Because <strong>the</strong> Fram<strong>in</strong>gham experiment is so methodically important,<strong>the</strong> problem <strong>of</strong> its results be<strong>in</strong>g justified is raised. It seems that thatstudy provides an example <strong>of</strong> obta<strong>in</strong><strong>in</strong>g important results as well as <strong>of</strong>a comprehensive discussion <strong>of</strong> possible doubts <strong>and</strong> objections by <strong>the</strong>authors <strong>the</strong>mselves.The authors published not only positive, but also some negativeresults. Thus, all <strong>the</strong> numerical estimates were based on 12 years <strong>of</strong>observ<strong>in</strong>g a certa<strong>in</strong> population, but are <strong>the</strong>y applicable to o<strong>the</strong>rpopulations? Or, to ask quite sharply, is not <strong>the</strong> observed arrangement<strong>in</strong>to groups <strong>of</strong> different risks just an artefact connected with select<strong>in</strong>g ara<strong>the</strong>r large number <strong>of</strong> parameters? It is <strong>in</strong>deed known that anarrangement <strong>of</strong> an already collected material accord<strong>in</strong>g to a largenumber <strong>of</strong> <strong>in</strong>dications can be done not badly, but that <strong>the</strong> obta<strong>in</strong>edformulas quit be<strong>in</strong>g useful when new observations are added.The authors attempted to apply <strong>the</strong> obta<strong>in</strong>ed expression for ˆp toisolate those totalities <strong>in</strong> which new cases <strong>of</strong> <strong>the</strong> IHD must occur (after<strong>the</strong> 12 years <strong>of</strong> observation). This experiment was quite successful formen aged 30 – 39 years <strong>and</strong> women aged 30 – 49 years: 8 new casesfor men from a high risk group <strong>and</strong> 10 for women; 2 <strong>and</strong> 5 for thosefrom groups <strong>of</strong> low risk. However, for o<strong>the</strong>r ages <strong>the</strong> experimentproved a real failure. The possible explanation is that for groups moreadvanced <strong>in</strong> age such a simple medical exam<strong>in</strong>ation made more than12 years ago was not <strong>in</strong>dicative at all.As to <strong>the</strong> applicability <strong>of</strong> <strong>the</strong> concrete numerical values <strong>of</strong> <strong>the</strong>coefficients <strong>of</strong> <strong>the</strong> function (2.9), this question can only be solvedexperimentally. I do not know whe<strong>the</strong>r that was accomplished. Usualestimates based on <strong>the</strong> model <strong>of</strong> two normal totalities do not admit <strong>the</strong>possibility <strong>of</strong> an artefact.Thus, when analyz<strong>in</strong>g one-dimensional samples (§ 2.1), it waspossible to be directly conv<strong>in</strong>ced <strong>in</strong> <strong>the</strong> correctness <strong>of</strong> <strong>the</strong> resultssimply by look<strong>in</strong>g at <strong>the</strong> data represented <strong>in</strong> a form convenient forunderst<strong>and</strong><strong>in</strong>g. For a multivariate analysis such representation isimpossible which seriously hampers statistical studies <strong>in</strong> manydimensionalspaces.113


2.3. Periodogram for damp<strong>in</strong>g fluctuations. The publication <strong>of</strong><strong>the</strong> two previous booklets afforded me <strong>the</strong> pleasure <strong>of</strong> a veryremarkable acqua<strong>in</strong>tance with Tim<strong>of</strong>eev, pr<strong>of</strong>essor at <strong>the</strong> generallyknown Len<strong>in</strong>grad Electro-Technical Institute. Vladimir Andreevichregrettably died 5 April 1975, but he left a few books (1960; 1973;1975) describ<strong>in</strong>g <strong>the</strong> now little known world <strong>of</strong> practically effectivema<strong>the</strong>matical methods. In connection with <strong>the</strong> wide development <strong>of</strong>computer ma<strong>the</strong>matics, which extremely broadened <strong>the</strong> scope <strong>of</strong>practically possible calculations, <strong>the</strong> attention to simple, <strong>in</strong> particulargraphical methods <strong>of</strong> calculation weakened. For example, I have onlycome to know what is a Lille orthogon 23 from Tim<strong>of</strong>eev’s books (it isa graphical procedure for calculat<strong>in</strong>g <strong>the</strong> values <strong>of</strong> a polynomial alsoapplicable for deriv<strong>in</strong>g <strong>the</strong> root [?]).Of course, a computer can accomplish this <strong>in</strong>comparably faster, but<strong>in</strong> those cases <strong>in</strong> which <strong>the</strong> polynomial appears <strong>in</strong> a technical problem(<strong>and</strong> it is possible to <strong>in</strong>fluence its coefficients for select<strong>in</strong>g somesuitable version) a graphical solution is preferable. At <strong>the</strong> same timemany such methods are non-trivial <strong>in</strong>ventions (similar to <strong>the</strong> <strong>in</strong>vention<strong>of</strong> mach<strong>in</strong>es or mechanisms) almost impossible to hit upon by oneself.These <strong>in</strong>ventions were be<strong>in</strong>g made over centuries, but now much less<strong>in</strong>terest is regrettably shown for <strong>the</strong>m.Here also <strong>the</strong> metaphor compar<strong>in</strong>g <strong>the</strong> progress <strong>of</strong> science with <strong>the</strong>development <strong>of</strong> a new territory (§ 1.3) comparatively accuratelydescribes <strong>the</strong> picture: <strong>the</strong> dem<strong>and</strong> for certa<strong>in</strong> products fell, <strong>and</strong> <strong>the</strong>settlements exist<strong>in</strong>g on <strong>the</strong>ir manufactur<strong>in</strong>g are ab<strong>and</strong>oned. In science,like <strong>in</strong> o<strong>the</strong>r no less serious th<strong>in</strong>gs, much depends on whims <strong>of</strong>fashion.It is <strong>in</strong>terest<strong>in</strong>g to describe Tim<strong>of</strong>eev’s op<strong>in</strong>ion about <strong>the</strong> speculative(as I named it here) criticism <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. It <strong>in</strong>variablystates that we are unable to prove logically that <strong>the</strong> premises <strong>of</strong> <strong>the</strong><strong>the</strong>ory are feasible. Tim<strong>of</strong>eev noted that <strong>in</strong> essence such reason<strong>in</strong>galways has <strong>the</strong> form <strong>of</strong> reduction ad absurdum, but that that method,widely used <strong>in</strong> ma<strong>the</strong>matics, is not ma<strong>the</strong>matical but judicial, wasobligatory <strong>in</strong> courts <strong>of</strong> ancient Greece exactly at <strong>the</strong> time whengeometry had been formed. Perhaps it came to ma<strong>the</strong>matics fromplead<strong>in</strong>gs.And now <strong>the</strong> periodogram. Periodic dependences should be isolated<strong>in</strong> a series <strong>of</strong> observations x 0 , x 1 , ..., x n (or, for <strong>the</strong> cont<strong>in</strong>uous case, <strong>in</strong>x(t), 0≤ t ≤ T). To achieve this aim, some method <strong>of</strong> compar<strong>in</strong>g <strong>the</strong>observations with an ideally periodic functione iωt = cosωt + is<strong>in</strong>ωtor with some o<strong>the</strong>r periodic function is applied. Most complicated is<strong>the</strong> determ<strong>in</strong>ation <strong>of</strong> a latent period (or a few periods) <strong>in</strong> ourobservations.In ma<strong>the</strong>matical statistics <strong>the</strong>re is a pert<strong>in</strong>ent method consist<strong>in</strong>g <strong>of</strong>calculat<strong>in</strong>g <strong>the</strong> expressionTIT(ω) = | ∫ e x( t) dt |0−iωt2114


or, for discrete observations,IT(ω) = | ∑ e x |nt=0−iωt2tat many values <strong>of</strong> ω <strong>and</strong> determ<strong>in</strong><strong>in</strong>g one or a few maxima <strong>of</strong> I T (ω).Actually, however, as is possible to f<strong>in</strong>d <strong>in</strong> Tim<strong>of</strong>eev’s book 24 , <strong>the</strong>reexist a few o<strong>the</strong>r expressions differ<strong>in</strong>g from I T (ω) by <strong>the</strong> limits <strong>of</strong><strong>in</strong>tegration (summation) <strong>and</strong> also by tak<strong>in</strong>g <strong>in</strong>to account not only <strong>the</strong>modulus, but <strong>the</strong> argument <strong>of</strong> <strong>the</strong> complex magnitudes as well.Various graphs are thus obta<strong>in</strong>ed whose behaviour at different ωallows to localize <strong>the</strong> possible periods conta<strong>in</strong>ed <strong>in</strong> <strong>the</strong> observations.Strictly speak<strong>in</strong>g, Tim<strong>of</strong>eev’s considerations are not stochastic s<strong>in</strong>ce astochastic approach dem<strong>and</strong>s to apply <strong>the</strong> notion <strong>of</strong> an ensemble <strong>of</strong>imag<strong>in</strong>ed realizations <strong>of</strong> <strong>the</strong> observations (<strong>of</strong> which we see only one)<strong>and</strong> to make estimations based on <strong>the</strong>se notions <strong>of</strong> <strong>the</strong> statisticalsignificance <strong>of</strong> <strong>the</strong> isolated periods. This is possible if certa<strong>in</strong>assumptions about <strong>the</strong> applied model describ<strong>in</strong>g <strong>the</strong> observations aremade.For example, it is very convenient if <strong>the</strong> model, <strong>in</strong> <strong>the</strong> discrete case,is <strong>of</strong> <strong>the</strong> k<strong>in</strong>dnx = ∑ a s<strong>in</strong>(ω t + φ ) + ξ(2.10)t k k k tk = 1where ξ 0 , ξ 1 , ..., ξ n are <strong>in</strong>dependent identically distributed r<strong>and</strong>omvariables. In <strong>the</strong> cont<strong>in</strong>uous case such a model with <strong>in</strong>dependence <strong>of</strong><strong>the</strong> values <strong>of</strong> noise ξ(t) at any no matter how close values <strong>of</strong> t (whitenoise) is less realistic. It is possible to provide a number <strong>of</strong> physicalexamples where model (2.10) is realistic.The first example to come across I can mention, can be provided by<strong>the</strong> observations <strong>of</strong> <strong>the</strong> brightness <strong>of</strong> a variable star if measured not to<strong>of</strong>requently. The reasonableness <strong>of</strong> <strong>the</strong> use <strong>of</strong> <strong>the</strong> periodogram method<strong>and</strong> <strong>the</strong> possibility <strong>of</strong> certa<strong>in</strong> estimates <strong>of</strong> significance <strong>in</strong> such cases isdoubtless. However, <strong>in</strong> more complicated cases, <strong>in</strong> which we areunable to discuss a stochastic model <strong>of</strong> noise corrupt<strong>in</strong>g <strong>the</strong> process,stochastic statistical considerations with estimation <strong>of</strong> significance areimpossible.Bas<strong>in</strong>g myself chiefly on <strong>the</strong> application <strong>of</strong> <strong>the</strong> periodogram methodto series <strong>in</strong> economics, I [ii] formulated a number <strong>of</strong> scepticalcomments on <strong>the</strong> actually achieved success. It were <strong>the</strong>se remarks thatprompted <strong>the</strong> only scientifically doubtless objection mentioned above<strong>in</strong> <strong>the</strong> Introduction, <strong>and</strong> it came from Tim<strong>of</strong>eev. He <strong>in</strong>dicated a very<strong>in</strong>terest<strong>in</strong>g unexpected example <strong>of</strong> a practical application <strong>of</strong>periodograms. To repeat, <strong>in</strong> this case <strong>the</strong> study has no stochasticessence (isolation <strong>of</strong> some undeniable peaks on <strong>the</strong> periodogramwhose significance is not needed to estimate). [The author describesthat successful <strong>and</strong> important <strong>in</strong>dustrial case.]115


Conclusion. Some Problems<strong>of</strong> <strong>the</strong> Current Development <strong>of</strong> <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>The examples provided <strong>in</strong> Chapter 2 were aimed at illustrat<strong>in</strong>g <strong>the</strong>idea that <strong>the</strong> problem concern<strong>in</strong>g <strong>the</strong> boundaries <strong>of</strong> applicability <strong>of</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability can not be solved speculatively, by logicaljustification (or by justify<strong>in</strong>g <strong>the</strong> opposite). Nei<strong>the</strong>r does a s<strong>in</strong>glepractical success scientifically assure us <strong>in</strong> <strong>the</strong> correctness <strong>of</strong> a<strong>the</strong>oretical concept. [...]Only prolonged studies last<strong>in</strong>g many years (almost 20 years <strong>in</strong> <strong>the</strong>Fram<strong>in</strong>gham <strong>in</strong>vestigation (§ 2.2)) <strong>and</strong> even carried out by manygenerations <strong>of</strong> scientists (like <strong>the</strong> study <strong>of</strong> problems <strong>of</strong> heredityorig<strong>in</strong>ated by Mendel) provide a reliable result. In a purely methodicalsense such studies ensure complete possibility <strong>of</strong> experimental checks<strong>of</strong> many stochastic assumptions. In particular, checks <strong>of</strong> statisticalhomogeneity (for example, by non-parametric criteria fordist<strong>in</strong>guish<strong>in</strong>g two empirical distribution functions), <strong>of</strong> confidence<strong>in</strong>tervals (recall my rejection <strong>of</strong> that <strong>in</strong>terval for <strong>the</strong> mass <strong>of</strong> Jupiter <strong>in</strong>§ 1.1) <strong>and</strong> <strong>of</strong> much more.And so, it is wrong that no experimental checks are threaten<strong>in</strong>gthose premises (Alimov’s objection). However, if simply collect<strong>in</strong>g <strong>the</strong>(statistical or not) ensemble <strong>of</strong> all <strong>the</strong> <strong>in</strong>stances <strong>in</strong> which stochasticmethods are applied, <strong>and</strong> f<strong>in</strong>d out <strong>in</strong> how many cases Alimov <strong>and</strong> Iwere <strong>in</strong> <strong>the</strong> right, <strong>the</strong>n, as I fear, he would have collected anoverwhelm<strong>in</strong>g majority <strong>of</strong> votes. I would have to take cover beh<strong>in</strong>d <strong>the</strong>argument that <strong>in</strong> science a numerical majority <strong>of</strong> votes might meannoth<strong>in</strong>g.All <strong>the</strong> circumstances concern one aspect <strong>of</strong> <strong>the</strong> problem, <strong>of</strong> what<strong>and</strong> under which conditions can <strong>the</strong>ory give to practice. Let us try toth<strong>in</strong>k what, on <strong>the</strong> contrary, can practice give to <strong>the</strong>ory. Forma<strong>the</strong>matics, this is a venerable question <strong>and</strong> most extremely pert<strong>in</strong>entop<strong>in</strong>ions had been voiced. I beg<strong>in</strong> by quot<strong>in</strong>g <strong>the</strong> op<strong>in</strong>ion <strong>of</strong> <strong>the</strong>celebrated French ma<strong>the</strong>matician Dieudonne (1966, p. 11; translatednow from Russian):In conclud<strong>in</strong>g, I would wish to stress how little does <strong>the</strong> most recenthistory exonerate <strong>the</strong> pious banalities <strong>of</strong> <strong>the</strong> soothsayers <strong>of</strong> a break-upwho are regularly warn<strong>in</strong>g us about <strong>the</strong> pernicious consequences thatma<strong>the</strong>matics will unavoidably attract to itself by ab<strong>and</strong>on<strong>in</strong>gapplications to o<strong>the</strong>r sciences. I do not wish to say that a close contactwith o<strong>the</strong>r fields such as <strong>the</strong>oretical physics is not beneficial for bothsides. It is absolutely clear, however, that among all <strong>the</strong> astonish<strong>in</strong>gachievements described, not a s<strong>in</strong>gle one, possibly except<strong>in</strong>g <strong>the</strong><strong>the</strong>ory <strong>of</strong> distributions, is at all suitable for be<strong>in</strong>g applied <strong>in</strong> physics.Even <strong>in</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> partial differential equations <strong>the</strong> emphasis is nowmuch more on <strong>the</strong> <strong>in</strong>ternal <strong>and</strong> structural problems than on thosehav<strong>in</strong>g a direct physical significance.Even if ma<strong>the</strong>matics be cut <strong>of</strong>f forcibly from all <strong>the</strong> o<strong>the</strong>r streams <strong>of</strong>human activity, it will still have food enough for centuries <strong>of</strong> thoughtabout great problems which we still ought to solve <strong>in</strong> our own science.116


What objections can be made? First, s<strong>in</strong>ce <strong>the</strong> problem concerns an<strong>in</strong>terval <strong>of</strong> a few centuries <strong>of</strong> time, it will be advisable to turn tohistory <strong>and</strong> look whe<strong>the</strong>r <strong>the</strong>re are examples <strong>of</strong> what is happen<strong>in</strong>g withsome fields <strong>of</strong> <strong>in</strong>tellectual activity <strong>the</strong> <strong>in</strong>terest <strong>in</strong> which is preserved aslong as that. An example <strong>of</strong> such an activity is <strong>the</strong> scholasticism <strong>of</strong> <strong>the</strong>Middle Ages.Scholastics were clever <strong>and</strong> diligent. In any case, <strong>the</strong> volume <strong>of</strong><strong>the</strong>ir contributions was <strong>of</strong> <strong>the</strong> same order as, say, those <strong>of</strong> Laplace (<strong>the</strong>amount <strong>of</strong> paper that a man can cover with writ<strong>in</strong>g dur<strong>in</strong>g his lifetimelikely little depends on <strong>the</strong> contents <strong>of</strong> <strong>the</strong> written). Universities <strong>and</strong>academies had been <strong>in</strong>itially created for scholastics because <strong>of</strong> <strong>the</strong>importance <strong>of</strong> <strong>the</strong> moral <strong>and</strong> ethical applications <strong>of</strong> <strong>the</strong>ir work, actualor imag<strong>in</strong>ed. No one had expelled <strong>the</strong>m from those <strong>in</strong>stitutions with ared-hot broom 25 , but it somehow happened all by itself that scientists,physicists, ma<strong>the</strong>maticians, chemists etc., had occupied <strong>the</strong>ir places.Why did that occur?I believe that <strong>the</strong> reason was that scholasticism had graduallywithdrawn <strong>in</strong>to its own bus<strong>in</strong>ess <strong>and</strong> quit to provide society solutions<strong>of</strong> moral problems essential for everyday life. For example, now, as <strong>in</strong><strong>the</strong> Middle Ages, each solves for himself whe<strong>the</strong>r to marry or not.Scholastics naturally discussed that problem but <strong>the</strong>ir answers becamelong summaries <strong>of</strong> <strong>the</strong> diverse op<strong>in</strong>ions <strong>of</strong> fa<strong>the</strong>rs <strong>of</strong> <strong>the</strong> church <strong>and</strong>ancient philosophers 26 . What <strong>the</strong>n should have done a practicallywork<strong>in</strong>g clergyman when asked by his parishioner?Such questions gradually occurred unbecom<strong>in</strong>g to serious science<strong>and</strong> <strong>the</strong>n it somehow happened all by itself that <strong>the</strong> society had begunto consider unbecom<strong>in</strong>g scholasticism as a whole. This examplecompels us to th<strong>in</strong>k carefully what would have happened toma<strong>the</strong>matics had it been for centuries cut <strong>of</strong>f from all <strong>the</strong> streams <strong>of</strong>activities. The action <strong>of</strong> <strong>the</strong> ensu<strong>in</strong>g phenomenon would have beenvery simple: <strong>the</strong> number <strong>of</strong> young men wish<strong>in</strong>g to devote <strong>the</strong>mselvesto ma<strong>the</strong>matics would have gradually decreased s<strong>in</strong>ce those o<strong>the</strong>rstreams <strong>in</strong>deed play <strong>the</strong> most important part <strong>in</strong> attract<strong>in</strong>g <strong>the</strong>ir <strong>in</strong>terest.However, f<strong>in</strong>ally it is possible to admit, <strong>and</strong> Dieudonne’s articleconv<strong>in</strong>ces us, that <strong>the</strong>re exists ma<strong>the</strong>matics <strong>of</strong> different types; one isdirected towards its own <strong>in</strong>terests, <strong>the</strong> o<strong>the</strong>r one, towards applications.Both have a quite lawful right to exist because, for example, <strong>the</strong>Kolmogorov axiomatics <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability, necessary <strong>in</strong> asense for applications as well, had emerged on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> <strong>the</strong>ory<strong>of</strong> functions <strong>of</strong> a real variable (obviously belong<strong>in</strong>g to <strong>the</strong> first type).But <strong>the</strong>n, to which type does <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability belong?The dist<strong>in</strong>guish<strong>in</strong>g feature <strong>of</strong> ma<strong>the</strong>matics <strong>of</strong> <strong>the</strong> first type is itssomewhat special elegance (present<strong>in</strong>g a comparative simplicity whichmakes it possible to perceive that quality). The <strong>the</strong>ory <strong>of</strong> probabilityhas ra<strong>the</strong>r many results <strong>of</strong> exactly that k<strong>in</strong>d, mostly connected with <strong>the</strong>Kolmogorov axiomatics <strong>and</strong> resembl<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> functions <strong>of</strong> areal variable. However, <strong>the</strong> ma<strong>in</strong> contents <strong>of</strong> that science hav<strong>in</strong>g beenformed at <strong>the</strong> time <strong>of</strong> Laplace 27 , develop<strong>in</strong>g after him <strong>and</strong> be<strong>in</strong>gelaborated nowadays is not, alas! beautiful at all. For example, limit<strong>the</strong>orems are usually ra<strong>the</strong>r decently formulated, but as a rule <strong>the</strong>irpro<strong>of</strong>s are helplessly long, difficult <strong>and</strong> entangled. Their sole raison117


d’être consists <strong>in</strong> obta<strong>in</strong><strong>in</strong>g comparatively simple stochasticdistributions possibly describ<strong>in</strong>g some real phenomena 28 .Turn<strong>in</strong>g to reality always refreshes whereas sever<strong>in</strong>g <strong>the</strong>connections with it spells danger <strong>of</strong> a scholastic degeneration.Ma<strong>the</strong>matics is wonderful, but at <strong>the</strong> h<strong>and</strong>s <strong>of</strong> its separaterepresentatives it can degenerate, for example <strong>in</strong>to scholasticism. It isregrettably sufficiently easy to overstep <strong>the</strong> limits beyond whichbeg<strong>in</strong>s scholasticism. Internal problems strongly attract. A man alwayswishes to tidy his own home both because it is his home <strong>and</strong> because itis <strong>the</strong> easiest. So where is that dangerous limit overstepp<strong>in</strong>g which wewill only be floor polish<strong>in</strong>g <strong>in</strong> our own apartment without provid<strong>in</strong>ganyth<strong>in</strong>g for <strong>the</strong> society? That limit is only well seen <strong>in</strong> a historicalperspective, but at each moment it is extremely <strong>in</strong>def<strong>in</strong>ite <strong>and</strong>unsteady.In a strange way statistics can partly help here, this time assum<strong>in</strong>g<strong>the</strong> aspect <strong>of</strong> science <strong>of</strong> science (Nalimov et al 1969). Ra<strong>the</strong>r recently acomparatively simple method was applied. It consisted <strong>of</strong> a formalstudy <strong>of</strong> <strong>the</strong> bibliographies appended to each scientific paper. Thenumber <strong>of</strong> those <strong>in</strong>terested <strong>in</strong> a given circle <strong>of</strong> problems can beroughly estimated by perceiv<strong>in</strong>g which groups <strong>of</strong> authors quote eacho<strong>the</strong>r. Attempts to build up some system <strong>of</strong> adm<strong>in</strong>istrative estimates bybas<strong>in</strong>g yourself on such studies will certa<strong>in</strong>ly cause all <strong>the</strong> authors tocite each o<strong>the</strong>r <strong>in</strong> a purely formal way; <strong>and</strong> it is practically impossibleto dist<strong>in</strong>guish whe<strong>the</strong>r references were essentially needed or <strong>in</strong>cludedas a payback.However, without any adm<strong>in</strong>istrative pressure <strong>the</strong> study <strong>of</strong>references is a valuable <strong>and</strong> more or less objective method <strong>of</strong> science<strong>of</strong> science. And such analysis shows that <strong>in</strong> probability <strong>the</strong>ory onlyvery small (as compared, for example, with physics) groups <strong>of</strong> authorsrefer to each o<strong>the</strong>r. This means that <strong>the</strong> <strong>in</strong>terest has narrowed whichwas largely caused by its unwieldy ma<strong>the</strong>matical mach<strong>in</strong>ery <strong>and</strong> whichis a typical sign <strong>of</strong> a scholastic degeneration.Perhaps <strong>the</strong> simplest way to combat that danger is to turn to physicalapplications. Their seriousness was never doubted by anyone, <strong>and</strong> here<strong>the</strong> <strong>in</strong>terest now concentrates <strong>in</strong> particular around <strong>the</strong> problems <strong>of</strong>statistical physics. Most wide <strong>and</strong> complicated ma<strong>the</strong>matical arsenalable to satisfy various tastes is applied. Physical problems are also<strong>in</strong>terest<strong>in</strong>g <strong>in</strong> that very much can be done by ma<strong>the</strong>matical means.However, much more variable is <strong>the</strong> field <strong>of</strong> so to say purelystatistical applications. In very many important matters a far reach<strong>in</strong>gma<strong>the</strong>matical analysis is impossible, but if a vast statistical materialcan be available, it can compensate to a necessary extent <strong>the</strong> scarcity<strong>of</strong> <strong>the</strong>oretical ideas. In all such cases statistical treatment is one <strong>of</strong> <strong>the</strong>ma<strong>in</strong> means <strong>of</strong> study. Here, I provided examples <strong>of</strong> exactly that k<strong>in</strong>dabsolutely leav<strong>in</strong>g aside <strong>the</strong> doubtless problem <strong>of</strong> physicalapplications.In purely statistical problems <strong>the</strong> ma<strong>in</strong> part is played by somestochastic model <strong>of</strong> <strong>the</strong> phenomenon. In <strong>the</strong> simplest case <strong>the</strong> supposedk<strong>in</strong>d <strong>of</strong> distributions <strong>of</strong> <strong>the</strong> observations (normal, exponential,Weibull, etc) can be understood as <strong>the</strong> model. In more complicatedcases <strong>the</strong> model gets more complicated as well; <strong>the</strong> <strong>the</strong>ories <strong>of</strong>118


eliability <strong>and</strong> queu<strong>in</strong>g are known to apply ra<strong>the</strong>r complicatedanalytical models. A certa<strong>in</strong> disproportion <strong>in</strong> <strong>the</strong> current development<strong>of</strong> probability <strong>the</strong>ory consists <strong>in</strong> that a ra<strong>the</strong>r large number <strong>of</strong><strong>the</strong>oretical models (even analytically studied to a sufficient extent) iscollected, but at <strong>the</strong> same time <strong>in</strong> many cases <strong>the</strong>y were neverpractically compared with reality. Of course, a creation <strong>of</strong> a <strong>the</strong>oreticalmodel marks a necessary <strong>in</strong>itial period without which no suchcomparison, <strong>and</strong> no underst<strong>and</strong><strong>in</strong>g <strong>of</strong> <strong>the</strong> actual data is at all possible.However, too <strong>of</strong>ten a study stops at that period. At <strong>the</strong> same time, eachcomparison with reality usually calls <strong>in</strong>to be<strong>in</strong>g new models, that is,acts refresh<strong>in</strong>g <strong>in</strong> that sense as well.Figures <strong>and</strong> TablesI did not reproduce <strong>the</strong>m. Fig. 1 − 3 <strong>and</strong> Tables 1 <strong>and</strong> 2 concerned<strong>the</strong> papers <strong>of</strong> Ermolaeva (1939) <strong>and</strong> En<strong>in</strong> (1939). Tables 3 <strong>and</strong> 4 from§ 3.2 expla<strong>in</strong>ed <strong>the</strong> Fram<strong>in</strong>gham experiment. Table 3 provided <strong>the</strong>expected <strong>and</strong> actual number <strong>of</strong> taken ill, <strong>in</strong> each expected <strong>in</strong>terval <strong>of</strong>risk, separately for men <strong>and</strong> women. Table 4 showed <strong>the</strong> estimates <strong>of</strong><strong>the</strong> coefficients <strong>of</strong> <strong>the</strong> factors <strong>of</strong> risk.Notes1. See [i, Note 4]. O. S.2. This was a feature <strong>of</strong> Soviet publications (perhaps <strong>of</strong> Russian papers even now).The late Pr<strong>of</strong>essor Truesdell told me that he was unable to read <strong>the</strong>m <strong>in</strong> translation(also because translations are usually quite formal. O. S.3. Strangely enough, no editor is mentioned <strong>in</strong> <strong>the</strong> booklet. O. S.4. A list is 24 pages typescript or 16 pages <strong>of</strong> published text. O. S.5. No wonder Laplace was elected member <strong>of</strong> <strong>the</strong> French Academy (not to beconfused with <strong>the</strong> Paris Academy whose member he also was) devoted to <strong>the</strong> study<strong>of</strong> <strong>the</strong> French language. O. S.6. Some scientists (Chebyshev, Markov) did not have any superstructure. O. S.7. The author referred (not <strong>in</strong> all necessary cases) to <strong>the</strong> text <strong>of</strong> <strong>the</strong> TAP aspublished <strong>in</strong> 1886. Instead, I provided references to its English translation(2005/2009). O. S.8. This is my quotation from Laplace (2005/2009, p. 97) <strong>in</strong>serted <strong>in</strong>stead <strong>of</strong> <strong>the</strong>author’s description. O. S.9. There are mistakes. One <strong>of</strong> <strong>the</strong>m, noticed by Pearson, concerned his model <strong>of</strong>births <strong>and</strong> deaths, see Sheyn<strong>in</strong> (1976, p. 160). Then, he had been keep<strong>in</strong>g to his ownpractically useless <strong>the</strong>ory <strong>of</strong> errors <strong>and</strong> thus caused French authors to shun Gauss(Sheyn<strong>in</strong> 1977, pp. 52 – 54). Laplace’s astonish<strong>in</strong>g mistake (1796/1884, p. 504) wasto state that <strong>the</strong> planets moved along elliptical paths not <strong>in</strong> accordance with Newton’sdiscovery, but because <strong>of</strong> small differences <strong>in</strong> densities <strong>and</strong> <strong>in</strong> temperatures <strong>of</strong> <strong>the</strong>irvarious parts. O. S.10. Concern<strong>in</strong>g <strong>the</strong> precision <strong>of</strong> his estimate, Laplace (2005/2009, pp. 46 – 47)stated: after a century <strong>of</strong> new observations [...] exam<strong>in</strong>ed <strong>in</strong> <strong>the</strong> same way [...]. Seealso Cournot (1843, § 137). O. S.11. Once more, see [i, Note 4]. O. S.12. See Note 10. O. S.13. The classical set-<strong>the</strong>oretic axiomatization is thus called <strong>the</strong> first one.Gnedenko (1969, p. 118), <strong>in</strong> a brief survey <strong>of</strong> <strong>the</strong> history <strong>of</strong> this problem, named onlyone pert<strong>in</strong>ent publication, Lomnicki (1923). O. S.14. Concern<strong>in</strong>g Ville see, for example, Shafer & Vovk (2001, pp. 48 – 50). Theo<strong>the</strong>r reference is Postnikov (1960) O. S.15. I have found this translation <strong>in</strong> Google with a reference to a commentator <strong>of</strong>Planck. O. S.16. Surely Tolstoi knew about such most actively work<strong>in</strong>g scholars as for exampleMendeleev or Chebyshev. O. S.119


17. The author had underst<strong>and</strong>ably chosen an ideologically safe cause. In 1959Kolmogorov (Sheyn<strong>in</strong> 1998, p. 542) was much more specific. It was necessary, hestated, to express <strong>the</strong> desired optimal state <strong>of</strong> affairs <strong>in</strong> <strong>the</strong> national economy by as<strong>in</strong>gle <strong>in</strong>dicator. Read: to ab<strong>and</strong>on <strong>the</strong> Marxian socially necessary labour as<strong>in</strong>dicator <strong>of</strong> cost <strong>and</strong> measure cost <strong>in</strong> monetary units. O. S.18. This is an underst<strong>and</strong>ably mild expression. Actually, genetics was uprooted asdecided beforeh<strong>and</strong> by <strong>the</strong> party’s leadership, many scientists severely persecuted(Vavilov, <strong>the</strong> world renown scholar, died <strong>in</strong> prison) <strong>and</strong> even Kolmogorov’s paper(1940),see below, was considered dangerous. In 1950, Gnedenko (Sheyn<strong>in</strong> 1998, p.545) mildly criticized it (undoubtedly after discuss<strong>in</strong>g <strong>the</strong> matter with him). In 1948,Fisher most strongly condemned Lyssenko (Ibidem, p, 544).For <strong>the</strong> sake <strong>of</strong> comprehensiveness I add references to Lyssenko <strong>and</strong> Kolman, ahigh rank<strong>in</strong>g party apparatchik who at <strong>the</strong> end <strong>of</strong> his life did not return from a visit tohis sister <strong>in</strong> Sweden <strong>and</strong> <strong>the</strong>n published a book with a telltale title. O. S.19. This is hardly underst<strong>and</strong>able. O. S.20. What does this mean actually? O. S.21. Is this really connected with <strong>the</strong> placebo effect? O. S.22. No explanation provided. Notation Ф usually meant <strong>the</strong> distribution function<strong>of</strong> <strong>the</strong> normal law. O. S.23. This is my attempt <strong>of</strong> translat<strong>in</strong>g that expression from Russian. I did not f<strong>in</strong>d it<strong>in</strong> any o<strong>the</strong>r language. O. S.24. The author provided a wrong reference. O. S.25. This was a beloved expression <strong>of</strong> <strong>the</strong> Soviet press applied <strong>in</strong> appropriate cases.O. S.26. The <strong>in</strong>itial aim <strong>of</strong> scholasticism was <strong>the</strong> study <strong>of</strong> Aristotelian philosophy butsoon it turned to unit<strong>in</strong>g philosophy <strong>and</strong> <strong>the</strong>ology. Accord<strong>in</strong>gly, <strong>the</strong> first universitiesconsisted <strong>of</strong> three faculties devoted to <strong>the</strong>ology, canon law <strong>and</strong> medic<strong>in</strong>e so thatscholasticism had <strong>in</strong>deed been avidly taught <strong>the</strong>re. It was gradually excluded by <strong>the</strong>develop<strong>in</strong>g natural science although its structure proved useful for logic. One <strong>of</strong> itsteach<strong>in</strong>g was <strong>the</strong> so-called probabilism, see [i, Note 3].Rabelais, <strong>in</strong> his immortal Gargantua <strong>and</strong> Pantagruel, had left a vivid picture <strong>of</strong><strong>the</strong> benefits <strong>of</strong> ga<strong>in</strong><strong>in</strong>g useful knowledge (ra<strong>the</strong>r than repeat<strong>in</strong>g Aristotle or ThomasAqu<strong>in</strong>as). There also <strong>the</strong> problem <strong>of</strong> a possible marriage is shown to depend oncircumstances. O. S.27. Modern probability appeared <strong>in</strong> <strong>the</strong> 1930s when such notions as density hadbegun to be considered as ma<strong>the</strong>matical entities. O. S.28. See [iv, Note 2]. O. S.BibliographyAlimov Ju. I. (1974 Ukra<strong>in</strong>ian), On <strong>the</strong> application <strong>of</strong> methods <strong>of</strong> ma<strong>the</strong>maticalstatistics to treat<strong>in</strong>g experimental data. Avtomatika, No. 2, pp. 21 – 33.Bolshev L. N., Smirnov S. V. (1967 Russian), Tablitsy Matematicheskoi Statistiki(Tables <strong>of</strong> Ma<strong>the</strong>matical <strong>Statistics</strong>). Moscow, 1968.Borovkov A. A. (1972 Russian), Wahrsche<strong>in</strong>lichkeitsrechnung. E<strong>in</strong>e E<strong>in</strong>führung.Berl<strong>in</strong>, 1976.Cournot A. A. (1843), Exposition de la théorie des chances et des probabilités.Paris, 1984.Dieudonne J. (1966), Sovremennoe Razvitie Matematiki (Current Development <strong>of</strong>Ma<strong>the</strong>matics). Coll. Translations. Place <strong>of</strong> publ. not provided. Orig<strong>in</strong>al contributionnot named.En<strong>in</strong> T. K. (1939 Russian), The results <strong>of</strong> an analysis <strong>of</strong> <strong>the</strong> assortment <strong>of</strong> hybrids<strong>of</strong> tomatoes. Doklady Akademii Nauk SSSR, vol. 24, No. 2, pp. 176 – 178. Alsopublished at about <strong>the</strong> same time <strong>in</strong> a foreign language <strong>in</strong> C. r. (Doklady) Acad. Sci.URSS.Ermolaeva N. I. (1939 Russian), Once more about <strong>the</strong> “pea laws”. Jarovizatsia,No. 2, pp. 79 – 86. Note <strong>the</strong> scornful term for <strong>the</strong> Mendelian laws.Gnedenko B. V. (1969 Russian), On Hilbert’s sixth problem. In ProblemyGilberta. Moscow, pp. 116 – 120. German translation <strong>of</strong> book: Die HilbertschenProbleme. Leipzig, 1971 (Ostwald Klassiker No. 252), see pp. 145 – 150.Gulyga A. V. (1975 Russian), Can science be immoral? Priroda, No. 12, pp. 45 –49.120


Kolman E. (1939 Russian), Perversion <strong>of</strong> ma<strong>the</strong>matics at <strong>the</strong> service <strong>of</strong>Mendelism. Jarovizatsia, No. 3, pp. 70 – 73.--- (1940), Is it possible to prove or disprove Mendelism by ma<strong>the</strong>matical <strong>and</strong>statistical methods? C. r. (Doklady) Acad, Sci. l’URSS, vol. 28, pp. 834 – 838.--- (1982), We Should Not Have Lived That Way. New York. In Russian with anadditional English title.Kolmogorov A. N. (1933 German), Osnovnye Poniatia Teorii Veroiatnostei(Ma<strong>in</strong> Concepts <strong>of</strong> <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>). Moscow, 1974.--- (1940), On a new confirmation <strong>of</strong> Mendel’s laws. C. r. (Doklady) Acad. Sci.l’URSS, vol. 28. No. 9, pp. 834 – 838.Laplace P.-S. (1796), Exposition du système du monde. Oeuvr. Compl., t. 6. Paris,1884. Repr<strong>in</strong>t <strong>of</strong> edition <strong>of</strong> 1835.--- (1812), Théorie analytique des probabilités. Oeuvr. Compl., t. 7. Paris, 1886.--- (1814 French), Philosophical Essay on Probabilities. New York, 1995.Lomnicki A. (1923), Nouveaux fondements du calcul des probabilités. Fondam.Math., Bd. 4, pp. 34 – 71.Lyssenko T. D. (1940), In response to an article by A. N. Kolmogorov. C. r.(Doklady) Acad. Sci. l’URSS, vol. 28, pp. 832 – 833.Mises R. (1928), Wahrsche<strong>in</strong>lichkeit, Statistik und Wahrheit. Wien.Nalimov V. V., Mulchenko Z. M. (1969), Naukometria etc. (Science <strong>of</strong> Science.Study <strong>of</strong> <strong>the</strong> Development <strong>of</strong> Science As an Informational Process). Moscow.Planck M. (1960), Ed<strong>in</strong>stvo Fizicheskoi Kart<strong>in</strong>y Mira (Unity <strong>of</strong> <strong>the</strong> PhysicalPicture <strong>of</strong> <strong>the</strong> World). Coll Translations. Moscow. Orig<strong>in</strong>al contribution not named.Postnikov A. G. (1960), Arifmeticheskoe Modelirovanie Slucha<strong>in</strong>ykh Prozessov(Arithmetical Modell<strong>in</strong>g <strong>of</strong> Stochastic Processes). Moscow.Shafer G., Vovk V. (2001), <strong>Probability</strong> <strong>and</strong> F<strong>in</strong>ance. It’s Only a Game. NewYork.Sheyn<strong>in</strong> O. (1976), Laplace’s work on probability. Arch. Hist. Ex. Sci., vol. 16,pp. 137 – 187.--- (1977), Laplace’s <strong>the</strong>ory <strong>of</strong> errors. Ibidem, vol. 17, pp. 1 – 61.--- (1998), <strong>Statistics</strong> <strong>in</strong> <strong>the</strong> Soviet epoch. Jahrbücher f. Nationalökonomie u.Statistik, Bd. 217, pp. 529 – 549.Tim<strong>of</strong>eev V. A. (1960), Teoria i Praktika Analiza Resultatov Nabliudenii etc(Theory <strong>and</strong> Practice <strong>of</strong> Analys<strong>in</strong>g <strong>the</strong> Results <strong>of</strong> Observation <strong>of</strong> Technical Objectsetc). Trudy Len<strong>in</strong>gradsky Electro-Techn. Inst.--- (1973), Matematicheskie Osnovy Tekhnitcheskoi Kibernetiki (Ma<strong>the</strong>maticalElements <strong>of</strong> Technical Cybernetics). Lecture notes. Pensa.--- (1975), Inzenernye Metody Rashcheta i Issledovanie D<strong>in</strong>amicheskih System.(Eng<strong>in</strong>eer<strong>in</strong>g Methods <strong>of</strong> Calculation <strong>and</strong> Study <strong>of</strong> Dynamic Systems). Len<strong>in</strong>grad.Tolstoi L. N. (1910), Put Zizni (The course <strong>of</strong> life). Poln. Sobr. Soch. (CompleteWorks), vol. 45.Truett J., Cornfield J., Kannel W. (1967), A multivariate analysis <strong>of</strong> <strong>the</strong> risk <strong>of</strong>coronary heart disease <strong>in</strong> Fram<strong>in</strong>gham. J. Chron. Diseases, vol. 20, pp. 511 – 524.Tutubal<strong>in</strong> V. N. (1972 Russian), Teoria Veroiatnostei (Theory <strong>of</strong> <strong>Probability</strong>).Moscow.121


IVYu. I. AlimovAn Alternative to <strong>the</strong> Method <strong>of</strong> Ma<strong>the</strong>matical <strong>Statistics</strong>Alternativa Methodu Matematicheskoi Statistiki. Moscow, 1980IntroductionBoth ma<strong>the</strong>maticians <strong>and</strong> those who have been apply<strong>in</strong>gma<strong>the</strong>matics are <strong>of</strong>ten recently express<strong>in</strong>g <strong>the</strong>ir concern that <strong>in</strong> many<strong>in</strong>stances ma<strong>the</strong>matical models noticeably alienate from reality. As aconsequence, <strong>the</strong> work <strong>of</strong> highly qualified specialists <strong>and</strong> valuablecomputer time is used with <strong>in</strong>sufficient effect. Criticism, occasionallyvery sharp, <strong>of</strong> this situation is seen ever <strong>of</strong>tener <strong>in</strong> papers <strong>and</strong>monographs for specialists <strong>and</strong> <strong>in</strong> textbooks <strong>and</strong> popular scientificeditions, see for example Blekhman et al (1976); Grekova (1976);Venikov (1978); Vysotsky (1979). It is <strong>in</strong>dicative that a paper <strong>of</strong> D.Schwarz called On <strong>the</strong> pernicious <strong>in</strong>fluence <strong>of</strong> ma<strong>the</strong>matics on scienceis didactically quoted <strong>in</strong> Venikov (1978).In particular, models <strong>of</strong>fered by ma<strong>the</strong>matical statistics are <strong>of</strong>tenremote from reality. Tutubal<strong>in</strong>’s booklets [i − iii] are devoted to <strong>the</strong>conditions <strong>and</strong> boundaries <strong>of</strong> <strong>the</strong> applicability <strong>of</strong> stochastic methods,<strong>and</strong> much attention is shown to such problems <strong>in</strong> his textbook (1972).With respect to its restrictive direction, this booklet adjo<strong>in</strong>s thosepublications. I stress at once that my contribution is not at all opposedto statistics as such.I underst<strong>and</strong> statistics as any calculation <strong>of</strong> means or o<strong>the</strong>r comb<strong>in</strong>edtreatment <strong>of</strong> experimental data aim<strong>in</strong>g at provid<strong>in</strong>g <strong>the</strong>ir predictable<strong>in</strong>tegral characteristics. It is assumed that <strong>the</strong>se will be later measuredfor future similar experimental data so that <strong>the</strong> correctness <strong>of</strong> <strong>the</strong>statistical forecast will be actually checked.I am not at all aga<strong>in</strong>st <strong>the</strong> use <strong>of</strong> ma<strong>the</strong>matics <strong>in</strong> statistics ei<strong>the</strong>r;o<strong>the</strong>rwise, <strong>the</strong> latter is simply unth<strong>in</strong>kable so that below I am treat<strong>in</strong>gma<strong>the</strong>matical statistics. Choose any pert<strong>in</strong>ent treatise <strong>and</strong> you will beeasily conv<strong>in</strong>ced that by no means any application <strong>of</strong> ma<strong>the</strong>matics <strong>in</strong>statistics is understood as ma<strong>the</strong>matical statistics. After attentivelylook<strong>in</strong>g, it is seen that ma<strong>the</strong>matical statistics is a very specificdiscipl<strong>in</strong>e possess<strong>in</strong>g its own peculiar method whose dist<strong>in</strong>ctivefeature is <strong>the</strong> conjectur<strong>in</strong>g <strong>of</strong> exactly one storey <strong>of</strong> probabilities calledconfidence probabilities or levels <strong>of</strong> significance above those reallymeasured <strong>in</strong> an experiment. It is possible to disagree with such aspecific approach.Ma<strong>the</strong>matics can be applied <strong>in</strong> statistics <strong>in</strong> a manner somewhatdifferent from what is prescribed by ma<strong>the</strong>matical statistics.In practice, <strong>the</strong> pr<strong>in</strong>ciples <strong>of</strong> statistically treat<strong>in</strong>g experimental datawhich are be<strong>in</strong>g applied for a long time now have noth<strong>in</strong>g <strong>in</strong> commonwith confidence probability <strong>and</strong> are <strong>the</strong>refore alien to <strong>the</strong> foundation <strong>of</strong>ma<strong>the</strong>matical statistics. We f<strong>in</strong>d for example that [a certa<strong>in</strong> magnitude]is equal to 0.0011609 ± 0.000024. Here, only <strong>the</strong> maximal error <strong>of</strong> <strong>the</strong>122


measurement is provided. Recently, physicists have sometimes begunto <strong>in</strong>dicate <strong>in</strong>stead <strong>the</strong> mean square error <strong>of</strong> measur<strong>in</strong>g <strong>the</strong> last digits <strong>of</strong><strong>the</strong> experimental result, usually <strong>in</strong> brackets; for example, <strong>the</strong> velocity<strong>of</strong> light <strong>in</strong> vacuum is [...]. Essential here is that unlike confidenceprobabilities <strong>of</strong> ma<strong>the</strong>matical statistics, <strong>the</strong> maximal <strong>and</strong> <strong>the</strong> meansquare error were actually measured.For many years, ma<strong>the</strong>matical statistics has been activelypropag<strong>and</strong>ized, but still perhaps even nowadays physicists will beunable to refra<strong>in</strong> from smil<strong>in</strong>g had we told <strong>the</strong>m, say, that after treat<strong>in</strong>g<strong>the</strong> observations <strong>of</strong> <strong>the</strong> velocity <strong>of</strong> light, c, accord<strong>in</strong>g to <strong>the</strong>prescriptions <strong>of</strong> ma<strong>the</strong>matical statistics, c is situated <strong>in</strong> such-<strong>and</strong>-suchconfidence <strong>in</strong>terval with confidence probability P = 0.99 <strong>and</strong> with<strong>in</strong> amore narrow <strong>in</strong>terval with P = 0.95.I also refer to physicists <strong>in</strong> <strong>the</strong> sequel. It was <strong>in</strong> physics that <strong>the</strong>basis <strong>of</strong> modern exact natural science had been laid, <strong>the</strong> largest amount<strong>of</strong> experience <strong>of</strong> complicated <strong>and</strong> subtle experimentation accumulated<strong>and</strong> a developed culture <strong>of</strong> a sound treatment <strong>of</strong> experimental data hadbeen achieved. On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, it was physics that provided <strong>the</strong>example <strong>of</strong> apply<strong>in</strong>g ma<strong>the</strong>matical structures which is now <strong>of</strong>tenrecognized not favourably enough for o<strong>the</strong>r fundamental <strong>and</strong> applieddiscipl<strong>in</strong>es. I return to that problem at <strong>the</strong> end <strong>of</strong> my booklet.Its ma<strong>in</strong> aim is to describe <strong>the</strong> pr<strong>in</strong>ciples <strong>of</strong> such a treatment <strong>of</strong> datathat absta<strong>in</strong>s from mention<strong>in</strong>g confidence probabilities. Thesepr<strong>in</strong>ciples had appeared even before ma<strong>the</strong>matical statistics had;<strong>in</strong>deed, appeared at <strong>the</strong> same time as <strong>the</strong> first quantitative experimentalresults <strong>in</strong> natural science did. However, <strong>the</strong>y were reflected <strong>in</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability only much later dur<strong>in</strong>g <strong>the</strong> process <strong>of</strong> <strong>the</strong>development <strong>of</strong> <strong>the</strong> approach connected with Mises. This approach hasbeen vividly discussed for decades, see my papers <strong>and</strong> textbooks(1976, 1977, 1987b; 1978a; 1979).The connection <strong>of</strong> that Mises approach with <strong>the</strong> pr<strong>in</strong>ciples <strong>and</strong>methods different from those <strong>of</strong> ma<strong>the</strong>matical statistics is fundamental<strong>and</strong> <strong>the</strong> contents <strong>of</strong> this booklet is <strong>the</strong>refore largely reduced to aconsistent although only underst<strong>and</strong>ably sketchy description <strong>of</strong> thatapproach. Such an exposition is still lack<strong>in</strong>g <strong>in</strong> <strong>the</strong> literature easilyread by a broad circle <strong>of</strong> readers.I am concentrat<strong>in</strong>g on <strong>the</strong> problems <strong>of</strong> <strong>in</strong>terpretation <strong>and</strong> practicalapplication <strong>of</strong> stochastic notions. Unlike <strong>the</strong> solution <strong>of</strong> purelyma<strong>the</strong>matical issues, any answers to such problems are always to alarge extent arguable <strong>and</strong> <strong>the</strong> reader ought to take it <strong>in</strong>to account. I amdescrib<strong>in</strong>g an approach noticeably different from that <strong>of</strong> <strong>the</strong> st<strong>and</strong>ardtreatises <strong>and</strong> most works on probability <strong>the</strong>ory <strong>and</strong> ma<strong>the</strong>maticalstatistics <strong>and</strong> I repeat that my po<strong>in</strong>t <strong>of</strong> view is not at all new. Itsextreme version is nicely expressed, for example, by Anscombe [1967,p. 3 note]: it is <strong>in</strong>admissible to identify statistics with <strong>the</strong> grotesquephenomenon generally known as ma<strong>the</strong>matical statistics.1. Introductory Remarks about Forecast<strong>in</strong>gThe f<strong>in</strong>al aim <strong>of</strong> research <strong>in</strong> both fundamental <strong>and</strong> applied naturalscience is a reliable forecast <strong>of</strong> <strong>the</strong> results <strong>of</strong> future experiments. Byexperiment I mean not only <strong>in</strong>vestigative, reconnaissance trials, but123


also <strong>the</strong> operations <strong>of</strong> various devices <strong>and</strong> systems. I also underst<strong>and</strong>prediction as design<strong>in</strong>g all k<strong>in</strong>ds <strong>of</strong> <strong>in</strong>struments, devices, systems etc.You can say that a forecast as a dem<strong>and</strong> <strong>of</strong> reproduc<strong>in</strong>g a publishedresult was be<strong>in</strong>g accepted as a def<strong>in</strong>ition <strong>of</strong> <strong>the</strong> f<strong>in</strong>al aim <strong>and</strong>dist<strong>in</strong>ctive feature <strong>of</strong> natural science even at <strong>the</strong>ir birth.That dem<strong>and</strong> apparently <strong>in</strong>cludes <strong>the</strong> most essential dist<strong>in</strong>ctionbetween natural science <strong>and</strong> magic. It should be regrettably stated thatforecast<strong>in</strong>g as <strong>the</strong> f<strong>in</strong>al aim <strong>of</strong> <strong>the</strong> <strong>the</strong>ories <strong>of</strong> natural science has partlyescap<strong>in</strong>g <strong>the</strong> attention <strong>of</strong> even <strong>the</strong> scientists <strong>the</strong>mselves. It seems thatthis circumstance causes <strong>the</strong> passion felt sometimes for such diffuseformulations <strong>of</strong> those goals <strong>of</strong> scientific research which are sometimesnoticeable as explanation or reveal<strong>in</strong>g <strong>the</strong> essence <strong>of</strong> phenomena.As an example I can cite <strong>the</strong> caustically <strong>in</strong>dicated (Kitaigorodsky1978) tendency <strong>of</strong> chemists to expla<strong>in</strong> a phenomenon with highprecision by <strong>in</strong>troduc<strong>in</strong>g after <strong>the</strong> event plenty adjust<strong>in</strong>g parameters<strong>in</strong>to formulas. A proper number <strong>of</strong> <strong>the</strong>se can always achieve an idealco<strong>in</strong>cidence <strong>of</strong> <strong>the</strong> <strong>the</strong>oretical <strong>and</strong> <strong>the</strong> empirical curves, only not before<strong>the</strong> latter was experimentally obta<strong>in</strong>ed.Kitaigorodsky (1978) <strong>of</strong>fered a formula for quantitatively <strong>in</strong>dicat<strong>in</strong>g<strong>the</strong> value P <strong>of</strong> a <strong>the</strong>ory: P = (k/n) − 1. Here, k is <strong>the</strong> number <strong>of</strong>magnitudes which can be predicted by that <strong>the</strong>ory, <strong>and</strong> n, <strong>the</strong> number<strong>of</strong> adjust<strong>in</strong>g parameters. The value <strong>of</strong> a <strong>the</strong>ory is <strong>the</strong>refore non-existentif k = n, <strong>and</strong> it is essential if k is much greater than n. The reader willbe certa<strong>in</strong>ly justified to believe that this proposal is a joke, but <strong>of</strong> ak<strong>in</strong>d that <strong>in</strong>cludes a large part <strong>of</strong> truth.A somewhat exaggerated stress on <strong>the</strong> idea <strong>of</strong> forecast<strong>in</strong>g noticeable<strong>in</strong> <strong>the</strong> newest discipl<strong>in</strong>e (Prognostika 1975/Prognostication 1978) islikely a reaction to <strong>the</strong> mentioned partial disregard <strong>of</strong> that fundamentalidea. In this connection I <strong>in</strong>dicate once more that <strong>in</strong> any concretebranch <strong>of</strong> natural science forecast<strong>in</strong>g is not at all a novelty <strong>and</strong> thatdur<strong>in</strong>g many years a large <strong>and</strong> specific experience <strong>of</strong> forecast<strong>in</strong>g hadbeen acquired with a great deal <strong>of</strong> trouble. It is hardly possible tocreate some essentially new, general <strong>and</strong> at <strong>the</strong> same time substantial<strong>the</strong>ory <strong>of</strong> forecast<strong>in</strong>g. Meanwhile, however, a unification <strong>of</strong>term<strong>in</strong>ology connected with forecast<strong>in</strong>g can undoubtedly play somepositive role.2. The Initial Concepts <strong>of</strong> <strong>the</strong> Applied Theory <strong>of</strong> <strong>Probability</strong>2.1. R<strong>and</strong>om variables <strong>and</strong> <strong>the</strong>ir moments. Denote <strong>the</strong> controlledconditions <strong>of</strong> trials by U, <strong>the</strong>ir result by V <strong>and</strong> <strong>the</strong> magnitude measured<strong>in</strong> trial s by X(s). The forecast <strong>of</strong> X(s + 1) given X(s) <strong>of</strong>ten fails.Permanence (forecast verified many times) is looked for by averag<strong>in</strong>g<strong>and</strong> obta<strong>in</strong><strong>in</strong>g from <strong>in</strong>itial unpredictable magnitude V 1 = X(s)V = E ( X ) = X ( s)mmwhere E m (X), <strong>in</strong> general also unpredictable, is <strong>the</strong> empirical mean <strong>of</strong> anunpredictable magnitude, <strong>of</strong> a r<strong>and</strong>om variable X(s). It is <strong>of</strong>ten stable:E m (X) ≈ E(X) (1)124


which means that sooner or later <strong>the</strong> scatter <strong>of</strong> <strong>the</strong> values <strong>of</strong> E m (X)ra<strong>the</strong>r <strong>of</strong>ten appreciably dim<strong>in</strong>ishes. The author <strong>in</strong>troduced <strong>the</strong> pattern <strong>of</strong> anextended series <strong>of</strong> trials. Bear<strong>in</strong>g <strong>in</strong> m<strong>in</strong>d his statements made <strong>in</strong> <strong>the</strong> sequel, it meansthat <strong>the</strong> behaviour <strong>of</strong> E m (X) is studied throughout <strong>the</strong> series ra<strong>the</strong>r than appreciatedby <strong>the</strong> result <strong>of</strong> <strong>the</strong> last trial. This latter method is called <strong>the</strong> pattern <strong>of</strong> a fixed series.For a predictable permanence it is supposed that (1) persists when <strong>the</strong> series isextended <strong>and</strong> E(X) is <strong>the</strong> predictable rough estimate <strong>of</strong> <strong>the</strong> empirical mean.Expectation <strong>of</strong> a separate measurement is mean<strong>in</strong>gless.The author <strong>in</strong>troduces moments but barely applies <strong>the</strong>m.2.2. Statistical stability. It is <strong>of</strong>ten alleged that homogeneity <strong>of</strong>trials leads to statistical stability. Only controlled conditions <strong>of</strong> trialsare meant <strong>and</strong> <strong>the</strong>refore, on <strong>the</strong> contrary, statistical stability means that<strong>the</strong> trials were homogeneous. Statistical stability is best justified byempirical <strong>in</strong>duction. Without stability E(X) does not exist.R<strong>and</strong>omness (<strong>in</strong> <strong>the</strong> general sense) is identified withunpredictability. It became usual to underst<strong>and</strong> r<strong>and</strong>om variables <strong>in</strong> <strong>the</strong>ma<strong>the</strong>matical sense only as statistically stable unpredictablemagnitudes, <strong>and</strong> even such for which <strong>the</strong> notion <strong>of</strong> distribution <strong>of</strong>probabilities is applicable.This narrow specialized <strong>in</strong>terpretation <strong>of</strong> r<strong>and</strong>om variable is stillbe<strong>in</strong>g willy-nilly confused with its wide general mean<strong>in</strong>g <strong>and</strong> leads toa mistaken belief that <strong>the</strong> applied <strong>the</strong>ory <strong>of</strong> probability <strong>and</strong>ma<strong>the</strong>matical statistics are applicable to any r<strong>and</strong>om variableunderstood <strong>in</strong> <strong>the</strong> general sense, i. e., to unpredictable magnitudes.On <strong>the</strong> o<strong>the</strong>r h<strong>and</strong>, <strong>the</strong> reader beg<strong>in</strong>s to believe that <strong>the</strong>ma<strong>the</strong>matical propositions <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability somehowdirectly concern only such magnitudes. Actually, <strong>the</strong>ir unpredictabilityis not at all a necessary condition for apply<strong>in</strong>g to <strong>the</strong>m <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability. It is important that when measur<strong>in</strong>g a magnitude manytimes it <strong>in</strong>dicates statistical stability. An artificial <strong>in</strong>troduction <strong>of</strong>unpredictability <strong>in</strong> an experiment by <strong>the</strong> so-called r<strong>and</strong>omization asalso <strong>in</strong> some calculations by <strong>the</strong> Monte Carlo method can be thought tomean an excessively brave challenge to <strong>the</strong> natural scientific tradition 1 .2.3. <strong>Probability</strong> <strong>of</strong> an event. An event A is r<strong>and</strong>om <strong>in</strong> both senses if X A (s)is r<strong>and</strong>om. Stability <strong>of</strong> frequency is established by empirical <strong>in</strong>duction accord<strong>in</strong>g to<strong>the</strong> pattern <strong>of</strong> an extended series. If frequency is stable, E(X), <strong>the</strong> probability <strong>of</strong> anevent, is its predictable rough estimate. If <strong>the</strong> behaviour <strong>of</strong> <strong>the</strong> series is not studied,<strong>and</strong> <strong>the</strong> probability only determ<strong>in</strong>ed by its outcome, <strong>the</strong> statistical stability is not<strong>in</strong>vestigated.Statistical probability is not applicable to <strong>in</strong>dividual trials. For estimat<strong>in</strong>g <strong>the</strong>probability <strong>of</strong> a rare event <strong>of</strong> <strong>the</strong> order <strong>of</strong> 10 −4 , sometimes encountered <strong>in</strong> <strong>the</strong>reliability <strong>the</strong>ory, 10 5 measurements are required.2.4. Distribution <strong>of</strong> probabilities. It is measured for a series <strong>of</strong> an<strong>in</strong>creas<strong>in</strong>g number <strong>of</strong> trials. If <strong>the</strong> empirical distributions are stabilized, F(X) isdeterm<strong>in</strong>ed. This is empirical <strong>in</strong>duction for <strong>the</strong> pattern <strong>of</strong> an extended series. Lack <strong>of</strong>stability <strong>of</strong> <strong>the</strong> empirical distributions means that <strong>the</strong> notion <strong>of</strong> F(X) is not applicable.Often recommended is <strong>the</strong> measurement <strong>of</strong> those F m (X) because <strong>the</strong>ir stability ismore noticeable, ra<strong>the</strong>r than <strong>the</strong> histograms, but this is ak<strong>in</strong> to stat<strong>in</strong>g that an<strong>in</strong>sensitive device is better than a sensitive histogram (<strong>in</strong>dicat<strong>in</strong>g a greater scatter, alack <strong>of</strong> stability).2.5. Statistical <strong>in</strong>dependence. Lack <strong>of</strong> correlation. A necessary<strong>and</strong> sufficient condition <strong>of</strong> <strong>in</strong>dependence isF(X 1 , X 2 , ..., X n ) = F 1 (X 1 ) F 2 (X 2 ) ... F n (X n ).125


It does not exist always even if <strong>the</strong> pert<strong>in</strong>ent magnitudes are<strong>in</strong>tuitively <strong>in</strong>dependent. Statistical <strong>in</strong>dependence can only be discussedwith complete justification after establish<strong>in</strong>g statistical stability.Independence <strong>of</strong> a separate measurement is mean<strong>in</strong>gless. Non-correlation<strong>of</strong> pairs <strong>of</strong> magnitudes X 1 (s), ..., X n (s) means that E(X i , X j ) = 0 for i, j = 1, ..., n <strong>and</strong> i≠ j.2.6. The ma<strong>in</strong> problem <strong>of</strong> <strong>the</strong> applied <strong>the</strong>ory <strong>of</strong> probability. Afterheuristically forecast<strong>in</strong>g <strong>the</strong> <strong>in</strong>itial magnitude V, to predict <strong>the</strong>oretically somesecondary magnitude, <strong>the</strong>ir functions. Forecast<strong>in</strong>g <strong>the</strong> <strong>in</strong>itial magnitudes is always<strong>in</strong>tuitive.2.7. Limit <strong>the</strong>orems <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. For <strong>the</strong> central limit<strong>the</strong>orem (CLT) magnitudes X 1 (s), ..., X n (s) are considered statistically <strong>in</strong>dependentfor any n <strong>and</strong> <strong>the</strong>ir scatter around <strong>the</strong>ir expectations is supposed to be roughly <strong>the</strong>same. For <strong>the</strong> law <strong>of</strong> large numbers (LLN) <strong>the</strong> second dem<strong>and</strong> is dropped <strong>and</strong> <strong>the</strong>first one weakened so that variance can be even replaced by non-correlation. TheCLT is practically admitted if <strong>the</strong> LLN takes place <strong>in</strong>tuitively.Quantitative estimates dur<strong>in</strong>g <strong>the</strong> pro<strong>of</strong> <strong>of</strong> <strong>the</strong> laws <strong>of</strong> large numbersare only possible by means <strong>of</strong> <strong>the</strong> [Bienaymé −] Chebyshev <strong>in</strong>equalitybut <strong>the</strong>y are rough <strong>and</strong> <strong>in</strong>expedient as compared with <strong>the</strong> CLT. In <strong>the</strong><strong>in</strong>itial period <strong>of</strong> <strong>the</strong> development <strong>of</strong> <strong>the</strong> probability <strong>the</strong>ory <strong>the</strong>fundamental importance <strong>of</strong> limit <strong>the</strong>orems had been essentiallyexaggerated which is not completely done away with even now 2 . Thus,sometimes statements are made assert<strong>in</strong>g that statistical stability is dueto <strong>the</strong> LLN.2.8. The Mises approach. His <strong>in</strong>itial concepts are extremely close to be<strong>in</strong>gexperimental. Instead <strong>of</strong> stability <strong>of</strong> <strong>the</strong> empirical mean he postulates <strong>the</strong> existence <strong>of</strong>E(X) = lim E m (X), m → ∞.The pattern <strong>of</strong> an extended series is meant here. Particular cases are <strong>the</strong> def<strong>in</strong>ition <strong>of</strong>probability as <strong>the</strong> limit <strong>of</strong> frequency <strong>and</strong> <strong>of</strong> F(X) be<strong>in</strong>g <strong>the</strong> limit <strong>of</strong> F m (X). Theconvergence can be understood <strong>in</strong> different ways.R<strong>and</strong>omness (that is, unpredictability) does not enter directly, <strong>the</strong>whole arsenal <strong>of</strong> tools is typically ma<strong>the</strong>matical. In similar ways,ma<strong>the</strong>maticians discuss derivatives <strong>and</strong> <strong>in</strong>tegrals ra<strong>the</strong>r than velocitiesor specific heat. Transitions to <strong>the</strong> limit are only <strong>the</strong> means (ornecessary! expenses) <strong>of</strong> a rigorous formalization 3 . The Mises approachprovides civil rights <strong>in</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability for <strong>the</strong> knownempirical patterns <strong>of</strong> treat<strong>in</strong>g data dat<strong>in</strong>g back to <strong>the</strong> very foundations<strong>of</strong> <strong>the</strong> natural scientific method with its dem<strong>and</strong> <strong>of</strong> repeatedreproduction <strong>of</strong> results.The ma<strong>in</strong> feature <strong>of</strong> <strong>the</strong> Mises approach consists <strong>in</strong> deal<strong>in</strong>g wi<strong>the</strong>veryth<strong>in</strong>g as though consider<strong>in</strong>g an experiment. Not surpris<strong>in</strong>gly,expectation is <strong>in</strong>troduced <strong>in</strong> applications accord<strong>in</strong>g to his postulate<strong>of</strong>ten without cit<strong>in</strong>g Mises.2.9. Comparison with <strong>the</strong> Kolmogorov axiomatization. TheMises approach most likely can not be <strong>in</strong>cluded with<strong>in</strong> <strong>the</strong> boundaries<strong>of</strong> this axiomatization. The ma<strong>in</strong> <strong>the</strong>oretical problem apparentlyconsists <strong>in</strong> discover<strong>in</strong>g existence <strong>the</strong>orems for number sequencesconverg<strong>in</strong>g to <strong>the</strong> given beforeh<strong>and</strong> distribution function. Thisproblem is still only solved for weak convergence (Postnikov 1960).The Mises approach should be specially developed by number<strong>the</strong>oreticmethods unusual for <strong>the</strong> Kolmogorov axiomatics.126


The foundations <strong>of</strong> <strong>the</strong> Mises approach can be quite rigorouslyformed as a clear set <strong>of</strong> axioms. Contrast<strong>in</strong>g it to <strong>the</strong> axiomatic methodis wholly based on a misunderst<strong>and</strong><strong>in</strong>g.2.10. Conclusion. Unpredictability <strong>of</strong> repeatedly measured <strong>in</strong>itialmagnitudes is nei<strong>the</strong>r necessary, nor sufficient for enlist<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory<strong>of</strong> probability s<strong>in</strong>ce it does not ensure <strong>the</strong> <strong>in</strong>itial statistical stability, i.e. <strong>the</strong> stability <strong>of</strong> <strong>the</strong> averaged characteristics <strong>of</strong> <strong>the</strong> <strong>in</strong>itial magnitudes,which is <strong>the</strong> really necessary pert<strong>in</strong>ent condition. Independence <strong>of</strong>trials is <strong>of</strong>ten presumed, but see § 2.5.I did not have an occasion to enlist <strong>of</strong>ficially <strong>the</strong> notion <strong>of</strong><strong>in</strong>dependence <strong>of</strong> trials. The <strong>in</strong>troduction <strong>of</strong> controlled conditions U<strong>in</strong>to quantitative notions, formulas or propositions <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability however constructed apparently can not be even hoped for.The <strong>in</strong>troduction <strong>of</strong> <strong>the</strong> concept <strong>of</strong> <strong>in</strong>dependence <strong>of</strong> trials is notrequired by <strong>the</strong> notions <strong>of</strong> statistical stability <strong>and</strong> statistical<strong>in</strong>dependence <strong>of</strong> magnitudes. On <strong>the</strong> contrary, it should be based on<strong>the</strong>se notions.3. Critical Analysis <strong>of</strong> <strong>the</strong> Method <strong>of</strong> Ma<strong>the</strong>matical <strong>Statistics</strong>Accord<strong>in</strong>g to one <strong>of</strong> <strong>the</strong> usual def<strong>in</strong>itions (Nikit<strong>in</strong>a et al 1972),ma<strong>the</strong>matical statistics studies quantitative relations <strong>of</strong> massphenomena [...] It is closely l<strong>in</strong>ked with <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability. [...]Its methods are universal 4 .3.1. An alternative to <strong>the</strong> general purpose <strong>of</strong> ma<strong>the</strong>maticalstatistics. All treatises state that that purpose is to provide a universalnumerical <strong>the</strong>ory <strong>of</strong> measur<strong>in</strong>g averaged characteristics; to f<strong>in</strong>d outwhe<strong>the</strong>r a given sample is representative.The possibility <strong>of</strong> construct<strong>in</strong>g such a <strong>the</strong>ory is doubtful s<strong>in</strong>ce <strong>the</strong>precision <strong>and</strong> reliability <strong>of</strong> <strong>the</strong> <strong>in</strong>itial presumptions can hardly becalculated or justified. Those presumptions are <strong>in</strong>tuitive forecasts <strong>of</strong>some permanences. When adopt<strong>in</strong>g <strong>the</strong>m, <strong>the</strong> alternative is to absta<strong>in</strong>as much as possible from <strong>the</strong>oretical considerations, to substantiate<strong>the</strong>ir likelihood <strong>of</strong> forecasts by empirical <strong>in</strong>duction. We will discusshow to verify experimentally <strong>the</strong> typical pronouncements made byma<strong>the</strong>matical statistics.3.2. Traditional <strong>in</strong>terpretation <strong>of</strong> limit <strong>the</strong>orems. [Only <strong>the</strong> Bernoulli<strong>the</strong>orem is discussed.] It only deals with one series <strong>of</strong> observations <strong>and</strong>applies two fundamental notions <strong>of</strong> ma<strong>the</strong>matical statistics,<strong>in</strong>dependence <strong>of</strong> trials <strong>and</strong> convergence <strong>in</strong> probability.3.3. Independence <strong>of</strong> trials. Contrary to what is sometimesasserted, stability <strong>of</strong> <strong>the</strong> controlled conditions <strong>of</strong> <strong>the</strong> experiments is notsufficient <strong>and</strong> <strong>the</strong> conditions U can not at all quantitatively enter <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability. It is less superficially stated that each trialengenders a r<strong>and</strong>om variable so that <strong>the</strong> <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> n trials isreduced to <strong>the</strong> statistical <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> n variables.However, a trial (a measurement <strong>of</strong> X) only engenders a realization<strong>of</strong> a r<strong>and</strong>om variable, <strong>the</strong> number X(s). Ma<strong>the</strong>matical statistics has noclear rules for empirically verify<strong>in</strong>g <strong>in</strong>dependence <strong>of</strong> trials, fordiscuss<strong>in</strong>g an ensemble <strong>of</strong> such series. The correspondence n trials – nr<strong>and</strong>om variables means imag<strong>in</strong><strong>in</strong>g an ensemble <strong>of</strong> r<strong>and</strong>om variables.127


Such imag<strong>in</strong><strong>in</strong>g is a peculiar feature <strong>of</strong> ma<strong>the</strong>matical statistics, <strong>and</strong><strong>the</strong>re are no clear rules for empirically verify<strong>in</strong>g <strong>the</strong> results <strong>of</strong> <strong>the</strong>trials.3.4. Convergence <strong>in</strong> probability. For experimentally check<strong>in</strong>g it 5 along series <strong>of</strong> secondary trials is required <strong>and</strong> many samples <strong>of</strong> size nare needed. The author calls form<strong>in</strong>g many samples <strong>the</strong> pattern <strong>of</strong> many series,<strong>and</strong> <strong>the</strong> patterns <strong>of</strong> an extended <strong>and</strong> a fixed series are now both called <strong>the</strong> pattern <strong>of</strong>one (extended or fixed) series. In ma<strong>the</strong>matical statistics, an ensemble <strong>of</strong>sequences <strong>of</strong> trials is only imag<strong>in</strong>ed.3.5. Two compet<strong>in</strong>g ma<strong>the</strong>matical models <strong>of</strong> statistical stability.Thus, <strong>the</strong> traditional formulation <strong>of</strong> <strong>the</strong> limit <strong>the</strong>orems lack clear rulesfor verify<strong>in</strong>g ei<strong>the</strong>r <strong>the</strong> conditions, or conclusions. This is <strong>the</strong> reasonthat had formerly engendered an illusion, not completely dissociatedfrom, that <strong>the</strong> laws <strong>of</strong> large numbers <strong>the</strong>oretically deduce stability <strong>of</strong>means from homogeneity <strong>of</strong> trials. In particular, it followed thatma<strong>the</strong>matical statistics identifies statistical stability with convergence<strong>in</strong> probability as studied <strong>in</strong> <strong>the</strong> laws <strong>of</strong> large numbers. The Misesmodel <strong>of</strong> stability P = lim ω, n → ∞, is not usually mentioned. Theauthor quotes Kolmogorov’s pert<strong>in</strong>ent remark (1956, p. 262):Such considerations can be repeated an unrestricted number <strong>of</strong>times, but it is quite underst<strong>and</strong>able that it will not completely free usfrom <strong>the</strong> necessity <strong>of</strong> turn<strong>in</strong>g dur<strong>in</strong>g <strong>the</strong> last stage to probabilities <strong>in</strong><strong>the</strong> primitive, rough underst<strong>and</strong><strong>in</strong>g <strong>of</strong> that term 6 .To put it o<strong>the</strong>rwise, <strong>the</strong>re is no o<strong>the</strong>r way out except turn<strong>in</strong>g to <strong>the</strong>pattern <strong>of</strong> one series, i. e. to <strong>the</strong> Mises model <strong>of</strong> stability <strong>of</strong>frequencies. If you wish, <strong>the</strong> Mises def<strong>in</strong>ition <strong>of</strong> probability is exactly<strong>the</strong> turn to probabilities <strong>in</strong> <strong>the</strong> primitive rough underst<strong>and</strong><strong>in</strong>g <strong>of</strong> thatterm. Accord<strong>in</strong>g to common sense, <strong>the</strong> turn to <strong>the</strong> last stage should bedone <strong>in</strong> such a manner that <strong>the</strong> probabilities <strong>of</strong> <strong>the</strong> highest rank<strong>in</strong>cluded <strong>in</strong> <strong>the</strong> ma<strong>the</strong>matical model <strong>of</strong> <strong>the</strong> given experiments were<strong>in</strong>deed actually measured <strong>in</strong> that experiment. It is apparently difficultto warrant <strong>the</strong> imag<strong>in</strong>ation <strong>of</strong> probabilities <strong>of</strong> even one superfluousrank. Never<strong>the</strong>less, such imag<strong>in</strong>ation is one <strong>of</strong> <strong>the</strong> fundamentals <strong>of</strong> <strong>the</strong>method <strong>of</strong> ma<strong>the</strong>matical statistics.3.6.1. Postulate <strong>of</strong> <strong>the</strong> existence <strong>of</strong> a distribution <strong>of</strong> probabilitiesfor <strong>the</strong> <strong>in</strong>itial r<strong>and</strong>om variables. All <strong>the</strong> considerations <strong>in</strong>ma<strong>the</strong>matical statistics usually beg<strong>in</strong> by postulat<strong>in</strong>g <strong>the</strong> existence, <strong>and</strong>sometimes even <strong>the</strong> concrete type <strong>of</strong> <strong>the</strong> distribution <strong>of</strong> probabilitiesfor unpredictable magnitudes, <strong>the</strong>n <strong>the</strong> estimation <strong>of</strong> density orparameters <strong>of</strong> <strong>the</strong> objectively exist<strong>in</strong>g distribution is dem<strong>and</strong>ed. TheFisherian <strong>the</strong>ory <strong>of</strong> estimation is constructed accord<strong>in</strong>g to this patternas also <strong>the</strong> method <strong>of</strong> maximal likelihood, <strong>the</strong> <strong>the</strong>ories <strong>of</strong> confidence<strong>in</strong>tervals, <strong>of</strong> order statistics etc 7 . An alternative (see Chapter 2) is toconcentrate on empirical justification <strong>of</strong> predictions <strong>of</strong> statisticalstability.The most difficult <strong>and</strong> <strong>in</strong>terest<strong>in</strong>g problem <strong>of</strong> empirically<strong>in</strong>vestigat<strong>in</strong>g statistical stability is rapidly sped by. Here is Grekova’scritical remark (1976, p. 111) about calculat<strong>in</strong>g a confidence <strong>in</strong>tervalwhen <strong>the</strong> number <strong>of</strong> trials is small:128


A ra<strong>the</strong>r subtle arsenal is developed based on <strong>the</strong> assumption thatwe know <strong>the</strong> distribution <strong>of</strong> probabilities <strong>of</strong> <strong>the</strong> r<strong>and</strong>om variable (<strong>the</strong>normal law). And once more <strong>the</strong> question emerges: wherefrom <strong>in</strong>deeddo we know it? And how precisely? And, f<strong>in</strong>ally, what is <strong>the</strong> practicalvalue <strong>of</strong> <strong>the</strong> product itself, <strong>of</strong> <strong>the</strong> confidence <strong>in</strong>terval? A small number<strong>of</strong> trials means small amount <strong>of</strong> <strong>in</strong>formation, <strong>and</strong> th<strong>in</strong>gs are bad for us.But, whe<strong>the</strong>r <strong>the</strong> confidence <strong>in</strong>terval will be somewhat longer orshorter, is not so important <strong>the</strong> less so s<strong>in</strong>ce <strong>the</strong> confidence probabilitywas assigned arbitrarily.From my viewpo<strong>in</strong>t, this remark is still a ra<strong>the</strong>r mild doubt. We mayadd: Wherefrom <strong>and</strong> how precisely do we know that, given thisconcrete situation, it is proper at all to discuss distributions <strong>of</strong>probabilities? Suppose, however, that <strong>the</strong> distribution <strong>of</strong> probabilities<strong>of</strong> <strong>the</strong> unpredictable magnitudes under discussion does exist. But <strong>the</strong>n(Grekova 1976), it is not necessary to th<strong>in</strong>k highly <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong>estimation. Indeed, this <strong>the</strong>ory allows us to extract <strong>the</strong> maximalamount <strong>of</strong> <strong>in</strong>formation not from sample data <strong>in</strong> general; <strong>the</strong> postulateon <strong>the</strong> type <strong>of</strong> distribution <strong>of</strong> probabilities is also <strong>in</strong>troduced. It onlyrepresents reality with some precision at whose empirical estimation<strong>the</strong> estimation <strong>the</strong>ory is not at all aimed.And <strong>the</strong> <strong>the</strong>ory’s conclusions <strong>and</strong> it itself, generally speak<strong>in</strong>g,changes with <strong>the</strong> change <strong>of</strong> that distribution. It would have beennecessary to calculate <strong>the</strong> vagueness <strong>of</strong> <strong>the</strong> sought estimates <strong>of</strong> <strong>the</strong>parameters caused by <strong>the</strong> expected vagueness <strong>of</strong> <strong>the</strong> postulateddistribution. Then, <strong>the</strong> estimation <strong>the</strong>ory extracts <strong>the</strong> maximal amount<strong>of</strong> <strong>in</strong>formation accord<strong>in</strong>g to some specific criteria whose practicalvalue is not doubtless. F<strong>in</strong>ally, that <strong>the</strong>ory is based on <strong>the</strong> postulate <strong>of</strong><strong>in</strong>dependent trials with which, as we saw, not everyth<strong>in</strong>g was <strong>in</strong> order.It ought to be stated that <strong>the</strong> treatises on ma<strong>the</strong>matical statistics do notmiss <strong>the</strong> opportunity to identify <strong>the</strong> treatment <strong>of</strong> observations, thatreally not at all simple discipl<strong>in</strong>e, with <strong>the</strong> scientific approach <strong>in</strong>statistics. Here is Grekova (1976, p. 112) once more:Ma<strong>the</strong>matical arsenals have some hypnotic property <strong>and</strong>researchers are <strong>of</strong>ten apt to believe unquestionably <strong>the</strong>ir calculations,<strong>and</strong> <strong>the</strong> more so <strong>the</strong> more flowery are <strong>the</strong>ir tools [...].In any applied science, a scientific approach presumes first <strong>of</strong> all acreation <strong>of</strong> an <strong>in</strong>tuitively conv<strong>in</strong>c<strong>in</strong>g empirical foundation. Thecomplication, rigour <strong>and</strong> cost <strong>of</strong> <strong>the</strong> ma<strong>the</strong>matical arsenal should becoord<strong>in</strong>ated with <strong>the</strong> reliability <strong>of</strong> <strong>the</strong> foundation. This pragmatic ruleapplied from long ago is neatly called pr<strong>in</strong>ciple <strong>of</strong> equal stability <strong>of</strong> all<strong>the</strong> elements <strong>of</strong> an [applied − Yu. A.] <strong>in</strong>vestigation (Grekova 1976, p.111). The <strong>the</strong>ory <strong>of</strong> estimation hardly satisfies it <strong>in</strong> due measure.3.6.2. Postulate on <strong>the</strong> existence <strong>of</strong> a distribution <strong>of</strong> probabilitiesfor sample estimates. Imag<strong>in</strong><strong>in</strong>g many additional samples. Theexistence <strong>and</strong> sometimes even <strong>the</strong> type <strong>of</strong> that distribution ispostulated. Suppose that an experiment accord<strong>in</strong>g to <strong>the</strong> pattern <strong>of</strong>many series is carried out. We may only repeat what was said <strong>in</strong> §3.6.1 concern<strong>in</strong>g <strong>the</strong> distributions <strong>of</strong> <strong>the</strong> <strong>in</strong>itial r<strong>and</strong>om variables.129


Actually, however, only <strong>the</strong> parameter <strong>of</strong> <strong>the</strong> distribution is studied. Itsestimate is usually found by treat<strong>in</strong>g all <strong>the</strong> data as a s<strong>in</strong>gle entity. Inma<strong>the</strong>matical statistics, this procedure is accompanied by imag<strong>in</strong><strong>in</strong>gmany additional samples, presum<strong>in</strong>g <strong>the</strong> postulate <strong>of</strong> § 3.6.1 <strong>and</strong><strong>in</strong>dependence <strong>of</strong> <strong>the</strong> trials.The alternative is to discuss, as far as possible, only r<strong>and</strong>omvariables really measured <strong>in</strong> long series <strong>of</strong> trials <strong>and</strong> to keep to <strong>the</strong>pattern <strong>of</strong> one extended series. When several series are available, <strong>the</strong>method <strong>of</strong> maximal likelihood will provide several optimal estimates,so which is <strong>the</strong> most optimal? Not less strange will be <strong>the</strong> concept <strong>of</strong>confidence <strong>in</strong>terval.3.6.3. Postulate on <strong>in</strong>dependence <strong>of</strong> trials. For ma<strong>the</strong>maticalstatistics, it occupies <strong>in</strong> some sense a central position because it l<strong>in</strong>ks<strong>the</strong> postulates <strong>of</strong> §§ 3.6.1 <strong>and</strong> 3.6.2. However, it is hardly elementary,see Chapter 4.3.7. The choice <strong>of</strong> a threshold for discern<strong>in</strong>g. In its very essence itis <strong>in</strong>tuitive <strong>and</strong> unavoidable for verify<strong>in</strong>g <strong>and</strong> compar<strong>in</strong>g variousstatistical hypo<strong>the</strong>ses with each o<strong>the</strong>r. Ma<strong>the</strong>matical statistics can notnaturally avoid it, but only shifts <strong>the</strong> choice to magnitudes not be<strong>in</strong>gmeasured <strong>in</strong> reality. No special benefit is seen <strong>in</strong> that procedure.3.8. The problem <strong>of</strong> representativeness <strong>of</strong> samples. To allappearances, this should be frankly attributed to a problem non-formal<strong>in</strong> its very essence, to <strong>the</strong> choice <strong>of</strong> <strong>the</strong> <strong>in</strong>itial <strong>in</strong>tuitive assumptions.An alternative can be to separate <strong>the</strong> trials <strong>in</strong>to several subsamples <strong>and</strong>only forecast rough averaged characteristics. The size <strong>of</strong> <strong>the</strong>subsamples <strong>and</strong> <strong>the</strong> threshold for discern<strong>in</strong>g should be chosenaccord<strong>in</strong>g to precedents <strong>in</strong> a c<strong>and</strong>id <strong>in</strong>tuitive way <strong>in</strong> terms <strong>of</strong> measuredmagnitudes. Such an empirical <strong>in</strong>tuitive approach embodies <strong>the</strong>fundamental pr<strong>in</strong>ciple <strong>of</strong> natural science, <strong>the</strong> dem<strong>and</strong> <strong>of</strong> multiplerepetition <strong>of</strong> experiments <strong>and</strong> a conv<strong>in</strong>c<strong>in</strong>g reproduction <strong>of</strong> <strong>the</strong>irresults. See Alimov (1976, 1977, 1978b; 1978a; 1979).4. The Mises Formalizations <strong>of</strong> <strong>the</strong> Idea <strong>of</strong> Independent TrialsIn § 3.3 we concluded that a clear rule is required for transition fromone <strong>in</strong>itial sequence <strong>of</strong> trials to an ensemble <strong>of</strong> statistically<strong>in</strong>dependent sequences. That rule should somehow reflect <strong>in</strong>tuitiveideas about <strong>in</strong>dependence <strong>of</strong> trials. We may accept Mises’ general ideato consider <strong>the</strong> trials <strong>in</strong>dependent if <strong>the</strong>ir sequence is very irregular <strong>and</strong>difficult to forecast. He called such sequences irregular collectives.From <strong>the</strong> 1920s many authors (Wald, Feller, Church, Reichenbach)had developed various versions <strong>of</strong> formaliz<strong>in</strong>g <strong>the</strong> concept <strong>of</strong> suchcollectives. Kolmogorov’s algorithmic notion <strong>of</strong> probability <strong>of</strong> 1963 8also bears relation to this problem although it is apparently only<strong>in</strong>directly l<strong>in</strong>ked with <strong>the</strong> idea <strong>of</strong> forecast<strong>in</strong>g. See <strong>the</strong> pert<strong>in</strong>ent <strong>in</strong>itialbibliography, for example, <strong>in</strong> Knut (1977, vol. 2, chapter 3).4.1. Formalization accord<strong>in</strong>g to Ville [e. g., Shafer & Vovk 2001,pp. 48 – 50] <strong>and</strong> Postnikov (1960).4.2. Formalization accord<strong>in</strong>g to Copel<strong>and</strong>. Postnikov (1960)proved that a sequence is irregular <strong>in</strong> Copel<strong>and</strong>’s sense if <strong>and</strong> only if itis irregular accord<strong>in</strong>g to Ville <strong>and</strong> Postnikov.130


4.3. General remarks on §§ 4.1 <strong>and</strong> 4.2. A sequence irregularaccord<strong>in</strong>g to §§ 4.1 or 4.2 presents a simplest example <strong>of</strong> an <strong>in</strong>tuitive<strong>and</strong> rigorous ma<strong>the</strong>matical model <strong>of</strong> trials which can be called<strong>in</strong>dependent <strong>and</strong> identical (identical s<strong>in</strong>ce <strong>the</strong> distributions <strong>of</strong> <strong>the</strong>probabilities for all <strong>the</strong> formed sequences co<strong>in</strong>cide). The idea <strong>of</strong> a poorpredictability <strong>of</strong> one <strong>in</strong>itial sequence is here <strong>in</strong>deed reduced ra<strong>the</strong>rnaturally to dem<strong>and</strong><strong>in</strong>g statistical <strong>in</strong>dependence <strong>of</strong> <strong>the</strong> ensemble <strong>of</strong>sequences. As a result, <strong>in</strong>dependence <strong>of</strong> trials is treated <strong>in</strong> such amanner that provides a sufficiently clear rule for its quantitativeempirical verification.Thus, after be<strong>in</strong>g clearly formulated, <strong>in</strong>dependence <strong>of</strong> trialsobviously becomes a concept derived from <strong>the</strong> notion <strong>of</strong> statisticalstability, cf. our assumption <strong>in</strong> § 2.10. It follows that <strong>the</strong> postulate <strong>of</strong> §3.6.3 even <strong>in</strong> its most simple clear form is evidently more complexthan <strong>the</strong> postulates <strong>of</strong> §§ 3.6.1 <strong>and</strong> 3.6.2. It can not be <strong>the</strong> assumptionfrom which, at least accord<strong>in</strong>g to <strong>the</strong> pattern <strong>of</strong> one series, statisticalstability is deduced.The verification <strong>of</strong> any propositions <strong>of</strong> ma<strong>the</strong>matical statistics willbe <strong>the</strong>refore aimed at verify<strong>in</strong>g <strong>the</strong> postulate <strong>of</strong> § 3.6.3 ra<strong>the</strong>r than atmeasur<strong>in</strong>g <strong>the</strong> sought parameters <strong>of</strong> <strong>the</strong> distributions <strong>of</strong> <strong>the</strong> <strong>in</strong>itialmagnitudes. This measurement, for which, as it seems, ma<strong>the</strong>maticalstatistics is <strong>in</strong>deed created, will only constitute a small <strong>and</strong> so to sayprelim<strong>in</strong>ary part <strong>of</strong> <strong>the</strong> work to be done.The formulations <strong>of</strong> <strong>the</strong> idea <strong>of</strong> <strong>in</strong>dependence <strong>of</strong> trials consideredabove are obviously only applicable when <strong>the</strong> n trials are actuallycarried out many times. The alternative to <strong>the</strong> method <strong>of</strong> ma<strong>the</strong>maticalstatistics <strong>the</strong>refore means that <strong>the</strong> postulate <strong>of</strong> § 3.6.3 should be<strong>in</strong>troduced only after <strong>the</strong> sought parameters or <strong>the</strong> <strong>in</strong>itial distributionitself were reliably measured.4.4. Specification <strong>of</strong> <strong>the</strong> traditional formulations <strong>of</strong> <strong>the</strong> limit<strong>the</strong>orems on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> concept <strong>of</strong> an irregular collective. Theauthor <strong>in</strong>terprets <strong>the</strong> Bernoulli <strong>the</strong>orem by apply<strong>in</strong>g <strong>the</strong> notion <strong>of</strong> irregularity <strong>of</strong>collectives. One <strong>of</strong> <strong>the</strong> conditions <strong>of</strong> his pert<strong>in</strong>ent <strong>the</strong>orem is <strong>the</strong> existence <strong>of</strong> a limit<strong>of</strong> <strong>the</strong> sequence <strong>of</strong> trials, <strong>the</strong> probability accord<strong>in</strong>g to Mises.He notes that his (<strong>and</strong> <strong>the</strong>refore <strong>the</strong> Bernoulli) <strong>the</strong>orem does not claim to justify<strong>the</strong> statistical stability <strong>of</strong> <strong>the</strong> frequency which is now one <strong>of</strong> his preconditions. Heconcludes that <strong>the</strong> limit <strong>the</strong>orems (<strong>in</strong> general!) are not actually fundamentalpropositions as it was thought <strong>in</strong> <strong>the</strong> <strong>in</strong>itial period <strong>of</strong> <strong>the</strong> development <strong>of</strong> <strong>the</strong> <strong>the</strong>ory<strong>of</strong> probability.4.5. An example from classical statistical physics. [Concern<strong>in</strong>g <strong>the</strong>work <strong>of</strong> an oscillator be<strong>in</strong>g <strong>in</strong> <strong>the</strong>rmal equilibrium with a <strong>the</strong>rmostat.]5. ConclusionAn alternative to <strong>the</strong> method <strong>of</strong> ma<strong>the</strong>matical statistics can bedescribed <strong>in</strong> a few words <strong>in</strong> <strong>the</strong> follow<strong>in</strong>g way. In applied research,<strong>and</strong> more precisely beyond fundamental physics, we should as far aspossible absta<strong>in</strong> from <strong>in</strong>troduc<strong>in</strong>g stochastic magnitudes not measured<strong>in</strong> real experiments <strong>in</strong> our <strong>in</strong>itial assumptions. The so-called numericalexperiments compare a computer <strong>and</strong> a paper model but not model<strong>and</strong> reality.The objects <strong>of</strong> study <strong>in</strong> economics, sociology <strong>and</strong> even moderntechnology are most <strong>of</strong>ten too complicated <strong>and</strong> unstable forconstruct<strong>in</strong>g <strong>the</strong>ir useful models by issu<strong>in</strong>g from general pr<strong>in</strong>ciples131


peculiar for <strong>the</strong> foundations <strong>of</strong> physics but remote from experiment.Advisable here are efficient phenomenal models without specialclaims to fundamentalism. Accord<strong>in</strong>g to <strong>the</strong> pr<strong>in</strong>ciple <strong>of</strong> equal stability<strong>of</strong> all <strong>the</strong> elements <strong>of</strong> an applied <strong>in</strong>vestigation, <strong>in</strong>troduction <strong>of</strong>complicated ma<strong>the</strong>matics should be considered guardedly. I concludeby quot<strong>in</strong>g Wiener (1966 from Russian), hardly an opponent <strong>of</strong>ma<strong>the</strong>matization:Advancement <strong>of</strong> ma<strong>the</strong>matical physics caused sociologists to bejealous <strong>of</strong> <strong>the</strong> power <strong>of</strong> its methods but was hardly accompanied by<strong>the</strong>ir dist<strong>in</strong>ct underst<strong>and</strong><strong>in</strong>g <strong>of</strong> <strong>the</strong> <strong>in</strong>tellectual sources <strong>of</strong> that power.[...] Some backward nations borrowed Western clo<strong>the</strong>s <strong>and</strong>parliamentary forms lack<strong>in</strong>g personality <strong>and</strong> national dist<strong>in</strong>ctivemarks, vaguely believ<strong>in</strong>g as though <strong>the</strong>se magic garments <strong>and</strong>ceremonies will at once br<strong>in</strong>g <strong>the</strong>m nearer to modern culture <strong>and</strong>technology, − so also economists began to dress <strong>the</strong>ir very <strong>in</strong>exactideas <strong>in</strong> rigorous formulas <strong>of</strong> <strong>in</strong>tegral <strong>and</strong> differential calculuses. [...]However difficult is <strong>the</strong> selection <strong>of</strong> reliable data <strong>in</strong> physics, it is muchmore difficult to collect vast economic or sociological <strong>in</strong>formationconsist<strong>in</strong>g <strong>of</strong> numerous series <strong>of</strong> homogeneous data. [...] Under <strong>the</strong>secircumstances, it is hopeless to secure too precise def<strong>in</strong>itions <strong>of</strong>magnitudes brought <strong>in</strong>to play. To attribute to such magnitudes,<strong>in</strong>determ<strong>in</strong>ate <strong>in</strong> <strong>the</strong>ir very essence, some special precision is useless.Whatever is <strong>the</strong> excuse, application <strong>of</strong> precise formulas to <strong>the</strong>se to<strong>of</strong>reely determ<strong>in</strong>ed magnitudes is noth<strong>in</strong>g but a deception, a va<strong>in</strong> waste<strong>of</strong> time.Notes1. Both r<strong>and</strong>omization <strong>and</strong> <strong>the</strong> Monte Carlo method are mentioned by Prokhorov(1999) <strong>and</strong> Dodge (2003). Tutubal<strong>in</strong>, who had sided with Alimov, later applied <strong>the</strong>Monte Carlo method <strong>in</strong> a jo<strong>in</strong>t contribution (Tutubal<strong>in</strong> et al 2009, p. 189). O. S.2. Concern<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability <strong>the</strong> author was likely wrong, see Tutubal<strong>in</strong>[i, § 4.2], who [i, § 4.5] also remarked that for natural science <strong>the</strong> significance <strong>of</strong> <strong>the</strong>LLN only consisted <strong>in</strong> reflect<strong>in</strong>g <strong>the</strong> experimental fact <strong>of</strong> <strong>the</strong> stability <strong>of</strong> <strong>the</strong> mean.The author’s next sentence had to do with <strong>the</strong> application <strong>of</strong> <strong>the</strong> LLN to statistics, bu<strong>the</strong> only stated what that <strong>the</strong>orem did not achieve.Concern<strong>in</strong>g <strong>the</strong> CLT I quote Kolmogorov (1956, p. 269): Even now, it is difficultto overestimate [its] importance. O. S.3. In spite <strong>of</strong> numerous efforts made, <strong>the</strong> Mises approach rema<strong>in</strong>s actuallyquestionable, see end <strong>of</strong> [vi]. O. S.4. It is worthwhile to quote ano<strong>the</strong>r def<strong>in</strong>ition (Kolmogorov & Prokhorov1974/1977, p. 721):[Ma<strong>the</strong>matical statistics is] <strong>the</strong> branch <strong>of</strong> ma<strong>the</strong>matics devoted to <strong>the</strong>ma<strong>the</strong>matical methods for <strong>the</strong> systematization, analysis <strong>and</strong> use <strong>of</strong> statistical data for<strong>the</strong> draw<strong>in</strong>g <strong>of</strong> scientific <strong>and</strong> practical <strong>in</strong>ferences. O. S.5. See <strong>the</strong> Introduction to [v]. O. S.6. I illustrate pr<strong>in</strong>cipal <strong>and</strong> secondary magnitudes (§ 2.6) by Kolmogorov’sreason<strong>in</strong>g. Frequency µ/n tends to probability p, <strong>and</strong> <strong>the</strong> probability P(|µ/n − p| < ε) isa secondary magnitude which <strong>in</strong> turn should be measured as well. O. S.7. This statement is not altoge<strong>the</strong>r correct. See Wilks (1962, Chapter 11) <strong>and</strong>Walsh (1962) who discuss non-parametric estimation <strong>and</strong> order statisticsrespectively. O. S.8. Perhaps Kolmogorov (1963). O. S.132


BibliographyAlimov Yu. I. (1976, 1977, 1978b), Elementy Teorii Eksperimenta (Elements <strong>of</strong><strong>the</strong> Theory <strong>of</strong> Experiments), pts 1 – 3. Sverdlovsk.--- (1978a Russian), On <strong>the</strong> applications <strong>of</strong> <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability considered <strong>in</strong>V. N. Tutubal<strong>in</strong>’s works. Avtomatika, No. 1, pp. 71 – 82.--- (1979 Russian), Once more about realism <strong>and</strong> fantasy <strong>in</strong> <strong>the</strong> applications <strong>of</strong> <strong>the</strong><strong>the</strong>ory <strong>of</strong> probability. Ibidem, No 4, pp. 103 – 110.Anscombe F. J. (1967), Topics <strong>in</strong> <strong>the</strong> <strong>in</strong>vestigation <strong>of</strong> l<strong>in</strong>ear relations. J. Roy.Stat. Soc., vol. B 29, pp. 1 – 52.Blekhman I. I., Myshkis A. D., Panovko Ya. G. (1976), Prikladnaia Matematika(Applied Math.). Kiev.Dodge Y. (2003), Oxford Dictionary <strong>of</strong> Statistical Terms. Oxford.Grekova I. (1976 Russian), Special methodical features <strong>of</strong> applied ma<strong>the</strong>maticson <strong>the</strong> current stage <strong>of</strong> its development. Voprosy Filos<strong>of</strong>ii No. 6, pp. 104 – 114.Kitaigorodsky A. I. (1978), Molekuliarnye Sily (Molecular Forces). Moscow.Knut D. E. (1977 Russian), The Art <strong>of</strong> Computer Progamm<strong>in</strong>g, vol. 2. Moscow.The author referred to this Russian edition.Kolmogorov A. N. (1956 Russian), Theory <strong>of</strong> probability. In: Matematika. EeSoderzanie, Metody i Znachenie (Ma<strong>the</strong>matics. Its Contents, Methods <strong>and</strong>Importance), vol. 2. Moscow, pp. 252 – 284.Kolmogorov A. N., Prokhorov Yu. V. (1974 Russian), <strong>Statistics</strong>. Great Sov.Enc., 3 rd ed., English version, 1977, vol. 15, pp. 721 – 725.Nikit<strong>in</strong>a E. P., Freidl<strong>in</strong>a V. D., Yarkho A. V. (1972), Kollekzia OpredeleniyTerm<strong>in</strong>a “Statistika” (Collection <strong>of</strong> Def<strong>in</strong>itions <strong>of</strong> <strong>the</strong> Term “<strong>Statistics</strong>”). Moscow.Postnikov A. G. (1960), Arifmeticheskoe Modelirovanie Slucha<strong>in</strong>ykh Prozessov(Arithmetical Modell<strong>in</strong>g <strong>of</strong> Stochastic Processes). Moscow. Perhaps <strong>in</strong>cluded <strong>in</strong>author’s Izbrannye Trudy (Sel. Works). Moscow, 2005.Prognostication (1975 Russian), Great Sov. Enc., 3 rd ed., English version, vol. 21,1978.Prokhorov Yu. V., Editor (1999), Veroiatnost i Matematicheskaia Statistika.Enziklopedia (<strong>Probability</strong> <strong>and</strong> Math. <strong>Statistics</strong>. An Enc.). Moscow.Shafer G., Vovk V. (2001), <strong>Probability</strong> <strong>and</strong> F<strong>in</strong>ance. It’s Only a Game. NewYork.Tutubal<strong>in</strong> V. N. (1972), Teoria veroiatnostei (Theory <strong>of</strong> <strong>Probability</strong>). Moscow.Tutubal<strong>in</strong> V. N., Barabasheva Yu. M., Devyatkova G. N., Uger E. G. (2009Russian), Kolmogorov’s criteria <strong>and</strong> Mendel’s heredity laws. Istoriko-Matematich.Issledovania, ser. 2, No. 13/48, pp. 185 – 197.Venikov V. A. (1978), Perekhodnye Elektromekhanicheskie Prozessy vElektricheskikh Sistemakh. Moscow.Vysotsky M. (1979), Pod Znakom Integrala (Under <strong>the</strong> Sign <strong>of</strong> Integral).Moscow.Walsh J. E. (1962), Nonparametric confidence <strong>in</strong>tervals <strong>and</strong> tolerance regions. In:Sarhan A. E., Greenberg B. G., Editors, Contributions to Order <strong>Statistics</strong>. New York– London, pp. 136 – 143.Wiener N. (1966 Russian), Tvorez i Robot (Creator <strong>and</strong> Robot). Moscow. Theauthor referred to this Russian edition. German: possibly Mensch undMenschenmasch<strong>in</strong>e.Wilks S. S. (1962), Ma<strong>the</strong>matical <strong>Statistics</strong>. New York.133


VV. N. Tutubal<strong>in</strong>Answer<strong>in</strong>g Alimov’s Critical Commentson Apply<strong>in</strong>g <strong>the</strong> Theory <strong>of</strong> <strong>Probability</strong>Otvet na kriticheskie zamechania Yu. I. Alimovav sviazi s problemami prilozenia teorii veroiatnostei.Avtomatika, No. 5, vol. 8, 1978, pp. 88 – 91Introduction by <strong>the</strong> Translator: The Ma<strong>in</strong> Ideas <strong>of</strong> Alimov (1978)Page 71. The stability <strong>of</strong> <strong>the</strong> <strong>in</strong>itial means is a postulate whoselikelihood should be experimentally justified.Page 73. The LLN was, <strong>and</strong> sometimes still is considered a bridgeconnect<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability with practice. Accord<strong>in</strong>g to <strong>the</strong>context (p. 72), <strong>the</strong> author denies this statement because statisticalstability <strong>of</strong> <strong>the</strong> trials had to be proved.Page 73. The proximity <strong>of</strong> <strong>the</strong> empirical frequency to <strong>the</strong> <strong>in</strong>itialprobability should be estimated by measurement.Page 74. Not practice is follow<strong>in</strong>g Mises as Tutubal<strong>in</strong> remarked, butra<strong>the</strong>r <strong>the</strong> <strong>in</strong>verse had happened.Page 75. The significance <strong>of</strong> <strong>the</strong> LLN <strong>and</strong> o<strong>the</strong>r limit <strong>the</strong>orems <strong>in</strong>statistics is reduced to solv<strong>in</strong>g an ord<strong>in</strong>ary problem.Page 76. An explanation <strong>of</strong> <strong>the</strong> <strong>in</strong>dependence <strong>of</strong> trials is notfundamentally important for <strong>the</strong> Mises approach. Statistical<strong>in</strong>dependence can be revealed by most various sequences <strong>of</strong> trials<strong>in</strong>clud<strong>in</strong>g periodic sequences.Page 77. For applications, <strong>the</strong> transition <strong>of</strong> <strong>the</strong> empirical frequencyto probability is an undist<strong>in</strong>guished expense <strong>of</strong> a rigorousformalization ra<strong>the</strong>r than any essential feature <strong>of</strong> <strong>the</strong> Mises approach.Page 77. Without due substantiation but <strong>in</strong> agreement with <strong>the</strong>former pronouncement <strong>the</strong> author alleges that <strong>the</strong> so-called stronglaws <strong>of</strong> large numbers are very remote from <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability.[The ma<strong>in</strong> text]Alimov (1978) critically commented on some <strong>of</strong> my publications<strong>and</strong> his paper is <strong>the</strong> only one that I know to publish a response to mymethodical <strong>and</strong> popular scientific works. S<strong>in</strong>ce discussions, <strong>in</strong>clud<strong>in</strong>gthose carried out <strong>in</strong> public, are most necessary for <strong>the</strong> development <strong>of</strong>science <strong>and</strong> teach<strong>in</strong>g, <strong>the</strong> <strong>in</strong>itiative <strong>of</strong> <strong>the</strong> periodical Avtomatika as wellas <strong>the</strong> serious (as will be seen below) work <strong>of</strong> Alimov only due towhich that discussion became possible should be appreciated verypositively.Alimov is well known because <strong>of</strong> a number <strong>of</strong> his publications,mostly <strong>of</strong> a critical k<strong>in</strong>d, on <strong>the</strong> application <strong>of</strong> <strong>the</strong> probability <strong>the</strong>ory. Ith<strong>in</strong>k that <strong>the</strong> general aim <strong>of</strong> his contributions differ but little fromm<strong>in</strong>e. We both apparently agree that <strong>the</strong> amount <strong>of</strong> falsehoods arrivedat by apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability is too great to be tolerated. In ahistorical perspective, my statement made publicly is all by itself aquite effective means <strong>of</strong> combat<strong>in</strong>g that evil. And <strong>in</strong>deed <strong>in</strong>tr<strong>in</strong>sic134


processes are now go<strong>in</strong>g on <strong>in</strong> <strong>the</strong> society due to which <strong>the</strong> part playedby moral elements sharply <strong>in</strong>creases. It is this circumstance to which I<strong>and</strong> Alimov are beholden for some not excessive popularity <strong>of</strong> ourpublications; o<strong>the</strong>rwise <strong>the</strong>y would just have not been popular.Thus, connect<strong>in</strong>g <strong>the</strong> problem <strong>of</strong> <strong>the</strong> truth <strong>of</strong> scientific work <strong>in</strong> <strong>the</strong>first place with <strong>the</strong> level <strong>of</strong> social morals, I consider <strong>the</strong> possibility <strong>of</strong>solv<strong>in</strong>g that problem by purely scientific means ra<strong>the</strong>r sceptically, forexample by describ<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong> probability accord<strong>in</strong>g to Misesra<strong>the</strong>r than by <strong>the</strong> generally recognized Kolmogorov axiomatics. I donot mention <strong>the</strong> idea <strong>of</strong> <strong>of</strong>ficial censur<strong>in</strong>g voiced by Alimov (1978, p.82). That would have been only really helpful if those responsible willbe at <strong>the</strong> same time as though automatically endowed with <strong>the</strong> truth orat least with a tendency to it.Incidentally, I would like to turn Alimov’s attention to acircumstance which I myself previously experienced, that apparentlyany attempt to retell or cite <strong>the</strong> viewpo<strong>in</strong>t <strong>of</strong> o<strong>the</strong>r people <strong>in</strong>troducesunavoidable corruption. Thus, Alimov (1978, p. 74) says: Whencompar<strong>in</strong>g <strong>the</strong> Mises approach with a dead language, Tutubal<strong>in</strong>never<strong>the</strong>less notes ...Actually, I (1972, p. 148) wrote:In general, <strong>the</strong> present attitude <strong>of</strong> specialists towards <strong>the</strong> language<strong>of</strong> <strong>the</strong> Mises <strong>the</strong>ory can be compared with <strong>the</strong> attitude towards a deadlanguage <strong>in</strong> which for some reason no one wishes to speak although,after be<strong>in</strong>g appropriately corrected <strong>and</strong> altered, it will be quitecapable <strong>of</strong> express<strong>in</strong>g everyth<strong>in</strong>g spoken <strong>in</strong> a live language.Thus, after Alimov quite properly but [too] briefly arranged myviewpo<strong>in</strong>t, my friendly attitude towards probability <strong>the</strong>ory accord<strong>in</strong>gto Mises absolutely disappeared <strong>and</strong> became replaced by disda<strong>in</strong>.This example taken toge<strong>the</strong>r with my general op<strong>in</strong>ion about <strong>the</strong>corruptions <strong>of</strong> such a k<strong>in</strong>d be<strong>in</strong>g practically unavoidable, sufficientlyexpla<strong>in</strong>s why I do not reply <strong>in</strong> detail to each po<strong>in</strong>t <strong>of</strong> Alimov’scriticism. Concern<strong>in</strong>g general pronouncements, all is reduced toselect<strong>in</strong>g some shade <strong>of</strong> conception. For example, if Alimov [p. 75]th<strong>in</strong>ks that <strong>the</strong> law <strong>of</strong> large numbers is a limit <strong>the</strong>orem suited forsolv<strong>in</strong>g an ord<strong>in</strong>ary modest problem <strong>of</strong> probability <strong>the</strong>ory unconnectedwith <strong>the</strong> pr<strong>in</strong>ciples <strong>of</strong> its applicability, <strong>the</strong>n let him be <strong>in</strong> <strong>the</strong> right.However, it is much more <strong>in</strong>terest<strong>in</strong>g to turn to concrete examples<strong>of</strong> application <strong>of</strong> <strong>the</strong> <strong>the</strong>ory because <strong>the</strong>y are always richer. Forexample, prom<strong>in</strong>ent physicists who had been creat<strong>in</strong>g that scienceusually philosophically <strong>in</strong>terpreted it <strong>the</strong>mselves without need<strong>in</strong>gphilosophers. Not that I deny <strong>the</strong> social utility <strong>of</strong> philosophers, but<strong>the</strong>ir customers are not lead<strong>in</strong>g scientists but <strong>the</strong> multitude <strong>of</strong> thosewho do not (yet) occupy lead<strong>in</strong>g places <strong>in</strong> science.Alimov’s ma<strong>in</strong> merit as a critic, as it seems to me, is that heconsidered concrete numerical data. I bear <strong>in</strong> m<strong>in</strong>d <strong>the</strong> experimentalverification <strong>of</strong> <strong>the</strong> most simple Mendelian law <strong>of</strong> assortment <strong>of</strong><strong>in</strong>dications <strong>in</strong> <strong>the</strong> ratio 3:1. The data was provided by Ermolaeva(1939), a representative <strong>of</strong> <strong>the</strong> Lyssenko school, <strong>and</strong> En<strong>in</strong> (1939), itsopponent. Kolmogorov (1940) published a detailed analysis <strong>of</strong>135


Ermolaeva’s results <strong>and</strong> concluded that, <strong>in</strong>stead <strong>of</strong> refut<strong>in</strong>g <strong>the</strong>Mendelian law, she completely confirmed it. There also, withoutm<strong>in</strong>utely analys<strong>in</strong>g En<strong>in</strong>’s paper, Kolmogorov implied that his resultsare doubtful because <strong>the</strong>y confirmed that law too f<strong>in</strong>ely.In a popular scientific booklet, I [iii] thought it expedient to rem<strong>in</strong>dreaders about Kolmogorov’s paper <strong>and</strong> supplemented it by treat<strong>in</strong>gEn<strong>in</strong>’s results. Alimov treated <strong>the</strong> same data o<strong>the</strong>rwise <strong>and</strong> formulateda number <strong>of</strong> objections. He directed <strong>the</strong>m to me alone although a part<strong>of</strong> <strong>the</strong>m to <strong>the</strong> same extent concerned Kolmogorov’s calculations. Ibeg<strong>in</strong> with <strong>the</strong> objection which I underst<strong>and</strong> <strong>and</strong> consider essential.He notes that <strong>in</strong> many cases <strong>the</strong> families considered by Ermolaevawere small (not more than 10 observations). Then <strong>the</strong> normalapproximation <strong>of</strong> <strong>the</strong> frequencies <strong>of</strong> a certa<strong>in</strong> phenotype <strong>in</strong>troduced byKolmogorov ought to be very rough. In particular, <strong>the</strong> presence <strong>of</strong>normed frequencies smaller than − 3 which I [iii] considered assignificant deviations from <strong>the</strong> Mendelian law can be expla<strong>in</strong>ed. asAlimov believes, by <strong>the</strong> asymmetry <strong>of</strong> <strong>the</strong> b<strong>in</strong>omial law. Alimovdeclared that my conclusion was wrong (that was somewhat hastily, heshould have said unjustified). Any student <strong>of</strong> a technical <strong>in</strong>stitute, as hestates, would have avoided such a mistake caused by <strong>the</strong> generalcorruption <strong>of</strong> concepts due to <strong>the</strong> application <strong>of</strong> <strong>the</strong> non-Miseslanguage <strong>and</strong> <strong>the</strong> rituals <strong>of</strong> ma<strong>the</strong>matical statistics.Actually, everyth<strong>in</strong>g is much simpler. Before prepar<strong>in</strong>g my booklet,I did not acqua<strong>in</strong>t myself with Ermolaeva’s paper which was notreadily available. Now, however, s<strong>in</strong>ce her data became an object <strong>of</strong>discussion, I had a look at that source. The data on <strong>the</strong> assortment <strong>in</strong>separate families are provided <strong>the</strong>re <strong>in</strong> Tables 4 <strong>and</strong> 6. In Table 4 <strong>the</strong>families are numbered from 1 to 100, but for some unknown reasonnumbers 50 <strong>and</strong> 87 are omitted. In Table 6, <strong>the</strong> number<strong>in</strong>g beg<strong>in</strong>s with22 <strong>and</strong> cont<strong>in</strong>ues until 148, but numbers 92, 95, 115, 127, 144 areabsent. At <strong>the</strong> same time, <strong>the</strong> table show<strong>in</strong>g <strong>the</strong> total, states 100 <strong>and</strong>127 families respectively.Kolmogorov <strong>in</strong>serted a venomous pert<strong>in</strong>ent remark; he counted 98families <strong>in</strong> <strong>the</strong> first, <strong>and</strong> 123 (actually, 122) <strong>in</strong> <strong>the</strong> second table. Thegeneral style <strong>of</strong> her contribution, let me say it frankly, is abom<strong>in</strong>able.The author obviously does not underst<strong>and</strong> <strong>the</strong> mean<strong>in</strong>g <strong>of</strong> <strong>the</strong> errorscalculated by biometric methods for <strong>the</strong> number <strong>of</strong> assortments. It isquite clear that her data do not really deserve to be seriouslyconsidered.However, if only discuss<strong>in</strong>g Ermolaeva’s tables such that <strong>the</strong>y are,Alimov is still unjustly reproach<strong>in</strong>g me for discover<strong>in</strong>g non-existentdeviations from <strong>the</strong> Mendelian law. Indeed, Table 4 <strong>in</strong>cludes a result<strong>of</strong> assortment 0:17, <strong>and</strong> 0:10 <strong>in</strong> Table 6 <strong>in</strong>stead <strong>of</strong> <strong>the</strong> expected ratio3:1. Their probabilities are 4 −17 <strong>and</strong> 4 −10 respectively so that, hav<strong>in</strong>g200 plus trials, such events could not have occurred.Concern<strong>in</strong>g both Kolmogorov’s <strong>and</strong> my own treatment, I wouldlike to <strong>in</strong>dicate that, <strong>in</strong> spite <strong>of</strong> Alimov’ op<strong>in</strong>ion, correct scientificresults are possibly <strong>of</strong>ten obta<strong>in</strong>ed not because we do everyth<strong>in</strong>gproperly, but ow<strong>in</strong>g to some special luck.I did not underst<strong>and</strong> <strong>the</strong> mean<strong>in</strong>g <strong>of</strong> Alimov’s objection to <strong>the</strong>calculation <strong>of</strong> <strong>the</strong> confidence level. From <strong>the</strong> times <strong>of</strong> Laplace, after136


obta<strong>in</strong><strong>in</strong>g a deviation from <strong>the</strong> <strong>the</strong>ory assumed to be valid, scientistshave been attempt<strong>in</strong>g to calculate, if possible, <strong>the</strong> probability <strong>of</strong> adeviation not less than that. If that probability was high, 1/2, say,everyth<strong>in</strong>g was <strong>in</strong> order; o<strong>the</strong>rwise, suppos<strong>in</strong>g that its order was1/1000, it was advisable to look for <strong>the</strong> cause <strong>of</strong> <strong>the</strong> deviation. If,f<strong>in</strong>ally, it was moderate, its order be<strong>in</strong>g 1/10, say, <strong>the</strong> case wasdoubtful <strong>and</strong> a f<strong>in</strong>al decision impossible. Can we object to such k<strong>in</strong>d <strong>of</strong>apply<strong>in</strong>g <strong>the</strong> confidence level?I do not underst<strong>and</strong> Alimov’s concept <strong>of</strong> <strong>in</strong>dependence ei<strong>the</strong>r. On p.80 he th<strong>in</strong>ks that secondary trials, that is, data on <strong>the</strong> assortment <strong>of</strong><strong>in</strong>dications <strong>in</strong> different families, unconnected with each o<strong>the</strong>r, can bestatistically dependent. But how could that occur with <strong>the</strong> outcomes <strong>of</strong>different trials unconnected with each o<strong>the</strong>r? If as a result <strong>of</strong> one trialevents A <strong>and</strong> B can ei<strong>the</strong>r happen or fail, <strong>the</strong>y can be statisticallydependent <strong>and</strong>, when treat<strong>in</strong>g this dependence accord<strong>in</strong>g to Mises, weshould use a s<strong>in</strong>gle record. But <strong>in</strong> case <strong>of</strong> two absolutely different trialswe should apparently <strong>in</strong>troduce someth<strong>in</strong>g like a direct product <strong>of</strong> tworecords.F<strong>in</strong>ally, concern<strong>in</strong>g my treatment <strong>of</strong> En<strong>in</strong>’s data, Alimov remarksfirst <strong>of</strong> all that his number <strong>of</strong> families is so small (11 + 14 = 25), that<strong>the</strong>ir treatment did not warrant <strong>the</strong> waste <strong>of</strong> ei<strong>the</strong>r time or paper with aspecial non-l<strong>in</strong>ear scale. I will answer that by stat<strong>in</strong>g that, on <strong>the</strong>contrary, I aimed at show<strong>in</strong>g that <strong>the</strong> image <strong>of</strong> a distribution functionunlike that <strong>of</strong> a histogram allows to obta<strong>in</strong> sensible results even whenhav<strong>in</strong>g such a small sample size.Then, Alimov states that it was possible to arrive at my conclusionsby compil<strong>in</strong>g an extended sample 1 . To some extent this is correct, butto some extent wrong. After tak<strong>in</strong>g samples <strong>of</strong> about <strong>the</strong> same size, <strong>the</strong>frequencies <strong>in</strong> En<strong>in</strong>’s second sample will be closer to <strong>the</strong> <strong>the</strong>oreticalmagnitudes than Ermolaeva’s similar frequencies. This is seen <strong>in</strong>Alimov’s table (1978, p. 78). It can be <strong>the</strong>refore concluded, ifErmolaeva’s data are considered as a st<strong>and</strong>ard, that <strong>the</strong>re is sometrouble with En<strong>in</strong>’s materials.However, after calculat<strong>in</strong>g <strong>the</strong> chi-squared statistic (Tutubal<strong>in</strong> [iii]),a st<strong>and</strong>ard is not needed. Actually, Alimov (1978, pp. 80 – 81)believes that En<strong>in</strong>’s data should be treated not by means <strong>of</strong> <strong>the</strong> normaldistribution <strong>of</strong> <strong>the</strong> normed frequencies, but by a more subtle model. Inpr<strong>in</strong>ciple, I completely agree, only that model should not be a mixture<strong>of</strong> b<strong>in</strong>omial distributions (Alimov, p. 80, formula (21)), but it shoulddirectly consider <strong>the</strong> actual numerical strength <strong>of</strong> <strong>the</strong> families. A series<strong>of</strong> b<strong>in</strong>omial trials would be obta<strong>in</strong>ed hav<strong>in</strong>g a known number <strong>of</strong> trials<strong>and</strong> a known probability <strong>of</strong> success. Underst<strong>and</strong>ably, such a model isbarely convenient <strong>and</strong> <strong>the</strong>refore <strong>the</strong> stupidest Monte Carlo method 2will apparently be most effective for calculat<strong>in</strong>g <strong>the</strong> various pert<strong>in</strong>entprobabilities. Thus, for example, <strong>the</strong> true distribution <strong>of</strong> <strong>the</strong>Kolmogorov statistic or some o<strong>the</strong>r statistic measur<strong>in</strong>g <strong>the</strong> deviationfrom <strong>the</strong> Mendelian law can be determ<strong>in</strong>ed. S<strong>in</strong>ce such statistics arera<strong>the</strong>r diverse, we conclude that not only <strong>the</strong> electron or <strong>the</strong> atom butalso <strong>the</strong> certa<strong>in</strong>ly carelessly constructed Ermolaeva’s tables are<strong>in</strong>exhaustible 3 .137


Notes1. Alimov [iv, § 2.1] <strong>in</strong>troduced extended series <strong>of</strong> observations. O. S.2. Without say<strong>in</strong>g anyth<strong>in</strong>g else, I note that Tutubal<strong>in</strong> himself applied that method<strong>in</strong> a jo<strong>in</strong>t paper (Tutubal<strong>in</strong> et al 2009, p. 189). O. S.3. That <strong>the</strong> electron is <strong>in</strong>exhaustible is Len<strong>in</strong>’s celebrated statement from hisMaterialism <strong>and</strong> Empirical Criticism (1909, <strong>in</strong> Russian). The notion <strong>of</strong> electron is<strong>in</strong>tr<strong>in</strong>sically contradictory, so perhaps <strong>the</strong> author <strong>in</strong>directly stated <strong>the</strong> same aboutthose tables. Anyway, Len<strong>in</strong>’s statement rema<strong>in</strong>s unjustified. O. S.BibliographyAlimov Yu. I. (1978 Russian), On <strong>the</strong> problem <strong>of</strong> apply<strong>in</strong>g <strong>the</strong> <strong>the</strong>ory <strong>of</strong>probability considered by V. N. Tutubal<strong>in</strong>. Avtomatika, No. 1, pp. 71 – 82.En<strong>in</strong> T. K. (1939 Russian), The results <strong>of</strong> an analysis <strong>of</strong> <strong>the</strong> assortment <strong>of</strong> hybrids<strong>of</strong> tomatoes. Doklady Akademii Nauk SSSR, vol. 24, No. 2, pp. 176 – 178. Alsopublished at about <strong>the</strong> same time <strong>in</strong> a foreign language <strong>in</strong> C. r. (Doklady) Acad. Sci.URSS.Ermolaeva N. I. (1939 Russian), Once more about <strong>the</strong> “pea laws”. Jarovizatsia,No. 2, pp. 79 – 86.Kolmogorov A. N. (1940), On a new confirmation <strong>of</strong> Mendel’s laws. C. r.(Doklady) Acad. Sci. URSS, vol. 28, No. 9, pp. 834 – 838.Tutubal<strong>in</strong> V. N. (1972), Teoria Veroiatnostei (Theory <strong>of</strong> probability). Moscow.Tutubal<strong>in</strong> V. N, Barabasheva Yu. M., Devyatkova G. N., Uger E. G. (2009Russian), Kolmogorov’s criteria <strong>and</strong> verification <strong>of</strong> Mendel’s heredity laws. Istoriko-Matematich. Issledovania, ser. 2, issue 13/48, pp. 185 – 197.138


VI<strong>Oscar</strong> Sheyn<strong>in</strong>On <strong>the</strong> Bernoulli Law <strong>of</strong> Large NumbersBernoulli considered (<strong>in</strong>dependent) trials with a constant probability<strong>of</strong> success, <strong>and</strong> rigorously proved that <strong>the</strong> frequency <strong>of</strong> success tendsto that probability. Mises, however, treated collectives, totalities <strong>of</strong>phenomena or events differ<strong>in</strong>g from each o<strong>the</strong>r <strong>in</strong> some <strong>in</strong>dication, <strong>and</strong>characterized by <strong>the</strong> existence <strong>of</strong> <strong>the</strong> limit<strong>in</strong>g frequency <strong>of</strong> success <strong>and</strong>by irregularity. The latter property meant that for any part <strong>of</strong> <strong>the</strong>collective that limit<strong>in</strong>g frequency was <strong>the</strong> same.Alimov noted that artificially constructed collectives proved that <strong>the</strong>empirical frequency <strong>of</strong> success can become more stable as <strong>the</strong> number<strong>of</strong> trials <strong>in</strong>creased, but have no limit. Therefore, <strong>the</strong> existence <strong>of</strong> thatlimit is an experimental fact. I have described his viewpo<strong>in</strong>t <strong>in</strong> somedetail <strong>in</strong> an Introduction to [v]. Tutubal<strong>in</strong> largely sided with Alimov.In <strong>the</strong> same Ars Conject<strong>and</strong>i, previous to prov<strong>in</strong>g <strong>the</strong> LLN,Bernoulli stated that his law was also valid <strong>in</strong> its <strong>in</strong>verse sense (<strong>and</strong> DeMoivre <strong>in</strong>dependently stated <strong>the</strong> same with respect to <strong>the</strong> first version<strong>of</strong> <strong>the</strong> CLT proved by him <strong>in</strong> 1733). In o<strong>the</strong>r words, an unknown <strong>and</strong>even a non-exist<strong>in</strong>g probability (one <strong>of</strong> Bernoulli’s examples) could beestimated by <strong>the</strong> limit<strong>in</strong>g frequency.In a little known companion paper (1765) to his ma<strong>in</strong> memoir(1764), Bayes all but proved his own limit <strong>the</strong>orem explicat<strong>in</strong>g that<strong>in</strong>verse LLN. He did not make <strong>the</strong> f<strong>in</strong>al step from <strong>the</strong> case <strong>of</strong> a largef<strong>in</strong>ite number <strong>of</strong> trials because he opposed <strong>the</strong> application <strong>of</strong> divergentseries which was usual <strong>in</strong> those times. That was done <strong>in</strong> 1908 byTimerd<strong>in</strong>g, <strong>the</strong> Editor <strong>of</strong> <strong>the</strong> German translation <strong>of</strong> Bayes, certa<strong>in</strong>lywithout us<strong>in</strong>g divergent series.Bayes – Timerd<strong>in</strong>g exam<strong>in</strong>ed <strong>the</strong> behaviour <strong>of</strong> <strong>the</strong> centred <strong>and</strong>normed r<strong>and</strong>om variable η, <strong>the</strong> unknown probability, (η − Eη)/var ηwhereas <strong>the</strong> direct LLN dealt with <strong>the</strong> frequency ξ, (ξ − Eξ)/varξ. Hisma<strong>in</strong> memoir became widely known <strong>and</strong> for a long time <strong>the</strong> Bayesapproach had been fiercely opposed, partly because an unknownconstant was treated as a r<strong>and</strong>om variable (with a uniformdistribution). Note that varη > varξ which is quite natural s<strong>in</strong>ceprobability is only unknown <strong>in</strong> <strong>the</strong> <strong>in</strong>verse case. For atta<strong>in</strong><strong>in</strong>g <strong>the</strong> sameprecision <strong>the</strong> <strong>in</strong>verse case <strong>the</strong>refore dem<strong>and</strong>s more trials than <strong>the</strong> directlaw. Mises could have called Bayes his ma<strong>in</strong> predecessor; actually,however, he only described <strong>the</strong> work <strong>of</strong> <strong>the</strong> English ma<strong>the</strong>matician,<strong>and</strong> <strong>in</strong>adequately at that. Bayes completed <strong>the</strong> first stage <strong>of</strong> <strong>the</strong>development <strong>of</strong> probability <strong>the</strong>ory.Alimov’s viewpo<strong>in</strong>t was largely correct s<strong>in</strong>ce he considered an<strong>in</strong>comparably more general pattern than Bernoulli <strong>and</strong> thought about<strong>the</strong> necessary checks, but he [iv] was too radical <strong>in</strong> deny<strong>in</strong>g importantparts <strong>of</strong> ma<strong>the</strong>matical statistics as also too brave <strong>in</strong> alter<strong>in</strong>g <strong>the</strong> Misesapproach. To borrow an expression from Tutubal<strong>in</strong> [end <strong>of</strong> ii], he<strong>in</strong>troduced <strong>the</strong> Mises approach <strong>of</strong> a light-weighted type.139


Concern<strong>in</strong>g <strong>the</strong> rigor <strong>of</strong> <strong>the</strong> frequentist <strong>the</strong>ory, witness Uspensky etal (1990, § 1.3.4):Until now, it proved impossible to embody Mises’ <strong>in</strong>tention <strong>in</strong> adef<strong>in</strong>ition <strong>of</strong> r<strong>and</strong>omness that was satisfactory from any po<strong>in</strong>t <strong>of</strong> view.I ought to add, however, that Kolmogorov (1963, p. 369) hadessentially s<strong>of</strong>tened his viewpo<strong>in</strong>t about that <strong>the</strong>ory:I have come to realize that <strong>the</strong> concept <strong>of</strong> r<strong>and</strong>om distribution <strong>of</strong> aproperty <strong>in</strong> a large f<strong>in</strong>ite population can have a strict formalma<strong>the</strong>matical exposition.In <strong>the</strong> 19 th <strong>and</strong> 20 th centuries statisticians had been reluctant tojustify <strong>the</strong>ir studies by <strong>the</strong> Bernoulli LLN. They did not refer ei<strong>the</strong>r to<strong>the</strong> <strong>in</strong>verse law or to Poisson (which would not have changed much).Maciejewski (1911, p. 96) even <strong>in</strong>troduced la loi des gr<strong>and</strong>s nombresdes statisticiens that only stated that <strong>the</strong> fluctuation <strong>of</strong> statisticalnumbers dim<strong>in</strong>ished with <strong>the</strong> <strong>in</strong>crease <strong>in</strong> <strong>the</strong> number <strong>of</strong> trials.Romanovsky (1924, pt 1, p. 15) stressed <strong>the</strong> natural scientific essence<strong>of</strong> <strong>the</strong> LLN <strong>and</strong> called it physical. Chuprov (1924, p. 465) declaredthat <strong>the</strong> LLN <strong>in</strong>cluded ei<strong>the</strong>r ma<strong>the</strong>matical formulas or empiricalrelations <strong>and</strong> <strong>in</strong> his letters <strong>of</strong> that time he effectively denied that <strong>the</strong>LLN provided a bridge between probability <strong>and</strong> statistics.BibliographyBayes T. (1764), An essay towards solv<strong>in</strong>g a problem <strong>in</strong> <strong>the</strong> doctr<strong>in</strong>e <strong>of</strong> chances.Phil. Trans. Roy. Soc., vol 53 for 1763, pp. 360 – 418. Repr<strong>in</strong>t: Biometrika, vol. 45,1958, pp. 293 – 345.--- (1765), Demonstration <strong>of</strong> <strong>the</strong> second rule <strong>in</strong> <strong>the</strong> essay [<strong>of</strong> 1764]. Ibidem, vol.54 for 1764, pp. 296 – 325.Chuprov A. A. (1924), Ziele und Wege der stochastischen Grundlagen derstatistische Theorie. Nord. Stat. Tidskr., t. 3, pp. 433 – 493.Kolmogorov A. N. (1963), On tables <strong>of</strong> r<strong>and</strong>om numbers. Sankhya, Indian J.Stat., vol. A25, pp. 369 – 376.Maciejewski C. (1911), Nouveaux fondements de la théorie de la statistique.Paris.Romanovsky V. I. (1924 Russian), Theory <strong>of</strong> probability <strong>and</strong> statistics accord<strong>in</strong>gto some newest Western scholars. Vestnik Statistiki, No. 4 – 6, pp. 1 – 38; No. 7 – 9,pp. 5 – 34.Uspensky V. A., Semenov A. L., Shen A. Kh. (1990 Russian), Can an(<strong>in</strong>dividual) sequence <strong>of</strong> zeros <strong>and</strong> ones be r<strong>and</strong>om? Uspekhi Matematich. Nauk, vol.45, pp. 105 – 162. This periodical is be<strong>in</strong>g translated cover to cover.140

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!