12.07.2015 Views

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

1 Studies in the History of Statistics and Probability ... - Sheynin, Oscar

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Treat<strong>in</strong>g observations whose results depend on many factors isfraught with an absolutely general difficulty <strong>and</strong> overcom<strong>in</strong>g it waspossibly <strong>the</strong> ma<strong>in</strong> f<strong>in</strong>d<strong>in</strong>g <strong>of</strong> <strong>the</strong> work done. The po<strong>in</strong>t is that <strong>the</strong> result<strong>of</strong> observation (<strong>in</strong> this case, <strong>the</strong> emergence <strong>of</strong> <strong>the</strong> IHD) is generallyconnected with <strong>the</strong> values <strong>of</strong> <strong>the</strong> risk factors <strong>in</strong> a barely understoodway. When <strong>the</strong>re are a few such factors, one or two, say, <strong>the</strong> data areusually divided <strong>in</strong>to <strong>in</strong>tervals accord<strong>in</strong>g to <strong>the</strong>ir value; <strong>in</strong> <strong>the</strong> mostsimple case, <strong>in</strong>to two, but this is very crude <strong>and</strong> it is better to havemore.If each factor is subdivided <strong>in</strong>to several levels, all <strong>the</strong>ircomb<strong>in</strong>ations should be applied to form <strong>the</strong> appropriate groupsprovid<strong>in</strong>g <strong>the</strong> frequencies <strong>of</strong> <strong>the</strong> IHD be<strong>in</strong>g estimates <strong>of</strong> <strong>the</strong>probabilities. These will <strong>in</strong>deed adequately describe <strong>the</strong> data(somewhat roughly because <strong>the</strong> values <strong>of</strong> <strong>the</strong> risk factors areconsidered approximately).For example, <strong>the</strong> contents <strong>of</strong> cholesterol can be considered on fourlevels [...], <strong>the</strong> values <strong>of</strong> <strong>the</strong> systolic blood pressure also on four levels[...]. We <strong>the</strong>n arrange a two-dimensional classification [...] <strong>and</strong> obta<strong>in</strong>16 groups with <strong>the</strong> frequency <strong>of</strong> <strong>the</strong> emergence <strong>of</strong> <strong>the</strong> IHD calculated<strong>in</strong> each <strong>of</strong> <strong>the</strong>m not for all 4856 observations, but for <strong>the</strong>ir number <strong>in</strong><strong>the</strong> group which is 16 times smaller <strong>in</strong> <strong>the</strong> mean. Jo<strong>in</strong><strong>in</strong>g men <strong>and</strong>women toge<strong>the</strong>r will likely be thought <strong>in</strong>admissible so that <strong>the</strong> number<strong>of</strong> observations becomes about twice smaller.In general, a modest number <strong>of</strong> observations <strong>of</strong> <strong>the</strong> order <strong>of</strong> ahundred (when hav<strong>in</strong>g a great many total number <strong>of</strong> observations) willbe left for each frequency. But what happens if we add three moregroups <strong>of</strong> different ages? And four more accord<strong>in</strong>g to <strong>the</strong> <strong>in</strong>tensity <strong>of</strong>smok<strong>in</strong>g? [..] As a result, we will obta<strong>in</strong> a classification with eachgroup conta<strong>in</strong><strong>in</strong>g at best one observation <strong>and</strong> cases <strong>of</strong> no observationsat all are not excluded. Consequently, we will be unable to determ<strong>in</strong>eany probabilities. [...]The same difficulty occurs <strong>in</strong> many technical problems concern<strong>in</strong>g<strong>the</strong> reliability <strong>of</strong> mach<strong>in</strong>ery established by several types <strong>of</strong> checks.Suppose that <strong>the</strong> results <strong>of</strong> <strong>the</strong> checks arex 1 , ..., x k (2.8)<strong>and</strong> we would like to derive <strong>the</strong> probability <strong>of</strong> failure-free work as afunction p(x 1 , ..., x k ). The attempt to achieve this by multivariateanalysis will be senseless.Let us see how this problem was solved <strong>in</strong> <strong>the</strong> Fram<strong>in</strong>gham<strong>in</strong>vestigation. As far as it is possible to judge, its solution had an<strong>in</strong>disputable part, but <strong>the</strong> o<strong>the</strong>r part was absolutely illogical. This doesnot mean that it is <strong>in</strong> essence wrong, but that it possibly needs somespecification. The first part can be thus expounded.When hav<strong>in</strong>g to do with several variables, <strong>the</strong>ir only well studiedfunction is <strong>the</strong> l<strong>in</strong>ear function; <strong>the</strong>re exists an entire pert<strong>in</strong>ent science,l<strong>in</strong>ear algebra, which also partly studies <strong>the</strong> function <strong>of</strong> <strong>the</strong> seconddegree. It would <strong>the</strong>refore be expedient to represent <strong>the</strong> unknownprobability p(x 1 , ..., x k ) by a l<strong>in</strong>ear function. This, however, isobviously impossible because probability changes from 0 to 1 whereas108

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!