13.11.2012 Views

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

Introduction to Categorical Data Analysis

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

11.2 R. A. FISHER’S CONTRIBUTIONS 327<br />

introduced degrees of freedom <strong>to</strong> characterize the family of chi-squared distributions.<br />

Fisher claimed that, for tests of independence in I × J tables, X 2 had df = (I − 1)<br />

(J − 1). By contrast, in 1900 Pearson had argued that, for any application of his<br />

statistic, df equalled the number of cells minus 1, or IJ − 1 for two-way tables.<br />

Fisher pointed out, however, that estimating hypothesized cell probabilities using<br />

estimated row and column probabilities resulted in an additional (I − 1) + (J − 1)<br />

constraints on the fitted values, thus affecting the distribution of X 2 .<br />

Not surprisingly, Pearson reacted critically <strong>to</strong> Fisher’s suggestion that his formula<br />

for df was incorrect. He stated<br />

I hold that such a view [Fisher’s] is entirely erroneous, and that the writer has done no<br />

service <strong>to</strong> the science of statistics by giving it broad-cast circulation in the pages of the<br />

Journal of the Royal Statistical Society....I trust my critic will pardon me for comparing<br />

him with Don Quixote tilting at the windmill; he must either destroy himself, or the whole<br />

theory of probable errors, for they are invariably based on using sample values for those of<br />

the sampled population unknown <strong>to</strong> us.<br />

Pearson claimed that using row and column sample proportions <strong>to</strong> estimate unknown<br />

probabilities had negligible effect on large-sample distributions. Fisher was unable <strong>to</strong><br />

get his rebuttal published by the Royal Statistical Society, and he ultimately resigned<br />

his membership.<br />

Statisticians soon realized that Fisher was correct. For example, in an article in<br />

1926, Fisher provided empirical evidence <strong>to</strong> support his claim. Using 11,688 2×2<br />

tables randomly generated by Karl Pearson’s son, E. S. Pearson, he found a sample<br />

mean of X 2 for these tables of 1.00001; this is much closer <strong>to</strong> the 1.0 predicted by<br />

his formula for E(X 2 )ofdf = (I − 1)(J − 1) = 1 than Pearson’s IJ − 1 = 3. Fisher<br />

maintained much bitterness over Pearson’s reaction <strong>to</strong> his work. In a later volume of<br />

his collected works, writing about Pearson, he stated “If peevish in<strong>to</strong>lerance of free<br />

opinion in others is a sign of senility, it is one which he had developed at an early age.”<br />

Fisher also made good use of CDA methods in his applied work. For example, he<br />

was also a famed geneticist. In one article, Fisher used Pearson’s goodness-of-fit test<br />

<strong>to</strong> test Mendel’s theories of natural inheritance. Calculating a summary P -value from<br />

the results of several of Mendel’s experiments, he obtained an unusually large value<br />

(P = 0.99996) for the right-tail probability of the reference chi-squared distribution.<br />

In other words X 2 was so small that the fit seemed <strong>to</strong>o good, leading Fisher in 1936<br />

<strong>to</strong> comment “the general level of agreement between Mendel’s expectations and his<br />

reported results shows that it is closer than would be expected in the best of several<br />

thousand repetitions. ... I have no doubt that Mendel was deceived by a gardening<br />

assistant, who knew only <strong>to</strong>o well what his principal expected from each trial made.”<br />

In a letter written at the time, he stated “Now, when data have been faked, I know very<br />

well how generally people underestimate the frequency of wide chance deviations,<br />

so that the tendency is always <strong>to</strong> make them agree <strong>to</strong>o well with expectations.”<br />

In 1934 the fifth edition of Fisher’s classic text Statistical Methods for Research<br />

Workers introduced “Fisher’s exact test” for 2 × 2 contingency tables. In his 1935<br />

book The Design of Experiments, Fisher described the tea-tasting experiment

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!