25.11.2014 Views

Biostatistics

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

698 CHAPTER 13 NONPARAMETRIC AND DISTRIBUTION-FREE STATISTICS<br />

13.7 THE KOLMOGOROV–SMIRNOV<br />

GOODNESS-OF-FIT TEST<br />

When one wishes to know how well the distribution of sample data conforms to some<br />

theoretical distribution, a test known as the Kolmogorov–Smirnov goodness-of-fit test<br />

provides an alternative to the chi-square goodness-of-fit test discussed in Chapter 12. The<br />

test gets its name from A. Kolmogorov and N. V. Smirnov, two Russian mathematicians<br />

who introduced two closely related tests in the 1930s.<br />

Kolmogorov’s work (6) is concerned with the one-sample case as discussed here.<br />

Smirnov’s work (7) deals with the case involving two samples in which interest centers on<br />

testing the hypothesis that the distributions of the two-parent populations are identical. The<br />

test for the first situation is frequently referred to as the Kolmogorov–Smirnov one-sample<br />

test. The test for the two-sample case, commonly referred to as the Kolmogorov–Smirnov<br />

two-sample test, will not be discussed here.<br />

The Test Statistic In using the Kolmogorov–Smirnov goodness-of-fit test, a<br />

comparison is made between some theoretical cumulative distribution function, F T (x), and<br />

a sample cumulative distribution function, F S (x). The sample is a random sample from a<br />

population with unknown cumulative distribution function F(x). It will be recalled (Section<br />

4.2) that a cumulative distribution function gives the probability that X is equal to or less<br />

than a particular value, x. That is, by means of the sample cumulative distribution function,<br />

F S (x), we may estimate PX ð xÞ. If there is close agreement between the theoretical and<br />

sample cumulative distributions, the hypothesis that the sample was drawn from the<br />

population with the specified cumulative distribution function, F T (x), is supported. If,<br />

however, there is a discrepancy between the theoretical and observed cumulative distribution<br />

functions too great to be attributed to chance alone, when H 0 is true, the hypothesis<br />

is rejected.<br />

The difference between the theoretical cumulative distribution function, F T (x), and<br />

the sample cumulative distribution function, F S (x), is measured by the statistic D, which is<br />

the greatest vertical distance between F S (x) and F T (x). When a two-sided test is appropriate,<br />

that is, when the hypotheses are<br />

H 0 : Fx ðÞ¼F T ðÞ x for all x from 1 to þ1<br />

H A : Fx ðÞ6¼ F T ðÞ x<br />

for at least one x<br />

the test statistic is<br />

D ¼ sup<br />

x<br />

jF S ðÞ x F T ðÞ x j (13.7.1)<br />

which is read, “D equals the supremum (greatest), over all x, of the absolute value of the<br />

difference F S ðXÞ minus F T ðXÞ.”<br />

The null hypothesis is rejected at the a level of significance if the computed value<br />

of D exceeds the value shown in Appendix Table M for 1 a (two-sided) and the sample<br />

size n.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!