01.06.2013 Views

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

Statistical Methods in Medical Research 4ed

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

292 Analys<strong>in</strong>g non-normal data<br />

It is useful to apply a cont<strong>in</strong>uity correction of 1 unit, s<strong>in</strong>ce the possible values of S turn<br />

out to be separated by an <strong>in</strong>terval of 2 units. The standardized normal deviate is<br />

22=11 18 ˆ 1 97, for which P ˆ 0 049, rather close to the exact value.<br />

For this set of data Spearman's rank correlation coefficient is 0 68. The approximate t<br />

test from (7.19) gives t ˆ 2 66 on 8 DF …P ˆ 0 029†. A more exact significance level,<br />

allow<strong>in</strong>g for the discreteness of the distribution, is 0 031. Both these values are smaller,<br />

<strong>in</strong>dicat<strong>in</strong>g a more significant result, than for Kendall's test. However, different measures<br />

of association which are not logically equivalent will necessarily achieve different levels of<br />

significance, and the reader will hardly need to be rem<strong>in</strong>ded that the method chosen<br />

for f<strong>in</strong>al presentation should not be selected merely as the one show<strong>in</strong>g the most significance.<br />

10.6 Permutation and Monte Carlo tests<br />

In the preced<strong>in</strong>g sections the significance levels of various distribution-free tests<br />

have been given by reference to tables specially constructed for the purpose. In<br />

most cases these tables only apply to small sample sizesÐfor larger samples<br />

approximations to the appropriate null distribution are generally used (as given<br />

<strong>in</strong> Table 10.1). However, it is <strong>in</strong>structive to consider <strong>in</strong> more detail how the<br />

special tables have been constructed. This is because an explanation of the tables<br />

will illustrate that many of the distribution-free tests can be justified as <strong>in</strong>stances<br />

of a more general class of tests known as permutation tests.<br />

As an example consider the comparison of two <strong>in</strong>dependent samples of sizes<br />

4 (sample 1) and 5 (sample 2) us<strong>in</strong>g the Wilcoxon rank sum test. As above, the<br />

sum of the ranks from sample 1 is denoted by T1. Reference to Table A7<br />

shows that the difference between the groups is significant at the 5% level if<br />

T 0 11. As expla<strong>in</strong>ed <strong>in</strong> the legend to Table A7, T 0 ˆ T1 if T1 E…T1† and<br />

T 0 ˆ 2E…T1† T1…ˆ 40 T1 <strong>in</strong> this example) if T1 > E…T1†: T 0 is considered<br />

because the test is two-sided. If, for simplicity, it is assumed that there are no<br />

ties, then the ranks of the four values from sample 1 will be four values chosen<br />

from f1, 2, 3, 4, 5, 6, 7, 8, 9g. Under the null hypothesis that both samples were<br />

drawn from the same population, any set of four of these values is equally likely.<br />

9<br />

There are, therefore, 4 ˆ 126 different sets of four ranks that could have come<br />

n<br />

from sample 1 (see §3.6 for a def<strong>in</strong>ition of the b<strong>in</strong>omial coefficient r ). From<br />

Table 10.1 it can be seen that the only possible values for T 0 are from 10 to<br />

E…T1† ˆ20, that is, 11 dist<strong>in</strong>ct values. Consequently some values of T 0 will arise<br />

from several sets of ranks. Indeed, because each set of ranks is equally likely, the<br />

value of P…T 0 ˆ t† under the null distribution is simply the number of sets of<br />

ranks giv<strong>in</strong>g T 0 ˆ t, divided by 126.<br />

The smallest value of T 0 is 10, which occurs only when sample 1 has the four<br />

smallest ranks, or the four largest ranks. The second smallest value of T 0 , 11, also<br />

can be obta<strong>in</strong>ed <strong>in</strong> only two ways, namely when the ranks for sample 1 are either<br />

f1, 2, 3, 5g or f5, 7, 8, 9g. The next smallest value, 12, can occur <strong>in</strong> four ways,

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!